Sample records for bayesian clustering analysis

  1. Semiparametric Bayesian analysis of accelerated failure time models with cluster structures. (United States)

    Li, Zhaonan; Xu, Xinyi; Shen, Junshan


    In this paper, we develop a Bayesian semiparametric accelerated failure time model for survival data with cluster structures. Our model allows distributional heterogeneity across clusters and accommodates their relationships through a density ratio approach. Moreover, a nonparametric mixture of Dirichlet processes prior is placed on the baseline distribution to yield full distributional flexibility. We illustrate through simulations that our model can greatly improve estimation accuracy by effectively pooling information from multiple clusters, while taking into account the heterogeneity in their random error distributions. We also demonstrate the implementation of our method using analysis of Mayo Clinic Trial in Primary Biliary Cirrhosis. Copyright © 2017 John Wiley & Sons, Ltd.

  2. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. (United States)

    Mo, Qianxing; Shen, Ronglai; Guo, Cui; Vannucci, Marina; Chan, Keith S; Hilsenbeck, Susan G


    Identification of clinically relevant tumor subtypes and omics signatures is an important task in cancer translational research for precision medicine. Large-scale genomic profiling studies such as The Cancer Genome Atlas (TCGA) Research Network have generated vast amounts of genomic, transcriptomic, epigenomic, and proteomic data. While these studies have provided great resources for researchers to discover clinically relevant tumor subtypes and driver molecular alterations, there are few computationally efficient methods and tools for integrative clustering analysis of these multi-type omics data. Therefore, the aim of this article is to develop a fully Bayesian latent variable method (called iClusterBayes) that can jointly model omics data of continuous and discrete data types for identification of tumor subtypes and relevant omics features. Specifically, the proposed method uses a few latent variables to capture the inherent structure of multiple omics data sets to achieve joint dimension reduction. As a result, the tumor samples can be clustered in the latent variable space and relevant omics features that drive the sample clustering are identified through Bayesian variable selection. This method significantly improve on the existing integrative clustering method iClusterPlus in terms of statistical inference and computational speed. By analyzing TCGA and simulated data sets, we demonstrate the excellent performance of the proposed method in revealing clinically meaningful tumor subtypes and driver omics features. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail:

  3. Bayesian Decision Theoretical Framework for Clustering (United States)

    Chen, Mo


    In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…

  4. Bayesian Mediation Analysis (United States)

    Yuan, Ying; MacKinnon, David P.


    In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…

  5. Identifying the source of farmed escaped Atlantic salmon (Salmo salar): Bayesian clustering analysis increases accuracy of assignment

    DEFF Research Database (Denmark)

    Glover, Kevin A.; Hansen, Michael Møller; Skaala, Oystein


    Farmed Atlantic salmon escapees represent a significant threat to the genetic integrity of natural populations. Not all escapement events are reported, and consequently, there is a need to develop an effective tool for the identification of escapees. In this study, > 2200 salmon were collected from...... 44 cages located on 26 farms in the Hardangerfjord, western Norway. This fjord represents one of the major salmon farming areas in Norway, with a production of 57,000 t in 2007. Based upon genetic data from 17 microsatellite markers, significant but highly variable differentiation was observed among...... the 44 samples (cages), with pair-wise FST values ranging between 0.000 and 0.185. Bayesian clustering of the samples revealed five major genetic groups, into which the 44 samples were re-organised. Bayesian clustering also identified two samples consisting of fish with mixed genetic background...

  6. Bayesian data analysis for newcomers. (United States)

    Kruschke, John K; Liddell, Torrin M


    This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.

  7. Bayesian Exploratory Factor Analysis

    DEFF Research Database (Denmark)

    Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.


    This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corr......This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor......, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates...

  8. Bayesian Exploratory Factor Analysis (United States)

    Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.; Piatek, Rémi


    This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates from a high dimensional set of psychological measurements. PMID:25431517

  9. Bayesian logistic regression analysis

    NARCIS (Netherlands)

    Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.


    In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an

  10. Bayesian Independent Component Analysis

    DEFF Research Database (Denmark)

    Winther, Ole; Petersen, Kaare Brandt


    In this paper we present an empirical Bayesian framework for independent component analysis. The framework provides estimates of the sources, the mixing matrix and the noise parameters, and is flexible with respect to choice of source prior and the number of sources and sensors. Inside the engine...... in a Matlab toolbox, is demonstrated for non-negative decompositions and compared with non-negative matrix factorization.......In this paper we present an empirical Bayesian framework for independent component analysis. The framework provides estimates of the sources, the mixing matrix and the noise parameters, and is flexible with respect to choice of source prior and the number of sources and sensors. Inside the engine...

  11. Bayesian nonparametric data analysis

    CERN Document Server

    Müller, Peter; Jara, Alejandro; Hanson, Tim


    This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.

  12. Bayesian Nonparametric Clustering for Positive Definite Matrices. (United States)

    Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos


    Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.

  13. Bayesian Data Analysis (lecture 1)

    CERN Multimedia

    CERN. Geneva


    framework but we will also go into more detail and discuss for example the role of the prior. The second part of the lecture will cover further examples and applications that heavily rely on the bayesian approach, as well as some computational tools needed to perform a bayesian analysis.

  14. Bayesian Data Analysis (lecture 2)

    CERN Multimedia

    CERN. Geneva


    framework but we will also go into more detail and discuss for example the role of the prior. The second part of the lecture will cover further examples and applications that heavily rely on the bayesian approach, as well as some computational tools needed to perform a bayesian analysis.

  15. A Bayesian approach to two-mode clustering

    NARCIS (Netherlands)

    A. van Dijk (Bram); J.M. van Rosmalen (Joost); R. Paap (Richard)


    textabstractWe develop a new Bayesian approach to estimate the parameters of a latent-class model for the joint clustering of both modes of two-mode data matrices. Posterior results are obtained using a Gibbs sampler with data augmentation. Our Bayesian approach has three advantages over existing

  16. Clustering analysis

    International Nuclear Information System (INIS)



    Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K -mean method ' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods

  17. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel


    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  18. Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza. (United States)

    Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A


    Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  19. Seismic Signal Compression Using Nonparametric Bayesian Dictionary Learning via Clustering

    Directory of Open Access Journals (Sweden)

    Xin Tian


    Full Text Available We introduce a seismic signal compression method based on nonparametric Bayesian dictionary learning method via clustering. The seismic data is compressed patch by patch, and the dictionary is learned online. Clustering is introduced for dictionary learning. A set of dictionaries could be generated, and each dictionary is used for one cluster’s sparse coding. In this way, the signals in one cluster could be well represented by their corresponding dictionaries. A nonparametric Bayesian dictionary learning method is used to learn the dictionaries, which naturally infers an appropriate dictionary size for each cluster. A uniform quantizer and an adaptive arithmetic coding algorithm are adopted to code the sparse coefficients. With comparisons to other state-of-the art approaches, the effectiveness of the proposed method could be validated in the experiments.

  20. Bayesian analysis of CCDM models

    Energy Technology Data Exchange (ETDEWEB)

    Jesus, J.F. [Universidade Estadual Paulista (Unesp), Câmpus Experimental de Itapeva, Rua Geraldo Alckmin 519, Vila N. Sra. de Fátima, Itapeva, SP, 18409-010 Brazil (Brazil); Valentim, R. [Departamento de Física, Instituto de Ciências Ambientais, Químicas e Farmacêuticas—ICAQF, Universidade Federal de São Paulo (UNIFESP), Unidade José Alencar, Rua São Nicolau No. 210, Diadema, SP, 09913-030 Brazil (Brazil); Andrade-Oliveira, F., E-mail:, E-mail:, E-mail: [Institute of Cosmology and Gravitation—University of Portsmouth, Burnaby Road, Portsmouth, PO1 3FX United Kingdom (United Kingdom)


    Creation of Cold Dark Matter (CCDM), in the context of Einstein Field Equations, produces a negative pressure term which can be used to explain the accelerated expansion of the Universe. In this work we tested six different spatially flat models for matter creation using statistical criteria, in light of SNe Ia data: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Bayesian Evidence (BE). These criteria allow to compare models considering goodness of fit and number of free parameters, penalizing excess of complexity. We find that JO model is slightly favoured over LJO/ΛCDM model, however, neither of these, nor Γ = 3α H {sub 0} model can be discarded from the current analysis. Three other scenarios are discarded either because poor fitting or because of the excess of free parameters. A method of increasing Bayesian evidence through reparameterization in order to reducing parameter degeneracy is also developed.

  1. Multiview Bayesian Correlated Component Analysis

    DEFF Research Database (Denmark)

    Kamronn, Simon Due; Poulsen, Andreas Trier; Hansen, Lars Kai


    are identical. Here we propose a hierarchical probabilistic model that can infer the level of universality in such multiview data, from completely unrelated representations, corresponding to canonical correlation analysis, to identical representations as in correlated component analysis. This new model, which...... we denote Bayesian correlated component analysis, evaluates favorably against three relevant algorithms in simulated data. A well-established benchmark EEG data set is used to further validate the new model and infer the variability of spatial representations across multiple subjects....

  2. Bayesian analysis of rare events (United States)

    Straub, Daniel; Papaioannou, Iason; Betz, Wolfgang


    In many areas of engineering and science there is an interest in predicting the probability of rare events, in particular in applications related to safety and security. Increasingly, such predictions are made through computer models of physical systems in an uncertainty quantification framework. Additionally, with advances in IT, monitoring and sensor technology, an increasing amount of data on the performance of the systems is collected. This data can be used to reduce uncertainty, improve the probability estimates and consequently enhance the management of rare events and associated risks. Bayesian analysis is the ideal method to include the data into the probabilistic model. It ensures a consistent probabilistic treatment of uncertainty, which is central in the prediction of rare events, where extrapolation from the domain of observation is common. We present a framework for performing Bayesian updating of rare event probabilities, termed BUS. It is based on a reinterpretation of the classical rejection-sampling approach to Bayesian analysis, which enables the use of established methods for estimating probabilities of rare events. By drawing upon these methods, the framework makes use of their computational efficiency. These methods include the First-Order Reliability Method (FORM), tailored importance sampling (IS) methods and Subset Simulation (SuS). In this contribution, we briefly review these methods in the context of the BUS framework and investigate their applicability to Bayesian analysis of rare events in different settings. We find that, for some applications, FORM can be highly efficient and is surprisingly accurate, enabling Bayesian analysis of rare events with just a few model evaluations. In a general setting, BUS implemented through IS and SuS is more robust and flexible.

  3. Advances in Bayesian Model Based Clustering Using Particle Learning

    Energy Technology Data Exchange (ETDEWEB)

    Merl, D M


    Recent work by Carvalho, Johannes, Lopes and Polson and Carvalho, Lopes, Polson and Taddy introduced a sequential Monte Carlo (SMC) alternative to traditional iterative Monte Carlo strategies (e.g. MCMC and EM) for Bayesian inference for a large class of dynamic models. The basis of SMC techniques involves representing the underlying inference problem as one of state space estimation, thus giving way to inference via particle filtering. The key insight of Carvalho et al was to construct the sequence of filtering distributions so as to make use of the posterior predictive distribution of the observable, a distribution usually only accessible in certain Bayesian settings. Access to this distribution allows a reversal of the usual propagate and resample steps characteristic of many SMC methods, thereby alleviating to a large extent many problems associated with particle degeneration. Furthermore, Carvalho et al point out that for many conjugate models the posterior distribution of the static variables can be parametrized in terms of [recursively defined] sufficient statistics of the previously observed data. For models where such sufficient statistics exist, particle learning as it is being called, is especially well suited for the analysis of streaming data do to the relative invariance of its algorithmic complexity with the number of data observations. Through a particle learning approach, a statistical model can be fit to data as the data is arriving, allowing at any instant during the observation process direct quantification of uncertainty surrounding underlying model parameters. Here we describe the use of a particle learning approach for fitting a standard Bayesian semiparametric mixture model as described in Carvalho, Lopes, Polson and Taddy. In Section 2 we briefly review the previously presented particle learning algorithm for the case of a Dirichlet process mixture of multivariate normals. In Section 3 we describe several novel extensions to the original

  4. Bayesian Model Averaging for Propensity Score Analysis (United States)

    Kaplan, David; Chen, Jianshen


    The purpose of this study is to explore Bayesian model averaging in the propensity score context. Previous research on Bayesian propensity score analysis does not take into account model uncertainty. In this regard, an internally consistent Bayesian framework for model building and estimation must also account for model uncertainty. The…

  5. Bayesian methods for data analysis

    CERN Document Server

    Carlin, Bradley P.


    Approaches for statistical inference Introduction Motivating Vignettes Defining the Approaches The Bayes-Frequentist Controversy Some Basic Bayesian Models The Bayes approach Introduction Prior Distributions Bayesian Inference Hierarchical Modeling Model Assessment Nonparametric Methods Bayesian computation Introduction Asymptotic Methods Noniterative Monte Carlo Methods Markov Chain Monte Carlo Methods Model criticism and selection Bayesian Modeling Bayesian Robustness Model Assessment Bayes Factors via Marginal Density Estimation Bayes Factors

  6. Bayesian Factor Analysis. (United States)


    Fundamantal Factors of Comprehension in...8217’"" . " *. . . . • * • "• "". . . . " . . . . • . . # • • • . .° - -.... . ... .. . . . . . . . . ................. 1 ,,.,..,*, University of Iowa/Novick 8 March 1985 Dr. James McBride Program Manager for Manpower, Psychological...Princeton, NJ 08541 Dr. Vern W. Urry Dr. Peter Stoloff Personnel R&D Center Center for Naval Analysis Office of Personnel Management 200 North

  7. Bayesian versus frequentist statistical inference for investigating a one-off cancer cluster reported to a health department

    Directory of Open Access Journals (Sweden)

    Wills Rachael A


    Full Text Available Abstract Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones, rather than objective reality. Bayesian analysis is (arguably a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.

  8. Bayesian analysis in plant pathology. (United States)

    Mila, A L; Carriquiry, A L


    ABSTRACT Bayesian methods are currently much discussed and applied in several disciplines from molecular biology to engineering. Bayesian inference is the process of fitting a probability model to a set of data and summarizing the results via probability distributions on the parameters of the model and unobserved quantities such as predictions for new observations. In this paper, after a short introduction of Bayesian inference, we present the basic features of Bayesian methodology using examples from sequencing genomic fragments and analyzing microarray gene-expressing levels, reconstructing disease maps, and designing experiments.

  9. Low-Complexity Bayesian Estimation of Cluster-Sparse Channels

    KAUST Repository

    Ballal, Tarig


    This paper addresses the problem of channel impulse response estimation for cluster-sparse channels under the Bayesian estimation framework. We develop a novel low-complexity minimum mean squared error (MMSE) estimator by exploiting the sparsity of the received signal profile and the structure of the measurement matrix. It is shown that due to the banded Toeplitz/circulant structure of the measurement matrix, a channel impulse response, such as underwater acoustic channel impulse responses, can be partitioned into a number of orthogonal or approximately orthogonal clusters. The orthogonal clusters, the sparsity of the channel impulse response and the structure of the measurement matrix, all combined, result in a computationally superior realization of the MMSE channel estimator. The MMSE estimator calculations boil down to simpler in-cluster calculations that can be reused in different clusters. The reduction in computational complexity allows for a more accurate implementation of the MMSE estimator. The proposed approach is tested using synthetic Gaussian channels, as well as simulated underwater acoustic channels. Symbol-error-rate performance and computation time confirm the superiority of the proposed method compared to selected benchmark methods in systems with preamble-based training signals transmitted over clustersparse channels.

  10. Implementing the Bayesian paradigm in risk analysis

    International Nuclear Information System (INIS)

    Aven, T.; Kvaloey, J.T.


    The Bayesian paradigm comprises a unified and consistent framework for analyzing and expressing risk. Yet, we see rather few examples of applications where the full Bayesian setting has been adopted with specifications of priors of unknown parameters. In this paper, we discuss some of the practical challenges of implementing Bayesian thinking and methods in risk analysis, emphasizing the introduction of probability models and parameters and associated uncertainty assessments. We conclude that there is a need for a pragmatic view in order to 'successfully' apply the Bayesian approach, such that we can do the assignments of some of the probabilities without adopting the somewhat sophisticated procedure of specifying prior distributions of parameters. A simple risk analysis example is presented to illustrate ideas

  11. Bayesian analysis for the social sciences

    CERN Document Server

    Jackman, Simon


    Bayesian methods are increasingly being used in the social sciences, as the problems encountered lend themselves so naturally to the subjective qualities of Bayesian methodology. This book provides an accessible introduction to Bayesian methods, tailored specifically for social science students. It contains lots of real examples from political science, psychology, sociology, and economics, exercises in all chapters, and detailed descriptions of all the key concepts, without assuming any background in statistics beyond a first course. It features examples of how to implement the methods using WinBUGS - the most-widely used Bayesian analysis software in the world - and R - an open-source statistical software. The book is supported by a Website featuring WinBUGS and R code, and data sets.

  12. Reliability analysis with Bayesian networks


    Zwirglmaier, Kilian Martin


    Bayesian networks (BNs) represent a probabilistic modeling tool with large potential for reliability engineering. While BNs have been successfully applied to reliability engineering, there are remaining issues, some of which are addressed in this work. Firstly a classification of BN elicitation approaches is proposed. Secondly two approximate inference approaches, one of which is based on discretization and the other one on sampling, are proposed. These approaches are applicable to hybrid/con...

  13. Robust bayesian analysis of an autoregressive model with ...

    African Journals Online (AJOL)

    In this work, robust Bayesian analysis of the Bayesian estimation of an autoregressive model with exponential innovations is performed. Using a Bayesian robustness methodology, we show that, using a suitable generalized quadratic loss, we obtain optimal Bayesian estimators of the parameters corresponding to the ...

  14. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R


    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  15. Bayesian Meta-Analysis of Coefficient Alpha (United States)

    Brannick, Michael T.; Zhang, Nanhua


    The current paper describes and illustrates a Bayesian approach to the meta-analysis of coefficient alpha. Alpha is the most commonly used estimate of the reliability or consistency (freedom from measurement error) for educational and psychological measures. The conventional approach to meta-analysis uses inverse variance weights to combine…

  16. Bayesian Nonparametric Longitudinal Data Analysis. (United States)

    Quintana, Fernando A; Johnson, Wesley O; Waetjen, Elaine; Gold, Ellen


    Practical Bayesian nonparametric methods have been developed across a wide variety of contexts. Here, we develop a novel statistical model that generalizes standard mixed models for longitudinal data that include flexible mean functions as well as combined compound symmetry (CS) and autoregressive (AR) covariance structures. AR structure is often specified through the use of a Gaussian process (GP) with covariance functions that allow longitudinal data to be more correlated if they are observed closer in time than if they are observed farther apart. We allow for AR structure by considering a broader class of models that incorporates a Dirichlet Process Mixture (DPM) over the covariance parameters of the GP. We are able to take advantage of modern Bayesian statistical methods in making full predictive inferences and about characteristics of longitudinal profiles and their differences across covariate combinations. We also take advantage of the generality of our model, which provides for estimation of a variety of covariance structures. We observe that models that fail to incorporate CS or AR structure can result in very poor estimation of a covariance or correlation matrix. In our illustration using hormone data observed on women through the menopausal transition, biology dictates the use of a generalized family of sigmoid functions as a model for time trends across subpopulation categories.

  17. Marketing research cluster analysis

    Directory of Open Access Journals (Sweden)

    Marić Nebojša


    Full Text Available One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  18. On Bayesian System Reliability Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Soerensen Ringi, M.


    The view taken in this thesis is that reliability, the probability that a system will perform a required function for a stated period of time, depends on a person`s state of knowledge. Reliability changes as this state of knowledge changes, i.e. when new relevant information becomes available. Most existing models for system reliability prediction are developed in a classical framework of probability theory and they overlook some information that is always present. Probability is just an analytical tool to handle uncertainty, based on judgement and subjective opinions. It is argued that the Bayesian approach gives a much more comprehensive understanding of the foundations of probability than the so called frequentistic school. A new model for system reliability prediction is given in two papers. The model encloses the fact that component failures are dependent because of a shared operational environment. The suggested model also naturally permits learning from failure data of similar components in non identical environments. 85 refs.

  19. On Bayesian System Reliability Analysis

    International Nuclear Information System (INIS)

    Soerensen Ringi, M.


    The view taken in this thesis is that reliability, the probability that a system will perform a required function for a stated period of time, depends on a person's state of knowledge. Reliability changes as this state of knowledge changes, i.e. when new relevant information becomes available. Most existing models for system reliability prediction are developed in a classical framework of probability theory and they overlook some information that is always present. Probability is just an analytical tool to handle uncertainty, based on judgement and subjective opinions. It is argued that the Bayesian approach gives a much more comprehensive understanding of the foundations of probability than the so called frequentistic school. A new model for system reliability prediction is given in two papers. The model encloses the fact that component failures are dependent because of a shared operational environment. The suggested model also naturally permits learning from failure data of similar components in non identical environments. 85 refs

  20. Combining morphological analysis and Bayesian networks for ...

    African Journals Online (AJOL)

    ... how these two computer aided methods may be combined to better facilitate modelling procedures. A simple example is presented, concerning a recent application in the field of environmental decision support. Keywords: Morphological analysis, Bayesian networks, strategic decision support. ORiON Vol. 23 (2) 2007: pp.

  1. Bayesian Correlation Analysis for Sequence Count Data.

    Directory of Open Access Journals (Sweden)

    Daniel Sánchez-Taltavull

    Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  2. CLEAN: CLustering Enrichment ANalysis (United States)

    Freudenberg, Johannes M; Joshi, Vineet K; Hu, Zhen; Medvedovic, Mario


    Background Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. Results We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score). The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at . The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView). Conclusion Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the conclusions made about clusters of co

  3. Bayesian estimation and modeling: Editorial to the second special issue on Bayesian data analysis. (United States)

    Chow, Sy-Miin; Hoijtink, Herbert


    This editorial accompanies the second special issue on Bayesian data analysis published in this journal. The emphases of this issue are on Bayesian estimation and modeling. In this editorial, we outline the basics of current Bayesian estimation techniques and some notable developments in the statistical literature, as well as adaptations and extensions by psychological researchers to better tailor to the modeling applications in psychology. We end with a discussion on future outlooks of Bayesian data analysis in psychology. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  4. A Gentle Introduction to Bayesian Analysis : Applications to Developmental Research

    NARCIS (Netherlands)

    Van de Schoot, Rens|info:eu-repo/dai/nl/304833207; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B.; Neyer, Franz J.; van Aken, Marcel A G|info:eu-repo/dai/nl/081831218


    Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First,

  5. A gentle introduction to Bayesian analysis : Applications to developmental research

    NARCIS (Netherlands)

    van de Schoot, R.; Kaplan, D.; Denissen, J.J.A.; Asendorpf, J.B.; Neyer, F.J.; van Aken, M.A.G.


    Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First,

  6. Quantile regression and Bayesian cluster detection to identify radon prone areas. (United States)

    Sarra, Annalina; Fontanella, Lara; Valentini, Pasquale; Palermi, Sergio


    Albeit the dominant source of radon in indoor environments is the geology of the territory, many studies have demonstrated that indoor radon concentrations also depend on dwelling-specific characteristics. Following a stepwise analysis, in this study we propose a combined approach to delineate radon prone areas. We first investigate the impact of various building covariates on indoor radon concentrations. To achieve a more complete picture of this association, we exploit the flexible formulation of a Bayesian spatial quantile regression, which is also equipped with parameters that controls the spatial dependence across data. The quantitative knowledge of the influence of each significant building-specific factor on the measured radon levels is employed to predict the radon concentrations that would have been found if the sampled buildings had possessed standard characteristics. Those normalised radon measures should reflect the geogenic radon potential of the underlying ground, which is a quantity directly related to the geological environment. The second stage of the analysis is aimed at identifying radon prone areas, and to this end, we adopt a Bayesian model for spatial cluster detection using as reference unit the building with standard characteristics. The case study is based on a data set of more than 2000 indoor radon measures, available for the Abruzzo region (Central Italy) and collected by the Agency of Environmental Protection of Abruzzo, during several indoor radon monitoring surveys. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Multilevel functional clustering analysis. (United States)

    Serban, Nicoleta; Jiang, Huijing


    In this article, we investigate clustering methods for multilevel functional data, which consist of repeated random functions observed for a large number of units (e.g., genes) at multiple subunits (e.g., bacteria types). To describe the within- and between variability induced by the hierarchical structure in the data, we take a multilevel functional principal component analysis (MFPCA) approach. We develop and compare a hard clustering method applied to the scores derived from the MFPCA and a soft clustering method using an MFPCA decomposition. In a simulation study, we assess the estimation accuracy of the clustering membership and the cluster patterns under a series of settings: small versus moderate number of time points; various noise levels; and varying number of subunits per unit. We demonstrate the applicability of the clustering analysis to a real data set consisting of expression profiles from genes activated by immunity system cells. Prevalent response patterns are identified by clustering the expression profiles using our multilevel clustering analysis. © 2012, The International Biometric Society.

  8. Bayesian Inference in Statistical Analysis

    CERN Document Server

    Box, George E P


    The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Rob

  9. Bayesian Analysis of Individual Level Personality Dynamics

    Directory of Open Access Journals (Sweden)

    Edward Cripps


    Full Text Available A Bayesian technique with analyses of within-person processes at the level of the individual is presented. The approach is used to examine if the patterns of within-person responses on a 12 trial simulation task are consistent with the predictions of ITA theory (Dweck, 1999. ITA theory states that the performance of an individual with an entity theory of ability is more likely to spiral down following a failure experience than the performance of an individual with an incremental theory of ability. This is because entity theorists interpret failure experiences as evidence of a lack of ability, which they believe is largely innate and therefore relatively fixed; whilst incremental theorists believe in the malleability of abilities and interpret failure experiences as evidence of more controllable factors such as poor strategy or lack of effort. The results of our analyses support ITA theory at both the within- and between-person levels of analyses and demonstrate the benefits of Bayesian techniques for the analysis of within-person processes. These include more formal specification of the theory and the ability to draw inferences about each individual, which allows for more nuanced interpretations of individuals within a personality category, such as differences in the individual probabilities of spiralling. While Bayesian techniques have many potential advantages for the analyses of within-person processes at the individual level, ease of use is not one of them for psychologists trained in traditional frequentist statistical techniques.

  10. On Bayesian Principal Component Analysis

    Czech Academy of Sciences Publication Activity Database

    Šmídl, Václav; Quinn, A.


    Roč. 51, č. 9 (2007), s. 4101-4123 ISSN 0167-9473 R&D Projects: GA MŠk(CZ) 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : Principal component analysis ( PCA ) * Variational bayes (VB) * von-Mises–Fisher distribution Subject RIV: BC - Control Systems Theory Impact factor: 1.029, year: 2007


    Directory of Open Access Journals (Sweden)

    M R Sumathi


    Full Text Available According to World Health Organization, 10-20% of children and adolescents all over the world are experiencing mental disorders. Correct diagnosis of mental disorders at an early stage improves the quality of life of children and avoids complicated problems. Various expert systems using artificial intelligence techniques have been developed for diagnosing mental disorders like Schizophrenia, Depression, Dementia, etc. This study focuses on predicting basic mental health problems of children, like Attention problem, Anxiety problem, Developmental delay, Attention Deficit Hyperactivity Disorder (ADHD, Pervasive Developmental Disorder(PDD, etc. using the machine learning techniques, Bayesian Networks and Fuzzy clustering. The focus of the article is on learning the Bayesian network structure using a novel Fuzzy Clustering Based Bayesian network structure learning framework. The performance of the proposed framework was compared with the other existing algorithms and the experimental results have shown that the proposed framework performs better than the earlier algorithms.

  12. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics (United States)

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.


    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  13. Bayesian data analysis tools for atomic physics (United States)

    Trassinelli, Martino


    We present an introduction to some concepts of Bayesian data analysis in the context of atomic physics. Starting from basic rules of probability, we present the Bayes' theorem and its applications. In particular we discuss about how to calculate simple and joint probability distributions and the Bayesian evidence, a model dependent quantity that allows to assign probabilities to different hypotheses from the analysis of a same data set. To give some practical examples, these methods are applied to two concrete cases. In the first example, the presence or not of a satellite line in an atomic spectrum is investigated. In the second example, we determine the most probable model among a set of possible profiles from the analysis of a statistically poor spectrum. We show also how to calculate the probability distribution of the main spectral component without having to determine uniquely the spectrum modeling. For these two studies, we implement the program Nested_fit to calculate the different probability distributions and other related quantities. Nested_fit is a Fortran90/Python code developed during the last years for analysis of atomic spectra. As indicated by the name, it is based on the nested algorithm, which is presented in details together with the program itself.

  14. Joint Bayesian analysis of forensic mixtures. (United States)

    Pascali, Vince L; Merigioli, Sara


    Evaluation of series of PCR experiments referring to the same evidence is not infrequent in a forensic casework. This situation is met when 'series of results in mixture' (EPGs produced by reiterating PCR experiments over the same DNA mixture extract) have to be interpreted or when 'potentially related traces' (mixtures that can have contributors in common) require a combined interpretation. In these cases, there can be uncertainty on the genotype assignment, since: (a) more than one genotype combination fall under the same peak profile; (b) PCR preferential amplification alters pre-PCR allelic proportions; (c) other, more unpredictable technical problems (dropouts/dropins, etc.) take place. The uncertainty in the genotype assignment is in most cases addressed by empirical methods (selection of just one particular profile; extraction of consensual or composite profiles) that disregard part of the evidence. Genotype assignment should conversely take advantage from a joint Bayesian analysis (JBA) of all STRs peak areas generated at each experiment. This is the typical case of Bayesian analysis in which adoption of object-oriented Bayesian networks (OOBNs) could be highly helpful. Starting from experimentally designed mixtures, we created typical examples of 'series of results in mixture' of 'potentially related traces'. JBA was some administered to the whole peak area evidence, by specifically tailored OOBNs models, which enabled genotype assignment reflecting all the available evidence. Examples of a residual ambiguity in the genotype assignment came to light at assumed genotypes with partially overlapping alleles (for example: AB+AC→ABC). In the 'series of results in mixture', this uncertainty was in part refractory to the joint evaluation. Ambiguity was conversely dissipated at the 'potentially related' trace example, where the ABC allelic scheme at the first trace was interpreted together with other unambiguous combinations (ABCD; AB) at the related trace. We

  15. Bayesian Analysis of Bubbles in Asset Prices

    Directory of Open Access Journals (Sweden)

    Andras Fulop


    Full Text Available We develop a new model where the dynamic structure of the asset price, after the fundamental value is removed, is subject to two different regimes. One regime reflects the normal period where the asset price divided by the dividend is assumed to follow a mean-reverting process around a stochastic long run mean. The second regime reflects the bubble period with explosive behavior. Stochastic switches between two regimes and non-constant probabilities of exit from the bubble regime are both allowed. A Bayesian learning approach is employed to jointly estimate the latent states and the model parameters in real time. An important feature of our Bayesian method is that we are able to deal with parameter uncertainty and at the same time, to learn about the states and the parameters sequentially, allowing for real time model analysis. This feature is particularly useful for market surveillance. Analysis using simulated data reveals that our method has good power properties for detecting bubbles. Empirical analysis using price-dividend ratios of S&P500 highlights the advantages of our method.

  16. Doing bayesian data analysis a tutorial with R and BUGS

    CERN Document Server

    Kruschke, John K


    There is an explosion of interest in Bayesian statistics, primarily because recently created computational methods have finally made Bayesian analysis obtainable to a wide audience. Doing Bayesian Data Analysis, A Tutorial Introduction with R and BUGS provides an accessible approach to Bayesian data analysis, as material is explained clearly with concrete examples. The book begins with the basics, including essential concepts of probability and random sampling, and gradually progresses to advanced hierarchical modeling methods for realistic data. The text delivers comprehensive coverage of all

  17. Medical Inpatient Journey Modeling and Clustering: A Bayesian Hidden Markov Model Based Approach. (United States)

    Huang, Zhengxing; Dong, Wei; Wang, Fei; Duan, Huilong


    Modeling and clustering medical inpatient journeys is useful to healthcare organizations for a number of reasons including inpatient journey reorganization in a more convenient way for understanding and browsing, etc. In this study, we present a probabilistic model-based approach to model and cluster medical inpatient journeys. Specifically, we exploit a Bayesian Hidden Markov Model based approach to transform medical inpatient journeys into a probabilistic space, which can be seen as a richer representation of inpatient journeys to be clustered. Then, using hierarchical clustering on the matrix of similarities, inpatient journeys can be clustered into different categories w.r.t their clinical and temporal characteristics. We evaluated the proposed approach on a real clinical data set pertaining to the unstable angina treatment process. The experimental results reveal that our method can identify and model latent treatment topics underlying in personalized inpatient journeys, and yield impressive clustering quality.

  18. Current trends in Bayesian methodology with applications

    CERN Document Server

    Upadhyay, Satyanshu K; Dey, Dipak K; Loganathan, Appaia


    Collecting Bayesian material scattered throughout the literature, Current Trends in Bayesian Methodology with Applications examines the latest methodological and applied aspects of Bayesian statistics. The book covers biostatistics, econometrics, reliability and risk analysis, spatial statistics, image analysis, shape analysis, Bayesian computation, clustering, uncertainty assessment, high-energy astrophysics, neural networking, fuzzy information, objective Bayesian methodologies, empirical Bayes methods, small area estimation, and many more topics.Each chapter is self-contained and focuses on

  19. A novel Bayesian DNA motif comparison method for clustering and retrieval.

    Directory of Open Access Journals (Sweden)

    Naomi Habib


    Full Text Available Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors.

  20. Bayesian Model Averaging for Propensity Score Analysis. (United States)

    Kaplan, David; Chen, Jianshen


    This article considers Bayesian model averaging as a means of addressing uncertainty in the selection of variables in the propensity score equation. We investigate an approximate Bayesian model averaging approach based on the model-averaged propensity score estimates produced by the R package BMA but that ignores uncertainty in the propensity score. We also provide a fully Bayesian model averaging approach via Markov chain Monte Carlo sampling (MCMC) to account for uncertainty in both parameters and models. A detailed study of our approach examines the differences in the causal estimate when incorporating noninformative versus informative priors in the model averaging stage. We examine these approaches under common methods of propensity score implementation. In addition, we evaluate the impact of changing the size of Occam's window used to narrow down the range of possible models. We also assess the predictive performance of both Bayesian model averaging propensity score approaches and compare it with the case without Bayesian model averaging. Overall, results show that both Bayesian model averaging propensity score approaches recover the treatment effect estimates well and generally provide larger uncertainty estimates, as expected. Both Bayesian model averaging approaches offer slightly better prediction of the propensity score compared with the Bayesian approach with a single propensity score equation. Covariate balance checks for the case study show that both Bayesian model averaging approaches offer good balance. The fully Bayesian model averaging approach also provides posterior probability intervals of the balance indices.

  1. Comprehensive cluster analysis with Transitivity Clustering. (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan


    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  2. Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination. (United States)

    Yau, Christopher; Holmes, Chris


    We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a 'sparsity prior' representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate's relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.

  3. Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering

    KAUST Repository

    Xu, Zhiqiang


    Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.


    Directory of Open Access Journals (Sweden)

    O. Yu. Kydashev


    Full Text Available This paper presents the detailed description of agglomerative clustering system implementation for speech segments based on Bayesian information criterion. Numerical experiment results with different acoustic features, as well as the full and diagonal covariance matrices application are given. The error rate DER equal to 6.4% for audio records of radio «Svoboda» was achieved by means of designed system.

  5. Power in Bayesian Mediation Analysis for Small Sample Research

    NARCIS (Netherlands)

    Miočević, M.; MacKinnon, David; Levy, Roy


    Bayesian methods have the potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This article compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product,

  6. BEAST: Bayesian evolutionary analysis by sampling trees. (United States)

    Drummond, Alexei J; Rambaut, Andrew


    The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented. BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at under the GNU LGPL license. BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.

  7. BEAST: Bayesian evolutionary analysis by sampling trees

    Directory of Open Access Journals (Sweden)

    Drummond Alexei J


    Full Text Available Abstract Background The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented. Results BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at under the GNU LGPL license. Conclusion BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.

  8. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model (United States)

    Ellefsen, Karl J.; Smith, David


    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.

  9. Bayesian hypothesis testing: Editorial to the Special Issue on Bayesian data analysis. (United States)

    Hoijtink, Herbert; Chow, Sy-Miin


    In the past 20 years, there has been a steadily increasing attention and demand for Bayesian data analysis across multiple scientific disciplines, including psychology. Bayesian methods and the related Markov chain Monte Carlo sampling techniques offered renewed ways of handling old and challenging new problems that may be difficult or impossible to handle using classical approaches. Yet, such opportunities and potential improvements have not been sufficiently explored and investigated. This is 1 of 2 special issues in Psychological Methods dedicated to the topic of Bayesian data analysis, with an emphasis on Bayesian hypothesis testing, model comparison, and general guidelines for applications in psychology. In this editorial, we provide an overview of the use of Bayesian methods in psychological research and a brief history of the Bayes factor and the posterior predictive p value. Translational abstracts that summarize the articles in this issue in very clear and understandable terms are included in the Appendix. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  10. ASteCA: Automated Stellar Cluster Analysis (United States)

    Perren, G. I.; Vázquez, R. A.; Piatti, A. E.


    We present the Automated Stellar Cluster Analysis package (ASteCA), a suit of tools designed to fully automate the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its uncertainties. To validate the code we applied it on a large set of over 400 synthetic MASSCLEAN clusters with varying degrees of field star contamination as well as a smaller set of 20 observed Milky Way open clusters (Berkeley 7, Bochum 11, Czernik 26, Czernik 30, Haffner 11, Haffner 19, NGC 133, NGC 2236, NGC 2264, NGC 2324, NGC 2421, NGC 2627, NGC 6231, NGC 6383, NGC 6705, Ruprecht 1, Tombaugh 1, Trumpler 1, Trumpler 5 and Trumpler 14) studied in the literature. The results show that ASteCA is able to recover cluster parameters with an acceptable precision even for those clusters affected by substantial field star contamination. ASteCA is written in Python and is made available as an open source code which can be downloaded ready to be used from its official site.

  11. Objective Bayesian Analysis of Skew- t Distributions

    KAUST Repository



    We study the Jeffreys prior and its properties for the shape parameter of univariate skew-t distributions with linear and nonlinear Student\\'s t skewing functions. In both cases, we show that the resulting priors for the shape parameter are symmetric around zero and proper. Moreover, we propose a Student\\'s t approximation of the Jeffreys prior that makes an objective Bayesian analysis easy to perform. We carry out a Monte Carlo simulation study that demonstrates an overall better behaviour of the maximum a posteriori estimator compared with the maximum likelihood estimator. We also compare the frequentist coverage of the credible intervals based on the Jeffreys prior and its approximation and show that they are similar. We further discuss location-scale models under scale mixtures of skew-normal distributions and show some conditions for the existence of the posterior distribution and its moments. Finally, we present three numerical examples to illustrate the implications of our results on inference for skew-t distributions. © 2012 Board of the Foundation of the Scandinavian Journal of Statistics.

  12. Relation chain based clustering analysis (United States)

    Zhang, Cheng-ning; Zhao, Ming-yang; Luo, Hai-bo


    Clustering analysis is currently one of well-developed branches in data mining technology which is supposed to find the hidden structures in the multidimensional space called feature or pattern space. A datum in the space usually possesses a vector form and the elements in the vector represent several specifically selected features. These features are often of efficiency to the problem oriented. Generally, clustering analysis goes into two divisions: one is based on the agglomerative clustering method, and the other one is based on divisive clustering method. The former refers to a bottom-up process which regards each datum as a singleton cluster while the latter refers to a top-down process which regards entire data as a cluster. As the collected literatures, it is noted that the divisive clustering is currently overwhelming both in application and research. Although some famous divisive clustering methods are designed and well developed, clustering problems are still far from being solved. The k - means algorithm is the original divisive clustering method which initially assigns some important index values, such as the clustering number and the initial clustering prototype positions, and that could not be reasonable in some certain occasions. More than the initial problem, the k - means algorithm may also falls into local optimum, clusters in a rigid way and is not available for non-Gaussian distribution. One can see that seeking for a good or natural clustering result, in fact, originates from the one's understanding of the concept of clustering. Thus, the confusion or misunderstanding of the definition of clustering always derives some unsatisfied clustering results. One should consider the definition deeply and seriously. This paper demonstrates the nature of clustering, gives the way of understanding clustering, discusses the methodology of designing a clustering algorithm, and proposes a new clustering method based on relation chains among 2D patterns. In

  13. Bayesian analysis of MEG visual evoked responses

    Energy Technology Data Exchange (ETDEWEB)

    Schmidt, D.M.; George, J.S.; Wood, C.C.


    The authors developed a method for analyzing neural electromagnetic data that allows probabilistic inferences to be drawn about regions of activation. The method involves the generation of a large number of possible solutions which both fir the data and prior expectations about the nature of probable solutions made explicit by a Bayesian formalism. In addition, they have introduced a model for the current distributions that produce MEG and (EEG) data that allows extended regions of activity, and can easily incorporate prior information such as anatomical constraints from MRI. To evaluate the feasibility and utility of the Bayesian approach with actual data, they analyzed MEG data from a visual evoked response experiment. They compared Bayesian analyses of MEG responses to visual stimuli in the left and right visual fields, in order to examine the sensitivity of the method to detect known features of human visual cortex organization. They also examined the changing pattern of cortical activation as a function of time.

  14. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. (United States)

    Kruschke, John K; Liddell, Torrin M


    In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.

  15. Adaptive bayesian analysis for binomial proportions

    CSIR Research Space (South Africa)

    Das, Sonali


    Full Text Available The authors consider the problem of statistical inference of binomial proportions for non-matched, correlated samples, under the Bayesian framework. Such inference can arise when the same group is observed at a different number of times with the aim...

  16. Bayesian methods in clinical trials: a Bayesian analysis of ECOG trials E1684 and E1690

    Directory of Open Access Journals (Sweden)

    Ibrahim Joseph G


    Full Text Available Abstract Background E1684 was the pivotal adjuvant melanoma trial for establishment of high-dose interferon (IFN as effective therapy of high-risk melanoma patients. E1690 was an intriguing effort to corroborate E1684, and the differences between the outcomes of these trials have embroiled the field in controversy over the past several years. The analyses of E1684 and E1690 were carried out separately when the results were published, and there were no further analyses trying to perform a single analysis of the combined trials. Method In this paper, we consider such a joint analysis by carrying out a Bayesian analysis of these two trials, thus providing us with a consistent and coherent methodology for combining the results from these two trials. Results The Bayesian analysis using power priors provided a more coherent flexible and potentially more accurate analysis than a separate analysis of these data or a frequentist analysis of these data. The methodology provides a consistent framework for carrying out a single unified analysis by combining data from two or more studies. Conclusions Such Bayesian analyses can be crucial in situations where the results from two theoretically identical trials yield somewhat conflicting or inconsistent results.

  17. Remodularization Analysis Using Semantic Clustering


    Santos, Gustavo; Tulio Valente, Marco; Anquetil, Nicolas


    International audience; In this paper, we report an experience on using and adapting Semantic Clustering to evaluate software remodularizations. Semantic Clustering is an approach that relies on information retrieval and clustering techniques to extract sets of similar classes in a system, according to their vocabularies. We adapted Semantic Clustering to support remodularization analysis. We evaluate our adaptation using six real-world remodularizations of four software systems. We report th...

  18. Medical decision making tools: Bayesian analysis and ROC analysis

    International Nuclear Information System (INIS)

    Lee, Byung Do


    During the diagnostic process of the various oral and maxillofacial lesions, we should consider the following: 'When should we order diagnostic tests? What tests should be ordered? How should we interpret the results clinically? And how should we use this frequently imperfect information to make optimal medical decision?' For the clinicians to make proper judgement, several decision making tools are suggested. This article discusses the concept of the diagnostic accuracy (sensitivity and specificity values) with several decision making tools such as decision matrix, ROC analysis and Bayesian analysis. The article also explain the introductory concept of ORAD program

  19. Bayesian analysis of Markov point processes

    DEFF Research Database (Denmark)

    Berthelsen, Kasper Klitgaard; Møller, Jesper


    Recently Møller, Pettitt, Berthelsen and Reeves introduced a new MCMC methodology for drawing samples from a posterior distribution when the likelihood function is only specified up to a normalising constant. We illustrate the method in the setting of Bayesian inference for Markov point processes...... a partially ordered Markov point process as the auxiliary variable. As the method requires simulation from the "unknown" likelihood, perfect simulation algorithms for spatial point processes become useful....


    Directory of Open Access Journals (Sweden)

    Anass BAYAGA


    Full Text Available The objective of this second part of a two-phased study was to explorethe predictive power of quantitative risk analysis (QRA method andprocess within Higher Education Institution (HEI. The method and process investigated the use impact analysis via Nicholas risk model and Bayesian analysis, with a sample of hundred (100 risk analysts in a historically black South African University in the greater Eastern Cape Province.The first findings supported and confirmed previous literature (KingIII report, 2009: Nicholas and Steyn, 2008: Stoney, 2007: COSA, 2004 that there was a direct relationship between risk factor, its likelihood and impact, certiris paribus. The second finding in relation to either controlling the likelihood or the impact of occurrence of risk (Nicholas risk model was that to have a brighter risk reward, it was important to control the likelihood ofoccurrence of risks as compared with its impact so to have a direct effect on entire University. On the Bayesian analysis, thus third finding, the impact of risk should be predicted along three aspects. These aspects included the human impact (decisions made, the property impact (students and infrastructural based and the business impact. Lastly, the study revealed that although in most business cases, where as business cycles considerably vary dependingon the industry and or the institution, this study revealed that, most impacts in HEI (University was within the period of one academic.The recommendation was that application of quantitative risk analysisshould be related to current legislative framework that affects HEI.

  1. Bayesian analysis of a correlated binomial model


    Diniz, Carlos A. R.; Tutia, Marcelo H.; Leite, Jose G.


    In this paper a Bayesian approach is applied to the correlated binomial model, CB(n, p, ρ), proposed by Luceño (Comput. Statist. Data Anal. 20 (1995) 511–520). The data augmentation scheme is used in order to overcome the complexity of the mixture likelihood. MCMC methods, including Gibbs sampling and Metropolis within Gibbs, are applied to estimate the posterior marginal for the probability of success p and for the correlation coefficient ρ. The sensitivity of the posterior is studied taking...

  2. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K


    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  3. Spatiotemporal Bayesian inference dipole analysis for MEG neuroimaging data. (United States)

    Jun, Sung C; George, John S; Paré-Blagoev, Juliana; Plis, Sergey M; Ranken, Doug M; Schmidt, David M; Wood, C C


    Recently, we described a Bayesian inference approach to the MEG/EEG inverse problem that used numerical techniques to estimate the full posterior probability distributions of likely solutions upon which all inferences were based [Schmidt, D.M., George, J.S., Wood, C.C., 1999. Bayesian inference applied to the electromagnetic inverse problem. Human Brain Mapping 7, 195; Schmidt, D.M., George, J.S., Ranken, D.M., Wood, C.C., 2001. Spatial-temporal bayesian inference for MEG/EEG. In: Nenonen, J., Ilmoniemi, R. J., Katila, T. (Eds.), Biomag 2000: 12th International Conference on Biomagnetism. Espoo, Norway, p. 671]. Schmidt et al. (1999) focused on the analysis of data at a single point in time employing an extended region source model. They subsequently extended their work to a spatiotemporal Bayesian inference analysis of the full spatiotemporal MEG/EEG data set. Here, we formulate spatiotemporal Bayesian inference analysis using a multi-dipole model of neural activity. This approach is faster than the extended region model, does not require use of the subject's anatomical information, does not require prior determination of the number of dipoles, and yields quantitative probabilistic inferences. In addition, we have incorporated the ability to handle much more complex and realistic estimates of the background noise, which may be represented as a sum of Kronecker products of temporal and spatial noise covariance components. This reduces the effects of undermodeling noise. In order to reduce the rigidity of the multi-dipole formulation which commonly causes problems due to multiple local minima, we treat the given covariance of the background as uncertain and marginalize over it in the analysis. Markov Chain Monte Carlo (MCMC) was used to sample the many possible likely solutions. The spatiotemporal Bayesian dipole analysis is demonstrated using simulated and empirical whole-head MEG data.

  4. Bayesian cost-effectiveness analysis with the R package BCEA

    CERN Document Server

    Baio, Gianluca; Heath, Anna


    The book provides a description of the process of health economic evaluation and modelling for cost-effectiveness analysis, particularly from the perspective of a Bayesian statistical approach. Some relevant theory and introductory concepts are presented using practical examples and two running case studies. The book also describes in detail how to perform health economic evaluations using the R package BCEA (Bayesian Cost-Effectiveness Analysis). BCEA can be used to post-process the results of a Bayesian cost-effectiveness model and perform advanced analyses producing standardised and highly customisable outputs. It presents all the features of the package, including its many functions and their practical application, as well as its user-friendly web interface. The book is a valuable resource for statisticians and practitioners working in the field of health economics wanting to simplify and standardise their workflow, for example in the preparation of dossiers in support of marketing authorisation, or acade...

  5. The Application of Bayesian Spectral Analysis in Photometric Time Series

    Directory of Open Access Journals (Sweden)

    saeideh latif


    Full Text Available The present paper introduces the Bayesian spectral analysis as a powerful and efficient method for spectral analysis of photometric time series. For this purpose, Bayesian spectral analysis has programmed in Matlab software for XZ Dra photometric time series which is non-uniform with large gaps and the power spectrum of this analysis has compared with the power spectrum which obtained from the Period04 software, which designed for statistical analysis of astronomical time series and used of artificial data for unify the time series. Although in the power spectrum of this software, the main spectral peak which represent the main frequency of XZ Dra variable star oscillations in the f = 2.09864 (day -1 is well known but false spectral peaks are also seen. Also, in this software it’s not clear how to generate the synthetic data. These false peaks have been removed in the power spectrum which obtained from the Bayesian analysis; also this spectral peak which is around the desired frequency has a shorter width and is more accurate. It should be noted that in Bayesian spectral analysis, it’s not require to unify the time series for obtaining a desired power spectrum. Moreover, the researcher also becomes aware of the exact calculation process.

  6. A spatio-temporal nonparametric Bayesian variable selection model of fMRI data for clustering correlated time courses. (United States)

    Zhang, Linlin; Guindani, Michele; Versace, Francesco; Vannucci, Marina


    In this paper we present a novel wavelet-based Bayesian nonparametric regression model for the analysis of functional magnetic resonance imaging (fMRI) data. Our goal is to provide a joint analytical framework that allows to detect regions of the brain which exhibit neuronal activity in response to a stimulus and, simultaneously, infer the association, or clustering, of spatially remote voxels that exhibit fMRI time series with similar characteristics. We start by modeling the data with a hemodynamic response function (HRF) with a voxel-dependent shape parameter. We detect regions of the brain activated in response to a given stimulus by using mixture priors with a spike at zero on the coefficients of the regression model. We account for the complex spatial correlation structure of the brain by using a Markov random field (MRF) prior on the parameters guiding the selection of the activated voxels, therefore capturing correlation among nearby voxels. In order to infer association of the voxel time courses, we assume correlated errors, in particular long memory, and exploit the whitening properties of discrete wavelet transforms. Furthermore, we achieve clustering of the voxels by imposing a Dirichlet process (DP) prior on the parameters of the long memory process. For inference, we use Markov Chain Monte Carlo (MCMC) sampling techniques that combine Metropolis-Hastings schemes employed in Bayesian variable selection with sampling algorithms for nonparametric DP models. We explore the performance of the proposed model on simulated data, with both block- and event-related design, and on real fMRI data. Copyright © 2014 Elsevier Inc. All rights reserved.

  7. A Bayesian Nonparametric Approach to Factor Analysis

    DEFF Research Database (Denmark)

    Piatek, Rémi; Papaspiliopoulos, Omiros


    This paper introduces a new approach for the inference of non-Gaussian factor models based on Bayesian nonparametric methods. It relaxes the usual normality assumption on the latent factors, widely used in practice, which is too restrictive in many settings. Our approach, on the contrary, does...... not impose any particular assumptions on the shape of the distribution of the factors, but still secures the basic requirements for the identification of the model. We design a new sampling scheme based on marginal data augmentation for the inference of mixtures of normals with location and scale...... restrictions. This approach is augmented by the use of a retrospective sampler, to allow for the inference of a constrained Dirichlet process mixture model for the distribution of the latent factors. We carry out a simulation study to illustrate the methodology and demonstrate its benefits. Our sampler is very...

  8. Research & development and growth: A Bayesian model averaging analysis

    Czech Academy of Sciences Publication Activity Database

    Horváth, Roman


    Roč. 28, č. 6 (2011), s. 2669-2673 ISSN 0264-9993. [Society for Non-linear Dynamics and Econometrics Annual Conferencen. Washington DC, 16.03.2011-18.03.2011] R&D Projects: GA ČR GA402/09/0965 Institutional research plan: CEZ:AV0Z10750506 Keywords : Research and development * Growth * Bayesian model averaging Subject RIV: AH - Economics Impact factor: 0.701, year: 2011 & development and growth a bayesian model averaging analysis.pdf

  9. Stochastic back analysis of permeability coefficient using generalized Bayesian method

    Directory of Open Access Journals (Sweden)

    Zheng Guilan


    Full Text Available Owing to the fact that the conventional deterministic back analysis of the permeability coefficient cannot reflect the uncertainties of parameters, including the hydraulic head at the boundary, the permeability coefficient and measured hydraulic head, a stochastic back analysis taking consideration of uncertainties of parameters was performed using the generalized Bayesian method. Based on the stochastic finite element method (SFEM for a seepage field, the variable metric algorithm and the generalized Bayesian method, formulas for stochastic back analysis of the permeability coefficient were derived. A case study of seepage analysis of a sluice foundation was performed to illustrate the proposed method. The results indicate that, with the generalized Bayesian method that considers the uncertainties of measured hydraulic head, the permeability coefficient and the hydraulic head at the boundary, both the mean and standard deviation of the permeability coefficient can be obtained and the standard deviation is less than that obtained by the conventional Bayesian method. Therefore, the present method is valid and applicable.

  10. Bayesian model and spatial analysis of oral and oropharynx cancer mortality in Minas Gerais, Brazil. (United States)

    Fonseca, Emílio Prado da; Oliveira, Cláudia Di Lorenzo; Chiaravalloti, Francisco; Pereira, Antonio Carlos; Vedovello, Silvia Amélia Scudeler; Meneghim, Marcelo de Castro


    The objective of this study was to determine of oral and oropharynx cancer mortality rate and the results were analyzed by applying the Spatial Analysis of Empirical Bayesian Model. To this end, we used the information contained in the International Classification of Diseases (ICD-10), Chapter II, Category C00 to C14 and Brazilian Mortality Information System (SIM) of Minas Gerais State. Descriptive statistics were observed and the gross rate of mortality was calculated for each municipality. Then Empirical Bayesian estimators were applied. The results showed that, in 2012, in the state of Minas Gerais, were registered 769 deaths of patients with cancer of oral and oropharynx, with 607 (78.96%) men and 162 (21.04%) women. There was a wide variation in spatial distribution of crude mortality rate and were identified agglomeration in the South, Central and North more accurately by Bayesian Estimator Global and Local Model. Through Bayesian models was possible to map the spatial clustering of deaths from oral cancer more accurately, and with the application of the method of spatial epidemiology, it was possible to obtain more accurate results and provide subsidies to reduce the number of deaths from this type of cancer.

  11. A new Bayesian Earthquake Analysis Tool (BEAT) (United States)

    Vasyura-Bathke, Hannes; Dutta, Rishabh; Jónsson, Sigurjón; Mai, Martin


    Modern earthquake source estimation studies increasingly use non-linear optimization strategies to estimate kinematic rupture parameters, often considering geodetic and seismic data jointly. However, the optimization process is complex and consists of several steps that need to be followed in the earthquake parameter estimation procedure. These include pre-describing or modeling the fault geometry, calculating the Green's Functions (often assuming a layered elastic half-space), and estimating the distributed final slip and possibly other kinematic source parameters. Recently, Bayesian inference has become popular for estimating posterior distributions of earthquake source model parameters given measured/estimated/assumed data and model uncertainties. For instance, some research groups consider uncertainties of the layered medium and propagate these to the source parameter uncertainties. Other groups make use of informative priors to reduce the model parameter space. In addition, innovative sampling algorithms have been developed that efficiently explore the often high-dimensional parameter spaces. Compared to earlier studies, these improvements have resulted in overall more robust source model parameter estimates that include uncertainties. However, the computational demands of these methods are high and estimation codes are rarely distributed along with the published results. Even if codes are made available, it is often difficult to assemble them into a single optimization framework as they are typically coded in different programing languages. Therefore, further progress and future applications of these methods/codes are hampered, while reproducibility and validation of results has become essentially impossible. In the spirit of providing open-access and modular codes to facilitate progress and reproducible research in earthquake source estimations, we undertook the effort of producing BEAT, a python package that comprises all the above-mentioned features in one

  12. Bayesian analysis of stress thallium-201 scintigraphy

    International Nuclear Information System (INIS)

    The variation of the diagnostic value of stress T1-201 scintigraphy with prevalence of coronary heart disease (CHD) in the population has been investigated using Bayesian reasoning. From scintigraphic and arteriographic data obtained in 100 consecutive patients presenting with chest pain, the sensitivity of stress T1-201 scintigraphy for the detection of significant CHD was 90% and the specificity was 88%. From Bayes' Theorem, the posterior probability of having CHD for a given test result was calculated for prevalences of CHD ranging from 1% to 99%. the discriminant value of stress T1-201 scintigraphy was best when the prevalence of CHD lay between 30% and 70% and maximum for a prevalence of 52%. Thus, stress T1-201 scintigraphy would be an unsuitable diagnostic test where the prior probability of CHD is low, e.g., population screening programmes, and would add little where the clinical probability of having CHD is intermediate stress T1-201 scintigraphy may provide valuable diagnostic information. (orig.)

  13. Bayesian investigation of isochrone consistency using the old open cluster NGC 188

    Energy Technology Data Exchange (ETDEWEB)

    Hills, Shane; Courteau, Stéphane [Department of Physics, Engineering Physics and Astronomy, Queen’s University, Kingston, ON K7L 3N6 Canada (Canada); Von Hippel, Ted [Department of Physical Sciences, Embry-Riddle Aeronautical University, Daytona Beach, FL 32114 (United States); Geller, Aaron M., E-mail:, E-mail:, E-mail:, E-mail: [Center for Interdisciplinary Exploration and Research in Astrophysics (CIERA) and Department of Physics and Astronomy, Northwestern University, 2145 Sheridan Road, Evanston, IL 60208 (United States)


    This paper provides a detailed comparison of the differences in parameters derived for a star cluster from its color–magnitude diagrams (CMDs) depending on the filters and models used. We examine the consistency and reliability of fitting three widely used stellar evolution models to 15 combinations of optical and near-IR photometry for the old open cluster NGC 188. The optical filter response curves match those of theoretical systems and are thus not the source of fit inconsistencies. NGC 188 is ideally suited to this study thanks to a wide variety of high-quality photometry and available proper motions and radial velocities that enable us to remove non-cluster members and many binaries. Our Bayesian fitting technique yields inferred values of age, metallicity, distance modulus, and absorption as a function of the photometric band combinations and stellar models. We show that the historically favored three-band combinations of UBV and VRI can be meaningfully inconsistent with each other and with longer baseline data sets such as UBVRIJHK{sub S}. Differences among model sets can also be substantial. For instance, fitting Yi et al. (2001) and Dotter et al. (2008) models to UBVRIJHK{sub S} photometry for NGC 188 yields the following cluster parameters: age = (5.78 ± 0.03, 6.45 ± 0.04) Gyr, [Fe/H] = (+0.125 ± 0.003, −0.077 ± 0.003) dex, (m−M){sub V} = (11.441 ± 0.007, 11.525 ± 0.005) mag, and A{sub V} = (0.162 ± 0.003, 0.236 ± 0.003) mag, respectively. Within the formal fitting errors, these two fits are substantially and statistically different. Such differences among fits using different filters and models are a cautionary tale regarding our current ability to fit star cluster CMDs. Additional modeling of this kind, with more models and star clusters, and future Gaia parallaxes are critical for isolating and quantifying the most relevant uncertainties in stellar evolutionary models.

  14. Reliability demonstration test planning using bayesian analysis

    International Nuclear Information System (INIS)

    Chandran, Senthil Kumar; Arul, John A.


    In Nuclear Power Plants, the reliability of all the safety systems is very critical from the safety viewpoint and it is very essential that the required reliability requirements be met while satisfying the design constraints. From practical experience, it is found that the reliability of complex systems such as Safety Rod Drive Mechanism is of the order of 10 -4 with an uncertainty factor of 10. To demonstrate the reliability of such systems is prohibitive in terms of cost and time as the number of tests needed is very large. The purpose of this paper is to develop a Bayesian reliability demonstrating testing procedure for exponentially distributed failure times with gamma prior distribution on the failure rate which can be easily and effectively used to demonstrate component/subsystem/system reliability conformance to stated requirements. The important questions addressed in this paper are: With zero failures, how long one should perform the tests and how many components are required to conclude with a given degree of confidence, that the component under test, meets the reliability requirement. The procedure is explained with an example. This procedure can also be extended to demonstrate with more number of failures. The approach presented is applicable for deriving test plans for demonstrating component failure rates of nuclear power plants, as the failure data for similar components are becoming available in existing plants elsewhere. The advantages of this procedure are the criterion upon which the procedure is based is simple and pertinent, the fitting of the prior distribution is an integral part of the procedure and is based on the use of information regarding two percentiles of this distribution and finally, the procedure is straightforward and easy to apply in practice. (author)

  15. Clustering and Bayesian hierarchical modeling for the definition of informative prior distributions in hydrogeology (United States)

    Cucchi, K.; Kawa, N.; Hesse, F.; Rubin, Y.


    In order to reduce uncertainty in the prediction of subsurface flow and transport processes, practitioners should use all data available. However, classic inverse modeling frameworks typically only make use of information contained in in-situ field measurements to provide estimates of hydrogeological parameters. Such hydrogeological information about an aquifer is difficult and costly to acquire. In this data-scarce context, the transfer of ex-situ information coming from previously investigated sites can be critical for improving predictions by better constraining the estimation procedure. Bayesian inverse modeling provides a coherent framework to represent such ex-situ information by virtue of the prior distribution and combine them with in-situ information from the target site. In this study, we present an innovative data-driven approach for defining such informative priors for hydrogeological parameters at the target site. Our approach consists in two steps, both relying on statistical and machine learning methods. The first step is data selection; it consists in selecting sites similar to the target site. We use clustering methods for selecting similar sites based on observable hydrogeological features. The second step is data assimilation; it consists in assimilating data from the selected similar sites into the informative prior. We use a Bayesian hierarchical model to account for inter-site variability and to allow for the assimilation of multiple types of site-specific data. We present the application and validation of the presented methods on an established database of hydrogeological parameters. Data and methods are implemented in the form of an open-source R-package and therefore facilitate easy use by other practitioners.

  16. Operational modal analysis modeling, Bayesian inference, uncertainty laws

    CERN Document Server

    Au, Siu-Kui


    This book presents operational modal analysis (OMA), employing a coherent and comprehensive Bayesian framework for modal identification and covering stochastic modeling, theoretical formulations, computational algorithms, and practical applications. Mathematical similarities and philosophical differences between Bayesian and classical statistical approaches to system identification are discussed, allowing their mathematical tools to be shared and their results correctly interpreted. Many chapters can be used as lecture notes for the general topic they cover beyond the OMA context. After an introductory chapter (1), Chapters 2–7 present the general theory of stochastic modeling and analysis of ambient vibrations. Readers are first introduced to the spectral analysis of deterministic time series (2) and structural dynamics (3), which do not require the use of probability concepts. The concepts and techniques in these chapters are subsequently extended to a probabilistic context in Chapter 4 (on stochastic pro...

  17. An Overview of Bayesian Methods for Neural Spike Train Analysis

    Directory of Open Access Journals (Sweden)

    Zhe Chen


    Full Text Available Neural spike train analysis is an important task in computational neuroscience which aims to understand neural mechanisms and gain insights into neural circuits. With the advancement of multielectrode recording and imaging technologies, it has become increasingly demanding to develop statistical tools for analyzing large neuronal ensemble spike activity. Here we present a tutorial overview of Bayesian methods and their representative applications in neural spike train analysis, at both single neuron and population levels. On the theoretical side, we focus on various approximate Bayesian inference techniques as applied to latent state and parameter estimation. On the application side, the topics include spike sorting, tuning curve estimation, neural encoding and decoding, deconvolution of spike trains from calcium imaging signals, and inference of neuronal functional connectivity and synchrony. Some research challenges and opportunities for neural spike train analysis are discussed.

  18. A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles. (United States)

    Zhang, Lin; Meng, Jia; Liu, Hui; Huang, Yufei


    DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic. A Dirichlet process beta mixture model (DPBMM) is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters. The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM) brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value < 0.1), while hierarchical clustering cannot yield statistically significant clusters.

  19. Bayesian phylogeny analysis via stochastic approximation Monte Carlo

    KAUST Repository

    Cheon, Sooyoung


    Monte Carlo methods have received much attention in the recent literature of phylogeny analysis. However, the conventional Markov chain Monte Carlo algorithms, such as the Metropolis-Hastings algorithm, tend to get trapped in a local mode in simulating from the posterior distribution of phylogenetic trees, rendering the inference ineffective. In this paper, we apply an advanced Monte Carlo algorithm, the stochastic approximation Monte Carlo algorithm, to Bayesian phylogeny analysis. Our method is compared with two popular Bayesian phylogeny software, BAMBE and MrBayes, on simulated and real datasets. The numerical results indicate that our method outperforms BAMBE and MrBayes. Among the three methods, SAMC produces the consensus trees which have the highest similarity to the true trees, and the model parameter estimates which have the smallest mean square errors, but costs the least CPU time. © 2009 Elsevier Inc. All rights reserved.

  20. Bayesian conformational analysis of ring molecules through reversible jump MCMC

    DEFF Research Database (Denmark)

    Nolsøe, Kim; Kessler, Mathieu; Pérez, José


    In this paper we address the problem of classifying the conformations of mmembered rings using experimental observations obtained by crystal structure analysis. We formulate a model for the data generation mechanism that consists in a multidimensional mixture model. We perform inference...... for the proportions and the components in a Bayesian framework, implementing an MCMC Reversible Jumps Algorithm to obtain samples of the posterior distributions. The method is illustrated on a simulated data set and on real data corresponding to cyclo-octane structures....

  1. A Bayesian Analysis of the Flood Frequency Hydrology Concept (United States)


    ERDC/CHL CHETN-X-1 February 2016 Approved for public release; distribution is unlimited. A Bayesian Analysis of the Flood Frequency Hydrology ...flood frequency hydrology concept as a formal probabilistic-based means by which to coherently combine and also evaluate the worth of different types...and development. INTRODUCTION: Merz and Blöschl (2008a,b) proposed the concept of flood frequency hydrology , which emphasizes the importance of

  2. Bayesian Analysis Toolkit: 1.0 and beyond (United States)

    Beaujean, Frederik; Caldwell, Allen; Greenwald, D.; Kluth, S.; Kröninger, Kevin; Schulz, O.


    The Bayesian Analysis Toolkit is a C++ package centered around Markov-chain Monte Carlo sampling. It is used in high-energy physics analyses by experimentalists and theorists alike. The software has matured over the last few years. We present new features to enter version 1.0, then summarize some of the software-engineering lessons learned and give an outlook on future versions.

  3. Bayesian-network-based safety risk analysis in construction projects

    International Nuclear Information System (INIS)

    Zhang, Limao; Wu, Xianguo; Skibniewski, Miroslaw J.; Zhong, Jingbing; Lu, Yujie


    This paper presents a systemic decision support approach for safety risk analysis under uncertainty in tunnel construction. Fuzzy Bayesian Networks (FBN) is used to investigate causal relationships between tunnel-induced damage and its influential variables based upon the risk/hazard mechanism analysis. Aiming to overcome limitations on the current probability estimation, an expert confidence indicator is proposed to ensure the reliability of the surveyed data for fuzzy probability assessment of basic risk factors. A detailed fuzzy-based inference procedure is developed, which has a capacity of implementing deductive reasoning, sensitivity analysis and abductive reasoning. The “3σ criterion” is adopted to calculate the characteristic values of a triangular fuzzy number in the probability fuzzification process, and the α-weighted valuation method is adopted for defuzzification. The construction safety analysis progress is extended to the entire life cycle of risk-prone events, including the pre-accident, during-construction continuous and post-accident control. A typical hazard concerning the tunnel leakage in the construction of Wuhan Yangtze Metro Tunnel in China is presented as a case study, in order to verify the applicability of the proposed approach. The results demonstrate the feasibility of the proposed approach and its application potential. A comparison of advantages and disadvantages between FBN and fuzzy fault tree analysis (FFTA) as risk analysis tools is also conducted. The proposed approach can be used to provide guidelines for safety analysis and management in construction projects, and thus increase the likelihood of a successful project in a complex environment. - Highlights: • A systemic Bayesian network based approach for safety risk analysis is developed. • An expert confidence indicator for probability fuzzification is proposed. • Safety risk analysis progress is extended to entire life cycle of risk-prone events. • A typical

  4. Analysis of lifespan monitoring data using Bayesian logic

    International Nuclear Information System (INIS)

    Pozzi, M; Zonta, D; Glisic, B; Inaudi, D; Lau, J M; Fong, C C


    In this contribution, we use a Bayesian approach to analyze the data from a 19-storey building block, which is part of the Punggol EC26 construction project undertaken by the Singapore Housing and Development Board in the early 2000s. The building was instrumented during construction with interferometric fiber optic average strain sensors, embedded in ten of the first story columns during construction. The philosophy driving the design of the monitoring system was to instrument a representative number of structural elements, while maintaining the cost at a reasonable level. The analysis of the data, along with prior experience, allowed the engineer to recognize at early stage an ongoing differential settlement of one base column. We show how the whole cognitive process followed by the engineer can be reproduced using Bayesian logic. Particularly, we discuss to what extent the prior knowledge and potential evidence from inspection, can alter the perception of the building response based solely on instrumental data.

  5. A Bayesian on-off analysis of cosmic ray data (United States)

    Nosek, Dalibor; Nosková, Jana


    We deal with the analysis of on-off measurements designed for the confirmation of a weak source of events whose presence is hypothesized, based on former observations. The problem of a small number of source events that are masked by an imprecisely known background is addressed from a Bayesian point of view. We examine three closely related variables, the posterior distributions of which carry relevant information about various aspects of the investigated phenomena. This information is utilized for predictions of further observations, given actual data. Backed by details of detection, we propose how to quantify disparities between different measurements. The usefulness of the Bayesian inference is demonstrated on examples taken from cosmic ray physics.

  6. Bayesian tomography and integrated data analysis in fusion diagnostics (United States)

    Li, Dong; Dong, Y. B.; Deng, Wei; Shi, Z. B.; Fu, B. Z.; Gao, J. M.; Wang, T. B.; Zhou, Yan; Liu, Yi; Yang, Q. W.; Duan, X. R.


    In this article, a Bayesian tomography method using non-stationary Gaussian process for a prior has been introduced. The Bayesian formalism allows quantities which bear uncertainty to be expressed in the probabilistic form so that the uncertainty of a final solution can be fully resolved from the confidence interval of a posterior probability. Moreover, a consistency check of that solution can be performed by checking whether the misfits between predicted and measured data are reasonably within an assumed data error. In particular, the accuracy of reconstructions is significantly improved by using the non-stationary Gaussian process that can adapt to the varying smoothness of emission distribution. The implementation of this method to a soft X-ray diagnostics on HL-2A has been used to explore relevant physics in equilibrium and MHD instability modes. This project is carried out within a large size inference framework, aiming at an integrated analysis of heterogeneous diagnostics.

  7. Bayesian models and meta analysis for multiple tissue gene expression data following corticosteroid administration

    Directory of Open Access Journals (Sweden)

    Kelemen Arpad


    Full Text Available Abstract Background This paper addresses key biological problems and statistical issues in the analysis of large gene expression data sets that describe systemic temporal response cascades to therapeutic doses in multiple tissues such as liver, skeletal muscle, and kidney from the same animals. Affymetrix time course gene expression data U34A are obtained from three different tissues including kidney, liver and muscle. Our goal is not only to find the concordance of gene in different tissues, identify the common differentially expressed genes over time and also examine the reproducibility of the findings by integrating the results through meta analysis from multiple tissues in order to gain a significant increase in the power of detecting differentially expressed genes over time and to find the differential differences of three tissues responding to the drug. Results and conclusion Bayesian categorical model for estimating the proportion of the 'call' are used for pre-screening genes. Hierarchical Bayesian Mixture Model is further developed for the identifications of differentially expressed genes across time and dynamic clusters. Deviance information criterion is applied to determine the number of components for model comparisons and selections. Bayesian mixture model produces the gene-specific posterior probability of differential/non-differential expression and the 95% credible interval, which is the basis for our further Bayesian meta-inference. Meta-analysis is performed in order to identify commonly expressed genes from multiple tissues that may serve as ideal targets for novel treatment strategies and to integrate the results across separate studies. We have found the common expressed genes in the three tissues. However, the up/down/no regulations of these common genes are different at different time points. Moreover, the most differentially expressed genes were found in the liver, then in kidney, and then in muscle.

  8. A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles

    Directory of Open Access Journals (Sweden)

    Zhang Lin


    Full Text Available Abstract Background DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic. Method A Dirichlet process beta mixture model (DPBMM is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters. Result The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value

  9. A Bayesian Nonparametric Meta-Analysis Model (United States)

    Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G.


    In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall…

  10. Bayesian networks for omics data analysis

    NARCIS (Netherlands)

    Gavai, A.K.


    This thesis focuses on two aspects of high throughput technologies, i.e. data storage and data analysis, in particular in transcriptomics and metabolomics. Both technologies are part of a research field that is generally called ‘omics’ (or ‘-omics’, with a leading hyphen), which refers to genomics,

  11. A Bayesian framework for cell-level protein network analysis for multivariate proteomics image data (United States)

    Kovacheva, Violet N.; Sirinukunwattana, Korsuk; Rajpoot, Nasir M.


    The recent development of multivariate imaging techniques, such as the Toponome Imaging System (TIS), has facilitated the analysis of multiple co-localisation of proteins. This could hold the key to understanding complex phenomena such as protein-protein interaction in cancer. In this paper, we propose a Bayesian framework for cell level network analysis allowing the identification of several protein pairs having significantly higher co-expression levels in cancerous tissue samples when compared to normal colon tissue. It involves segmenting the DAPI-labeled image into cells and determining the cell phenotypes according to their protein-protein dependence profile. The cells are phenotyped using Gaussian Bayesian hierarchical clustering (GBHC) after feature selection is performed. The phenotypes are then analysed using Difference in Sums of Weighted cO-dependence Profiles (DiSWOP), which detects differences in the co-expression patterns of protein pairs. We demonstrate that the pairs highlighted by the proposed framework have high concordance with recent results using a different phenotyping method. This demonstrates that the results are independent of the clustering method used. In addition, the highlighted protein pairs are further analysed via protein interaction pathway databases and by considering the localization of high protein-protein dependence within individual samples. This suggests that the proposed approach could identify potentially functional protein complexes active in cancer progression and cell differentiation.

  12. Robust Bayesian Analysis of Generalized Half Logistic Distribution

    Directory of Open Access Journals (Sweden)

    Ajit Chaturvedi


    Full Text Available In this paper, Robust Bayesian analysis of the generalized half logistic distribution (GHLD under an $\\epsilon$-contamination class of priors for the shape parameter $\\lambda$ is considered. ML-II Bayes estimators of the parameters, reliability function and hazard function are derived under the squared-error loss function (SELF and linear exponential (LINEX loss function by considering the Type~II censoring and the sampling scheme of Bartholomew (1963. Both the cases when scale parameter is known and unknown is considered under Type~II censoring and under the sampling scheme of Bartholomew. Simulation study and analysis of a real data set are presented.

  13. On the blind use of statistical tools in the analysis of globular cluster stars (United States)

    D'Antona, Francesca; Caloi, Vittoria; Tailo, Marco


    As with most data analysis methods, the Bayesian method must be handled with care. We show that its application to determine stellar evolution parameters within globular clusters can lead to paradoxical results if used without the necessary precautions. This is a cautionary tale on the use of statistical tools for big data analysis.

  14. Prior Sensitivity Analysis in Default Bayesian Structural Equation Modeling. (United States)

    van Erp, Sara; Mulder, Joris; Oberski, Daniel L


    Bayesian structural equation modeling (BSEM) has recently gained popularity because it enables researchers to fit complex models and solve some of the issues often encountered in classical maximum likelihood estimation, such as nonconvergence and inadmissible solutions. An important component of any Bayesian analysis is the prior distribution of the unknown model parameters. Often, researchers rely on default priors, which are constructed in an automatic fashion without requiring substantive prior information. However, the prior can have a serious influence on the estimation of the model parameters, which affects the mean squared error, bias, coverage rates, and quantiles of the estimates. In this article, we investigate the performance of three different default priors: noninformative improper priors, vague proper priors, and empirical Bayes priors-with the latter being novel in the BSEM literature. Based on a simulation study, we find that these three default BSEM methods may perform very differently, especially with small samples. A careful prior sensitivity analysis is therefore needed when performing a default BSEM analysis. For this purpose, we provide a practical step-by-step guide for practitioners to conducting a prior sensitivity analysis in default BSEM. Our recommendations are illustrated using a well-known case study from the structural equation modeling literature, and all code for conducting the prior sensitivity analysis is available in the online supplemental materials. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  15. Bayesian Sensitivity Analysis of Statistical Models with Missing Data. (United States)

    Zhu, Hongtu; Ibrahim, Joseph G; Tang, Niansheng


    Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures.

  16. A Bayesian analysis of pentaquark signals from CLAS data

    Energy Technology Data Exchange (ETDEWEB)

    David Ireland; Bryan McKinnon; Dan Protopopescu; Pawel Ambrozewicz; Marco Anghinolfi; G. Asryan; Harutyun Avakian; H. Bagdasaryan; Nathan Baillie; Jacques Ball; Nathan Baltzell; V. Batourine; Marco Battaglieri; Ivan Bedlinski; Ivan Bedlinskiy; Matthew Bellis; Nawal Benmouna; Barry Berman; Angela Biselli; Lukasz Blaszczyk; Sylvain Bouchigny; Sergey Boyarinov; Robert Bradford; Derek Branford; William Briscoe; William Brooks; Volker Burkert; Cornel Butuceanu; John Calarco; Sharon Careccia; Daniel Carman; Liam Casey; Shifeng Chen; Lu Cheng; Philip Cole; Patrick Collins; Philip Coltharp; Donald Crabb; Volker Crede; Natalya Dashyan; Rita De Masi; Raffaella De Vita; Enzo De Sanctis; Pavel Degtiarenko; Alexandre Deur; Richard Dickson; Chaden Djalali; Gail Dodge; Joseph Donnelly; David Doughty; Michael Dugger; Oleksandr Dzyubak; Hovanes Egiyan; Kim Egiyan; Lamiaa Elfassi; Latifa Elouadrhiri; Paul Eugenio; Gleb Fedotov; Gerald Feldman; Ahmed Fradi; Herbert Funsten; Michel Garcon; Gagik Gavalian; Nerses Gevorgyan; Gerard Gilfoyle; Kevin Giovanetti; Francois-Xavier Girod; John Goetz; Wesley Gohn; Atilla Gonenc; Ralf Gothe; Keith Griffioen; Michel Guidal; Nevzat Guler; Lei Guo; Vardan Gyurjyan; Kawtar Hafidi; Hayk Hakobyan; Charles Hanretty; Neil Hassall; F. Hersman; Ishaq Hleiqawi; Maurik Holtrop; Charles Hyde; Yordanka Ilieva; Boris Ishkhanov; Eugeny Isupov; D. Jenkins; Hyon-Suk Jo; John Johnstone; Kyungseon Joo; Henry Juengst; Narbe Kalantarians; James Kellie; Mahbubul Khandaker; Wooyoung Kim; Andreas Klein; Franz Klein; Mikhail Kossov; Zebulun Krahn; Laird Kramer; Valery Kubarovsky; Joachim Kuhn; Sergey Kuleshov; Viacheslav Kuznetsov; Jeff Lachniet; Jean Laget; Jorn Langheinrich; D. Lawrence; Kenneth Livingston; Haiyun Lu; Marion MacCormick; Nikolai Markov; Paul Mattione; Bernhard Mecking; Mac Mestayer; Curtis Meyer; Tsutomu Mibe; Konstantin Mikhaylov; Marco Mirazita; Rory Miskimen; Viktor Mokeev; Brahim Moreno; Kei Moriya; Steven Morrow; Maryam Moteabbed; Edwin Munevar Espitia; Gordon Mutchler; Pawel Nadel-Turonski; Rakhsha Nasseripour; Silvia Niccolai; Gabriel Niculescu; Maria-Ioana Niculescu; Bogdan Niczyporuk; Megh Niroula; Rustam Niyazov; Mina Nozar; Mikhail Osipenko; Alexander Ostrovidov; Kijun Park; Evgueni Pasyuk; Craig Paterson; Sergio Pereira; Joshua Pierce; Nikolay Pivnyuk; Oleg Pogorelko; Sergey Pozdnyakov; John Price; Sebastien Procureur; Yelena Prok; Brian Raue; Giovanni Ricco; Marco Ripani; Barry Ritchie; Federico Ronchetti; Guenther Rosner; Patrizia Rossi; Franck Sabatie; Julian Salamanca; Carlos Salgado; Joseph Santoro; Vladimir Sapunenko; Reinhard Schumacher; Vladimir Serov; Youri Sharabian; Dmitri Sharov; Nikolay Shvedunov; Elton Smith; Lee Smith; Daniel Sober; Daria Sokhan; Aleksey Stavinskiy; Samuel Stepanyan; Stepan Stepanyan; Burnham Stokes; Paul Stoler; Steffen Strauch; Mauro Taiuti; David Tedeschi; Ulrike Thoma; Avtandil Tkabladze; Svyatoslav Tkachenko; Clarisse Tur; Maurizio Ungaro; Michael Vineyard; Alexander Vlassov; Daniel Watts; Lawrence Weinstein; Dennis Weygand; M. Williams; Elliott Wolin; M.H. Wood; Amrit Yegneswaran; Lorenzo Zana; Jixie Zhang; Bo Zhao; Zhiwen Zhao


    We examine the results of two measurements by the CLAS collaboration, one of which claimed evidence for a $\\Theta^{+}$ pentaquark, whilst the other found no such evidence. The unique feature of these two experiments was that they were performed with the same experimental setup. Using a Bayesian analysis we find that the results of the two experiments are in fact compatible with each other, but that the first measurement did not contain sufficient information to determine unambiguously the existence of a $\\Theta^{+}$. Further, we suggest a means by which the existence of a new candidate particle can be tested in a rigorous manner.

  17. Bayesian analysis of log Gaussian Cox processes for disease mapping

    DEFF Research Database (Denmark)

    Benes, Viktor; Bodlák, Karel; Møller, Jesper

    We consider a data set of locations where people in Central Bohemia have been infected by tick-borne encephalitis, and where population census data and covariates concerning vegetation and altitude are available. The aims are to estimate the risk map of the disease and to study the dependence...... of the risk on the covariates. Instead of using the common area level approaches we consider a Bayesian analysis for a log Gaussian Cox point process with covariates. Posterior characteristics for a discretized version of the log Gaussian Cox process are computed using markov chain Monte Carlo methods...

  18. WebBUGS: Conducting Bayesian Statistical Analysis Online

    Directory of Open Access Journals (Sweden)

    Zhiyong Zhang


    Full Text Available A web interface, named WebBUGS, is developed to conduct Bayesian analysis online over the Internet through OpenBUGS and R. WebBUGS can be used with the minimum requirement of a web browser both remotely and locally. WebBUGS has many collaborative features such as email notification and sharing. WebBUGS also eases the use of OpenBUGS by providing built-in model templates, data management module, and other useful modules. In this paper, the use of WebBUGS is illustrated and discussed.

  19. Bayesian Reasoning in Data Analysis A Critical Introduction

    CERN Document Server

    D'Agostini, Giulio


    This book provides a multi-level introduction to Bayesian reasoning (as opposed to "conventional statistics") and its applications to data analysis. The basic ideas of this "new" approach to the quantification of uncertainty are presented using examples from research and everyday life. Applications covered include: parametric inference; combination of results; treatment of uncertainty due to systematic errors and background; comparison of hypotheses; unfolding of experimental distributions; upper/lower bounds in frontier-type measurements. Approximate methods for routine use are derived and ar

  20. Implementation of a Bayesian Engine for Uncertainty Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Leng Vang; Curtis Smith; Steven Prescott


    In probabilistic risk assessment, it is important to have an environment where analysts have access to a shared and secured high performance computing and a statistical analysis tool package. As part of the advanced small modular reactor probabilistic risk analysis framework implementation, we have identified the need for advanced Bayesian computations. However, in order to make this technology available to non-specialists, there is also a need of a simplified tool that allows users to author models and evaluate them within this framework. As a proof-of-concept, we have implemented an advanced open source Bayesian inference tool, OpenBUGS, within the browser-based cloud risk analysis framework that is under development at the Idaho National Laboratory. This development, the “OpenBUGS Scripter” has been implemented as a client side, visual web-based and integrated development environment for creating OpenBUGS language scripts. It depends on the shared server environment to execute the generated scripts and to transmit results back to the user. The visual models are in the form of linked diagrams, from which we automatically create the applicable OpenBUGS script that matches the diagram. These diagrams can be saved locally or stored on the server environment to be shared with other users.

  1. Framework for network modularization and Bayesian network analysis to investigate the perturbed metabolic network. (United States)

    Kim, Hyun Uk; Kim, Tae Yong; Lee, Sang Yup


    Genome-scale metabolic network models have contributed to elucidating biological phenomena, and predicting gene targets to engineer for biotechnological applications. With their increasing importance, their precise network characterization has also been crucial for better understanding of the cellular physiology. We herein introduce a framework for network modularization and Bayesian network analysis (FMB) to investigate organism's metabolism under perturbation. FMB reveals direction of influences among metabolic modules, in which reactions with similar or positively correlated flux variation patterns are clustered, in response to specific perturbation using metabolic flux data. With metabolic flux data calculated by constraints-based flux analysis under both control and perturbation conditions, FMB, in essence, reveals the effects of specific perturbations on the biological system through network modularization and Bayesian network analysis at metabolic modular level. As a demonstration, this framework was applied to the genetically perturbed Escherichia coli metabolism, which is a lpdA gene knockout mutant, using its genome-scale metabolic network model. After all, it provides alternative scenarios of metabolic flux distributions in response to the perturbation, which are complementary to the data obtained from conventionally available genome-wide high-throughput techniques or metabolic flux analysis.

  2. Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models. (United States)

    Hack, C Eric


    Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach.

  3. DATMAN: A reliability data analysis program using Bayesian updating

    International Nuclear Information System (INIS)

    Becker, M.; Feltus, M.A.


    Preventive maintenance (PM) techniques focus on the prevention of failures, in particular, system components that are important to plant functions. Reliability-centered maintenance (RCM) improves on the PM techniques by introducing a set of guidelines by which to evaluate the system functions. It also minimizes intrusive maintenance, labor, and equipment downtime without sacrificing system performance when its function is essential for plant safety. Both the PM and RCM approaches require that system reliability data be updated as more component failures and operation time are acquired. Systems reliability and the likelihood of component failures can be calculated by Bayesian statistical methods, which can update these data. The DATMAN computer code has been developed at Penn State to simplify the Bayesian analysis by performing tedious calculations needed for RCM reliability analysis. DATMAN reads data for updating, fits a distribution that best fits the data, and calculates component reliability. DATMAN provides a user-friendly interface menu that allows the user to choose from several common prior and posterior distributions, insert new failure data, and visually select the distribution that matches the data most accurately

  4. Bayesian data analysis of the dynamics of rolling leukocytes (United States)

    Moskopp, Mats Leif; Preuss, Roland; Deussen, Andreas; Chavakis, Triantafyllos; Dieterich, Peter


    The coordinated recruitment of leukocytes to sites of infection and inflammation is a central process of the immune system and proceeds in several steps. Here we focus on the dynamics of rolling leukocytes obtained from in vitro experiments. Trajectories of rolling leukocytes in small flow chambers are acquired with phase contrast microscopy under different levels of fluid shear stress and a variation of protein coatings of the (adhesive) surfaces. Bayesian data analysis of a random walk model including drift is applied to individual trajectories of leukocytes. The analysis allows the estimation of drift velocities and diffusion coefficients within an uncertainty of about 10% and shows a certain homogeneity of the cell groups. Drift velocities of cells saturate in spite of increasing fluid flow. In addition, the analysis reveals some correlated fluctuations of cells' translocations requiring a refinement of the stochastic model.

  5. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering (United States)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann


    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  6. Stability analysis in K-means clustering. (United States)

    Steinley, Douglas


    This paper develops a new procedure, called stability analysis, for K-means clustering. Instead of ignoring local optima and only considering the best solution found, this procedure takes advantage of additional information from a K-means cluster analysis. The information from the locally optimal solutions is collected in an object by object co-occurrence matrix. The co-occurrence matrix is clustered and subsequently reordered by a steepest ascent quadratic assignment procedure to aid visual interpretation of the multidimensional cluster structure. Subsequently, measures are developed to determine the overall structure of a data set, the number of clusters and the multidimensional relationships between the clusters.

  7. Evolutionary Analysis of Dengue Serotype 2 Viruses Using Phylogenetic and Bayesian Methods from New Delhi, India.

    Directory of Open Access Journals (Sweden)

    Nazia Afreen


    Full Text Available Dengue fever is the most important arboviral disease in the tropical and sub-tropical countries of the world. Delhi, the metropolitan capital state of India, has reported many dengue outbreaks, with the last outbreak occurring in 2013. We have recently reported predominance of dengue virus serotype 2 during 2011-2014 in Delhi. In the present study, we report molecular characterization and evolutionary analysis of dengue serotype 2 viruses which were detected in 2011-2014 in Delhi. Envelope genes of 42 DENV-2 strains were sequenced in the study. All DENV-2 strains grouped within the Cosmopolitan genotype and further clustered into three lineages; Lineage I, II and III. Lineage III replaced lineage I during dengue fever outbreak of 2013. Further, a novel mutation Thr404Ile was detected in the stem region of the envelope protein of a single DENV-2 strain in 2014. Nucleotide substitution rate and time to the most recent common ancestor were determined by molecular clock analysis using Bayesian methods. A change in effective population size of Indian DENV-2 viruses was investigated through Bayesian skyline plot. The study will be a vital road map for investigation of epidemiology and evolutionary pattern of dengue viruses in India.

  8. Evolutionary Analysis of Dengue Serotype 2 Viruses Using Phylogenetic and Bayesian Methods from New Delhi, India. (United States)

    Afreen, Nazia; Naqvi, Irshad H; Broor, Shobha; Ahmed, Anwar; Kazim, Syed Naqui; Dohare, Ravins; Kumar, Manoj; Parveen, Shama


    Dengue fever is the most important arboviral disease in the tropical and sub-tropical countries of the world. Delhi, the metropolitan capital state of India, has reported many dengue outbreaks, with the last outbreak occurring in 2013. We have recently reported predominance of dengue virus serotype 2 during 2011-2014 in Delhi. In the present study, we report molecular characterization and evolutionary analysis of dengue serotype 2 viruses which were detected in 2011-2014 in Delhi. Envelope genes of 42 DENV-2 strains were sequenced in the study. All DENV-2 strains grouped within the Cosmopolitan genotype and further clustered into three lineages; Lineage I, II and III. Lineage III replaced lineage I during dengue fever outbreak of 2013. Further, a novel mutation Thr404Ile was detected in the stem region of the envelope protein of a single DENV-2 strain in 2014. Nucleotide substitution rate and time to the most recent common ancestor were determined by molecular clock analysis using Bayesian methods. A change in effective population size of Indian DENV-2 viruses was investigated through Bayesian skyline plot. The study will be a vital road map for investigation of epidemiology and evolutionary pattern of dengue viruses in India.

  9. Bayesian networks inference algorithm to implement Dempster Shafer theory in reliability analysis

    International Nuclear Information System (INIS)

    Simon, C.; Weber, P.; Evsukoff, A.


    This paper deals with the use of Bayesian networks to compute system reliability. The reliability analysis problem is described and the usual methods for quantitative reliability analysis are presented within a case study. Some drawbacks that justify the use of Bayesian networks are identified. The basic concepts of the Bayesian networks application to reliability analysis are introduced and a model to compute the reliability for the case study is presented. Dempster Shafer theory to treat epistemic uncertainty in reliability analysis is then discussed and its basic concepts that can be applied thanks to the Bayesian network inference algorithm are introduced. Finally, it is shown, with a numerical example, how Bayesian networks' inference algorithms compute complex system reliability and what the Dempster Shafer theory can provide to reliability analysis

  10. BATMAN: Bayesian Technique for Multi-image Analysis (United States)

    Casado, J.; Ascasibar, Y.; García-Benito, R.; Guidi, G.; Choudhury, O. S.; Bellocchi, E.; Sánchez, S. F.; Díaz, A. I.


    This paper describes the Bayesian Technique for Multi-image Analysis (BATMAN), a novel image-segmentation technique based on Bayesian statistics that characterizes any astronomical data set containing spatial information and performs a tessellation based on the measurements and errors provided as input. The algorithm iteratively merges spatial elements as long as they are statistically consistent with carrying the same information (I.e. identical signal within the errors). We illustrate its operation and performance with a set of test cases including both synthetic and real integral-field spectroscopic data. The output segmentations adapt to the underlying spatial structure, regardless of its morphology and/or the statistical properties of the noise. The quality of the recovered signal represents an improvement with respect to the input, especially in regions with low signal-to-noise ratio. However, the algorithm may be sensitive to small-scale random fluctuations, and its performance in presence of spatial gradients is limited. Due to these effects, errors may be underestimated by as much as a factor of 2. Our analysis reveals that the algorithm prioritizes conservation of all the statistically significant information over noise reduction, and that the precise choice of the input data has a crucial impact on the results. Hence, the philosophy of BaTMAn is not to be used as a 'black box' to improve the signal-to-noise ratio, but as a new approach to characterize spatially resolved data prior to its analysis. The source code is publicly available at

  11. BClass: A Bayesian Approach Based on Mixture Models for Clustering and Classification of Heterogeneous Biological Data

    Directory of Open Access Journals (Sweden)

    Arturo Medrano-Soto


    Full Text Available Based on mixture models, we present a Bayesian method (called BClass to classify biological entities (e.g. genes when variables of quite heterogeneous nature are analyzed. Various statistical distributions are used to model the continuous/categorical data commonly produced by genetic experiments and large-scale genomic projects. We calculate the posterior probability of each entry to belong to each element (group in the mixture. In this way, an original set of heterogeneous variables is transformed into a set of purely homogeneous characteristics represented by the probabilities of each entry to belong to the groups. The number of groups in the analysis is controlled dynamically by rendering the groups as 'alive' and 'dormant' depending upon the number of entities classified within them. Using standard Metropolis-Hastings and Gibbs sampling algorithms, we constructed a sampler to approximate posterior moments and grouping probabilities. Since this method does not require the definition of similarity measures, it is especially suitable for data mining and knowledge discovery in biological databases. We applied BClass to classify genes in RegulonDB, a database specialized in information about the transcriptional regulation of gene expression in the bacterium Escherichia coli. The classification obtained is consistent with current knowledge and allowed prediction of missing values for a number of genes. BClass is object-oriented and fully programmed in Lisp-Stat. The output grouping probabilities are analyzed and interpreted using graphical (dynamically linked plots and query-based approaches. We discuss the advantages of using Lisp-Stat as a programming language as well as the problems we faced when the data volume increased exponentially due to the ever-growing number of genomic projects.

  12. A Bayesian Analysis of Unobserved Component Models Using Ox

    Directory of Open Access Journals (Sweden)

    Charles S. Bos


    Full Text Available This article details a Bayesian analysis of the Nile river flow data, using a similar state space model as other articles in this volume. For this data set, Metropolis-Hastings and Gibbs sampling algorithms are implemented in the programming language Ox. These Markov chain Monte Carlo methods only provide output conditioned upon the full data set. For filtered output, conditioning only on past observations, the particle filter is introduced. The sampling methods are flexible, and this advantage is used to extend the model to incorporate a stochastic volatility process. The volatility changes both in the Nile data and also in daily S&P 500 return data are investigated. The posterior density of parameters and states is found to provide information on which elements of the model are easily identifiable, and which elements are estimated with less precision.

  13. Bayesian analysis of factors associated with fibromyalgia syndrome subjects (United States)

    Jayawardana, Veroni; Mondal, Sumona; Russek, Leslie


    Factors contributing to movement-related fear were assessed by Russek, et al. 2014 for subjects with Fibromyalgia (FM) based on the collected data by a national internet survey of community-based individuals. The study focused on the variables, Activities-Specific Balance Confidence scale (ABC), Primary Care Post-Traumatic Stress Disorder screen (PC-PTSD), Tampa Scale of Kinesiophobia (TSK), a Joint Hypermobility Syndrome screen (JHS), Vertigo Symptom Scale (VSS-SF), Obsessive-Compulsive Personality Disorder (OCPD), Pain, work status and physical activity dependent from the "Revised Fibromyalgia Impact Questionnaire" (FIQR). The study presented in this paper revisits same data with a Bayesian analysis where appropriate priors were introduced for variables selected in the Russek's paper.

  14. Bayesian analysis for uncertainty estimation of a canopy transpiration model (United States)

    Samanta, S.; Mackay, D. S.; Clayton, M. K.; Kruger, E. L.; Ewers, B. E.


    A Bayesian approach was used to fit a conceptual transpiration model to half-hourly transpiration rates for a sugar maple (Acer saccharum) stand collected over a 5-month period and probabilistically estimate its parameter and prediction uncertainties. The model used the Penman-Monteith equation with the Jarvis model for canopy conductance. This deterministic model was extended by adding a normally distributed error term. This extension enabled using Markov chain Monte Carlo simulations to sample the posterior parameter distributions. The residuals revealed approximate conformance to the assumption of normally distributed errors. However, minor systematic structures in the residuals at fine timescales suggested model changes that would potentially improve the modeling of transpiration. Results also indicated considerable uncertainties in the parameter and transpiration estimates. This simple methodology of uncertainty analysis would facilitate the deductive step during the development cycle of deterministic conceptual models by accounting for these uncertainties while drawing inferences from data.

  15. Limitations of cytochrome oxidase I for the barcoding of Neritidae (Mollusca: Gastropoda) as revealed by Bayesian analysis. (United States)

    Chee, S Y


    The mitochondrial DNA (mtDNA) cytochrome oxidase I (COI) gene has been universally and successfully utilized as a barcoding gene, mainly because it can be amplified easily, applied across a wide range of taxa, and results can be obtained cheaply and quickly. However, in rare cases, the gene can fail to distinguish between species, particularly when exposed to highly sensitive methods of data analysis, such as the Bayesian method, or when taxa have undergone introgressive hybridization, over-splitting, or incomplete lineage sorting. Such cases require the use of alternative markers, and nuclear DNA markers are commonly used. In this study, a dendrogram produced by Bayesian analysis of an mtDNA COI dataset was compared with that of a nuclear DNA ATPS-α dataset, in order to evaluate the efficiency of COI in barcoding Malaysian nerites (Neritidae). In the COI dendrogram, most of the species were in individual clusters, except for two species: Nerita chamaeleon and N. histrio. These two species were placed in the same subcluster, whereas in the ATPS-α dendrogram they were in their own subclusters. Analysis of the ATPS-α gene also placed the two genera of nerites (Nerita and Neritina) in separate clusters, whereas COI gene analysis placed both genera in the same cluster. Therefore, in the case of the Neritidae, the ATPS-α gene is a better barcoding gene than the COI gene.

  16. Tanzania: A Hierarchical Cluster Analysis Approach | Ngaruko ...

    African Journals Online (AJOL)

    Using survey data from Kibondo district, west Tanzania, we use hierarchical cluster analysis to classify borrower farmers according to their borrowing behaviour into four distinctive clusters. The appreciation of the existence of heterogeneous farmer clusters is vital in forging credit delivery policies that are not only ...

  17. Bayesian nonparametric meta-analysis using Polya tree mixture models. (United States)

    Branscum, Adam J; Hanson, Timothy E


    Summary. A common goal in meta-analysis is estimation of a single effect measure using data from several studies that are each designed to address the same scientific inquiry. Because studies are typically conducted in geographically disperse locations, recent developments in the statistical analysis of meta-analytic data involve the use of random effects models that account for study-to-study variability attributable to differences in environments, demographics, genetics, and other sources that lead to heterogeneity in populations. Stemming from asymptotic theory, study-specific summary statistics are modeled according to normal distributions with means representing latent true effect measures. A parametric approach subsequently models these latent measures using a normal distribution, which is strictly a convenient modeling assumption absent of theoretical justification. To eliminate the influence of overly restrictive parametric models on inferences, we consider a broader class of random effects distributions. We develop a novel hierarchical Bayesian nonparametric Polya tree mixture (PTM) model. We present methodology for testing the PTM versus a normal random effects model. These methods provide researchers a straightforward approach for conducting a sensitivity analysis of the normality assumption for random effects. An application involving meta-analysis of epidemiologic studies designed to characterize the association between alcohol consumption and breast cancer is presented, which together with results from simulated data highlight the performance of PTMs in the presence of nonnormality of effect measures in the source population.

  18. Spatial Dependence and Heterogeneity in Bayesian Factor Analysis : A Cross-National Investigation of Schwartz Values

    NARCIS (Netherlands)

    Stakhovych, Stanislav; Bijmolt, Tammo H. A.; Wedel, Michel


    In this article, we present a Bayesian spatial factor analysis model. We extend previous work on confirmatory factor analysis by including geographically distributed latent variables and accounting for heterogeneity and spatial autocorrelation. The simulation study shows excellent recovery of the

  19. Cluster analysis in phenotyping a Portuguese population. (United States)

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J


    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  20. Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis (United States)

    Dezfuli, Homayoon; Kelly, Dana; Smith, Curtis; Vedros, Kurt; Galyean, William


    This document, Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis, is intended to provide guidelines for the collection and evaluation of risk and reliability-related data. It is aimed at scientists and engineers familiar with risk and reliability methods and provides a hands-on approach to the investigation and application of a variety of risk and reliability data assessment methods, tools, and techniques. This document provides both: A broad perspective on data analysis collection and evaluation issues. A narrow focus on the methods to implement a comprehensive information repository. The topics addressed herein cover the fundamentals of how data and information are to be used in risk and reliability analysis models and their potential role in decision making. Understanding these topics is essential to attaining a risk informed decision making environment that is being sought by NASA requirements and procedures such as 8000.4 (Agency Risk Management Procedural Requirements), NPR 8705.05 (Probabilistic Risk Assessment Procedures for NASA Programs and Projects), and the System Safety requirements of NPR 8715.3 (NASA General Safety Program Requirements).

  1. Bayesian multivariate meta-analysis of multiple factors. (United States)

    Lin, Lifeng; Chu, Haitao


    In medical sciences, a disease condition is typically associated with multiple risk and protective factors. Although many studies report results of multiple factors, nearly all meta-analyses separately synthesize the association between each factor and the disease condition of interest. The collected studies usually report different subsets of factors, and the results from separate analyses on multiple factors may not be comparable because each analysis may use different subpopulation. This may impact on selecting most important factors to design a multifactor intervention program. This article proposes a new concept, multivariate meta-analysis of multiple factors (MVMA-MF), to synthesize all available factors simultaneously. By borrowing information across factors, MVMA-MF can improve statistical efficiency and reduce biases compared with separate analyses when factors were missing not at random. As within-study correlations between factors are commonly unavailable from published articles, we use a Bayesian hybrid model to perform MVMA-MF, which effectively accounts for both within- and between-study correlations. The performance of MVMA-MF and the conventional methods are compared using simulations and an application to a pterygium dataset consisting of 29 studies on 8 risk factors. Copyright © 2018 John Wiley & Sons, Ltd.

  2. Thermodynamically consistent Bayesian analysis of closed biochemical reaction systems

    Directory of Open Access Journals (Sweden)

    Goutsias John


    Full Text Available Abstract Background Estimating the rate constants of a biochemical reaction system with known stoichiometry from noisy time series measurements of molecular concentrations is an important step for building predictive models of cellular function. Inference techniques currently available in the literature may produce rate constant values that defy necessary constraints imposed by the fundamental laws of thermodynamics. As a result, these techniques may lead to biochemical reaction systems whose concentration dynamics could not possibly occur in nature. Therefore, development of a thermodynamically consistent approach for estimating the rate constants of a biochemical reaction system is highly desirable. Results We introduce a Bayesian analysis approach for computing thermodynamically consistent estimates of the rate constants of a closed biochemical reaction system with known stoichiometry given experimental data. Our method employs an appropriately designed prior probability density function that effectively integrates fundamental biophysical and thermodynamic knowledge into the inference problem. Moreover, it takes into account experimental strategies for collecting informative observations of molecular concentrations through perturbations. The proposed method employs a maximization-expectation-maximization algorithm that provides thermodynamically feasible estimates of the rate constant values and computes appropriate measures of estimation accuracy. We demonstrate various aspects of the proposed method on synthetic data obtained by simulating a subset of a well-known model of the EGF/ERK signaling pathway, and examine its robustness under conditions that violate key assumptions. Software, coded in MATLAB®, which implements all Bayesian analysis techniques discussed in this paper, is available free of charge at Conclusions Our approach provides an attractive statistical methodology for

  3. Prior sensitivity analysis in default Bayesian structural equation modeling

    NARCIS (Netherlands)

    van Erp, S.J.; Mulder, J.; Oberski, Daniel L.


    Bayesian structural equation modeling (BSEM) has recently gained popularity because it enables researchers to fit complex models while solving some of the issues often encountered in classical maximum likelihood (ML) estimation, such as nonconvergence and inadmissible solutions. An important

  4. How do you solve a problem like Letharia? A new look at cryptic species in lichen-forming fungi using Bayesian clustering and SNPs from multilocus sequence data.

    Directory of Open Access Journals (Sweden)

    Susanne Altermann

    Full Text Available The inclusion of molecular data is increasingly an integral part of studies assessing species boundaries. Analyses based on predefined groups may obscure patterns of differentiation, and population assignment tests provide an alternative for identifying population structure and barriers to gene flow. In this study, we apply population assignment tests implemented in the programs STRUCTURE and BAPS to single nucleotide polymorphisms from DNA sequence data generated for three previous studies of the lichenized fungal genus Letharia. Previous molecular work employing a gene genealogical approach circumscribed six species-level lineages within the genus, four putative lineages within the nominal taxon L. columbiana (Nutt. J.W. Thomson and two sorediate lineages. We show that Bayesian clustering implemented in the program STRUCTURE was generally able to recover the same six putative Letharia lineages. Population assignments were largely consistent across a range of scenarios, including: extensive amounts of missing data, the exclusion of SNPs from variable markers, and inferences based on SNPs from as few as three gene regions. While our study provided additional evidence corroborating the six candidate Letharia species, the equivalence of these genetic clusters with species-level lineages is uncertain due, in part, to limited phylogenetic signal. Furthermore, both the BAPS analysis and the ad hoc ΔK statistic from results of the STRUCTURE analysis suggest that population structure can possibly be captured with fewer genetic groups. Our findings also suggest that uneven sampling across taxa may be responsible for the contrasting inferences of population substructure. Our results consistently supported two distinct sorediate groups, 'L. lupina' and L. vulpina, and subtle morphological differences support this distinction. Similarly, the putative apotheciate species 'L. lucida' was also consistently supported as a distinct genetic cluster. However

  5. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion. (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K


    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  6. Bayesian uncertainty analysis with applications to turbulence modeling

    International Nuclear Information System (INIS)

    Cheung, Sai Hung; Oliver, Todd A.; Prudencio, Ernesto E.; Prudhomme, Serge; Moser, Robert D.


    In this paper, we apply Bayesian uncertainty quantification techniques to the processes of calibrating complex mathematical models and predicting quantities of interest (QoI's) with such models. These techniques also enable the systematic comparison of competing model classes. The processes of calibration and comparison constitute the building blocks of a larger validation process, the goal of which is to accept or reject a given mathematical model for the prediction of a particular QoI for a particular scenario. In this work, we take the first step in this process by applying the methodology to the analysis of the Spalart-Allmaras turbulence model in the context of incompressible, boundary layer flows. Three competing model classes based on the Spalart-Allmaras model are formulated, calibrated against experimental data, and used to issue a prediction with quantified uncertainty. The model classes are compared in terms of their posterior probabilities and their prediction of QoI's. The model posterior probability represents the relative plausibility of a model class given the data. Thus, it incorporates the model's ability to fit experimental observations. Alternatively, comparing models using the predicted QoI connects the process to the needs of decision makers that use the results of the model. We show that by using both the model plausibility and predicted QoI, one has the opportunity to reject some model classes after calibration, before subjecting the remaining classes to additional validation challenges.

  7. Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations (United States)

    Scargle, Jeffrey D.; Norris, Jay P.; Jackson, Brad; Chiang, James


    This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it-an improved and generalized version of Bayesian Blocks [Scargle 1998]-that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piece- wise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by [Arias-Castro, Donoho and Huo 2003]. In the spirit of Reproducible Research [Donoho et al. (2008)] all of the code and data necessary to reproduce all of the figures in this paper are included as auxiliary material.

  8. Combining morphological analysis and Bayesian networks for strategic decision support

    Directory of Open Access Journals (Sweden)

    A de Waal


    Full Text Available Morphological analysis (MA and Bayesian networks (BN are two closely related modelling methods, each of which has its advantages and disadvantages for strategic decision support modelling. MA is a method for defining, linking and evaluating problem spaces. BNs are graphical models which consist of a qualitative and quantitative part. The qualitative part is a cause-and-effect, or causal graph. The quantitative part depicts the strength of the causal relationships between variables. Combining MA and BN, as two phases in a modelling process, allows us to gain the benefits of both of these methods. The strength of MA lies in defining, linking and internally evaluating the parameters of problem spaces and BN modelling allows for the definition and quantification of causal relationships between variables. Short summaries of MA and BN are provided in this paper, followed by discussions how these two computer aided methods may be combined to better facilitate modelling procedures. A simple example is presented, concerning a recent application in the field of environmental decision support.

  9. Bayesian analysis of a reduced-form air quality model. (United States)

    Foley, Kristen M; Reich, Brian J; Napelenok, Sergey L


    Numerical air quality models are being used for assessing emission control strategies for improving ambient pollution levels across the globe. This paper applies probabilistic modeling to evaluate the effectiveness of emission reduction scenarios aimed at lowering ground-level ozone concentrations. A Bayesian hierarchical model is used to combine air quality model output and monitoring data in order to characterize the impact of emissions reductions while accounting for different degrees of uncertainty in the modeled emissions inputs. The probabilistic model predictions are weighted based on population density in order to better quantify the societal benefits/disbenefits of four hypothetical emission reduction scenarios in which domain-wide NO(x) emissions from various sectors are reduced individually and then simultaneously. Cross validation analysis shows the statistical model performs well compared to observed ozone levels. Accounting for the variability and uncertainty in the emissions and atmospheric systems being modeled is shown to impact how emission reduction scenarios would be ranked, compared to standard methodology.

  10. Bayesian analysis of inflation: Parameter estimation for single field models

    International Nuclear Information System (INIS)

    Mortonson, Michael J.; Peiris, Hiranya V.; Easther, Richard


    Future astrophysical data sets promise to strengthen constraints on models of inflation, and extracting these constraints requires methods and tools commensurate with the quality of the data. In this paper we describe ModeCode, a new, publicly available code that computes the primordial scalar and tensor power spectra for single-field inflationary models. ModeCode solves the inflationary mode equations numerically, avoiding the slow roll approximation. It is interfaced with CAMB and CosmoMC to compute cosmic microwave background angular power spectra and perform likelihood analysis and parameter estimation. ModeCode is easily extendable to additional models of inflation, and future updates will include Bayesian model comparison. Errors from ModeCode contribute negligibly to the error budget for analyses of data from Planck or other next generation experiments. We constrain representative single-field models (φ n with n=2/3, 1, 2, and 4, natural inflation, and 'hilltop' inflation) using current data, and provide forecasts for Planck. From current data, we obtain weak but nontrivial limits on the post-inflationary physics, which is a significant source of uncertainty in the predictions of inflationary models, while we find that Planck will dramatically improve these constraints. In particular, Planck will link the inflationary dynamics with the post-inflationary growth of the horizon, and thus begin to probe the ''primordial dark ages'' between TeV and grand unified theory scale energies.

  11. Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations (United States)

    Scargle, Jeffrey D.; Norris, Jay P.; Jackson, Brad; Chiang, James


    This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it—an improved and generalized version of Bayesian Blocks—that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by Arias-Castro et al. In the spirit of Reproducible Research all of the code and data necessary to reproduce all of the figures in this paper are included as supplementary material.

  12. Binary naive Bayesian classifiers for correlated Gaussian features: a theoretical analysis

    CSIR Research Space (South Africa)

    Van Dyk, E


    Full Text Available We investigate the use of Naive Bayesian classifiers for correlated Gaussian feature spaces and derive error estimates for these classifiers. The error analysis is done by developing an exact expression for the error performance of a binary...

  13. Bayesian specification analysis and estimation of simultaneous equation models using Monte Carlo methods

    NARCIS (Netherlands)

    A. Zellner (Arnold); L. Bauwens (Luc); H.K. van Dijk (Herman)


    textabstractBayesian procedures for specification analysis or diagnostic checking of modeling assumptions for structural equations of econometric models are developed and applied using Monte Carlo numerical methods. Checks on the validity of identifying restrictions, exogeneity assumptions and other

  14. Review of bayesian statistical analysis methods for cytogenetic radiation biodosimetry, with a practical example

    International Nuclear Information System (INIS)

    Ainsbury, Elizabeth A.; Lloyd, David C.; Rothkamm, Kai; Vinnikov, Volodymyr A.; Maznyk, Nataliya A.; Puig, Pedro; Higueras, Manuel


    Classical methods of assessing the uncertainty associated with radiation doses estimated using cytogenetic techniques are now extremely well defined. However, several authors have suggested that a Bayesian approach to uncertainty estimation may be more suitable for cytogenetic data, which are inherently stochastic in nature. The Bayesian analysis framework focuses on identification of probability distributions (for yield of aberrations or estimated dose), which also means that uncertainty is an intrinsic part of the analysis, rather than an 'afterthought'. In this paper Bayesian, as well as some more advanced classical, data analysis methods for radiation cytogenetics are reviewed that have been proposed in the literature. A practical overview of Bayesian cytogenetic dose estimation is also presented, with worked examples from the literature. (authors)

  15. [Cluster analysis and its application]. (United States)

    Půlpán, Zdenĕk


    The study exploits knowledge-oriented and context-based modification of well-known algorithms of (fuzzy) clustering. The role of fuzzy sets is inherently inclined towards coping with linguistic domain knowledge also. We try hard to obtain from rich diverse data and knowledge new information about enviroment that is being explored.

  16. Cognition, quality-of-life, and symptom clusters in breast cancer: Using Bayesian networks to elucidate complex relationships. (United States)

    Xu, Selene; Thompson, Wesley; Ancoli-Israel, Sonia; Liu, Lianqi; Palmer, Barton; Natarajan, Loki


    Breast cancer patients frequently complain of cognitive dysfunction during chemotherapy. Patients also report experiencing a cluster of sleep problems, fatigue, and depressive symptoms during chemotherapy. We aimed to understand the complex dynamic interrelationships of depression, fatigue, and sleep to ultimately elucidate their role in cognitive performance and quality of life amongst breast cancer survivors undergoing chemotherapy treatment. Our study sample comprised 74 newly diagnosed stage I to III breast cancer patients scheduled to receive chemotherapy. An objective neuropsychological test battery and self-reported fatigue, mood, sleep quality, and quality of life were collected at 3 time points: before the start of chemotherapy (baseline: BL), at the end of cycle 4 chemotherapy (C4), and 1 year after the start of chemotherapy (Y1). We applied novel Bayesian network methods to investigate the role of sleep/fatigue/mood on cognition and quality of life prior to, during, and after chemotherapy. The fitted network exhibited strong direct and indirect links between symptoms, cognitive performance, and quality of life. The only symptom directly linked to cognitive performance was C4 sleep quality; at C4, fatigue was directly linked to sleep and thus indirectly influenced cognitive performance. Mood strongly influenced concurrent quality of life at C4 and Y1. Regression estimates indicated that worse sleep quality, fatigue, and mood were negatively associated with cognitive performance or quality of life. The Bayesian network identified local structure (eg, fatigue-mood-QoL or sleep-cognition) and possible intervention targets (eg, a sleep intervention to reduce cognitive complaints during chemotherapy). Copyright © 2017 John Wiley & Sons, Ltd.

  17. An Automated Bayesian Framework for Integrative Gene Expression Analysis and Predictive Medicine


    Parikh, Neena; Zollanvari, Amin; Alterovitz, Gil


    Motivation: This work constructs a closed loop Bayesian Network framework for predictive medicine via integrative analysis of publicly available gene expression findings pertaining to various diseases. Results: An automated pipeline was successfully constructed. Integrative models were made based on gene expression data obtained from GEO experiments relating to four different diseases using Bayesian statistical methods. Many of these models demonstrated a high level of accuracy and predictive...

  18. Bayesian analysis of right censored survival time data | Abiodun ...

    African Journals Online (AJOL)

    We analyzed cancer data using Fully Bayesian inference approach based on Markov Chain Monte Carlo (MCMC) simulation technique which allows the estimation of very complex and realistic models. The results show that sex and age are significant risk factors for dying from some selected cancers. The risk of dying from ...

  19. Exploiting sensitivity analysis in Bayesian networks for consumer satisfaction study

    NARCIS (Netherlands)

    Jaronski, W.; Bloemer, J.M.M.; Vanhoof, K.; Wets, G.


    The paper presents an application of Bayesian network technology in a empirical customer satisfaction study. The findings of the study should provide insight as to the importance of product/service dimensions in terms of the strength of their influence on overall satisfaction. To this end we apply a

  20. Review of applications of Bayesian meta-analysis in systematic reviews

    Directory of Open Access Journals (Sweden)

    Melissa Glenda Lewis


    Full Text Available Background: Systematic reviews are important sources of evidence in health care research. These reviews may or may not include meta-analysis as a statistical assimilation of the results of several studies in order to acquire a pooled estimate. Systematic review with meta-analysis is considered as a robust method of evidence synthesis. The methodology concerned with traditional meta-analysis does not incorporate external prior information. Hence, Bayesian methods are essential due to the natural process of incorporating the past information and updating the belief. Bayesian methods to meta-analysis have been developed with a motivation from the limitations of traditional meta-analysis such as dealing with missing data, problem with limited number of studies and problem with sparse event data in both the groups. The present article aims to unearth as to what extent Bayesian methods have been used in systematic reviews, evolution and its applications. This article also highlights the existing challenges and opportunities. Methods: The literature search was performed in databases such as Cochrane, PubMed, ProQuest and Scopus using the keywords “Bayesian Meta-analysis” and “Bayesian Meta-analyses”. All the methodology and application oriented papers specific to Bayesian meta-analysis were considered relevant for this review. Conclusion: Bayesian meta-analysis has gained popularity in the field of evidence synthesis of clinical trials. However, it did not pick up momentum in summarizing public health interventions, owing to the fact that public health interventions are targeted to highly heterogeneous population, multi-component interventions, and multiple outcomes and influenced by the context

  1. RADYBAN: A tool for reliability analysis of dynamic fault trees through conversion into dynamic Bayesian networks

    International Nuclear Information System (INIS)

    Montani, S.; Portinale, L.; Bobbio, A.; Codetta-Raiteri, D.


    In this paper, we present RADYBAN (Reliability Analysis with DYnamic BAyesian Networks), a software tool which allows to analyze a dynamic fault tree relying on its conversion into a dynamic Bayesian network. The tool implements a modular algorithm for automatically translating a dynamic fault tree into the corresponding dynamic Bayesian network and exploits classical algorithms for the inference on dynamic Bayesian networks, in order to compute reliability measures. After having described the basic features of the tool, we show how it operates on a real world example and we compare the unreliability results it generates with those returned by other methodologies, in order to verify the correctness and the consistency of the results obtained

  2. Built environment and Property Crime in Seattle, 1998–2000: A Bayesian Analysis (United States)

    Matthews, Stephen A.; Yang, Tse-chuan; Hayslett-McCall, Karen L.; Ruback, R. Barry


    The past decade has seen a rapid growth in the use of a spatial perspective in studies of crime. In part this growth has been driven by the availability of georeferenced data, and the tools to analyze and visualize them: geographic information systems (GIS), spatial analysis, and spatial statistics. In this paper we use exploratory spatial data analysis (ESDA) tools and Bayesian models to help better understand the spatial patterning and predictors of property crime in Seattle, Washington for 1998–2000, including a focus on built environment variables. We present results for aggregate property crime data as well as models for specific property crime types: residential burglary, nonresidential burglary, theft, auto theft, and arson. ESDA confirms the presence of spatial clustering of property crime and we seek to explain these patterns using spatial Poisson models implemented in WinBUGS. Our results indicate that built environment variables were significant predictors of property crime, especially the presence of a highway on auto theft and burglary. PMID:24737924

  3. Built environment and Property Crime in Seattle, 1998-2000: A Bayesian Analysis. (United States)

    Matthews, Stephen A; Yang, Tse-Chuan; Hayslett-McCall, Karen L; Ruback, R Barry


    The past decade has seen a rapid growth in the use of a spatial perspective in studies of crime. In part this growth has been driven by the availability of georeferenced data, and the tools to analyze and visualize them: geographic information systems (GIS), spatial analysis, and spatial statistics. In this paper we use exploratory spatial data analysis (ESDA) tools and Bayesian models to help better understand the spatial patterning and predictors of property crime in Seattle, Washington for 1998-2000, including a focus on built environment variables. We present results for aggregate property crime data as well as models for specific property crime types: residential burglary, nonresidential burglary, theft, auto theft, and arson. ESDA confirms the presence of spatial clustering of property crime and we seek to explain these patterns using spatial Poisson models implemented in WinBUGS. Our results indicate that built environment variables were significant predictors of property crime, especially the presence of a highway on auto theft and burglary.

  4. A Bayesian approach to meta-analysis of plant pathology studies. (United States)

    Mila, A L; Ngugi, H K


    Bayesian statistical methods are used for meta-analysis in many disciplines, including medicine, molecular biology, and engineering, but have not yet been applied for quantitative synthesis of plant pathology studies. In this paper, we illustrate the key concepts of Bayesian statistics and outline the differences between Bayesian and classical (frequentist) methods in the way parameters describing population attributes are considered. We then describe a Bayesian approach to meta-analysis and present a plant pathological example based on studies evaluating the efficacy of plant protection products that induce systemic acquired resistance for the management of fire blight of apple. In a simple random-effects model assuming a normal distribution of effect sizes and no prior information (i.e., a noninformative prior), the results of the Bayesian meta-analysis are similar to those obtained with classical methods. Implementing the same model with a Student's t distribution and a noninformative prior for the effect sizes, instead of a normal distribution, yields similar results for all but acibenzolar-S-methyl (Actigard) which was evaluated only in seven studies in this example. Whereas both the classical (P = 0.28) and the Bayesian analysis with a noninformative prior (95% credibility interval [CRI] for the log response ratio: -0.63 to 0.08) indicate a nonsignificant effect for Actigard, specifying a t distribution resulted in a significant, albeit variable, effect for this product (CRI: -0.73 to -0.10). These results confirm the sensitivity of the analytical outcome (i.e., the posterior distribution) to the choice of prior in Bayesian meta-analyses involving a limited number of studies. We review some pertinent literature on more advanced topics, including modeling of among-study heterogeneity, publication bias, analyses involving a limited number of studies, and methods for dealing with missing data, and show how these issues can be approached in a Bayesian framework

  5. Conjunction analysis and propositional logic in fMRI data analysis using Bayesian statistics. (United States)

    Rudert, Thomas; Lohmann, Gabriele


    To evaluate logical expressions over different effects in data analyses using the general linear model (GLM) and to evaluate logical expressions over different posterior probability maps (PPMs). In functional magnetic resonance imaging (fMRI) data analysis, the GLM was applied to estimate unknown regression parameters. Based on the GLM, Bayesian statistics can be used to determine the probability of conjunction, disjunction, implication, or any other arbitrary logical expression over different effects or contrast. For second-level inferences, PPMs from individual sessions or subjects are utilized. These PPMs can be combined to a logical expression and its probability can be computed. The methods proposed in this article are applied to data from a STROOP experiment and the methods are compared to conjunction analysis approaches for test-statistics. The combination of Bayesian statistics with propositional logic provides a new approach for data analyses in fMRI. Two different methods are introduced for propositional logic: the first for analyses using the GLM and the second for common inferences about different probability maps. The methods introduced extend the idea of conjunction analysis to a full propositional logic and adapt it from test-statistics to Bayesian statistics. The new approaches allow inferences that are not possible with known standard methods in fMRI. (c) 2008 Wiley-Liss, Inc.

  6. A Bayesian Analysis of the Radioactive Releases of Fukushima

    DEFF Research Database (Denmark)

    Tomioka, Ryota; Mørup, Morten


    the types of nuclides and their levels of concentration from the recorded mixture of radiations to take necessary measures. We presently formulate a Bayesian generative model for the data available on radioactive releases from the Fukushima Daiichi disaster across Japan. From the sparsely sampled...... the Fukushima Daiichi plant we establish that the model is able to account for the data. We further demonstrate how the model extends to include all the available measurements recorded throughout Japan. The model can be considered a first attempt to apply Bayesian learning unsupervised in order to give a more......The Fukushima Daiichi disaster 11 March, 2011 is considered the largest nuclear accident since the 1986 Chernobyl disaster and has been rated at level 7 on the International Nuclear Event Scale. As different radioactive materials have different effects to human body, it is important to know...

  7. A Dynamic Bayesian Approach to Computational Laban Shape Quality Analysis

    Directory of Open Access Journals (Sweden)

    Dilip Swaminathan


    kinesiology. LMA (especially Effort/Shape emphasizes how internal feelings and intentions govern the patterning of movement throughout the whole body. As we argue, a complex understanding of intention via LMA is necessary for human-computer interaction to become embodied in ways that resemble interaction in the physical world. We thus introduce a novel, flexible Bayesian fusion approach for identifying LMA Shape qualities from raw motion capture data in real time. The method uses a dynamic Bayesian network (DBN to fuse movement features across the body and across time and as we discuss can be readily adapted for low-cost video. It has delivered excellent performance in preliminary studies comprising improvisatory movements. Our approach has been incorporated in Response, a mixed-reality environment where users interact via natural, full-body human movement and enhance their bodily-kinesthetic awareness through immersive sound and light feedback, with applications to kinesiology training, Parkinson's patient rehabilitation, interactive dance, and many other areas.

  8. Bayesian models for comparative analysis integrating phylogenetic uncertainty (United States)


    Background Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for

  9. Bayesian Nonparametric Measurement of Factor Betas and Clustering with Application to Hedge Fund Returns

    Directory of Open Access Journals (Sweden)

    Urbi Garay


    Full Text Available We define a dynamic and self-adjusting mixture of Gaussian Graphical Models to cluster financial returns, and provide a new method for extraction of nonparametric estimates of dynamic alphas (excess return and betas (to a choice set of explanatory factors in a multivariate setting. This approach, as well as the outputs, has a dynamic, nonstationary and nonparametric form, which circumvents the problem of model risk and parametric assumptions that the Kalman filter and other widely used approaches rely on. The by-product of clusters, used for shrinkage and information borrowing, can be of use to determine relationships around specific events. This approach exhibits a smaller Root Mean Squared Error than traditionally used benchmarks in financial settings, which we illustrate through simulation. As an illustration, we use hedge fund index data, and find that our estimated alphas are, on average, 0.13% per month higher (1.6% per year than alphas estimated through Ordinary Least Squares. The approach exhibits fast adaptation to abrupt changes in the parameters, as seen in our estimated alphas and betas, which exhibit high volatility, especially in periods which can be identified as times of stressful market events, a reflection of the dynamic positioning of hedge fund portfolio managers.

  10. Use of SAMC for Bayesian analysis of statistical models with intractable normalizing constants

    KAUST Repository

    Jin, Ick Hoon


    Statistical inference for the models with intractable normalizing constants has attracted much attention. During the past two decades, various approximation- or simulation-based methods have been proposed for the problem, such as the Monte Carlo maximum likelihood method and the auxiliary variable Markov chain Monte Carlo methods. The Bayesian stochastic approximation Monte Carlo algorithm specifically addresses this problem: It works by sampling from a sequence of approximate distributions with their average converging to the target posterior distribution, where the approximate distributions can be achieved using the stochastic approximation Monte Carlo algorithm. A strong law of large numbers is established for the Bayesian stochastic approximation Monte Carlo estimator under mild conditions. Compared to the Monte Carlo maximum likelihood method, the Bayesian stochastic approximation Monte Carlo algorithm is more robust to the initial guess of model parameters. Compared to the auxiliary variable MCMC methods, the Bayesian stochastic approximation Monte Carlo algorithm avoids the requirement for perfect samples, and thus can be applied to many models for which perfect sampling is not available or very expensive. The Bayesian stochastic approximation Monte Carlo algorithm also provides a general framework for approximate Bayesian analysis. © 2012 Elsevier B.V. All rights reserved.

  11. Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection. (United States)

    Dhavala, Soma S; Datta, Sujay; Mallick, Bani K; Carroll, Raymond J; Khare, Sangeeta; Lawhon, Sara D; Adams, L Garry


    Massively Parallel Signature Sequencing (MPSS) is a high-throughput counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression (SAGE) and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflated Poisson (ZIP) distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base-pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using non-parametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries.

  12. Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection

    KAUST Repository

    Dhavala, Soma S.


    Massively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflatedPoisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online. © 2010 American Statistical Association.

  13. Cluster analysis of pharmacists' work attitudes. (United States)

    Nakagomi, Keiichi; Hayashi, Yukikazu; Komiyama, Takako


    Few studies in Japan use clustering to examine the work attitudes of pharmacists. This study conducts an exploratory analysis to classify those attitudes based on previous studies to help staff pharmacists and their management to understand their mutually beneficial requirements. Survey data collected in previous studies from 1 228 community pharmacists and 419 hospital pharmacists were analyzed using Quantification Theory 3 and clustering. Among community pharmacists, two clusters, namely 30- to 34-year-old married males and married males aged over 35 years, reported the highest job satisfaction, intending to remain in their jobs for 5 years or more or until retirement. Conversely, one cluster of 35- to 39-year-old single females reported the lowest job satisfaction and intended to remain for less than 5  years or were undecided. Among hospital pharmacists, one cluster of 22- to 25-year-old single males reported the highest job satisfaction and intended to remain for more than 5 years. Conversely, one cluster of 30- to 34-year-old married males reported the lowest job satisfaction and a period of working undetermined. This study used clustering to explore how pharmacists of different ages, marital statuses, and experience felt regarding their work. Job satisfaction and human relationships are significant in considering future work plans of practicing pharmacists. Pharmacy staff, supervisors, and managers of community or hospital pharmacies must recognize features of pharmacists' work attitudes for offering high-quality service to patients.

  14. Fuzzy clustering analysis of microarray data. (United States)

    Han, Lixin; Zeng, Xiaoqin; Yan, Hong


    Fuzzy clustering is a useful tool for identifying relevant subsets of microarray data. This paper proposes a fuzzy clustering method for microarray data analysis. An advantage of the method is that it used a combination of the fuzzy c-means and the principal component analysis to identify the groups of genes that show similar expression patterns. It allows a gene to belong to more than a gene expression pattern with different membership grades. The method is suitable for the analysis of large amounts of noisy microarray data.

  15. Bayesian inference – a way to combine statistical data and semantic analysis meaningfully

    Directory of Open Access Journals (Sweden)

    Eila Lindfors


    Full Text Available This article focuses on presenting the possibilities of Bayesian modelling (Finite Mixture Modelling in the semantic analysis of statistically modelled data. The probability of a hypothesis in relation to the data available is an important question in inductive reasoning. Bayesian modelling allows the researcher to use many models at a time and provides tools to evaluate the goodness of different models. The researcher should always be aware that there is no such thing as the exact probability of an exact event. This is the reason for using probabilistic models. Each model presents a different perspective on the phenomenon in focus, and the researcher has to choose the most probable model with a view to previous research and the knowledge available.The idea of Bayesian modelling is illustrated here by presenting two different sets of data, one from craft science research (n=167 and the other (n=63 from educational research (Lindfors, 2007, 2002. The principles of how to build models and how to combine different profiles are described in the light of the research mentioned.Bayesian modelling is an analysis based on calculating probabilities in relation to a specific set of quantitative data. It is a tool for handling data and interpreting it semantically. The reliability of the analysis arises from an argumentation of which model can be selected from the model space as the basis for an interpretation, and on which arguments.Keywords: method, sloyd, Bayesian modelling, student teachersURN:NBN:no-29959

  16. Spatial Dependence and Heterogeneity in Bayesian Factor Analysis: A Cross-National Investigation of Schwartz Values (United States)

    Stakhovych, Stanislav; Bijmolt, Tammo H. A.; Wedel, Michel


    In this article, we present a Bayesian spatial factor analysis model. We extend previous work on confirmatory factor analysis by including geographically distributed latent variables and accounting for heterogeneity and spatial autocorrelation. The simulation study shows excellent recovery of the model parameters and demonstrates the consequences…

  17. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study. (United States)

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein


    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  18. Nuclear stockpile stewardship and Bayesian image analysis (DARHT and the BIE)

    Energy Technology Data Exchange (ETDEWEB)

    Carroll, James L [Los Alamos National Laboratory


    Since the end of nuclear testing, the reliability of our nation's nuclear weapon stockpile has been performed using sub-critical hydrodynamic testing. These tests involve some pretty 'extreme' radiography. We will be discussing the challenges and solutions to these problems provided by DARHT (the world's premiere hydrodynamic testing facility) and the BIE or Bayesian Inference Engine (a powerful radiography analysis software tool). We will discuss the application of Bayesian image analysis techniques to this important and difficult problem.

  19. [Meta analysis of the use of Bayesian networks in breast cancer diagnosis]. (United States)

    Simões, Priscyla Waleska; Silva, Geraldo Doneda da; Moretti, Gustavo Pasquali; Simon, Carla Sasso; Winnikow, Erik Paul; Nassar, Silvia Modesto; Medeiros, Lidia Rosi; Rosa, Maria Inês


    The aim of this study was to determine the accuracy of Bayesian networks in supporting breast cancer diagnoses. Systematic review and meta-analysis were carried out, including articles and papers published between January 1990 and March 2013. We included prospective and retrospective cross-sectional studies of the accuracy of diagnoses of breast lesions (target conditions) made using Bayesian networks (index test). Four primary studies that included 1,223 breast lesions were analyzed, 89.52% (444/496) of the breast cancer cases and 6.33% (46/727) of the benign lesions were positive based on the Bayesian network analysis. The area under the curve (AUC) for the summary receiver operating characteristic curve (SROC) was 0.97, with a Q* value of 0.92. Using Bayesian networks to diagnose malignant lesions increased the pretest probability of a true positive from 40.03% to 90.05% and decreased the probability of a false negative to 6.44%. Therefore, our results demonstrated that Bayesian networks provide an accurate and non-invasive method to support breast cancer diagnosis.

  20. Analysis of Climate Change on Hydrologic Components by using Bayesian Neural Networks (United States)

    Kang, K.


    Representation of hydrologic analysis in climate change is a challenging task. Hydrologic outputs in regional climate models (RCMs) from general circulation models (GCMs) have difficult representation due to several uncertainties in hydrologic impacts of climate change. To overcome this problem, this research presents practical options for hydrological climate change with Bayesian and Neural networks approached to regional adaption to climate change. Bayesian and Neural networks analysis to climate hydrologic components is one of new frontier researches considering to climate change expectation. Strong advantage in Bayesian Neural networks is detecting time series in hydrologic components, which is complicated due to data, parameter, and model hypothesis on climate change scenario, through changing steps by removing and adding connections in Neural network process that combined Bayesian concept from parameter, predict and update process. As an example study, Mekong River Watershed, which is surrounded by four countries (Myanmar, Laos, Thailand and Cambodia), is selected. Results will show understanding of hydrologic components trend on climate model simulations through Bayesian Neural networks.

  1. Bayesian Analysis of Demand Elasticity in the Italian Electricity Market

    Directory of Open Access Journals (Sweden)

    Maria Chiara D'Errico


    Full Text Available The liberalization of the Italian electricity market is a decade old. Within these last ten years, the supply side has been extensively analyzed, but not the demand side. The aim of this paper is to provide a new method for estimation of the demand elasticity, based on Bayesian methods applied to the Italian electricity market. We used individual demand bids data in the day-ahead market in the Italian Power Exchange (IPEX, for 2011, in order to construct an aggregate demand function at the hourly level. We took into account the existence of both elastic and inelastic bidders on the demand side. The empirical results show that elasticity varies significantly during the day and across periods of the year. In addition, the elasticity hourly distribution is clearly skewed and more so in the daily hours. The Bayesian method is a useful tool for policy-making, insofar as the regulator can start with a priori historical information on market behavior and estimate actual market outcomes in response to new policy actions.

  2. Bayesian analysis of deterministic and stochastic prisoner's dilemma games

    Directory of Open Access Journals (Sweden)

    Howard Kunreuther


    Full Text Available This paper compares the behavior of individuals playing a classic two-person deterministic prisoner's dilemma (PD game with choice data obtained from repeated interdependent security prisoner's dilemma games with varying probabilities of loss and the ability to learn (or not learn about the actions of one's counterpart, an area of recent interest in experimental economics. This novel data set, from a series of controlled laboratory experiments, is analyzed using Bayesian hierarchical methods, the first application of such methods in this research domain. We find that individuals are much more likely to be cooperative when payoffs are deterministic than when the outcomes are probabilistic. A key factor explaining this difference is that subjects in a stochastic PD game respond not just to what their counterparts did but also to whether or not they suffered a loss. These findings are interpreted in the context of behavioral theories of commitment, altruism and reciprocity. The work provides a linkage between Bayesian statistics, experimental economics, and consumer psychology.

  3. Expert prior elicitation and Bayesian analysis of the Mycotic Ulcer Treatment Trial I. (United States)

    Sun, Catherine Q; Prajna, N Venkatesh; Krishnan, Tiruvengada; Mascarenhas, Jeena; Rajaraman, Revathi; Srinivasan, Muthiah; Raghavan, Anita; O'Brien, Kieran S; Ray, Kathryn J; McLeod, Stephen D; Porco, Travis C; Acharya, Nisha R; Lietman, Thomas M


    To perform a Bayesian analysis of the Mycotic Ulcer Treatment Trial I (MUTT I) using expert opinion as a prior belief. MUTT I was a randomized clinical trial comparing topical natamycin or voriconazole for treating filamentous fungal keratitis. A questionnaire elicited expert opinion on the best treatment of fungal keratitis before MUTT I results were available. A Bayesian analysis was performed using the questionnaire data as a prior belief and the MUTT I primary outcome (3-month visual acuity) by frequentist analysis as a likelihood. Corneal experts had a 41.1% prior belief that natamycin improved 3-month visual acuity compared with voriconazole. The Bayesian analysis found a 98.4% belief for natamycin treatment compared with voriconazole treatment for filamentous cases as a group (mean improvement 1.1 Snellen lines, 95% credible interval 0.1-2.1). The Bayesian analysis estimated a smaller treatment effect than the MUTT I frequentist analysis result of 1.8-line improvement with natamycin versus voriconazole (95% confidence interval 0.5-3.0, P = 0.006). For Fusarium cases, the posterior demonstrated a 99.7% belief for natamycin treatment, whereas non-Fusarium cases had a 57.3% belief. The Bayesian analysis suggests that natamycin is superior to voriconazole when filamentous cases are analyzed as a group. Subgroup analysis of Fusarium cases found improvement with natamycin compared with voriconazole, whereas there was almost no difference between treatments for non-Fusarium cases. These results were consistent with, though smaller in effect size than, the MUTT I primary outcome by frequentist analysis. The accordance between analyses further validates the trial results. ( number, NCT00996736.).

  4. Multilevel Bayesian networks for the analysis of hierarchical health care data. (United States)

    Lappenschaar, Martijn; Hommersom, Arjen; Lucas, Peter J F; Lagro, Joep; Visscher, Stefan


    Large health care datasets normally have a hierarchical structure, in terms of levels, as the data have been obtained from different practices, hospitals, or regions. Multilevel regression is the technique commonly used to deal with such multilevel data. However, for the statistical analysis of interactions between entities from a domain, multilevel regression yields little to no insight. While Bayesian networks have proved to be useful for analysis of interactions, they do not have the capability to deal with hierarchical data. In this paper, we describe a new formalism, which we call multilevel Bayesian networks; its effectiveness for the analysis of hierarchically structured health care data is studied from the perspective of multimorbidity. Multilevel Bayesian networks are formally defined and applied to analyze clinical data from family practices in The Netherlands with the aim to predict interactions between heart failure and diabetes mellitus. We compare the results obtained with multilevel regression. The results obtained by multilevel Bayesian networks closely resembled those obtained by multilevel regression. For both diseases, the area under the curve of the prediction model improved, and the net reclassification improvements were significantly positive. In addition, the models offered considerable more insight, through its internal structure, into the interactions between the diseases. Multilevel Bayesian networks offer a suitable alternative to multilevel regression when analyzing hierarchical health care data. They provide more insight into the interactions between multiple diseases. Moreover, a multilevel Bayesian network model can be used for the prediction of the occurrence of multiple diseases, even when some of the predictors are unknown, which is typically the case in medicine. Copyright © 2013 Elsevier B.V. All rights reserved.

  5. Estimating size and scope economies in the Portuguese water sector using the Bayesian stochastic frontier analysis

    Energy Technology Data Exchange (ETDEWEB)

    Carvalho, Pedro, E-mail: [Computational Modelling in Engineering and Geophysics Laboratory (LAMEMO), Department of Civil Engineering, COPPE, Federal University of Rio de Janeiro, Av. Pedro Calmon - Ilha do Fundão, 21941-596 Rio de Janeiro (Brazil); Center for Urban and Regional Systems (CESUR), CERIS, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1049-001 Lisbon (Portugal); Marques, Rui Cunha, E-mail: [Center for Urban and Regional Systems (CESUR), CERIS, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1049-001 Lisbon (Portugal)


    This study aims to search for economies of size and scope in the Portuguese water sector applying Bayesian and classical statistics to make inference in stochastic frontier analysis (SFA). This study proves the usefulness and advantages of the application of Bayesian statistics for making inference in SFA over traditional SFA which just uses classical statistics. The resulting Bayesian methods allow overcoming some problems that arise in the application of the traditional SFA, such as the bias in small samples and skewness of residuals. In the present case study of the water sector in Portugal, these Bayesian methods provide more plausible and acceptable results. Based on the results obtained we found that there are important economies of output density, economies of size, economies of vertical integration and economies of scope in the Portuguese water sector, pointing out to the huge advantages in undertaking mergers by joining the retail and wholesale components and by joining the drinking water and wastewater services. - Highlights: • This study aims to search for economies of size and scope in the water sector; • The usefulness of the application of Bayesian methods is highlighted; • Important economies of output density, economies of size, economies of vertical integration and economies of scope are found.

  6. Estimating size and scope economies in the Portuguese water sector using the Bayesian stochastic frontier analysis

    International Nuclear Information System (INIS)

    Carvalho, Pedro; Marques, Rui Cunha


    This study aims to search for economies of size and scope in the Portuguese water sector applying Bayesian and classical statistics to make inference in stochastic frontier analysis (SFA). This study proves the usefulness and advantages of the application of Bayesian statistics for making inference in SFA over traditional SFA which just uses classical statistics. The resulting Bayesian methods allow overcoming some problems that arise in the application of the traditional SFA, such as the bias in small samples and skewness of residuals. In the present case study of the water sector in Portugal, these Bayesian methods provide more plausible and acceptable results. Based on the results obtained we found that there are important economies of output density, economies of size, economies of vertical integration and economies of scope in the Portuguese water sector, pointing out to the huge advantages in undertaking mergers by joining the retail and wholesale components and by joining the drinking water and wastewater services. - Highlights: • This study aims to search for economies of size and scope in the water sector; • The usefulness of the application of Bayesian methods is highlighted; • Important economies of output density, economies of size, economies of vertical integration and economies of scope are found.

  7. Introduction to Bayesian statistics

    CERN Document Server

    Bolstad, William M


    There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this Third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian staistics. The author continues to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inferenfe cfor discrete random variables, bionomial proprotion, Poisson, normal mean, and simple linear regression. In addition, newly-developing topics in the field are presented in four new chapters: Bayesian inference with unknown mean and variance; Bayesian inference for Multivariate Normal mean vector; Bayesian inference for Multiple Linear RegressionModel; and Computati...

  8. Bayesian analysis of repairable systems showing a bounded failure intensity

    International Nuclear Information System (INIS)

    Guida, Maurizio; Pulcini, Gianpaolo


    The failure pattern of repairable mechanical equipment subject to deterioration phenomena sometimes shows a finite bound for the increasing failure intensity. A non-homogeneous Poisson process with bounded increasing failure intensity is then illustrated and its characteristics are discussed. A Bayesian procedure, based on prior information on model-free quantities, is developed in order to allow technical information on the failure process to be incorporated into the inferential procedure and to improve the inference accuracy. Posterior estimation of the model-free quantities and of other quantities of interest (such as the optimal replacement interval) is provided, as well as prediction on the waiting time to the next failure and on the number of failures in a future time interval is given. Finally, numerical examples are given to illustrate the proposed inferential procedure

  9. FABADA: a Fitting Algorithm for Bayesian Analysis of DAta

    International Nuclear Information System (INIS)

    Pardo, L C; Rovira-Esteva, M; Ruiz-Martin, M D; Tamarit, J Ll; Busch, S


    The fit of data using a mathematical model is the standard way to know if the model describes data correctly and to obtain parameters that describe the physical processes hidden behind the experimental results. This is usually done by means of a χ 2 minimization procedure. Although this procedure is fast and quite reliable for simple models, it has many drawbacks when dealing with complicated problems such as models with many or correlated parameters. We present here a Bayesian method to explore the parameter space guided only by the probability laws underlying the χ 2 figure of merit. The presented method does not get stuck in local minima of the χ 2 landscape as it usually happens with classical minimization procedures. Moreover correlations between parameters are taken into account in a natural way. Finally, parameters are obtained as probability distribution functions so that all the complexity of the parameter space is shown.

  10. Micronutrients in HIV: a Bayesian meta-analysis.

    Directory of Open Access Journals (Sweden)

    George M Carter

    Full Text Available Approximately 28.5 million people living with HIV are eligible for treatment (CD4<500, but currently have no access to antiretroviral therapy. Reduced serum level of micronutrients is common in HIV disease. Micronutrient supplementation (MNS may mitigate disease progression and mortality.We synthesized evidence on the effect of micronutrient supplementation on mortality and rate of disease progression in HIV disease.We searched MEDLINE, EMBASE, the Cochrane Central, AMED and CINAHL databases through December 2014, without language restriction, for studies of greater than 3 micronutrients versus any or no comparator. We built a hierarchical Bayesian random effects model to synthesize results. Inferences are based on the posterior distribution of the population effects; posterior distributions were approximated by Markov chain Monte Carlo in OpenBugs.From 2166 initial references, we selected 49 studies for full review and identified eight reporting on disease progression and/or mortality. Bayesian synthesis of data from 2,249 adults in three studies estimated the relative risk of disease progression in subjects on MNS vs. control as 0.62 (95% credible interval, 0.37, 0.96. Median number needed to treat is 8.4 (4.8, 29.9 and the Bayes Factor 53.4. Based on data reporting on 4,095 adults reporting mortality in 7 randomized controlled studies, the RR was 0.84 (0.38, 1.85, NNT is 25 (4.3, ∞.MNS significantly and substantially slows disease progression in HIV+ adults not on ARV, and possibly reduces mortality. Micronutrient supplements are effective in reducing progression with a posterior probability of 97.9%. Considering MNS low cost and lack of adverse effects, MNS should be standard of care for HIV+ adults not yet on ARV.

  11. Cluster Analysis of Properties of Temperament

    Directory of Open Access Journals (Sweden)

    A I Krupnov


    Full Text Available The paper presents the cluster analysis of various properties of temperament, based on the systematic structure of its main components. On the basis of the received data the qualitative psychological characteristic of the four types of temperament is given.

  12. Bayesian analysis of the dynamic cosmic web in the SDSS galaxy survey

    International Nuclear Information System (INIS)

    Leclercq, Florent; Wandelt, Benjamin; Jasche, Jens


    Recent application of the Bayesian algorithm \\textsc(borg) to the Sloan Digital Sky Survey (SDSS) main sample galaxies resulted in the physical inference of the formation history of the observed large-scale structure from its origin to the present epoch. In this work, we use these inferences as inputs for a detailed probabilistic cosmic web-type analysis. To do so, we generate a large set of data-constrained realizations of the large-scale structure using a fast, fully non-linear gravitational model. We then perform a dynamic classification of the cosmic web into four distinct components (voids, sheets, filaments, and clusters) on the basis of the tidal field. Our inference framework automatically and self-consistently propagates typical observational uncertainties to web-type classification. As a result, this study produces accurate cosmographic classification of large-scale structure elements in the SDSS volume. By also providing the history of these structure maps, the approach allows an analysis of the origin and growth of the early traces of the cosmic web present in the initial density field and of the evolution of global quantities such as the volume and mass filling fractions of different structures. For the problem of web-type classification, the results described in this work constitute the first connection between theory and observations at non-linear scales including a physical model of structure formation and the demonstrated capability of uncertainty quantification. A connection between cosmology and information theory using real data also naturally emerges from our probabilistic approach. Our results constitute quantitative chrono-cosmography of the complex web-like patterns underlying the observed galaxy distribution

  13. Using Discrete Loss Functions and Weighted Kappa for Classification: An Illustration Based on Bayesian Network Analysis (United States)

    Zwick, Rebecca; Lenaburg, Lubella


    In certain data analyses (e.g., multiple discriminant analysis and multinomial log-linear modeling), classification decisions are made based on the estimated posterior probabilities that individuals belong to each of several distinct categories. In the Bayesian network literature, this type of classification is often accomplished by assigning…

  14. Application of a data-mining method based on Bayesian networks to lesion-deficit analysis (United States)

    Herskovits, Edward H.; Gerring, Joan P.


    Although lesion-deficit analysis (LDA) has provided extensive information about structure-function associations in the human brain, LDA has suffered from the difficulties inherent to the analysis of spatial data, i.e., there are many more variables than subjects, and data may be difficult to model using standard distributions, such as the normal distribution. We herein describe a Bayesian method for LDA; this method is based on data-mining techniques that employ Bayesian networks to represent structure-function associations. These methods are computationally tractable, and can represent complex, nonlinear structure-function associations. When applied to the evaluation of data obtained from a study of the psychiatric sequelae of traumatic brain injury in children, this method generates a Bayesian network that demonstrates complex, nonlinear associations among lesions in the left caudate, right globus pallidus, right side of the corpus callosum, right caudate, and left thalamus, and subsequent development of attention-deficit hyperactivity disorder, confirming and extending our previous statistical analysis of these data. Furthermore, analysis of simulated data indicates that methods based on Bayesian networks may be more sensitive and specific for detecting associations among categorical variables than methods based on chi-square and Fisher exact statistics.

  15. Reporting of Bayesian analysis in epidemiologic research should become more transparent

    NARCIS (Netherlands)

    Rietbergen, Charlotte; Debray, Thomas P. A.; Klugkist, Irene; Janssen, Kristel J M; Moons, Karel G. M.

    Objectives The objective of this systematic review is to investigate the use of Bayesian data analysis in epidemiology in the past decade and particularly to evaluate the quality of research papers reporting the results of these analyses. Study Design and Setting Complete volumes of five major

  16. Calibration of Uncertainty Analysis of the SWAT Model Using Genetic Algorithms and Bayesian Model Averaging (United States)

    In this paper, the Genetic Algorithms (GA) and Bayesian model averaging (BMA) were combined to simultaneously conduct calibration and uncertainty analysis for the Soil and Water Assessment Tool (SWAT). In this hybrid method, several SWAT models with different structures are first selected; next GA i...

  17. Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences. (United States)

    Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric


    Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach.

  18. Cluster analysis for determining distribution center location (United States)

    Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian


    Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.

  19. Fuzzy clustering analysis of osteosarcoma related genes. (United States)

    Chen, Kai; Wu, Dajiang; Bai, Yushu; Zhu, Xiaodong; Chen, Ziqiang; Wang, Chuanfeng; Zhao, Yingchuan; Li, Ming


    Osteosarcoma is the most common malignant bone-tumor with a peak manifestation during the second and third decade of life. In order to explore the influence of genetic factors on the mechanism of osteosarcoma by analyzing the inter relationship between osteosarcoma and its related genes, and then provide potential genetic references for the prevention, diagnosis and treatment of osteosarcoma, we collected osteosarcoma related gene sequences in Genebank of National Center for Biotechnology Information (NCBI) and local alignment analysis for a pair of sequences was carried out to identify the measurement association among related sequences. Then fuzzy clustering method was used for clustering analysis so as to contact the unknown genes through the consistent osteosarcoma related genes in one class. From the result of fuzzy clustering analysis, we could classify the osteosarcoma related genes into two groups and deduced that the genes clustered into one group had similar function. Based on this knowledge, we found more genes related to the pathogenesis of osteosarcoma and these genes could exert similar function as Runx2, a risk factor confirmed in osteosarcoma, this study may help better understand the genetic mechanism and provide new molecular markers and therapies for osteosarcoma.

  20. Evaluating Mixture Modeling for Clustering: Recommendations and Cautions (United States)

    Steinley, Douglas; Brusco, Michael J.


    This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…

  1. A Pragmatic Bayesian Perspective on Correlation Analysis : The exoplanetary gravity - stellar activity case. (United States)

    Figueira, P; Faria, J P; Adibekyan, V Zh; Oshagh, M; Santos, N C


    We apply the Bayesian framework to assess the presence of a correlation between two quantities. To do so, we estimate the probability distribution of the parameter of interest, ρ, characterizing the strength of the correlation. We provide an implementation of these ideas and concepts using python programming language and the pyMC module in a very short (∼ 130 lines of code, heavily commented) and user-friendly program. We used this tool to assess the presence and properties of the correlation between planetary surface gravity and stellar activity level as measured by the log([Formula: see text]) indicator. The results of the Bayesian analysis are qualitatively similar to those obtained via p-value analysis, and support the presence of a correlation in the data. The results are more robust in their derivation and more informative, revealing interesting features such as asymmetric posterior distributions or markedly different credible intervals, and allowing for a deeper exploration. We encourage the reader interested in this kind of problem to apply our code to his/her own scientific problems. The full understanding of what the Bayesian framework is can only be gained through the insight that comes by handling priors, assessing the convergence of Monte Carlo runs, and a multitude of other practical problems. We hope to contribute so that Bayesian analysis becomes a tool in the toolkit of researchers, and they understand by experience its advantages and limitations.

  2. Bayesian Integrated Data Analysis of Fast-Ion Measurements by Velocity-Space Tomography

    DEFF Research Database (Denmark)

    Salewski, M.; Nocente, M.; Jacobsen, A.S.


    Bayesian integrated data analysis combines measurements from different diagnostics to jointly measure plasma parameters of interest such as temperatures, densities, and drift velocities. Integrated data analysis of fast-ion measurements has long been hampered by the complexity of the strongly non...... framework. The implementation for different types of diagnostics as well as the uncertainties are discussed, and we highlight the importance of integrated data analysis of all available detectors....

  3. Bayesian Analysis of Hot Jupiter Radii Points to Ohmic Dissipation (United States)

    Thorngren, Daniel; Fortney, Jonathan J.


    The cause of the unexpectedly large radii of hot Jupiters has been the subject of many hypotheses over the past 15 years and is one of the long-standing open issues in exoplanetary physics. In our work, we seek to examine the population of 300 hot Jupiters to identify a model that best explains their radii. Using a hierarchical Bayesian framework, we match structure evolution models to the observed giant planets’ masses, radii, and ages, with a prior for bulk composition based on the mass from Thorngren et al. (2016). We consider various models for the relationship between heating efficiency (the fraction of flux absorbed into the interior) and incident flux. For the first time, we are able to derive this heating efficiency as a function of planetary T_eq. Models in which the heating efficiency decreases at the higher temperatures (above ~1600 K) are strongly and statistically significantly preferred. Of the published models for the radius anomaly, only the Ohmic dissipation model predicts this feature, which it explains as being the result of magnetic drag reducing atmospheric wind speeds. We interpret our results as strong evidence in favor of the Ohmic dissipation model.

  4. Doubly Bayesian Analysis of Confidence in Perceptual Decision-Making (United States)

    Bahrami, Bahador; Latham, Peter E.


    Humans stand out from other animals in that they are able to explicitly report on the reliability of their internal operations. This ability, which is known as metacognition, is typically studied by asking people to report their confidence in the correctness of some decision. However, the computations underlying confidence reports remain unclear. In this paper, we present a fully Bayesian method for directly comparing models of confidence. Using a visual two-interval forced-choice task, we tested whether confidence reports reflect heuristic computations (e.g. the magnitude of sensory data) or Bayes optimal ones (i.e. how likely a decision is to be correct given the sensory data). In a standard design in which subjects were first asked to make a decision, and only then gave their confidence, subjects were mostly Bayes optimal. In contrast, in a less-commonly used design in which subjects indicated their confidence and decision simultaneously, they were roughly equally likely to use the Bayes optimal strategy or to use a heuristic but suboptimal strategy. Our results suggest that, while people’s confidence reports can reflect Bayes optimal computations, even a small unusual twist or additional element of complexity can prevent optimality. PMID:26517475

  5. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations (United States)


    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations

  6. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations. (United States)

    Corrigan, Neil; Bankart, Michael J G; Gray, Laura J; Smith, Karen L


    There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where

  7. Personalized medicine for mucositis: Bayesian networks identify unique gene clusters which predict the response to gamma-D-glutamyl-L-tryptophan (SCV-07) for the attenuation of chemoradiation-induced oral mucositis. (United States)

    Alterovitz, Gil; Tuthill, Cynthia; Rios, Israel; Modelska, Katharina; Sonis, Stephen


    Gamma-D-glutamyl-L-tryptophan (SCV-07) demonstrated an overall efficacy signal in ameliorating oral mucositis (OM) in a clinical trial of head and neck cancer patients. However, not all SCV-07-treated subjects responded positively. Here we determined if specific gene clusters could discriminate between subjects who responded to SCV-07 and those who did not. Microarrays were done using peripheral blood RNA obtained at screening and on the last day of radiation from 28 subjects enrolled in the SCV-07 trial. An analytical technique was applied that relied on learned Bayesian networks to identify gene clusters which discriminated between individuals who received SCV-07 and those who received placebo, and which differentiated subjects for whom SCV-07 was an effective OM intervention from those for whom it was not. We identified 107 genes that discriminated SCV-07 responders from non-responders using four models and applied Akaike Information Criteria (AIC) and Bayes Factor (BF) analysis to evaluate predictive accuracy. AIC were superior to BF: the accuracy of predicting placebo vs. treatment was 78% using BF, but 91% using the AIC score. Our ability to differentiate responders from non-responders using the AIC score was dramatic and ranged from 93% to 100% depending on the dataset that was evaluated. Predictive Bayesian networks were identified and functional cluster analyses were performed. A specific 10 gene cluster was a critical contributor to the predictability of the dataset. Our results demonstrate proof of concept in which the application of a genomics-based analytical paradigm was capable of discriminating responders and non-responders for an OM intervention. Copyright © 2011 Elsevier Ltd. All rights reserved.

  8. Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video. (United States)

    Bastani, Vahid; Marcenaro, Lucio; Regazzoni, Carlo S


    A method for online incremental mining of activity patterns from the surveillance video stream is presented in this paper. The framework consists of a learning block in which Dirichlet process mixture model is employed for the incremental clustering of trajectories. Stochastic trajectory pattern models are formed using the Gaussian process regression of the corresponding flow functions. Moreover, a sequential Monte Carlo method based on Rao-Blackwellized particle filter is proposed for tracking and online classification as well as the detection of abnormality during the observation of an object. Experimental results on real surveillance video data are provided to show the performance of the proposed algorithm in different tasks of trajectory clustering, classification, and abnormality detection.

  9. Semi-supervised consensus clustering for gene expression data analysis


    Wang, Yunli; Pan, Youlian


    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  10. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations


    Corrigan, Neil; Bankart, Michael J G; Gray, Laura J; Smith, Karen L


    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed ...

  11. Sensitivity analysis in Gaussian Bayesian networks using a symbolic-numerical technique

    International Nuclear Information System (INIS)

    Castillo, Enrique; Kjaerulff, Uffe


    The paper discusses the problem of sensitivity analysis in Gaussian Bayesian networks. The algebraic structure of the conditional means and variances, as rational functions involving linear and quadratic functions of the parameters, are used to simplify the sensitivity analysis. In particular the probabilities of conditional variables exceeding given values and related probabilities are analyzed. Two examples of application are used to illustrate all the concepts and methods

  12. Deep Learning Neural Networks and Bayesian Neural Networks in Data Analysis

    Directory of Open Access Journals (Sweden)

    Chernoded Andrey


    Full Text Available Most of the modern analyses in high energy physics use signal-versus-background classification techniques of machine learning methods and neural networks in particular. Deep learning neural network is the most promising modern technique to separate signal and background and now days can be widely and successfully implemented as a part of physical analysis. In this article we compare Deep learning and Bayesian neural networks application as a classifiers in an instance of top quark analysis.


    Directory of Open Access Journals (Sweden)

    Jana Halčinová


    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  14. Estimating size and scope economies in the Portuguese water sector using the Bayesian stochastic frontier analysis. (United States)

    Carvalho, Pedro; Marques, Rui Cunha


    This study aims to search for economies of size and scope in the Portuguese water sector applying Bayesian and classical statistics to make inference in stochastic frontier analysis (SFA). This study proves the usefulness and advantages of the application of Bayesian statistics for making inference in SFA over traditional SFA which just uses classical statistics. The resulting Bayesian methods allow overcoming some problems that arise in the application of the traditional SFA, such as the bias in small samples and skewness of residuals. In the present case study of the water sector in Portugal, these Bayesian methods provide more plausible and acceptable results. Based on the results obtained we found that there are important economies of output density, economies of size, economies of vertical integration and economies of scope in the Portuguese water sector, pointing out to the huge advantages in undertaking mergers by joining the retail and wholesale components and by joining the drinking water and wastewater services. Copyright © 2015 Elsevier B.V. All rights reserved.

  15. Advanced analysis of forest fire clustering (United States)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean


    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index

  16. Cluster Analysis in Rapeseed (Brassica Napus L.)

    International Nuclear Information System (INIS)

    Mahasi, J.M


    With widening edible deficit, Kenya has become increasingly dependent on imported edible oils. Many oilseed crops (e.g. sunflower, soya beans, rapeseed/mustard, sesame, groundnuts etc) can be grown in Kenya. But oilseed rape is preferred because it very high yielding (1.5 tons-4.0 tons/ha) with oil content of 42-46%. Other uses include fitting in various cropping systems as; relay/inter crops, rotational crops, trap crops and fodder. It is soft seeded hence oil extraction is relatively easy. The meal is high in protein and very useful in livestock supplementation. Rapeseed can be straight combined using adjusted wheat combines. The priority is to expand domestic oilseed production, hence the need to introduce improved rapeseed germplasm from other countries. The success of any crop improvement programme depends on the extent of genetic diversity in the material. Hence, it is essential to understand the adaptation of introduced genotypes and the similarities if any among them. Evaluation trials were carried out on 17 rapeseed genotypes (nine Canadian origin and eight of European origin) grown at 4 locations namely Endebess, Njoro, Timau and Mau Narok in three years (1992, 1993 and 1994). Results for 1993 were discarded due to severe drought. An analysis of variance was carried out only on seed yields and the treatments were found to be significantly different. Cluster analysis was then carried out on mean seed yields and based on this analysis; only one major group exists within the material. In 1992, varieties 2,3,8 and 9 didn't fall in the same cluster as the rest. Variety 8 was the only one not classified with the rest of the Canadian varieties. Three European varieties (2,3 and 9) were however not classified with the others. In 1994, varieties 10 and 6 didn't fall in the major cluster. Of these two, variety 10 is of Canadian origin. Varieties were more similar in 1994 than 1992 due to favorable weather. It is evident that, genotypes from different geographical

  17. Chaotic map clustering algorithm for EEG analysis (United States)

    Bellotti, R.; De Carlo, F.; Stramaglia, S.


    The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.

  18. Clustering Analysis within Text Classification Techniques

    Directory of Open Access Journals (Sweden)

    Madalina ZURINI


    Full Text Available The paper represents a personal approach upon the main applications of classification which are presented in the area of knowledge based society by means of methods and techniques widely spread in the literature. Text classification is underlined in chapter two where the main techniques used are described, along with an integrated taxonomy. The transition is made through the concept of spatial representation. Having the elementary elements of geometry and the artificial intelligence analysis, spatial representation models are presented. Using a parallel approach, spatial dimension is introduced in the process of classification. The main clustering methods are described in an aggregated taxonomy. For an example, spam and ham words are clustered and spatial represented, when the concepts of spam, ham and common and linkage word are presented and explained in the xOy space representation.

  19. Tweets clustering using latent semantic analysis (United States)

    Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul


    Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.

  20. An efficient Bayesian meta-analysis approach for studying cross-phenotype genetic associations. (United States)

    Majumdar, Arunabha; Haldar, Tanushree; Bhattacharya, Sourabh; Witte, John S


    Simultaneous analysis of genetic associations with multiple phenotypes may reveal shared genetic susceptibility across traits (pleiotropy). For a locus exhibiting overall pleiotropy, it is important to identify which specific traits underlie this association. We propose a Bayesian meta-analysis approach (termed CPBayes) that uses summary-level data across multiple phenotypes to simultaneously measure the evidence of aggregate-level pleiotropic association and estimate an optimal subset of traits associated with the risk locus. This method uses a unified Bayesian statistical framework based on a spike and slab prior. CPBayes performs a fully Bayesian analysis by employing the Markov Chain Monte Carlo (MCMC) technique Gibbs sampling. It takes into account heterogeneity in the size and direction of the genetic effects across traits. It can be applied to both cohort data and separate studies of multiple traits having overlapping or non-overlapping subjects. Simulations show that CPBayes can produce higher accuracy in the selection of associated traits underlying a pleiotropic signal than the subset-based meta-analysis ASSET. We used CPBayes to undertake a genome-wide pleiotropic association study of 22 traits in the large Kaiser GERA cohort and detected six independent pleiotropic loci associated with at least two phenotypes. This includes a locus at chromosomal region 1q24.2 which exhibits an association simultaneously with the risk of five different diseases: Dermatophytosis, Hemorrhoids, Iron Deficiency, Osteoporosis and Peripheral Vascular Disease. We provide an R-package 'CPBayes' implementing the proposed method.

  1. A Pareto scale-inflated outlier model and its Bayesian analysis


    Scollnik, David P. M.


    This paper develops a Pareto scale-inflated outlier model. This model is intended for use when data from some standard Pareto distribution of interest is suspected to have been contaminated with a relatively small number of outliers from a Pareto distribution with the same shape parameter but with an inflated scale parameter. The Bayesian analysis of this Pareto scale-inflated outlier model is considered and its implementation using the Gibbs sampler is discussed. The paper contains three wor...

  2. Environmental Modeling and Bayesian Analysis for Assessing Human Health Impacts from Radioactive Waste Disposal (United States)

    Stockton, T.; Black, P.; Tauxe, J.; Catlett, K.


    Bayesian decision analysis provides a unified framework for coherent decision-making. Two key components of Bayesian decision analysis are probability distributions and utility functions. Calculating posterior distributions and performing decision analysis can be computationally challenging, especially for complex environmental models. In addition, probability distributions and utility functions for environmental models must be specified through expert elicitation, stakeholder consensus, or data collection, all of which have their own set of technical and political challenges. Nevertheless, a grand appeal of the Bayesian approach for environmental decision- making is the explicit treatment of uncertainty, including expert judgment. The impact of expert judgment on the environmental decision process, though integral, goes largely unassessed. Regulations and orders of the Environmental Protection Agency, Department Of Energy, and Nuclear Regulatory Agency orders require assessing the impact on human health of radioactive waste contamination over periods of up to ten thousand years. Towards this end complex environmental simulation models are used to assess "risk" to human and ecological health from migration of radioactive waste. As the computational burden of environmental modeling is continually reduced probabilistic process modeling using Monte Carlo simulation is becoming routinely used to propagate uncertainty from model inputs through model predictions. The utility of a Bayesian approach to environmental decision-making is discussed within the context of a buried radioactive waste example. This example highlights the desirability and difficulties of merging the cost of monitoring, the cost of the decision analysis, the cost and viability of clean up, and the probability of human health impacts within a rigorous decision framework.

  3. EM cluster analysis for categorical data

    Czech Academy of Sciences Publication Activity Database

    Grim, Jiří


    Roč. 44, č. 4109 (2006), s. 640-648 ISSN 0302-9743. [Joint IAPR International Workshops SSPR 2006 and SPR 2006. Hong Kong , 17.08.2006-19.08.2006] R&D Projects: GA AV ČR 1ET400750407; GA MŠk 1M0572 EU Projects: European Commission(XE) 507752 - MUSCLE Institutional research plan: CEZ:AV0Z10750506 Keywords : cluster analysis * categorical data * EM algorithm Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.402, year: 2005

  4. A Bayesian analysis of rare B decays with advanced Monte Carlo methods

    Energy Technology Data Exchange (ETDEWEB)

    Beaujean, Frederik


    Searching for new physics in rare B meson decays governed by b {yields} s transitions, we perform a model-independent global fit of the short-distance couplings C{sub 7}, C{sub 9}, and C{sub 10} of the {Delta}B=1 effective field theory. We assume the standard-model set of b {yields} s{gamma} and b {yields} sl{sup +}l{sup -} operators with real-valued C{sub i}. A total of 59 measurements by the experiments BaBar, Belle, CDF, CLEO, and LHCb of observables in B{yields}K{sup *}{gamma}, B{yields}K{sup (*)}l{sup +}l{sup -}, and B{sub s}{yields}{mu}{sup +}{mu}{sup -} decays are used in the fit. Our analysis is the first of its kind to harness the full power of the Bayesian approach to probability theory. All main sources of theory uncertainty explicitly enter the fit in the form of nuisance parameters. We make optimal use of the experimental information to simultaneously constrain theWilson coefficients as well as hadronic form factors - the dominant theory uncertainty. Generating samples from the posterior probability distribution to compute marginal distributions and predict observables by uncertainty propagation is a formidable numerical challenge for two reasons. First, the posterior has multiple well separated maxima and degeneracies. Second, the computation of the theory predictions is very time consuming. A single posterior evaluation requires O(1s), and a few million evaluations are needed. Population Monte Carlo (PMC) provides a solution to both issues; a mixture density is iteratively adapted to the posterior, and samples are drawn in a massively parallel way using importance sampling. The major shortcoming of PMC is the need for cogent knowledge of the posterior at the initial stage. In an effort towards a general black-box Monte Carlo sampling algorithm, we present a new method to extract the necessary information in a reliable and automatic manner from Markov chains with the help of hierarchical clustering. Exploiting the latest 2012 measurements, the fit

  5. Dating ancient Chinese celadon porcelain by neutron activation analysis and bayesian classification

    International Nuclear Information System (INIS)

    Xie Guoxi; Feng Songlin; Feng Xiangqian; Zhu Jihao; Yan Lingtong; Li Li


    Dating ancient Chinese porcelain is one of the most important and difficult problems in porcelain archaeological field. Eighteen elements in bodies of ancient celadon porcelains fired in Southern Song to Yuan period (AD 1127-1368) and Ming dynasty (AD 1368-1644), including La, Sm, U, Ce, etc., were determined by neutron activation analysis (NAA). After the outliers of experimental data were excluded and multivariate normal distribution was tested, and Bayesian classification was used for dating of 165 ancient celadon porcelain samples. The results show that 98.2% of total ancient celadon porcelain samples are classified correctly. It means that NAA and Bayesian classification are very useful for dating ancient porcelain. (authors)

  6. Introduction of Bayesian network in risk analysis of maritime accidents in Bangladesh (United States)

    Rahman, Sohanur


    Due to the unique geographic location, complex navigation environment and intense vessel traffic, a considerable number of maritime accidents occurred in Bangladesh which caused serious loss of life, property and environmental contamination. Based on the historical data of maritime accidents from 1981 to 2015, which has been collected from Department of Shipping (DOS) and Bangladesh Inland Water Transport Authority (BIWTA), this paper conducted a risk analysis of maritime accidents by applying Bayesian network. In order to conduct this study, a Bayesian network model has been developed to find out the relation among parameters and the probability of them which affect accidents based on the accident investigation report of Bangladesh. Furthermore, number of accidents in different categories has also been investigated in this paper. Finally, some viable recommendations have been proposed in order to ensure greater safety of inland vessels in Bangladesh.

  7. Dizzy-Beats: a Bayesian evidence analysis tool for systems biology. (United States)

    Aitken, Stuart; Kilpatrick, Alastair M; Akman, Ozgur E


    Model selection and parameter inference are complex problems of long-standing interest in systems biology. Selecting between competing models arises commonly as underlying biochemical mechanisms are often not fully known, hence alternative models must be considered. Parameter inference yields important information on the extent to which the data and the model constrain parameter values. We report Dizzy-Beats, a graphical Java Bayesian evidence analysis tool implementing nested sampling - an algorithm yielding an estimate of the log of the Bayesian evidence Z and the moments of model parameters, thus addressing two outstanding challenges in systems modelling. A likelihood function based on the L1-norm is adopted as it is generically applicable to replicated time series data. © The Author 2015. Published by Oxford University Press.

  8. Bayesian estimation of dynamic matching function for U-V analysis in Japan (United States)

    Kyo, Koki; Noda, Hideo; Kitagawa, Genshiro


    In this paper we propose a Bayesian method for analyzing unemployment dynamics. We derive a Beveridge curve for unemployment and vacancy (U-V) analysis from a Bayesian model based on a labor market matching function. In our framework, the efficiency of matching and the elasticities of new hiring with respect to unemployment and vacancy are regarded as time varying parameters. To construct a flexible model and obtain reasonable estimates in an underdetermined estimation problem, we treat the time varying parameters as random variables and introduce smoothness priors. The model is then described in a state space representation, enabling the parameter estimation to be carried out using Kalman filter and fixed interval smoothing. In such a representation, dynamic features of the cyclic unemployment rate and the structural-frictional unemployment rate can be accurately captured.

  9. Risks Analysis of Logistics Financial Business Based on Evidential Bayesian Network

    Directory of Open Access Journals (Sweden)

    Ying Yan


    Full Text Available Risks in logistics financial business are identified and classified. Making the failure of the business as the root node, a Bayesian network is constructed to measure the risk levels in the business. Three importance indexes are calculated to find the most important risks in the business. And more, considering the epistemic uncertainties in the risks, evidence theory associate with Bayesian network is used as an evidential network in the risk analysis of logistics finance. To find how much uncertainty in root node is produced by each risk, a new index, epistemic importance, is defined. Numerical examples show that the proposed methods could provide a lot of useful information. With the information, effective approaches could be found to control and avoid these sensitive risks, thus keep logistics financial business working more reliable. The proposed method also gives a quantitative measure of risk levels in logistics financial business, which provides guidance for the selection of financing solutions.

  10. Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data. (United States)

    Yu, Zhiwen; Chen, Hantao; You, Jane; Liu, Jiming; Wong, Hau-San; Han, Guoqiang; Li, Le


    Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research works which perform tumor clustering, few of them considers how to incorporate fuzzy theory together with an optimization process into a consensus clustering framework to improve the performance of clustering analysis. In this paper, we first propose a random double clustering based cluster ensemble framework (RDCCE) to perform tumor clustering based on gene expression data. Specifically, RDCCE generates a set of representative features using a randomly selected clustering algorithm in the ensemble, and then assigns samples to their corresponding clusters based on the grouping results. In addition, we also introduce the random double clustering based fuzzy cluster ensemble framework (RDCFCE), which is designed to improve the performance of RDCCE by integrating the newly proposed fuzzy extension model into the ensemble framework. RDCFCE adopts the normalized cut algorithm as the consensus function to summarize the fuzzy matrices generated by the fuzzy extension models, partition the consensus matrix, and obtain the final result. Finally, adaptive RDCFCE (A-RDCFCE) is proposed to optimize RDCFCE and improve the performance of RDCFCE further by adopting a self-evolutionary process (SEPP) for the parameter set. Experiments on real cancer gene expression profiles indicate that RDCFCE and A-RDCFCE works well on these data sets, and outperform most of the state-of-the-art tumor clustering algorithms.

  11. Convergence analysis of surrogate-based methods for Bayesian inverse problems (United States)

    Yan, Liang; Zhang, Yuan-Xiang


    The major challenges in the Bayesian inverse problems arise from the need for repeated evaluations of the forward model, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. Many attempts at accelerating Bayesian inference have relied on surrogates for the forward model, typically constructed through repeated forward simulations that are performed in an offline phase. Although such approaches can be quite effective at reducing computation cost, there has been little analysis of the approximation on posterior inference. In this work, we prove error bounds on the Kullback–Leibler (KL) distance between the true posterior distribution and the approximation based on surrogate models. Our rigorous error analysis show that if the forward model approximation converges at certain rate in the prior-weighted L 2 norm, then the posterior distribution generated by the approximation converges to the true posterior at least two times faster in the KL sense. The error bound on the Hellinger distance is also provided. To provide concrete examples focusing on the use of the surrogate model based methods, we present an efficient technique for constructing stochastic surrogate models to accelerate the Bayesian inference approach. The Christoffel least squares algorithms, based on generalized polynomial chaos, are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. The numerical strategy and the predicted convergence rates are then demonstrated on the nonlinear inverse problems, involving the inference of parameters appearing in partial differential equations.

  12. bspmma: An R Package for Bayesian Semiparametric Models for Meta-Analysis

    Directory of Open Access Journals (Sweden)

    Deborah Burr


    Full Text Available We introduce an R package, bspmma, which implements a Dirichlet-based random effects model specific to meta-analysis. In meta-analysis, when combining effect estimates from several heterogeneous studies, it is common to use a random-effects model. The usual frequentist or Bayesian models specify a normal distribution for the true effects. However, in many situations, the effect distribution is not normal, e.g., it can have thick tails, be skewed, or be multi-modal. A Bayesian nonparametric model based on mixtures of Dirichlet process priors has been proposed in the literature, for the purpose of accommodating the non-normality. We review this model and then describe a competitor, a semiparametric version which has the feature that it allows for a well-defined centrality parameter convenient for determining whether the overall effect is significant. This second Bayesian model is based on a different version of the Dirichlet process prior, and we call it the "conditional Dirichlet model". The package contains functions to carry out analyses based on either the ordinary or the conditional Dirichlet model, functions for calculating certain Bayes factors that provide a check on the appropriateness of the conditional Dirichlet model, and functions that enable an empirical Bayes selection of the precision parameter of the Dirichlet process. We illustrate the use of the package on two examples, and give an interpretation of the results in these two different scenarios.

  13. Bayesian analysis of interacting quantitative trait loci (QTL) for yield ...

    African Journals Online (AJOL)

    7×Lycopersicon pimpinellifolium LA2184 was used for genome-wide linkage analysis for yield traits in tomato. The genetic map, spanning the tomato genome of 808.4 cM long was constructed with 112 SSR markers distributing on 16 linkage ...

  14. Flexible and efficient implementations of Bayesian independent component analysis

    DEFF Research Database (Denmark)

    Winther, Ole; Petersen, Kaare Brandt


    In this paper we present an empirical Bayes method for flexible and efficient independent component analysis (ICA). The method is flexible with respect to choice of source prior, dimensionality and constraints of the mixing matrix (unconstrained or non-negativity), and structure of the noise cova...... Elsevier B.V. All rights reserved....

  15. Bayesian statistics applied to neutron activation data for reactor flux spectrum analysis

    International Nuclear Information System (INIS)

    Chiesa, Davide; Previtali, Ezio; Sisti, Monica


    Highlights: • Bayesian statistics to analyze the neutron flux spectrum from activation data. • Rigorous statistical approach for accurate evaluation of the neutron flux groups. • Cross section and activation data uncertainties included for the problem solution. • Flexible methodology applied to analyze different nuclear reactor flux spectra. • The results are in good agreement with the MCNP simulations of neutron fluxes. - Abstract: In this paper, we present a statistical method, based on Bayesian statistics, to analyze the neutron flux spectrum from the activation data of different isotopes. The experimental data were acquired during a neutron activation experiment performed at the TRIGA Mark II reactor of Pavia University (Italy) in four irradiation positions characterized by different neutron spectra. In order to evaluate the neutron flux spectrum, subdivided in energy groups, a system of linear equations, containing the group effective cross sections and the activation rate data, has to be solved. However, since the system’s coefficients are experimental data affected by uncertainties, a rigorous statistical approach is fundamental for an accurate evaluation of the neutron flux groups. For this purpose, we applied the Bayesian statistical analysis, that allows to include the uncertainties of the coefficients and the a priori information about the neutron flux. A program for the analysis of Bayesian hierarchical models, based on Markov Chain Monte Carlo (MCMC) simulations, was used to define the problem statistical model and solve it. The first analysis involved the determination of the thermal, resonance-intermediate and fast flux components and the dependence of the results on the Prior distribution choice was investigated to confirm the reliability of the Bayesian analysis. After that, the main resonances of the activation cross sections were analyzed to implement multi-group models with finer energy subdivisions that would allow to determine the

  16. Modified Bayesian Kriging for Noisy Response Problems for Reliability Analysis (United States)


    52242, USA Mary Kathryn Cowles Department of Statistics & Actuarial Science College of Liberal Arts and Sciences , The...Forrester, A. I. J., & Keane, A. J. (2009). Recent advances in surrogate-based optimization. Progress in Aerospace Sciences , 45(1–3), 50-79. doi...Wiley. [27] Sacks, J., Welch, W. J., Toby J. Mitchell, & Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science , 4

  17. Bayesian GWAS and network analysis revealed new candidate genes for number of teats in pigs. (United States)

    Verardo, L L; Silva, F F; Varona, L; Resende, M D V; Bastiaansen, J W M; Lopes, P S; Guimarães, S E F


    The genetic improvement of reproductive traits such as the number of teats is essential to the success of the pig industry. As opposite to most SNP association studies that consider continuous phenotypes under Gaussian assumptions, this trait is characterized as a discrete variable, which could potentially follow other distributions, such as the Poisson. Therefore, in order to access the complexity of a counting random regression considering all SNPs simultaneously as covariate under a GWAS modeling, the Bayesian inference tools become necessary. Currently, another point that deserves to be highlighted in GWAS is the genetic dissection of complex phenotypes through candidate genes network derived from significant SNPs. We present a full Bayesian treatment of SNP association analysis for number of teats assuming alternatively Gaussian and Poisson distributions for this trait. Under this framework, significant SNP effects were identified by hypothesis tests using 95% highest posterior density intervals. These SNPs were used to construct associated candidate genes network aiming to explain the genetic mechanism behind this reproductive trait. The Bayesian model comparisons based on deviance posterior distribution indicated the superiority of Gaussian model. In general, our results suggest the presence of 19 significant SNPs, which mapped 13 genes. Besides, we predicted gene interactions through networks that are consistent with the mammals known breast biology (e.g., development of prolactin receptor signaling, and cell proliferation), captured known regulation binding sites, and provided candidate genes for that trait (e.g., TINAGL1 and ICK).

  18. Unavailability analysis of a PWR safety system by a Bayesian network

    International Nuclear Information System (INIS)

    Estevao, Lilian B.; Melo, Paulo Fernando F. Frutuoso e; Rivero, Jose J.


    Bayesian networks (BN) are directed acyclic graphs that have dependencies between variables, which are represented by nodes. These dependencies are represented by lines connecting the nodes and can be directed or not. Thus, it is possible to model conditional probabilities and calculate them with the help of Bayes' Theorem. The objective of this paper is to present the modeling of the failure of a safety system of a typical second generation light water reactor plant, the Containment Heat Removal System (CHRS), whose function is to cool the water of containment reservoir being recirculated through the Containment Spray Recirculation System (CSRS). CSRS is automatically initiated after a loss of coolant accident (LOCA) and together with the CHRS cools the reservoir water. The choice of this system was due to the fact that its analysis by a fault tree is available in Appendix II of the Reactor Safety Study Report (WASH-1400), and therefore all the necessary technical information is also available, such as system diagrams, failure data input and the fault tree itself that was developed to study system failure. The reason for the use of a bayesian network in this context was to assess its ability to reproduce the results of fault tree analyses and also verify the feasibility of treating dependent events. Comparing the fault trees and bayesian networks, the results obtained for the system failure were very close. (author)

  19. Inference on the Univariate Frailty Model: A Bayesian Reference Analysis Approach (United States)

    Tomazella, Vera Lucia D.; Martins, Camila Bertini; Bernardo, Jose Miguel


    In this work we present an approach involving objective Bayesian reference analysis to the Frailty model with univariate survival time and sources of heterogeneity that are not captured by covariates. The derivation unconditional hazard and survival leads to the Lomax distribution, also known as the Pareto distribution of the second kind. This distribution has an important position in life testing to adjust data from business failures. Reference analysis, introduced by Bernardo (1979) produce a new solution for this problem. The results are illustrated with survival data analyzed in the literature and simulated data.

  20. Bayesian latent variable models for the analysis of experimental psychology data. (United States)

    Merkle, Edgar C; Wang, Ting


    In this paper, we address the use of Bayesian factor analysis and structural equation models to draw inferences from experimental psychology data. While such application is non-standard, the models are generally useful for the unified analysis of multivariate data that stem from, e.g., subjects' responses to multiple experimental stimuli. We first review the models and the parameter identification issues inherent in the models. We then provide details on model estimation via JAGS and on Bayes factor estimation. Finally, we use the models to re-analyze experimental data on risky choice, comparing the approach to simpler, alternative methods.

  1. HICOSMO - X-ray analysis of a complete sample of galaxy clusters (United States)

    Schellenberger, G.; Reiprich, T.


    Galaxy clusters are known to be the largest virialized objects in the Universe. Based on the theory of structure formation one can use them as cosmological probes, since they originate from collapsed overdensities in the early Universe and witness its history. The X-ray regime provides the unique possibility to measure in detail the most massive visible component, the intra cluster medium. Using Chandra observations of a local sample of 64 bright clusters (HIFLUGCS) we provide total (hydrostatic) and gas mass estimates of each cluster individually. Making use of the completeness of the sample we quantify two interesting cosmological parameters by a Bayesian cosmological likelihood analysis. We find Ω_{M}=0.3±0.01 and σ_{8}=0.79±0.03 (statistical uncertainties) using our default analysis strategy combining both, a mass function analysis and the gas mass fraction results. The main sources of biases that we discuss and correct here are (1) the influence of galaxy groups (higher incompleteness in parent samples and a differing behavior of the L_{x} - M relation), (2) the hydrostatic mass bias (as determined by recent hydrodynamical simulations), (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other cosmological (non-negligible neutrino mass), and instrumental (calibration) effects.

  2. Application of cluster analysis for data driven market segmentation ...

    African Journals Online (AJOL)

    This research work is all out to capture: which standard of application of cluster analysis have emerged in the academic marketing literature, compare their standards of applying the methodological knowledge about clustering procedures and delineate sudden changes in clustering habits. These goals are achieved by ...

  3. Cluster analysis of word frequency dynamics (United States)

    Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.


    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.

  4. Cluster analysis of word frequency dynamics

    International Nuclear Information System (INIS)

    Maslennikova, Yu S; Bochkarev, V V; Belashova, I A


    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations

  5. Bayesian meta-analysis models for microarray data: a comparative study

    Directory of Open Access Journals (Sweden)

    Song Joon J


    Full Text Available Abstract Background With the growing abundance of microarray data, statistical methods are increasingly needed to integrate results across studies. Two common approaches for meta-analysis of microarrays include either combining gene expression measures across studies or combining summaries such as p-values, probabilities or ranks. Here, we compare two Bayesian meta-analysis models that are analogous to these methods. Results Two Bayesian meta-analysis models for microarray data have recently been introduced. The first model combines standardized gene expression measures across studies into an overall mean, accounting for inter-study variability, while the second combines probabilities of differential expression without combining expression values. Both models produce the gene-specific posterior probability of differential expression, which is the basis for inference. Since the standardized expression integration model includes inter-study variability, it may improve accuracy of results versus the probability integration model. However, due to the small number of studies typical in microarray meta-analyses, the variability between studies is challenging to estimate. The probability integration model eliminates the need to model variability between studies, and thus its implementation is more straightforward. We found in simulations of two and five studies that combining probabilities outperformed combining standardized gene expression measures for three comparison values: the percent of true discovered genes in meta-analysis versus individual studies; the percent of true genes omitted in meta-analysis versus separate studies, and the number of true discovered genes for fixed levels of Bayesian false discovery. We identified similar results when pooling two independent studies of Bacillus subtilis. We assumed that each study was produced from the same microarray platform with only two conditions: a treatment and control, and that the data sets

  6. From virtual clustering analysis to self-consistent clustering analysis: a mathematical study (United States)

    Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam


    In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.

  7. Cluster-based exposure variation analysis (United States)


    Background Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. Methods For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity. Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. Results C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate

  8. A Bayesian analysis of inflationary primordial spectrum models using Planck data (United States)

    Santos da Costa, Simony; Benetti, Micol; Alcaniz, Jailson


    The current available Cosmic Microwave Background (CMB) data show an anomalously low value of the CMB temperature fluctuations at large angular scales (l physics. In this paper, we analyse a set of cutoff inflationary PPS models using a Bayesian model comparison approach in light of the latest CMB data from the Planck Collaboration. Our results show that the standard power-law parameterisation is preferred over all models considered in the analysis, which motivates the search for alternative explanations for the observed lack of power in the CMB anisotropy spectrum.

  9. Bayesian Analysis of Linear and Nonlinear Latent Variable Models with Fixed Covariate and Ordered Categorical Data

    Directory of Open Access Journals (Sweden)

    Thanoon Y. Thanoon


    Full Text Available In this paper, ordered categorical variables are used to compare between linear and nonlinear interactions of fixed covariate and latent variables Bayesian structural equation models. Gibbs sampling method is applied for estimation and model comparison. Hidden continuous normal distribution (censored normal distribution is used to handle the problem of ordered categorical data. Statistical inferences, which involve estimation of parameters and their standard deviations, and residuals analyses for testing the selected model, are discussed. The proposed procedure is illustrated by a simulation data obtained from R program. Analysis are done by using OpenBUGS program.

  10. BayesLCA: An R Package for Bayesian Latent Class Analysis

    Directory of Open Access Journals (Sweden)

    Arthur White


    Full Text Available The BayesLCA package for R provides tools for performing latent class analysis within a Bayesian setting. Three methods for fitting the model are provided, incorporating an expectation-maximization algorithm, Gibbs sampling and a variational Bayes approximation. The article briefly outlines the methodology behind each of these techniques and discusses some of the technical difficulties associated with them. Methods to remedy these problems are also described. Visualization methods for each of these techniques are included, as well as criteria to aid model selection.

  11. A Bayesian analysis of the chromosome architecture of human disorders by integrating reductionist data. (United States)

    Emmert-Streib, Frank; de Matos Simoes, Ricardo; Tripathi, Shailesh; Glazko, Galina V; Dehmer, Matthias


    In this paper, we present a Bayesian approach to estimate a chromosome and a disorder network from the Online Mendelian Inheritance in Man (OMIM) database. In contrast to other approaches, we obtain statistic rather than deterministic networks enabling a parametric control in the uncertainty of the underlying disorder-disease gene associations contained in the OMIM, on which the networks are based. From a structural investigation of the chromosome network, we identify three chromosome subgroups that reflect architectural differences in chromosome-disorder associations that are predictively exploitable for a functional analysis of diseases.

  12. An analysis of hospital brand mark clusters. (United States)

    Vollmers, Stacy M; Miller, Darryl W; Kilic, Ozcan


    This study analyzed brand mark clusters (i.e., various types of brand marks displayed in combination) used by hospitals in the United States. The brand marks were assessed against several normative criteria for creating brand marks that are memorable and that elicit positive affect. Overall, results show a reasonably high level of adherence to many of these normative criteria. Many of the clusters exhibited pictorial elements that reflected benefits and that were conceptually consistent with the verbal content of the cluster. Also, many clusters featured icons that were balanced and moderately complex. However, only a few contained interactive imagery or taglines communicating benefits.

  13. Smartness and Italian Cities. A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Flavio Boscacci


    Full Text Available Smart cities have been recently recognized as the most pleasing and attractive places to live in; due to this, both scholars and policy-makers pay close attention to this topic. Specifically, urban “smartness” has been identified by plenty of characteristics that can be grouped into six dimensions (Giffinger et al. 2007: smart Economy (competitiveness, smart People (social and human capital, smart Governance (participation, smart Mobility (both ICTs and transport, smart Environment (natural resources, and smart Living (quality of life. According to this analytical framework, in the present paper the relation between urban attractiveness and the “smart” characteristics has been investigated in the 103 Italian NUTS3 province capitals in the year 2011. To this aim, a descriptive statistics has been followed by a regression analysis (OLS, where the dependent variable measuring the urban attractiveness has been proxied by housing market prices. Besides, a Cluster Analysis (CA has been developed in order to find differences and commonalities among the province capitals.The OLS results indicate that living, people and economy are the key drivers for achieving a better urban attractiveness. Environment, instead, keeps on playing a minor role. Besides, the CA groups the province capitals a

  14. A Two-Step Bayesian Approach for Propensity Score Analysis: Simulations and Case Study (United States)

    Kaplan, David; Chen, Jianshen


    A two-step Bayesian propensity score approach is introduced that incorporates prior information in the propensity score equation and outcome equation without the problems associated with simultaneous Bayesian propensity score approaches. The corresponding variance estimators are also provided. The two-step Bayesian propensity score is provided for…


    International Nuclear Information System (INIS)

    Trotta, R.; Johannesson, G.; Moskalenko, I. V.; Porter, T. A.; Ruiz de Austri, R.; Strong, A. W.


    Research in many areas of modern physics such as, e.g., indirect searches for dark matter and particle acceleration in supernova remnant shocks rely heavily on studies of cosmic rays (CRs) and associated diffuse emissions (radio, microwave, X-rays, γ-rays). While very detailed numerical models of CR propagation exist, a quantitative statistical analysis of such models has been so far hampered by the large computational effort that those models require. Although statistical analyses have been carried out before using semi-analytical models (where the computation is much faster), the evaluation of the results obtained from such models is difficult, as they necessarily suffer from many simplifying assumptions. The main objective of this paper is to present a working method for a full Bayesian parameter estimation for a numerical CR propagation model. For this study, we use the GALPROP code, the most advanced of its kind, which uses astrophysical information, and nuclear and particle data as inputs to self-consistently predict CRs, γ-rays, synchrotron, and other observables. We demonstrate that a full Bayesian analysis is possible using nested sampling and Markov Chain Monte Carlo methods (implemented in the SuperBayeS code) despite the heavy computational demands of a numerical propagation code. The best-fit values of parameters found in this analysis are in agreement with previous, significantly simpler, studies also based on GALPROP.

  16. Fuzzy Bayesian Network-Bow-Tie Analysis of Gas Leakage during Biomass Gasification.

    Directory of Open Access Journals (Sweden)

    Fang Yan

    Full Text Available Biomass gasification technology has been rapidly developed recently. But fire and poisoning accidents caused by gas leakage restrict the development and promotion of biomass gasification. Therefore, probabilistic safety assessment (PSA is necessary for biomass gasification system. Subsequently, Bayesian network-bow-tie (BN-bow-tie analysis was proposed by mapping bow-tie analysis into Bayesian network (BN. Causes of gas leakage and the accidents triggered by gas leakage can be obtained by bow-tie analysis, and BN was used to confirm the critical nodes of accidents by introducing corresponding three importance measures. Meanwhile, certain occurrence probability of failure was needed in PSA. In view of the insufficient failure data of biomass gasification, the occurrence probability of failure which cannot be obtained from standard reliability data sources was confirmed by fuzzy methods based on expert judgment. An improved approach considered expert weighting to aggregate fuzzy numbers included triangular and trapezoidal numbers was proposed, and the occurrence probability of failure was obtained. Finally, safety measures were indicated based on the obtained critical nodes. The theoretical occurrence probabilities in one year of gas leakage and the accidents caused by it were reduced to 1/10.3 of the original values by these safety measures.

  17. Fuzzy Bayesian Network-Bow-Tie Analysis of Gas Leakage during Biomass Gasification. (United States)

    Yan, Fang; Xu, Kaili; Yao, Xiwen; Li, Yang


    Biomass gasification technology has been rapidly developed recently. But fire and poisoning accidents caused by gas leakage restrict the development and promotion of biomass gasification. Therefore, probabilistic safety assessment (PSA) is necessary for biomass gasification system. Subsequently, Bayesian network-bow-tie (BN-bow-tie) analysis was proposed by mapping bow-tie analysis into Bayesian network (BN). Causes of gas leakage and the accidents triggered by gas leakage can be obtained by bow-tie analysis, and BN was used to confirm the critical nodes of accidents by introducing corresponding three importance measures. Meanwhile, certain occurrence probability of failure was needed in PSA. In view of the insufficient failure data of biomass gasification, the occurrence probability of failure which cannot be obtained from standard reliability data sources was confirmed by fuzzy methods based on expert judgment. An improved approach considered expert weighting to aggregate fuzzy numbers included triangular and trapezoidal numbers was proposed, and the occurrence probability of failure was obtained. Finally, safety measures were indicated based on the obtained critical nodes. The theoretical occurrence probabilities in one year of gas leakage and the accidents caused by it were reduced to 1/10.3 of the original values by these safety measures.

  18. Combination of Bayesian and Latent Semantic Analysis with Domain Specific Knowledge

    Directory of Open Access Journals (Sweden)

    Shen Lu


    Full Text Available With the development of information technology, electronic publications become popular. However, it is a challenge to retrieve information from electronic publications because the large amount of words, the synonymy problem and the polysemi problem. In this paper, we introduced a new algorithm called Bayesian Latent Semantic Analysis (BLSA. We chose to model text not based on terms but associations between words. Also, the significance of interesting features were improved by expand the number of similar terms with glossaries. Latent Semantic Analysis (LSA was chosen to discover significant features. Bayesian post probability was used to discover segmentation boundaries. Also, Dirchlet distribution was chosen to present the vector of topic distribution and calculate the maximum probability of the topics. Experimental results showed us that both Pk [8] and WindowsDiff [27] decreased 10% by using BLSA in comparison to the Lexical Cohesion with the original data. [8] Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R. (1990, 'Indexing by latent semantic analysis', Journal of the American Society for Information Science, vol. 41, n.6, pp. 391-407. [27] Pevzner, L. and Hearst, M.A. (2002. A critique and improvement of an evaluation metric for text segmentation, Computational Linguistics, vol. 28, no. 1, pp. 19-36.

  19. The Psychology of Yoga Practitioners: A Cluster Analysis. (United States)

    Genovese, Jeremy E C; Fondran, Kristine M


    Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall -Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.

  20. Meta-analysis of the effect of natural frequencies on Bayesian reasoning. (United States)

    McDowell, Michelle; Jacobs, Perke


    The natural frequency facilitation effect describes the finding that people are better able to solve descriptive Bayesian inference tasks when represented as joint frequencies obtained through natural sampling, known as natural frequencies, than as conditional probabilities. The present meta-analysis reviews 20 years of research seeking to address when, why, and for whom natural frequency formats are most effective. We review contributions from research associated with the 2 dominant theoretical perspectives, the ecological rationality framework and nested-sets theory, and test potential moderators of the effect. A systematic review of relevant literature yielded 35 articles representing 226 performance estimates. These estimates were statistically integrated using a bivariate mixed-effects model that yields summary estimates of average performances across the 2 formats and estimates of the effects of different study characteristics on performance. These study characteristics range from moderators representing individual characteristics (e.g., numeracy, expertise), to methodological differences (e.g., use of incentives, scoring criteria) and features of problem representation (e.g., short menu format, visual aid). Short menu formats (less computationally complex representations showing joint-events) and visual aids demonstrated some of the strongest moderation effects, improving performance for both conditional probability and natural frequency formats. A number of methodological factors (e.g., exposure to both problem formats) were also found to affect performance rates, emphasizing the importance of a systematic approach. We suggest how research on Bayesian reasoning can be strengthened by broadening the definition of successful Bayesian reasoning to incorporate choice and process and by applying different research methodologies. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  1. Effective connectivity analysis of default mode network based on the Bayesian network learning approach (United States)

    Li, Rui; Chen, Kewei; Zhang, Nan; Fleisher, Adam S.; Li, Yao; Wu, Xia


    This work proposed to use the linear Gaussian Bayesian network (BN) to construct the effective connectivity model of the brain's default mode network (DMN), a set of regions characterized by more increased neural activity during rest-state than most goal-oriented tasks. In a complete unsupervised data-driven manner, Bayesian information criterion (BIC) based learning approach was utilized to identify a highest scored network whose nodes (brain regions) were selected based on the result from the group independent component analysis (Group ICA) examining the DMN. We put forward to adopt the statistical significance testing method for regression coefficients used in stepwise regression analysis to further refine the network identified by BIC. The final established BN, learned from the functional magnetic resonance imaging (fMRI) data acquired from 12 healthy young subjects during rest-state, revealed that the hippocampus (HC) was the most influential brain region that affected activities in all other regions included in the BN. In contrast, the posterior cingulate cortex (PCC) was influenced by other regions, but had no reciprocal effects on any other region. Overall, the configuration of our BN illustrated that a prominent connection from HC to PCC existed in the DMN.

  2. Risk analysis of emergent water pollution accidents based on a Bayesian Network. (United States)

    Tang, Caihong; Yi, Yujun; Yang, Zhifeng; Sun, Jie


    To guarantee the security of water quality in water transfer channels, especially in open channels, analysis of potential emergent pollution sources in the water transfer process is critical. It is also indispensable for forewarnings and protection from emergent pollution accidents. Bridges above open channels with large amounts of truck traffic are the main locations where emergent accidents could occur. A Bayesian Network model, which consists of six root nodes and three middle layer nodes, was developed in this paper, and was employed to identify the possibility of potential pollution risk. Dianbei Bridge is reviewed as a typical bridge on an open channel of the Middle Route of the South to North Water Transfer Project where emergent traffic accidents could occur. Risk of water pollutions caused by leakage of pollutants into water is focused in this study. The risk for potential traffic accidents at the Dianbei Bridge implies a risk for water pollution in the canal. Based on survey data, statistical analysis, and domain specialist knowledge, a Bayesian Network model was established. The human factor of emergent accidents has been considered in this model. Additionally, this model has been employed to describe the probability of accidents and the risk level. The sensitive reasons for pollution accidents have been deduced. The case has also been simulated that sensitive factors are in a state of most likely to lead to accidents. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. A discrete-time Bayesian network reliability modeling and analysis framework

    International Nuclear Information System (INIS)

    Boudali, H.; Dugan, J.B.


    Dependability tools are becoming an indispensable tool for modeling and analyzing (critical) systems. However the growing complexity of such systems calls for increasing sophistication of these tools. Dependability tools need to not only capture the complex dynamic behavior of the system components, but they must be also easy to use, intuitive, and computationally efficient. In general, current tools have a number of shortcomings including lack of modeling power, incapacity to efficiently handle general component failure distributions, and ineffectiveness in solving large models that exhibit complex dependencies between their components. We propose a novel reliability modeling and analysis framework based on the Bayesian network (BN) formalism. The overall approach is to investigate timed Bayesian networks and to find a suitable reliability framework for dynamic systems. We have applied our methodology to two example systems and preliminary results are promising. We have defined a discrete-time BN reliability formalism and demonstrated its capabilities from a modeling and analysis point of view. This research shows that a BN based reliability formalism is a powerful potential solution to modeling and analyzing various kinds of system components behaviors and interactions. Moreover, being based on the BN formalism, the framework is easy to use and intuitive for non-experts, and provides a basis for more advanced and useful analyses such as system diagnosis

  4. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis (United States)

    Hwang, Heungsun; Dillon, William R.


    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…

  5. Using Cluster Analysis for Data Mining in Educational Technology Research (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.


    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  6. Exact Bayesian bin classification: a fast alternative to Bayesian classification and its application to neural response analysis. (United States)

    Endres, D; Földiák, P


    We investigate the general problem of signal classification and, in particular, that of assigning stimulus labels to neural spike trains recorded from single cortical neurons. Finding efficient ways of classifying neural responses is especially important in experiments involving rapid presentation of stimuli. We introduce a fast, exact alternative to Bayesian classification. Instead of estimating the class-conditional densities p(x|y) (where x is a scalar function of the feature[s], y the class label) and converting them to P(y|x) via Bayes' theorem, this probability is evaluated directly and without the need for approximations. This is achieved by integrating over all possible binnings of x with an upper limit on the number of bins. Computational time is quadratic in both the number of observed data points and the number of bins. The algorithm also allows for the computation of feedback signals, which can be used as input to subsequent stages of inference, e.g. neural network training. Responses of single neurons from high-level visual cortex (area STSa) to rapid sequences of complex visual stimuli are analysed. Information latency and response duration increase nonlinearly with presentation duration, suggesting that neural processing speeds adapt to presentation speeds.

  7. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A


    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...

  8. Visual cluster analysis and pattern recognition methods (United States)

    Osbourn, Gordon Cecil; Martinez, Rubel Francisco


    A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  9. Performance of an iterative two-stage bayesian technique for population pharmacokinetic analysis of rich data sets

    NARCIS (Netherlands)

    Proost, Johannes H.; Eleveld, Douglas J.


    Purpose. To test the suitability of an Iterative Two-Stage Bayesian (ITSB) technique for population pharmacokinetic analysis of rich data sets, and to compare ITSB with Standard Two-Stage (STS) analysis and nonlinear Mixed Effect Modeling (MEM). Materials and Methods. Data from a clinical study with

  10. Genetic analysis of loose cluster architecture in grapevine

    Directory of Open Access Journals (Sweden)

    Richter Robert


    Full Text Available Loose cluster architecture is a well known trait supporting Botrytis resilience by permitting a faster drying of bunches. Furthermore, a loose bunch enables a better application of fungicides into the cluster. The analysis of 150 F1 plants of the superior breeding line GF.GA-47-42 (‘Bacchus' x ‘Seyval blanc' crossed with ‘Villard blanc' segregating for compactness of the cluster was used for QTL analysis. Plenty of QTL were identified reproducibly for two years, QTLs stable over three growing seasons were identified for rachis length, peduncle length, and pedicel length. In a second approach ‘Pinot noir' clones showing variation for cluster architecture were analyzed for differential gene expression. Grown in three different German viticultural areas, loose versus compact clustered ‘Pinot noir' clones showed in gene expression experiments a candidate gene expressed fivefold higher in loosely clustered clones between stages BBCH57 and BBCH71.

  11. Developing a Validation Methodology for Expert-Informed Bayesian Network Models Supporting Nuclear Nonproliferation Analysis

    Energy Technology Data Exchange (ETDEWEB)

    White, Amanda M.; Gastelum, Zoe N.; Whitney, Paul D.


    Under the auspices of Pacific Northwest National Laboratory’s Signature Discovery Initiative (SDI), the research team developed a series of Bayesian Network models to assess multi-source signatures of nuclear programs. A Bayesian network is a mathematical model that can be used to marshal evidence to assess competing hypotheses. The purpose of the models was to allow non-expert analysts to benefit from the use of expert-informed mathematical models to assess nuclear programs, because such assessments require significant technical expertise ranging from the nuclear fuel cycle, construction and engineering, imagery analysis, and so forth. One such model developed under this research was aimed at assessing the consistency of open-source information about a nuclear facility with the facility’s declared use. The model incorporates factors such as location, security and safety features among others identified by subject matter experts as crucial to their assessments. The model includes key features, observables and their relationships. The model also provides documentation, which serves as training materials for the non-experts.

  12. Assessing State Nuclear Weapons Proliferation: Using Bayesian Network Analysis of Social Factors

    Energy Technology Data Exchange (ETDEWEB)

    Coles, Garill A.; Brothers, Alan J.; Olson, Jarrod; Whitney, Paul D.


    A Bayesian network (BN) model of social factors can support proliferation assessments by estimating the likelihood that a state will pursue a nuclear weapon. Social factors including political, economic, nuclear capability, security, and national identity and psychology factors may play as important a role in whether a State pursues nuclear weapons as more physical factors. This paper will show how using Bayesian reasoning on a generic case of a would-be proliferator State can be used to combine evidence that supports proliferation assessment. Theories and analysis by political scientists can be leveraged in a quantitative and transparent way to indicate proliferation risk. BN models facilitate diagnosis and inference in a probabilistic environment by using a network of nodes and acyclic directed arcs between the nodes whose connections, or absence of, indicate probabilistic relevance, or independence. We propose a BN model that would use information from both traditional safeguards and the strengthened safeguards associated with the Additional Protocol to indicate countries with a high risk of proliferating nuclear weapons. This model could be used in a variety of applications such a prioritization tool and as a component of state safeguards evaluations. This paper will discuss the benefits of BN reasoning, the development of Pacific Northwest National Laboratory’s (PNNL) BN state proliferation model and how it could be employed as an analytical tool.

  13. Bayesian analysis of gene essentiality based on sequencing of transposon insertion libraries (United States)

    DeJesus, Michael A.; Zhang, Yanjia J.; Sassetti, Christopher M.; Rubin, Eric J.; Sacchettini, James C.; Ioerger, Thomas R.


    Motivation: Next-generation sequencing affords an efficient analysis of transposon insertion libraries, which can be used to identify essential genes in bacteria. To analyse this high-resolution data, we present a formal Bayesian framework for estimating the posterior probability of essentiality for each gene, using the extreme-value distribution to characterize the statistical significance of the longest region lacking insertions within a gene. We describe a sampling procedure based on the Metropolis–Hastings algorithm to calculate posterior probabilities of essentiality while simultaneously integrating over unknown internal parameters. Results: Using a sequence dataset from a transposon library for Mycobacterium tuberculosis, we show that this Bayesian approach predicts essential genes that correspond well with genes shown to be essential in previous studies. Furthermore, we show that by using the extreme-value distribution to characterize genomic regions lacking transposon insertions, this method is capable of identifying essential domains within genes. This approach can be used for analysing transposon libraries in other organisms and augmenting essentiality predictions with statistical confidence scores. Availability: A python script implementing the method described is available for download from Contact: or Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23361328

  14. Bayesian soft x-ray tomography and MHD mode analysis on HL-2A (United States)

    Li, Dong; Liu, Yi; Svensson, J.; Liu, Y. Q.; Song, X. M.; Yu, L. M.; Mao, Rui; Fu, B. Z.; Deng, Wei; Yuan, B. S.; Ji, X. Q.; Xu, Yuan; Chen, Wei; Zhou, Yan; Yang, Q. W.; Duan, X. R.; Liu, Yong; HL-2A Team


    A Bayesian based tomography method using so-called Gaussian processes (GPs) for the emission model has been applied to the soft x-ray (SXR) diagnostics on HL-2A tokamak. To improve the accuracy of reconstructions, the standard GP is extended to a non-stationary version so that different smoothness between the plasma center and the edge can be taken into account in the algorithm. The uncertainty in the reconstruction arising from measurement errors and incapability can be fully analyzed by the usage of Bayesian probability theory. In this work, the SXR reconstructions by this non-stationary Gaussian processes tomography (NSGPT) method have been compared with the equilibrium magnetic flux surfaces, generally achieving a satisfactory agreement in terms of both shape and position. In addition, singular-value-decomposition (SVD) and Fast Fourier Transform (FFT) techniques have been applied for the analysis of SXR and magnetic diagnostics, in order to explore the spatial and temporal features of the saturated long-lived magnetohydrodynamics (MHD) instability induced by energetic particles during neutral beam injection (NBI) on HL-2A. The result shows that this ideal internal kink instability has a dominant m/n  =  1/1 mode structure along with a harmonics m/n  =  2/2, which are coupled near the q  =  1 surface with a rotation frequency of 12 kHz.

  15. Uncertainty analysis of depth predictions from seismic reflection data using Bayesian statistics (United States)

    Michelioudakis, Dimitrios G.; Hobbs, Richard W.; Caiado, Camila C. S.


    Estimating the depths of target horizons from seismic reflection data is an important task in exploration geophysics. To constrain these depths we need a reliable and accurate velocity model. Here, we build an optimum 2D seismic reflection data processing flow focused on pre - stack deghosting filters and velocity model building and apply Bayesian methods, including Gaussian process emulation and Bayesian History Matching (BHM), to estimate the uncertainties of the depths of key horizons near the borehole DSDP-258 located in the Mentelle Basin, south west of Australia, and compare the results with the drilled core from that well. Following this strategy, the tie between the modelled and observed depths from DSDP-258 core was in accordance with the ± 2σ posterior credibility intervals and predictions for depths to key horizons were made for the two new drill sites, adjacent the existing borehole of the area. The probabilistic analysis allowed us to generate multiple realizations of pre-stack depth migrated images, these can be directly used to better constrain interpretation and identify potential risk at drill sites. The method will be applied to constrain the drilling targets for the upcoming International Ocean Discovery Program (IODP), leg 369.

  16. Capturing cognitive causal paths in human reliability analysis with Bayesian network models

    International Nuclear Information System (INIS)

    Zwirglmaier, Kilian; Straub, Daniel; Groth, Katrina M.


    reIn the last decade, Bayesian networks (BNs) have been identified as a powerful tool for human reliability analysis (HRA), with multiple advantages over traditional HRA methods. In this paper we illustrate how BNs can be used to include additional, qualitative causal paths to provide traceability. The proposed framework provides the foundation to resolve several needs frequently expressed by the HRA community. First, the developed extended BN structure reflects the causal paths found in cognitive psychology literature, thereby addressing the need for causal traceability and strong scientific basis in HRA. Secondly, the use of node reduction algorithms allows the BN to be condensed to a level of detail at which quantification is as straightforward as the techniques used in existing HRA. We illustrate the framework by developing a BN version of the critical data misperceived crew failure mode in the IDHEAS HRA method, which is currently under development at the US NRC . We illustrate how the model could be quantified with a combination of expert-probabilities and information from operator performance databases such as SACADA. This paper lays the foundations necessary to expand the cognitive and quantitative foundations of HRA. - Highlights: • A framework for building traceable BNs for HRA, based on cognitive causal paths. • A qualitative BN structure, directly showing these causal paths is developed. • Node reduction algorithms are used for making the BN structure quantifiable. • BN quantified through expert estimates and observed data (Bayesian updating). • The framework is illustrated for a crew failure mode of IDHEAS.

  17. Value of information analysis for interventional and counterfactual Bayesian networks in forensic medical sciences. (United States)

    Constantinou, Anthony Costa; Yet, Barbaros; Fenton, Norman; Neil, Martin; Marsh, William


    Inspired by real-world examples from the forensic medical sciences domain, we seek to determine whether a decision about an interventional action could be subject to amendments on the basis of some incomplete information within the model, and whether it would be worthwhile for the decision maker to seek further information prior to suggesting a decision. The method is based on the underlying principle of Value of Information to enhance decision analysis in interventional and counterfactual Bayesian networks. The method is applied to two real-world Bayesian network models (previously developed for decision support in forensic medical sciences) to examine the average gain in terms of both Value of Information (average relative gain ranging from 11.45% and 59.91%) and decision making (potential amendments in decision making ranging from 0% to 86.8%). We have shown how the method becomes useful for decision makers, not only when decision making is subject to amendments on the basis of some unknown risk factors, but also when it is not. Knowing that a decision outcome is independent of one or more unknown risk factors saves us from the trouble of seeking information about the particular set of risk factors. Further, we have also extended the assessment of this implication to the counterfactual case and demonstrated how answers about interventional actions are expected to change when some unknown factors become known, and how useful this becomes in forensic medical science. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Rational hypocrisy: a Bayesian analysis based on informal argumentation and slippery slopes. (United States)

    Rai, Tage S; Holyoak, Keith J


    Moral hypocrisy is typically viewed as an ethical accusation: Someone is applying different moral standards to essentially identical cases, dishonestly claiming that one action is acceptable while otherwise equivalent actions are not. We suggest that in some instances the apparent logical inconsistency stems from different evaluations of a weak argument, rather than dishonesty per se. Extending Corner, Hahn, and Oaksford's (2006) analysis of slippery slope arguments, we develop a Bayesian framework in which accusations of hypocrisy depend on inferences of shared category membership between proposed actions and previous standards, based on prior probabilities that inform the strength of competing hypotheses. Across three experiments, we demonstrate that inferences of hypocrisy increase as perceptions of the likelihood of shared category membership between precedent cases and current cases increase, that these inferences follow established principles of category induction, and that the presence of self-serving motives increases inferences of hypocrisy independent of changes in the actions themselves. Taken together, these results demonstrate that Bayesian analyses of weak arguments may have implications for assessing moral reasoning. © 2014 Cognitive Science Society, Inc.

  19. Bayesian bivariate meta-analysis of diagnostic test studies using integrated nested Laplace approximations. (United States)

    Paul, M; Riebler, A; Bachmann, L M; Rue, H; Held, L


    For bivariate meta-analysis of diagnostic studies, likelihood approaches are very popular. However, they often run into numerical problems with possible non-convergence. In addition, the construction of confidence intervals is controversial. Bayesian methods based on Markov chain Monte Carlo (MCMC) sampling could be used, but are often difficult to implement, and require long running times and diagnostic convergence checks. Recently, a new Bayesian deterministic inference approach for latent Gaussian models using integrated nested Laplace approximations (INLA) has been proposed. With this approach MCMC sampling becomes redundant as the posterior marginal distributions are directly and accurately approximated. By means of a real data set we investigate the influence of the prior information provided and compare the results obtained by INLA, MCMC, and the maximum likelihood procedure SAS PROC NLMIXED. Using a simulation study we further extend the comparison of INLA and SAS PROC NLMIXED by assessing their performance in terms of bias, mean-squared error, coverage probability, and convergence rate. The results indicate that INLA is more stable and gives generally better coverage probabilities for the pooled estimates and less biased estimates of variance parameters. The user-friendliness of INLA is demonstrated by documented R-code. Copyright (c) 2010 John Wiley & Sons, Ltd.

  20. Fast Bayesian whole-brain fMRI analysis with spatial 3D priors. (United States)

    Sidén, Per; Eklund, Anders; Bolin, David; Villani, Mattias


    Spatial whole-brain Bayesian modeling of task-related functional magnetic resonance imaging (fMRI) is a great computational challenge. Most of the currently proposed methods therefore do inference in subregions of the brain separately or do approximate inference without comparison to the true posterior distribution. A popular such method, which is now the standard method for Bayesian single subject analysis in the SPM software, is introduced in Penny et al. (2005b). The method processes the data slice-by-slice and uses an approximate variational Bayes (VB) estimation algorithm that enforces posterior independence between activity coefficients in different voxels. We introduce a fast and practical Markov chain Monte Carlo (MCMC) scheme for exact inference in the same model, both slice-wise and for the whole brain using a 3D prior on activity coefficients. The algorithm exploits sparsity and uses modern techniques for efficient sampling from high-dimensional Gaussian distributions, leading to speed-ups without which MCMC would not be a practical option. Using MCMC, we are for the first time able to evaluate the approximate VB posterior against the exact MCMC posterior, and show that VB can lead to spurious activation. In addition, we develop an improved VB method that drops the assumption of independent voxels a posteriori. This algorithm is shown to be much faster than both MCMC and the original VB for large datasets, with negligible error compared to the MCMC posterior. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis. (United States)

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun


    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  2. EM Clustering Analysis of Diabetes Patients Basic Diagnosis Index


    Wu, Cai; Steinbauer, Jeffrey R.; Kuo, Grace M.


    Cluster analysis can group similar instances into same group. Partitioning cluster assigns classes to samples without known the classes in advance. Most common algorithms are K-means and Expectation Maximization (EM). EM clustering algorithm can find number of distributions of generating data and build “mixture models”. It identifies groups that are either overlapping or varying sizes and shapes. In this project, by using EM in Machine Learning Algorithm in JAVA (WEKA) syste...

  3. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann


    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  4. Bayesian nonparametric estimation of continuous monotone functions with applications to dose-response analysis. (United States)

    Bornkamp, Björn; Ickstadt, Katja


    In this article, we consider monotone nonparametric regression in a Bayesian framework. The monotone function is modeled as a mixture of shifted and scaled parametric probability distribution functions, and a general random probability measure is assumed as the prior for the mixing distribution. We investigate the choice of the underlying parametric distribution function and find that the two-sided power distribution function is well suited both from a computational and mathematical point of view. The model is motivated by traditional nonlinear models for dose-response analysis, and provides possibilities to elicitate informative prior distributions on different aspects of the curve. The method is compared with other recent approaches to monotone nonparametric regression in a simulation study and is illustrated on a data set from dose-response analysis.

  5. Bayesian analysis for exponential random graph models using the adaptive exchange sampler

    KAUST Repository

    Jin, Ick Hoon


    Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the existence of intractable normalizing constants. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the issue of intractable normalizing constants encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.

  6. Bayesian methods for meta-analysis of causal relationships estimated using genetic instrumental variables

    DEFF Research Database (Denmark)

    Burgess, Stephen; Thompson, Simon G; Thompson, Grahame


    Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context...... of multiple genetic markers measured in multiple studies, based on the analysis of individual participant data. First, for a single genetic marker in one study, we show that the usual ratio of coefficients approach can be reformulated as a regression with heterogeneous error in the explanatory variable....... This can be implemented using a Bayesian approach, which is next extended to include multiple genetic markers. We then propose a hierarchical model for undertaking a meta-analysis of multiple studies, in which it is not necessary that the same genetic markers are measured in each study. This provides...

  7. Multi-Objective data analysis using Bayesian Inference for MagLIF experiments (United States)

    Knapp, Patrick; Glinksy, Michael; Evans, Matthew; Gom, Matth; Han, Stephanie; Harding, Eric; Slutz, Steve; Hahn, Kelly; Harvey-Thompson, Adam; Geissel, Matthias; Ampleford, David; Jennings, Christopher; Schmit, Paul; Smith, Ian; Schwarz, Jens; Peterson, Kyle; Jones, Brent; Rochau, Gregory; Sinars, Daniel


    The MagLIF concept has recently demonstrated Gbar pressures and confinement of charged fusion products at stagnation. We present a new analysis methodology that allows for integration of multiple diagnostics including nuclear, x-ray imaging, and x-ray power to determine the temperature, pressure, liner areal density, and mix fraction. A simplified hot-spot model is used with a Bayesian inference network to determine the most probable model parameters that describe the observations while simultaneously revealing the principal uncertainties in the analysis. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525.

  8. Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea. (United States)

    Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun


    Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.

  9. An efficient Bayesian data-worth analysis using a multilevel Monte Carlo method (United States)

    Lu, Dan; Ricciuto, Daniel; Evans, Katherine


    Improving the understanding of subsurface systems and thus reducing prediction uncertainty requires collection of data. As the collection of subsurface data is costly, it is important that the data collection scheme is cost-effective. Design of a cost-effective data collection scheme, i.e., data-worth analysis, requires quantifying model parameter, prediction, and both current and potential data uncertainties. Assessment of these uncertainties in large-scale stochastic subsurface hydrological model simulations using standard Monte Carlo (MC) sampling or surrogate modeling is extremely computationally intensive, sometimes even infeasible. In this work, we propose an efficient Bayesian data-worth analysis using a multilevel Monte Carlo (MLMC) method. Compared to the standard MC that requires a significantly large number of high-fidelity model executions to achieve a prescribed accuracy in estimating expectations, the MLMC can substantially reduce computational costs using multifidelity approximations. Since the Bayesian data-worth analysis involves a great deal of expectation estimation, the cost saving of the MLMC in the assessment can be outstanding. While the proposed MLMC-based data-worth analysis is broadly applicable, we use it for a highly heterogeneous two-phase subsurface flow simulation to select an optimal candidate data set that gives the largest uncertainty reduction in predicting mass flow rates at four production wells. The choices made by the MLMC estimation are validated by the actual measurements of the potential data, and consistent with the standard MC estimation. But compared to the standard MC, the MLMC greatly reduces the computational costs.

  10. Antiplatelets versus anticoagulants for the treatment of cervical artery dissection: Bayesian meta-analysis.

    Directory of Open Access Journals (Sweden)

    Hakan Sarikaya

    Full Text Available To compare the effects of antiplatelets and anticoagulants on stroke and death in patients with acute cervical artery dissection.Systematic review with Bayesian meta-analysis.The reviewers searched MEDLINE and EMBASE from inception to November 2012, checked reference lists, and contacted authors.Studies were eligible if they were randomised, quasi-randomised or observational comparisons of antiplatelets and anticoagulants in patients with cervical artery dissection.Data were extracted by one reviewer and checked by another. Bayesian techniques were used to appropriately account for studies with scarce event data and imbalances in the size of comparison groups.Thirty-seven studies (1991 patients were included. We found no randomised trial. The primary analysis revealed a large treatment effect in favour of antiplatelets for preventing the primary composite outcome of ischaemic stroke, intracranial haemorrhage or death within the first 3 months after treatment initiation (relative risk 0.32, 95% credibility interval 0.12 to 0.63, while the degree of between-study heterogeneity was moderate (τ(2 = 0.18. In an analysis restricted to studies of higher methodological quality, the possible advantage of antiplatelets over anticoagulants was less obvious than in the main analysis (relative risk 0.73, 95% credibility interval 0.17 to 2.30.In view of these results and the safety advantages, easier usage and lower cost of antiplatelets, we conclude that antiplatelets should be given precedence over anticoagulants as a first line treatment in patients with cervical artery dissection unless results of an adequately powered randomised trial suggest the opposite.

  11. How to interpret the results of medical time series data analysis: Classical statistical approaches versus dynamic Bayesian network modeling. (United States)

    Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall


    Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.

  12. Entropic Approach to Multiscale Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Antonio Insolia


    Full Text Available Recently, a novel method has been introduced to estimate the statistical significance of clustering in the direction distribution of objects. The method involves a multiscale procedure, based on the Kullback–Leibler divergence and the Gumbel statistics of extreme values, providing high discrimination power, even in presence of strong background isotropic contamination. It is shown that the method is: (i semi-analytical, drastically reducing computation time; (ii very sensitive to small, medium and large scale clustering; (iii not biased against the null hypothesis. Applications to the physics of ultra-high energy cosmic rays, as a cosmological probe, are presented and discussed.

  13. Statistical Analysis of Firearms/Toolmarks Interpretation of Cartridge Case Evidence Using IBIS and Bayesian Networks (United States)


    Taroni, F, Aitken, C, Garbolino, P, Biedermann, A, Bayesian Networks and Probabilistic Inference in Forensic Science (Statistics in Practice), Wiley...were transformed into a Bayesian network . Bayesian networks allow for the assessment of evidence based upon two propositions (same gun or different...gun). This allows a forensic scientist to provide insight to courts and investigators as to the value of the evidence. The breech face (BF) and

  14. Failure rate modeling using fault tree analysis and Bayesian network: DEMO pulsed operation turbine study case

    International Nuclear Information System (INIS)

    Dongiovanni, Danilo Nicola; Iesmantas, Tomas


    Highlights: • RAMI (Reliability, Availability, Maintainability and Inspectability) assessment of secondary heat transfer loop for a DEMO nuclear fusion plant. • Definition of a fault tree for a nuclear steam turbine operated in pulsed mode. • Turbine failure rate models update by mean of a Bayesian network reflecting the fault tree analysis in the considered scenario. • Sensitivity analysis on system availability performance. - Abstract: Availability will play an important role in the Demonstration Power Plant (DEMO) success from an economic and safety perspective. Availability performance is commonly assessed by Reliability Availability Maintainability Inspectability (RAMI) analysis, strongly relying on the accurate definition of system components failure modes (FM) and failure rates (FR). Little component experience is available in fusion application, therefore requiring the adaptation of literature FR to fusion plant operating conditions, which may differ in several aspects. As a possible solution to this problem, a new methodology to extrapolate/estimate components failure rate under different operating conditions is presented. The DEMO Balance of Plant nuclear steam turbine component operated in pulse mode is considered as study case. The methodology moves from the definition of a fault tree taking into account failure modes possibly enhanced by pulsed operation. The fault tree is then translated into a Bayesian network. A statistical model for the turbine system failure rate in terms of subcomponents’ FR is hence obtained, allowing for sensitivity analyses on the structured mixture of literature and unknown FR data for which plausible value intervals are investigated to assess their impact on the whole turbine system FR. Finally, the impact of resulting turbine system FR on plant availability is assessed exploiting a Reliability Block Diagram (RBD) model for a typical secondary cooling system implementing a Rankine cycle. Mean inherent availability

  15. Amiodarone, lidocaine, magnesium or placebo in shock refractory ventricular arrhythmia: A Bayesian network meta-analysis. (United States)

    Khan, Safi U; Winnicka, Lydia; Saleem, Muhammad A; Rahman, Hammad; Rehman, Najeeb

    Recent evidence challenges, the superiority of amiodarone, compared to other anti-arrhythmic medications, as the agent of choice in pulseless ventricular tachycardia (VT) or ventricular fibrillation (VF). We conducted Bayesian network and traditional meta-analyses to investigate the relative efficacies of amiodarone, lidocaine, magnesium (MgSO4) and placebo as treatments for pulseless VT or VF. Eleven studies [5200 patients, 7 randomized trials (4, 611 patients) and 4 non-randomized studies (589 patients)], were included in this meta-analysis. The search was conducted, from 1981 to February 2017, using MEDLINE, EMBASE and The Cochrane Library. Estimates were reported as odds ratio (OR) with 95% Credible Interval (CrI). Markov chain Monte Carlo (MCMC) modeling was used to estimate the relative ranking probability of each treatment group based on surface under cumulative ranking curve (SUCRA). Bayesian analysis demonstrated that lidocaine had superior effects on survival to hospital discharge, compared to amiodarone (OR, 2.18, 95% Cr.I 1.26-3.13), MgSO4 (OR, 2.03, 95% Cr.I 0.74-4.82) and placebo (OR, 2.42, 95% Cr.I 1.39-3.54). There were no statistical differences among treatment groups regarding survival to hospital admission/24 h (hrs) and return of spontaneous circulation (ROSC). Probability analysis revealed that lidocaine was the most effective therapy for survival to hospital discharge (SUCRA, 97%). We conclude that lidocaine may be the most effective anti-arrhythmic agent for survival to hospital discharge in patients with pulseless VT or VF. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Bayesian Networks for the Age Classification of Living Individuals: A Study on Transition Analysis

    Directory of Open Access Journals (Sweden)

    Emanuele Sironi


    Full Text Available Over the past few decades, age estimation of living persons has represented a challenging task for many forensic services worldwide. In general, the process for age estimation includes the observation of the degree of maturity reached by some physical attributes, such as dentition or several ossification centers. The estimated chronological age or the probability that an individual belongs to a meaningful class of ages is then obtained from the observed degree of maturity by means of various statistical methods. Among these methods, those developed in a Bayesian framework offer to users the possibility of coherently dealing with the uncertainty associated with age estimation and of assessing in a transparent and logical way the probability that an examined individual is younger or older than a given age threshold. Recently, a Bayesian network for age estimation has been presented in scientific literature; this kind of probabilistic graphical tool may facilitate the use of the probabilistic approach. Probabilities of interest in the network are assigned by means of transition analysis, a statistical parametric model, which links the chronological age and the degree of maturity by means of specific regression models, such as logit or probit models. Since different regression models can be employed in transition analysis, the aim of this paper is to study the influence of the model in the classification of individuals. The analysis was performed using a dataset related to the ossifications status of the medial clavicular epiphysis and results support that the classification of individuals is not dependent on the choice of the regression model.

  17. Failure rate modeling using fault tree analysis and Bayesian network: DEMO pulsed operation turbine study case

    Energy Technology Data Exchange (ETDEWEB)

    Dongiovanni, Danilo Nicola, E-mail: [ENEA, Nuclear Fusion and Safety Technologies Department, via Enrico Fermi 45, Frascati 00040 (Italy); Iesmantas, Tomas [LEI, Breslaujos str. 3 Kaunas (Lithuania)


    Highlights: • RAMI (Reliability, Availability, Maintainability and Inspectability) assessment of secondary heat transfer loop for a DEMO nuclear fusion plant. • Definition of a fault tree for a nuclear steam turbine operated in pulsed mode. • Turbine failure rate models update by mean of a Bayesian network reflecting the fault tree analysis in the considered scenario. • Sensitivity analysis on system availability performance. - Abstract: Availability will play an important role in the Demonstration Power Plant (DEMO) success from an economic and safety perspective. Availability performance is commonly assessed by Reliability Availability Maintainability Inspectability (RAMI) analysis, strongly relying on the accurate definition of system components failure modes (FM) and failure rates (FR). Little component experience is available in fusion application, therefore requiring the adaptation of literature FR to fusion plant operating conditions, which may differ in several aspects. As a possible solution to this problem, a new methodology to extrapolate/estimate components failure rate under different operating conditions is presented. The DEMO Balance of Plant nuclear steam turbine component operated in pulse mode is considered as study case. The methodology moves from the definition of a fault tree taking into account failure modes possibly enhanced by pulsed operation. The fault tree is then translated into a Bayesian network. A statistical model for the turbine system failure rate in terms of subcomponents’ FR is hence obtained, allowing for sensitivity analyses on the structured mixture of literature and unknown FR data for which plausible value intervals are investigated to assess their impact on the whole turbine system FR. Finally, the impact of resulting turbine system FR on plant availability is assessed exploiting a Reliability Block Diagram (RBD) model for a typical secondary cooling system implementing a Rankine cycle. Mean inherent availability

  18. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale. (United States)

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy


    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  19. The Reliability of Inverse Screen Tests for Cluster Analysis. (United States)

    Lathrop, Richard G.; Williams, Janice E.


    A Monte Carlo study, involving 6,000 "computer subjects" and three raters, explored the reliability of the inverse screen test for cluster analysis. Results indicate that the inverse screen may be a useful and reliable cluster analytic technique for determining the number of true groups. (TJH)

  20. Blaeu: Mapping and navigating large tables with cluster analysis

    NARCIS (Netherlands)

    T.H.J. Sellam (Thibault); C.P. Cijvat (Robin); R.A. Koopmanschap (Richard); M.L. Kersten (Martin)


    textabstractBlaeu is an interactive database exploration tool. Its aim is to guide casual users through large data tables, ultimately triggering insights and serendipity. To do so, it relies on a double cluster analysis mechanism. It clusters the data vertically: it detects themes, groups of


    Directory of Open Access Journals (Sweden)

    Anton Agus Setyawan


    Full Text Available Study of SME in Indonesia related with business networks and performance in these business organizations. In many cases, regional administration in Indonesia develops SME business network in the form of clusters. This study analyzes SME fisheries clusters with supply chain analysis.  We also develop performance assessment of SME fisheries cluster by using multivariate model. This study involves 62 SMEs in Sragen, Central Java Indonesia. Those SMEs  includes in fisheries cluster in the area. Our findings show that SME fisheries cluster has in-efficient supply chain. This business clusters has problems in profit setting and delivery time which harm their performance. We measure business performance by using business selling, profit rate and asset growth. We found that cost structure, man power and physical production has positive effects to business performance.

  2. Merging Galaxy Clusters: Analysis of Simulated Analogs (United States)

    Nguyen, Jayke; Wittman, David; Cornell, Hunter


    The nature of dark matter can be better constrained by observing merging galaxy clusters. However, uncertainty in the viewing angle leads to uncertainty in dynamical quantities such as 3-d velocities, 3-d separations, and time since pericenter. The classic timing argument links these quantities via equations of motion, but neglects effects of nonzero impact parameter (i.e. it assumes velocities are parallel to the separation vector), dynamical friction, substructure, and larger-scale environment. We present a new approach using n-body cosmological simulations that naturally incorporate these effects. By uniformly sampling viewing angles about simulated cluster analogs, we see projected merger parameters in the many possible configurations of a given cluster. We select comparable simulated analogs and evaluate the likelihood of particular merger parameters as a function of viewing angle. We present viewing angle constraints for a sample of observed mergers including the Bullet cluster and El Gordo, and show that the separation vectors are closer to the plane of the sky than previously reported.

  3. Genotypic stability and clustering analysis of confectionery ...

    African Journals Online (AJOL)

    Nine groundnut genotypes were evaluated in terminal moisture-stress areas of northeastern Ethiopia during 2005 and 2006 cropping seasons with the objective of analyzing genotypic stability and clustering of confectionery groundnut for seed and protein yield. The genotypes were evaluated on a plot size of 15 m2 at Kobo ...

  4. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation. (United States)

    Karabatsos, George


    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected

  5. Bayesian analysis of the effective charge from spectroscopic bremsstrahlung measurement in fusion plasmas (United States)

    Krychowiak, M.; König, R.; Klinger, T.; Fischer, R.


    At the stellarator Wendelstein 7-AS (W7-AS) a spectrally resolving two channel system for the measurement of line-of-sight averaged Zeff values has been tested in preparation for its planned installation as a multichannel Zeff-profile measurement system on the stellarator Wendelstein 7-X (W7-X) which is presently under construction. The measurement is performed using the bremsstrahlung intensity in the wavelength region of ultraviolet to near infrared. The spectrally resolved measurement allows to eliminate signal contamination by line radiation. For statistical data analysis a procedure based on Bayesian probability theory has been developed. With this method it is possible to estimate the bremsstrahlung background in the measured signal and its error without the necessity to fit the spectral lines. For evaluation of the random error in Zeff the signal noise has been investigated. Furthermore, the linearity and behavior of the charge-coupled device detector at saturation has been analyzed.

  6. A Bayesian approach to probabilistic sensitivity analysis in structured benefit-risk assessment. (United States)

    Waddingham, Ed; Mt-Isa, Shahrul; Nixon, Richard; Ashby, Deborah


    Quantitative decision models such as multiple criteria decision analysis (MCDA) can be used in benefit-risk assessment to formalize trade-offs between benefits and risks, providing transparency to the assessment process. There is however no well-established method for propagating uncertainty of treatment effects data through such models to provide a sense of the variability of the benefit-risk balance. Here, we present a Bayesian statistical method that directly models the outcomes observed in randomized placebo-controlled trials and uses this to infer indirect comparisons between competing active treatments. The resulting treatment effects estimates are suitable for use within the MCDA setting, and it is possible to derive the distribution of the overall benefit-risk balance through Markov Chain Monte Carlo simulation. The method is illustrated using a case study of natalizumab for relapsing-remitting multiple sclerosis. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Bayesian analysis of genetic association across tree-structured routine healthcare data in the UK Biobank. (United States)

    Cortes, Adrian; Dendrou, Calliope A; Motyer, Allan; Jostins, Luke; Vukcevic, Damjan; Dilthey, Alexander; Donnelly, Peter; Leslie, Stephen; Fugger, Lars; McVean, Gil


    Genetic discovery from the multitude of phenotypes extractable from routine healthcare data can transform understanding of the human phenome and accelerate progress toward precision medicine. However, a critical question when analyzing high-dimensional and heterogeneous data is how best to interrogate increasingly specific subphenotypes while retaining statistical power to detect genetic associations. Here we develop and employ a new Bayesian analysis framework that exploits the hierarchical structure of diagnosis classifications to analyze genetic variants against UK Biobank disease phenotypes derived from self-reporting and hospital episode statistics. Our method displays a more than 20% increase in power to detect genetic effects over other approaches and identifies new associations between classical human leukocyte antigen (HLA) alleles and common immune-mediated diseases (IMDs). By applying the approach to genetic risk scores (GRSs), we show the extent of genetic sharing among IMDs and expose differences in disease perception or diagnosis with potential clinical implications.

  8. Bayesian Reliability Analysis of Non-Stationarity in Multi-agent Systems

    Directory of Open Access Journals (Sweden)

    TONT Gabriela


    Full Text Available The Bayesian methods provide information about the meaningful parameters in a statistical analysis obtained by combining the prior and sampling distributions to form the posterior distribution of theparameters. The desired inferences are obtained from this joint posterior. An estimation strategy for hierarchical models, where the resulting joint distribution of the associated model parameters cannotbe evaluated analytically, is to use sampling algorithms, known as Markov Chain Monte Carlo (MCMC methods, from which approximate solutions can be obtained. Both serial and parallel configurations of subcomponents are permitted. The capability of time-dependent method to describe a multi-state system is based on a case study, assessingthe operatial situation of studied system. The rationality and validity of the presented model are demonstrated via a case of study. The effect of randomness of the structural parameters is alsoexamined.

  9. Bayesian operational modal analysis with asynchronous data, part I: Most probable value (United States)

    Zhu, Yi-Chen; Au, Siu-Kui


    In vibration tests, multiple sensors are used to obtain detailed mode shape information about the tested structure. Time synchronisation among data channels is required in conventional modal identification approaches. Modal identification can be more flexibly conducted if this is not required. Motivated by the potential gain in feasibility and economy, this work proposes a Bayesian frequency domain method for modal identification using asynchronous 'output-only' ambient data, i.e. 'operational modal analysis'. It provides a rigorous means for identifying the global mode shape taking into account the quality of the measured data and their asynchronous nature. This paper (Part I) proposes an efficient algorithm for determining the most probable values of modal properties. The method is validated using synthetic and laboratory data. The companion paper (Part II) investigates identification uncertainty and challenges in applications to field vibration data.

  10. Analytic Bayesian solution of the two-stage poisson-type problem in probabilistic risk analysis

    International Nuclear Information System (INIS)

    Frohner, F.H.


    The basic purpose of probabilistic risk analysis is to make inferences about the probabilities of various postulated events, with an account of all relevant information such as prior knowledge and operating experience with the specific system under study, as well as experience with other similar systems. Estimation of the failure rate of a Poisson-type system leads to an especially simple Bayesian solution in closed form if the prior probabilty implied by the invariance properties of the problem is properly taken into account. This basic simplicity persists if a more realistic prior, representing order of magnitude knowledge of the rate parameter, is employed instead. Moreover, the more realistic prior allows direct incorporation of experience gained from other similar systems, without need to postulate a statistical model for an underlying ensemble. The analytic formalism is applied to actual nuclear reactor data

  11. Bayesian Analysis of Multidimensional Item Response Theory Models: A Discussion and Illustration of Three Response Style Models (United States)

    Leventhal, Brian C.; Stone, Clement A.


    Interest in Bayesian analysis of item response theory (IRT) models has grown tremendously due to the appeal of the paradigm among psychometricians, advantages of these methods when analyzing complex models, and availability of general-purpose software. Possible models include models which reflect multidimensionality due to designed test structure,…

  12. Time Series Analysis of Non-Gaussian Observations Based on State Space Models from Both Classical and Bayesian Perspectives

    NARCIS (Netherlands)

    Durbin, J.; Koopman, S.J.M.


    The analysis of non-Gaussian time series using state space models is considered from both classical and Bayesian perspectives. The treatment in both cases is based on simulation using importance sampling and antithetic variables; Monte Carlo Markov chain methods are not employed. Non-Gaussian

  13. Prognostic value of cluster analysis of severe asthma phenotypes. (United States)

    Bourdin, Arnaud; Molinari, Nicolas; Vachier, Isabelle; Varrin, Muriel; Marin, Grégory; Gamez, Anne-Sophie; Paganin, Fabrice; Chanez, Pascal


    Cross-sectional severe asthma cluster analysis identified different phenotypes. We tested the hypothesis that these clusters will follow different courses. We aimed to identify which asthma outcomes are specific and coherently associated with these different phenotypes in a prospective longitudinal cohort. In a longitudinal cohort of 112 patients with severe asthma, the 5 Severe Asthma Research Program (SARP) clusters were identified by means of algorithm application. Because patients of the present cohort all had severe asthma compared with the SARP cohort, homemade clusters were identified and also tested. At the subsequent visit, we investigated several outcomes related to asthma control at 1 year (6-item Asthma Control Questionnaire [ACQ-6], lung function, and medication requirement) and then recorded the 3-year exacerbations rate and time to first exacerbation. The SARP algorithm discriminated the 5 clusters at entry for age, asthma duration, lung function, blood eosinophil measurement, ACQ-6 scores, and diabetes comorbidity. Four homemade clusters were mostly segregated by best ever achieved FEV1 values and discriminated the groups by a few clinical characteristics. Nonetheless, all these clusters shared similar asthma outcomes related to asthma control as follows. The ACQ-6 score did not change in any cluster. Exacerbation rate and time to first exacerbation were similar, as were treatment requirements. Severe asthma phenotypes identified by using a previously reported cluster analysis or newly homemade clusters do not behave differently concerning asthma control-related outcomes, which are used to assess the response to innovative therapies. This study demonstrates a potential limitation of the cluster analysis approach in the field of severe asthma. Copyright © 2014. Published by Elsevier Inc.

  14. PFG NMR and Bayesian analysis to characterise non-Newtonian fluids (United States)

    Blythe, Thomas W.; Sederman, Andrew J.; Stitt, E. Hugh; York, Andrew P. E.; Gladden, Lynn F.


    Many industrial flow processes are sensitive to changes in the rheological behaviour of process fluids, and there therefore exists a need for methods that provide online, or inline, rheological characterisation necessary for process control and optimisation over timescales of minutes or less. Nuclear magnetic resonance (NMR) offers a non-invasive technique for this application, without limitation on optical opacity. We present a Bayesian analysis approach using pulsed field gradient (PFG) NMR to enable estimation of the rheological parameters of Herschel-Bulkley fluids in a pipe flow geometry, characterised by a flow behaviour index n , yield stress τ0 , and consistency factor k , by analysis of the signal in q -space. This approach eliminates the need for velocity image acquisition and expensive gradient hardware. We investigate the robustness of the proposed Bayesian NMR approach to noisy data and reduced sampling using simulated NMR data and show that even with a signal-to-noise ratio (SNR) of 100, only 16 points are required to be sampled to provide rheological parameters accurate to within 2% of the ground truth. Experimental validation is provided through an experimental case study on Carbopol 940 solutions (model Herschel-Bulkley fluids) using PFG NMR at a 1H resonance frequency of 85.2 MHz; for SNR > 1000, only 8 points are required to be sampled. This corresponds to a total acquisition time of probably due to shear history-dependent behaviour and the different geometries used. This behaviour highlights the need for online, or inline, rheological characterisation in industrial process applications.

  15. A Bayesian-based multilevel factorial analysis method for analyzing parameter uncertainty of hydrological model (United States)

    Liu, Y. R.; Li, Y. P.; Huang, G. H.; Zhang, J. L.; Fan, Y. R.


    In this study, a Bayesian-based multilevel factorial analysis (BMFA) method is developed to assess parameter uncertainties and their effects on hydrological model responses. In BMFA, Differential Evolution Adaptive Metropolis (DREAM) algorithm is employed to approximate the posterior distributions of model parameters with Bayesian inference; factorial analysis (FA) technique is used for measuring the specific variations of hydrological responses in terms of posterior distributions to investigate the individual and interactive effects of parameters on model outputs. BMFA is then applied to a case study of the Jinghe River watershed in the Loess Plateau of China to display its validity and applicability. The uncertainties of four sensitive parameters, including soil conservation service runoff curve number to moisture condition II (CN2), soil hydraulic conductivity (SOL_K), plant available water capacity (SOL_AWC), and soil depth (SOL_Z), are investigated. Results reveal that (i) CN2 has positive effect on peak flow, implying that the concentrated rainfall during rainy season can cause infiltration-excess surface flow, which is an considerable contributor to peak flow in this watershed; (ii) SOL_K has positive effect on average flow, implying that the widely distributed cambisols can lead to medium percolation capacity; (iii) the interaction between SOL_AWC and SOL_Z has noticeable effect on the peak flow and their effects are dependent upon each other, which discloses that soil depth can significant influence the processes of plant uptake of soil water in this watershed. Based on the above findings, the significant parameters and the relationship among uncertain parameters can be specified, such that hydrological model's capability for simulating/predicting water resources of the Jinghe River watershed can be improved.

  16. From "weight of evidence" to quantitative data integration using multicriteria decision analysis and Bayesian methods. (United States)

    Linkov, Igor; Massey, Olivia; Keisler, Jeff; Rusyn, Ivan; Hartung, Thomas


    "Weighing" available evidence in the process of decision-making is unavoidable, yet it is one step that routinely raises suspicions: what evidence should be used, how much does it weigh, and whose thumb may be tipping the scales? This commentary aims to evaluate the current state and future roles of various types of evidence for hazard assessment as it applies to environmental health. In its recent evaluation of the US Environmental Protection Agency's Integrated Risk Information System assessment process, the National Research Council committee singled out the term "weight of evidence" (WoE) for critique, deeming the process too vague and detractive to the practice of evaluating human health risks of chemicals. Moving the methodology away from qualitative, vague and controversial methods towards generalizable, quantitative and transparent methods for appropriately managing diverse lines of evidence is paramount for both regulatory and public acceptance of the hazard assessments. The choice of terminology notwithstanding, a number of recent Bayesian WoE-based methods, the emergence of multi criteria decision analysis for WoE applications, as well as the general principles behind the foundational concepts of WoE, show promise in how to move forward and regain trust in the data integration step of the assessments. We offer our thoughts on the current state of WoE as a whole and while we acknowledge that many WoE applications have been largely qualitative and subjective in nature, we see this as an opportunity to turn WoE towards a quantitative direction that includes Bayesian and multi criteria decision analysis.

  17. BGX: a Bioconductor package for the Bayesian integrated analysis of Affymetrix GeneChips

    Directory of Open Access Journals (Sweden)

    Hein Anne-Mette K


    Full Text Available Abstract Background Affymetrix 3' GeneChip microarrays are widely used to profile the expression of thousands of genes simultaneously. They differ from many other microarray types in that GeneChips are hybridised using a single labelled extract and because they contain multiple 'match' and 'mismatch' sequences for each transcript. Most algorithms extract the signal from GeneChip experiments in a sequence of separate steps, including background correction and normalisation, which inhibits the simultaneous use of all available information. They principally provide a point estimate of gene expression and, in contrast to BGX, do not fully integrate the uncertainty arising from potentially heterogeneous responses of the probes. Results BGX is a new Bioconductor R package that implements an integrated Bayesian approach to the analysis of 3' GeneChip data. The software takes into account additive and multiplicative error, non-specific hybridisation and replicate summarisation in the spirit of the model outlined in 1. It also provides a posterior distribution for the expression of each gene. Moreover, BGX can take into account probe affinity effects from probe sequence information where available. The package employs a novel adaptive Markov chain Monte Carlo (MCMC algorithm that raises considerably the efficiency with which the posterior distributions are sampled from. Finally, BGX incorporates various ways to analyse the results, such as ranking genes by expression level as well as statistically based methods for estimating the amount of up and down regulated genes between two conditions. Conclusion BGX performs well relative to other widely used methods at estimating expression levels and fold changes. It has the advantage that it provides a statistically sound measure of uncertainty for its estimates. BGX includes various analysis functions to visualise and exploit the rich output that is produced by the Bayesian model.

  18. Analysis of Aspects of Innovation in a Brazilian Cluster

    Directory of Open Access Journals (Sweden)

    Adriana Valélia Saraceni


    Full Text Available Innovation through clustering has become very important on the increased significance that interaction represents on innovation and learning process concept. This study aims to identify whereas a case analysis on innovation process in a cluster represents on the learning process. Therefore, this study is developed in two stages. First, we used a preliminary case study verifying a cluster innovation analysis and it Innovation Index, for further, exploring a combined body of theory and practice. Further, the second stage is developed by exploring the learning process concept. Both stages allowed us building a theory model for the learning process development in clusters. The main results of the model development come up with a mechanism of improvement implementation on clusters when case studies are applied.

  19. Describing the homeless mentally ill: cluster analysis results. (United States)

    Mowbray, C T; Bybee, D; Cohen, E


    Presented descriptive data on a group of homeless, mentally ill individuals (N = 108) served by a two-site demonstration project, funded by NIMH. Comparing results with those from other studies of this population produced some differences and some similarities. Cluster analysis techniques were applied to the data, producing a 4-group solution. Data validating the cluster solution are presented. It is suggested that the cluster results provide a more meaningful and useful method of understanding the descriptive data. Results suggest that while the population of individuals served as homeless and mentally ill is quite heterogeneous, many have well-developed functioning skills--only one cluster, making up 35.2% of the sample, fits the stereotype of the aggressive, psychotic individual with skill deficits in many areas. Further discussion is presented concerning the implications of the cluster analysis results for demonstrating contextual effects and thus better interpreting research results from other studies and assisting in future services planning.

  20. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL


    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  1. Systemic antibiotics in the treatment of aggressive periodontitis. A systematic review and a Bayesian Network meta-analysis. (United States)

    Rabelo, Cleverton Correa; Feres, Magda; Gonçalves, Cristiane; Figueiredo, Luciene C; Faveri, Marcelo; Tu, Yu-Kang; Chambrone, Leandro


    The aim of this study was to assess the effect of systemic antibiotic therapy on the treatment of aggressive periodontitis (AgP). This study was conducted and reported in accordance with the PRISMA statement. The MEDLINE, EMBASE and CENTRAL databases were searched up to June 2014 for randomized clinical trials comparing the treatment of subjects with AgP with either scaling and root planing (SRP) alone or associated with systemic antibiotics. Bayesian network meta-analysis was prepared using the Bayesian random-effects hierarchical models and the outcomes reported at 6-month post-treatment. Out of 350 papers identified, 14 studies were eligible. Greater gain in clinical attachment (CA) (mean difference [MD]: 1.08 mm; p < 0.0001) and reduction in probing depth (PD) (MD: 1.05 mm; p < 0.00001) were observed for SRP + metronidazole (Mtz), and for SRP + Mtz + amoxicillin (Amx) (MD: 0.45 mm, MD: 0.53 mm, respectively; p < 0.00001) than SRP alone/placebo. Bayesian network meta-analysis showed additional benefits in CA gain and PD reduction when SRP was associated with systemic antibiotics. SRP plus systemic antibiotics led to an additional clinical effect compared with SRP alone in the treatment of AgP. Of the antibiotic protocols available for inclusion into the Bayesian network meta-analysis, Mtz and Mtz/Amx provided to the most beneficial outcomes. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  2. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis. (United States)

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost


    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  3. Bayesian analysis of risk associated with workplace accidents in earthmoving operations

    Directory of Open Access Journals (Sweden)

    J. F. García


    Full Text Available This paper analyses the characteristics of earthmoving operations involving a workplace accident. Bayesian networks were used to identify the factors that best predicted potential risk situations. Inference studies were then conducted to analyse the interplay between different risk factors. We demonstrate the potential of Bayesian networks to describe workplace contexts and predict risk situations from a safety and production planning perspective.

  4. A Bayesian Approach to Person Fit Analysis in Item Response Theory Models. Research Report. (United States)

    Glas, Cees A. W.; Meijer, Rob R.

    A Bayesian approach to the evaluation of person fit in item response theory (IRT) models is presented. In a posterior predictive check, the observed value on a discrepancy variable is positioned in its posterior distribution. In a Bayesian framework, a Markov Chain Monte Carlo procedure can be used to generate samples of the posterior distribution…

  5. Spectral analysis of the IntCal98 calibration curve: a Bayesian view

    International Nuclear Information System (INIS)

    Palonen, V.; Tikkanen, P.


    Preliminary results from a Bayesian approach to find periodicities in the IntCal98 calibration curve are given. It has been shown in the literature that the discrete Fourier transform (Schuster periodogram) corresponds to the use of an approximate Bayesian model of one harmonic frequency and Gaussian noise. Advantages of the Bayesian approach include the possibility to use models for variable, attenuated and multiple frequencies, the capability to analyze unevenly spaced data and the possibility to assess the significance and uncertainties of spectral estimates. In this work, a new Bayesian model using random walk noise to take care of the trend in the data is developed. Both Bayesian models are described and the first results of the new model are reported and compared with results from straightforward discrete-Fourier-transform and maximum-entropy-method spectral analyses

  6. A semi-parametric Bayesian model for unsupervised differential co-expression analysis

    Directory of Open Access Journals (Sweden)

    Medvedovic Mario


    Full Text Available Abstract Background Differential co-expression analysis is an emerging strategy for characterizing disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples, such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples. Results We developed a novel probabilistic framework for jointly uncovering contexts (i.e. groups of samples with specific co-expression patterns, and groups of genes with different co-expression patterns across such contexts. In contrast to current clustering and bi-clustering procedures, the implicit similarity measure in this model used for grouping biological samples is based on the clustering structure of genes within each sample and not on traditional measures of gene expression level similarities. Within this framework, biological samples with widely discordant expression patterns can be placed in the same context as long as the co-clustering structure of genes is concordant within these samples. To the best of our knowledge, this is the first method to date for unsupervised differential co-expression analysis in this generality. When applied to the problem of identifying molecular subtypes of breast cancer, our method identified reproducible patterns of differential co-expression across several independent expression datasets. Sample groupings induced by these patterns were highly informative of the disease outcome. Expression patterns of differentially co-expressed genes provided new insights into the complex nature of the ERα regulatory network. Conclusions We demonstrated that the use of the co-clustering structure as the similarity measure in the unsupervised analysis of sample gene expression profiles provides valuable information about expression regulatory networks.

  7. Application of dynamic Bayesian network to risk analysis of domino effects in chemical infrastructures

    International Nuclear Information System (INIS)

    Khakzad, Nima


    A domino effect is a low frequency high consequence chain of accidents where a primary accident (usually fire and explosion) in a unit triggers secondary accidents in adjacent units. High complexity and growing interdependencies of chemical infrastructures make them increasingly vulnerable to domino effects. Domino effects can be considered as time dependent processes. Thus, not only the identification of involved units but also their temporal entailment in the chain of accidents matter. More importantly, in the case of domino-induced fires which can generally last much longer compared to explosions, foreseeing the temporal evolution of domino effects and, in particular, predicting the most probable sequence of accidents (or involved units) in a domino effect can be of significance in the allocation of preventive and protective safety measures. Although many attempts have been made to identify the spatial evolution of domino effects, the temporal evolution of such accidents has been overlooked. We have proposed a methodology based on dynamic Bayesian network to model both the spatial and temporal evolutions of domino effects and also to quantify the most probable sequence of accidents in a potential domino effect. The application of the developed methodology has been demonstrated via a hypothetical fuel storage plant. - Highlights: • A Dynamic Bayesian Network methodology has been developed to model domino effects. • Considering time-dependencies, both spatial and temporal evolutions of domino effects have been modeled. • The concept of most probable sequence of accidents has been proposed instead of the most probable combination of accidents. • Using backward analysis, the most vulnerable units have been identified during a potential domino effect. • The proposed methodology does not need to identify a unique primary unit (accident) for domino effect modeling

  8. Gamma prior distribution selection for Bayesian analysis of failure rate and reliability

    International Nuclear Information System (INIS)

    Waler, R.A.; Johnson, M.M.; Waterman, M.S.; Martz, H.F. Jr.


    It is assumed that the phenomenon under study is such that the time-to-failure may be modeled by an exponential distribution with failure-rate parameter, lambda. For Bayesian analyses of the assumed model, the family of gamma distributions provides conjugate prior models for lambda. Thus, an experimenter needs to select a particular gamma model to conduct a Bayesian reliability analysis. The purpose of this paper is to present a methodology which can be used to translate engineering information, experience, and judgment into a choice of a gamma prior distribution. The proposed methodology assumes that the practicing engineer can provide percentile data relating to either the failure rate or the reliability of the phenomenon being investigated. For example, the methodology will select the gamma prior distribution which conveys an engineer's belief that the failure rate, lambda, simultaneously satisfies the probability statements, P(lambda less than 1.0 x 10 -3 ) = 0.50 and P(lambda less than 1.0 x 10 -5 ) = 0.05. That is, two percentiles provided by an engineer are used to determine a gamma prior model which agrees with the specified percentiles. For those engineers who prefer to specify reliability percentiles rather than the failure-rate percentiles illustrated above, one can use the induced negative-log gamma prior distribution which satisfies the probability statements, P(R(t 0 ) less than 0.99) = 0.50 and P(R(t 0 ) less than 0.99999) = 0.95 for some operating time t 0 . Also, the paper includes graphs for selected percentiles which assist an engineer in applying the methodology

  9. Bayesian analysis of data and model error in rainfall-runoff hydrological models (United States)

    Kavetski, D.; Franks, S. W.; Kuczera, G.


    A major unresolved issue in the identification and use of conceptual hydrologic models is realistic description of uncertainty in the data and model structure. In particular, hydrologic parameters often cannot be measured directly and must be inferred (calibrated) from observed forcing/response data (typically, rainfall and runoff). However, rainfall varies significantly in space and time, yet is often estimated from sparse gauge networks. Recent work showed that current calibration methods (e.g., standard least squares, multi-objective calibration, generalized likelihood uncertainty estimation) ignore forcing uncertainty and assume that the rainfall is known exactly. Consequently, they can yield strongly biased and misleading parameter estimates. This deficiency confounds attempts to reliably test model hypotheses, to generalize results across catchments (the regionalization problem) and to quantify predictive uncertainty when the hydrologic model is extrapolated. This paper continues the development of a Bayesian total error analysis (BATEA) methodology for the calibration and identification of hydrologic models, which explicitly incorporates the uncertainty in both the forcing and response data, and allows systematic model comparison based on residual model errors and formal Bayesian hypothesis testing (e.g., using Bayes factors). BATEA is based on explicit stochastic models for both forcing and response uncertainty, whereas current techniques focus solely on response errors. Hence, unlike existing methods, the BATEA parameter equations directly reflect the modeler's confidence in all the data. We compare several approaches to approximating the parameter distributions: a) full Markov Chain Monte Carlo methods and b) simplified approaches based on linear approximations. Studies using synthetic and real data from the US and Australia show that BATEA systematically reduces the parameter bias, leads to more meaningful model fits and allows model comparison taking

  10. Bias correction and Bayesian analysis of aggregate counts in SAGE libraries

    Directory of Open Access Journals (Sweden)

    Briggs William M


    Full Text Available Abstract Background Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power. Results Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context. Conclusions Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.

  11. Bayesian adaptive methods for clinical trials

    CERN Document Server

    Berry, Scott M; Muller, Peter


    Already popular in the analysis of medical device trials, adaptive Bayesian designs are increasingly being used in drug development for a wide variety of diseases and conditions, from Alzheimer's disease and multiple sclerosis to obesity, diabetes, hepatitis C, and HIV. Written by leading pioneers of Bayesian clinical trial designs, Bayesian Adaptive Methods for Clinical Trials explores the growing role of Bayesian thinking in the rapidly changing world of clinical trial analysis. The book first summarizes the current state of clinical trial design and analysis and introduces the main ideas and potential benefits of a Bayesian alternative. It then gives an overview of basic Bayesian methodological and computational tools needed for Bayesian clinical trials. With a focus on Bayesian designs that achieve good power and Type I error, the next chapters present Bayesian tools useful in early (Phase I) and middle (Phase II) clinical trials as well as two recent Bayesian adaptive Phase II studies: the BATTLE and ISP...

  12. Cluster analysis of typhoid cases in Kota Bharu, Kelantan, Malaysia

    Directory of Open Access Journals (Sweden)

    Nazarudin Safian


    Full Text Available Typhoid fever is still a major public health problem globally as well as in Malaysia. This study was done to identify the spatial epidemiology of typhoid fever in the Kota Bharu District of Malaysia as a first step to developing more advanced analysis of the whole country. The main characteristic of the epidemiological pattern that interested us was whether typhoid cases occurred in clusters or whether they were evenly distributed throughout the area. We also wanted to know at what spatial distances they were clustered. All confirmed typhoid cases that were reported to the Kota Bharu District Health Department from the year 2001 to June of 2005 were taken as the samples. From the home address of the cases, the location of the house was traced and a coordinate was taken using handheld GPS devices. Spatial statistical analysis was done to determine the distribution of typhoid cases, whether clustered, random or dispersed. The spatial statistical analysis was done using CrimeStat III software to determine whether typhoid cases occur in clusters, and later on to determine at what distances it clustered. From 736 cases involved in the study there was significant clustering for cases occurring in the years 2001, 2002, 2003 and 2005. There was no significant clustering in year 2004. Typhoid clustering also occurred strongly for distances up to 6 km. This study shows that typhoid cases occur in clusters, and this method could be applicable to describe spatial epidemiology for a specific area. (Med J Indones 2008; 17: 175-82Keywords: typhoid, clustering, spatial epidemiology, GIS

  13. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis (United States)

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.


    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  14. Dirichlet Process Parsimonious Mixtures for clustering


    Chamroukhi, Faicel; Bartcus, Marius; Glotin, Hervé


    The parsimonious Gaussian mixture models, which exploit an eigenvalue decomposition of the group covariance matrices of the Gaussian mixture, have shown their success in particular in cluster analysis. Their estimation is in general performed by maximum likelihood estimation and has also been considered from a parametric Bayesian prospective. We propose new Dirichlet Process Parsimonious mixtures (DPPM) which represent a Bayesian nonparametric formulation of these parsimonious Gaussian mixtur...

  15. A Bayesian network meta-analysis of whole brain radiotherapy and stereotactic radiotherapy for brain metastasis. (United States)

    Yuan, Xi; Liu, Wen-Jie; Li, Bing; Shen, Ze-Tian; Shen, Jun-Shu; Zhu, Xi-Xu


    This study was conducted to compare the effects of whole brain radiotherapy (WBRT) and stereotactic radiotherapy (SRS) in treatment of brain metastasis.A systematical retrieval in PubMed and Embase databases was performed for relative literatures on the effects of WBRT and SRS in treatment of brain metastasis. A Bayesian network meta-analysis was performed by using the ADDIS software. The effect sizes included odds ratio (OR) and 95% confidence interval (CI). A random effects model was used for the pooled analysis for all the outcome measures, including 1-year distant control rate, 1-year local control rate, 1-year survival rate, and complication. The consistency was tested by using node-splitting analysis and inconsistency standard deviation. The convergence was estimated according to the Brooks-Gelman-Rubin method.A total of 12 literatures were included in this meta-analysis. WBRT + SRS showed higher 1-year distant control rate than SRS. WBRT + SRS was better for the 1-year local control rate than WBRT. SRS and WBRT + SRS had higher 1-year survival rate than the WBRT. In addition, there was no difference in complication among the three therapies.Comprehensively, WBRT + SRS might be the choice of treatment for brain metastasis.

  16. Comparative analysis of genomic signal processing for microarray data clustering. (United States)

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe


    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  17. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li


    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  18. A Distributed Flocking Approach for Information Stream Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL


    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  19. The Phylogeographic History of the New World Screwworm Fly, Inferred by Approximate Bayesian Computation Analysis (United States)

    Azeredo-Espin, Ana Maria L.


    Insect pest phylogeography might be shaped both by biogeographic events and by human influence. Here, we conducted an approximate Bayesian computation (ABC) analysis to investigate the phylogeography of the New World screwworm fly, Cochliomyia hominivorax, with the aim of understanding its population history and its order and time of divergence. Our ABC analysis supports that populations spread from North to South in the Americas, in at least two different moments. The first split occurred between the North/Central American and South American populations in the end of the Last Glacial Maximum (15,300-19,000 YBP). The second split occurred between the North and South Amazonian populations in the transition between the Pleistocene and the Holocene eras (9,100-11,000 YBP). The species also experienced population expansion. Phylogenetic analysis likewise suggests this north to south colonization and Maxent models suggest an increase in the number of suitable areas in South America from the past to present. We found that the phylogeographic patterns observed in C. hominivorax cannot be explained only by climatic oscillations and can be connected to host population histories. Interestingly we found these patterns are very coincident with general patterns of ancient human movements in the Americas, suggesting that humans might have played a crucial role in shaping the distribution and population structure of this insect pest. This work presents the first hypothesis test regarding the processes that shaped the current phylogeographic structure of C. hominivorax and represents an alternate perspective on investigating the problem of insect pests. PMID:24098436

  20. Analysis of human and organizational factors that influence mining accidents based on Bayesian network. (United States)

    Mirzaei Aliabadi, Mostafa; Aghaei, Hamed; Kalatpour, Omid; Soltanian, Ali Reza; Nikravesh, Asghar


    The present study was aimed to analyze human and organizational factors involved in mining accidents and determine the relationships among these factors. In this study, Human Factors Analysis and Classification System (HFACS) with Bayesian network (BN) were combined in order to analyze contributing factors in mining accidents. BN was constructed based on a hierarchal structure of HFACS. The required data were collected from a total of 295 cases of Iranian mining accidents and analyzed using HFACS. Afterwards, prior probability of contributing factors was computed using the expectation-maximization algorithm. Sensitivity analysis was applied to determine which contributing factor had a higher influence on unsafe acts to select the best intervention strategy. The analyses showed that skill based errors, routine violations, environmental factors, and planned inappropriate operation had a higher relative importance in the accidents. Moreover, sensitivity analysis revealed that environmental factors, failed to correct known problem, and personnel factors had a higher influence on unsafe acts. The results of the present study could provide guidance to help safety and health management by adopting proper intervention strategies to reduce mining accidents.

  1. Integrated data analysis of fusion diagnostics by means of the Bayesian probability theory

    International Nuclear Information System (INIS)

    Fischer, R.; Dinklage, A.


    Integrated data analysis (IDA) of fusion diagnostics is the combination of heterogeneous diagnostics to obtain validated physical results. Benefits from the integrated approach result from a systematic use of interdependencies; in that sense IDA optimizes the extraction of information from sets of different data. For that purpose IDA requires a systematic and formalized error analysis of all (statistical and systematic) uncertainties involved in each diagnostic. Bayesian probability theory allows for a systematic combination of all information entering the diagnostic model by considering all uncertainties of the measured data, the calibration measurements, and the physical model. Prior physics knowledge on model parameters can be included. Handling of systematic errors is provided. A central goal of the integration of redundant or complementary diagnostics is to provide information to resolve inconsistencies by exploiting interdependencies. A comparable analysis of sets of diagnostics (meta-diagnostics) is performed by combining statistical and systematical uncertainties with model parameters and model uncertainties. Diagnostics improvement and experimental optimization and design of meta-diagnostics will be discussed

  2. Efficient Methods for Bayesian Uncertainty Analysis and Global Optimization of Computationally Expensive Environmental Models (United States)

    Shoemaker, Christine; Espinet, Antoine; Pang, Min


    Models of complex environmental systems can be computationally expensive in order to describe the dynamic interactions of the many components over a sizeable time period. Diagnostics of these systems can include forward simulations of calibrated models under uncertainty and analysis of alternatives of systems management. This discussion will focus on applications of new surrogate optimization and uncertainty analysis methods to environmental models that can enhance our ability to extract information and understanding. For complex models, optimization and especially uncertainty analysis can require a large number of model simulations, which is not feasible for computationally expensive models. Surrogate response surfaces can be used in Global Optimization and Uncertainty methods to obtain accurate answers with far fewer model evaluations, which made the methods practical for computationally expensive models for which conventional methods are not feasible. In this paper we will discuss the application of the SOARS surrogate method for estimating Bayesian posterior density functions for model parameters for a TOUGH2 model of geologic carbon sequestration. We will also briefly discuss new parallel surrogate global optimization algorithm applied to two groundwater remediation sites that was implemented on a supercomputer with up to 64 processors. The applications will illustrate the use of these methods to predict the impact of monitoring and management on subsurface contaminants.

  3. Bayesian Inference for Neural Electromagnetic Source Localization: Analysis of MEG Visual Evoked Activity

    International Nuclear Information System (INIS)

    George, J.S.; Schmidt, D.M.; Wood, C.C.


    We have developed a Bayesian approach to the analysis of neural electromagnetic (MEG/EEG) data that can incorporate or fuse information from other imaging modalities and addresses the ill-posed inverse problem by sarnpliig the many different solutions which could have produced the given data. From these samples one can draw probabilistic inferences about regions of activation. Our source model assumes a variable number of variable size cortical regions of stimulus-correlated activity. An active region consists of locations on the cortical surf ace, within a sphere centered on some location in cortex. The number and radi of active regions can vary to defined maximum values. The goal of the analysis is to determine the posterior probability distribution for the set of parameters that govern the number, location, and extent of active regions. Markov Chain Monte Carlo is used to generate a large sample of sets of parameters distributed according to the posterior distribution. This sample is representative of the many different source distributions that could account for given data, and allows identification of probable (i.e. consistent) features across solutions. Examples of the use of this analysis technique with both simulated and empirical MEG data are presented

  4. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis (United States)

    Down, Thomas A.; Rakyan, Vardhman K.; Turner, Daniel J.; Flicek, Paul; Li, Heng; Kulesha, Eugene; Gräf, Stefan; Johnson, Nathan; Herrero, Javier; Tomazou, Eleni M.; Thorne, Natalie P.; Bäckdahl, Liselotte; Herberth, Marlis; Howe, Kevin L.; Jackson, David K.; Miretti, Marcos M.; Marioni, John C.; Birney, Ewan; Hubbard, Tim J. P.; Durbin, Richard; Tavaré, Simon; Beck, Stephan


    DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation. PMID:18612301

  5. Combining data and meta-analysis to build Bayesian networks for clinical decision support. (United States)

    Yet, Barbaros; Perkins, Zane B; Rasmussen, Todd E; Tai, Nigel R M; Marsh, D William R


    Complex clinical decisions require the decision maker to evaluate multiple factors that may interact with each other. Many clinical studies, however, report 'univariate' relations between a single factor and outcome. Such univariate statistics are often insufficient to provide useful support for complex clinical decisions even when they are pooled using meta-analysis. More useful decision support could be provided by evidence-based models that take the interaction between factors into account. In this paper, we propose a method of integrating the univariate results of a meta-analysis with a clinical dataset and expert knowledge to construct multivariate Bayesian network (BN) models. The technique reduces the size of the dataset needed to learn the parameters of a model of a given complexity. Supplementing the data with the meta-analysis results avoids the need to either simplify the model - ignoring some complexities of the problem - or to gather more data. The method is illustrated by a clinical case study into the prediction of the viability of severely injured lower extremities. The case study illustrates the advantages of integrating combined evidence into BN development: the BN developed using our method outperformed four different data-driven structure learning methods, and a well-known scoring model (MESS) in this domain. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Bayesian reliability analysis for non-periodic inspection with estimation of uncertain parameters; Bayesian shinraisei kaiseki wo tekiyoshita hiteiki kozo kensa ni kansuru kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    Itagaki, H. [Yokohama National University, Yokohama (Japan). Faculty of Engineering; Asada, H.; Ito, S. [National Aerospace Laboratory, Tokyo (Japan); Shinozuka, M.


    Risk assessed structural positions in a pressurized fuselage of a transport-type aircraft applied with damage tolerance design are taken up as the subject of discussion. A small number of data obtained from inspections on the positions was used to discuss the Bayesian reliability analysis that can estimate also a proper non-periodic inspection schedule, while estimating proper values for uncertain factors. As a result, time period of generating fatigue cracks was determined according to procedure of detailed visual inspections. The analysis method was found capable of estimating values that are thought reasonable and the proper inspection schedule using these values, in spite of placing the fatigue crack progress expression in a very simple form and estimating both factors as the uncertain factors. Thus, the present analysis method was verified of its effectiveness. This study has discussed at the same time the structural positions, modeling of fatigue cracks generated and develop in the positions, conditions for destruction, damage factors, and capability of the inspection from different viewpoints. This reliability analysis method is thought effective also on such other structures as offshore structures. 18 refs., 8 figs., 1 tab.

  7. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  8. Bayesian Analysis Made Simple An Excel GUI for WinBUGS

    CERN Document Server

    Woodward, Philip


    From simple NLMs to complex GLMMs, this book describes how to use the GUI for WinBUGS - BugsXLA - an Excel add-in written by the author that allows a range of Bayesian models to be easily specified. With case studies throughout, the text shows how to routinely apply even the more complex aspects of model specification, such as GLMMs, outlier robust models, random effects Emax models, auto-regressive errors, and Bayesian variable selection. It provides brief, up-to-date discussions of current issues in the practical application of Bayesian methods. The author also explains how to obtain free so

  9. Statistical analysis of modal parameters of a suspension bridge based on Bayesian spectral density approach and SHM data (United States)

    Li, Zhijun; Feng, Maria Q.; Luo, Longxi; Feng, Dongming; Xu, Xiuli


    Uncertainty of modal parameters estimation appear in structural health monitoring (SHM) practice of civil engineering to quite some significant extent due to environmental influences and modeling errors. Reasonable methodologies are needed for processing the uncertainty. Bayesian inference can provide a promising and feasible identification solution for the purpose of SHM. However, there are relatively few researches on the application of Bayesian spectral method in the modal identification using SHM data sets. To extract modal parameters from large data sets collected by SHM system, the Bayesian spectral density algorithm was applied to address the uncertainty of mode extraction from output-only response of a long-span suspension bridge. The posterior most possible values of modal parameters and their uncertainties were estimated through Bayesian inference. A long-term variation and statistical analysis was performed using the sensor data sets collected from the SHM system of the suspension bridge over a one-year period. The t location-scale distribution was shown to be a better candidate function for frequencies of lower modes. On the other hand, the burr distribution provided the best fitting to the higher modes which are sensitive to the temperature. In addition, wind-induced variation of modal parameters was also investigated. It was observed that both the damping ratios and modal forces increased during the period of typhoon excitations. Meanwhile, the modal damping ratios exhibit significant correlation with the spectral intensities of the corresponding modal forces.

  10. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Årup; Frutiger, Sally A.


    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing. Hum. Brain Mapping 15...

  11. Bayesian change-point analysis reveals developmental change in a classic theory of mind task. (United States)

    Baker, Sara T; Leslie, Alan M; Gallistel, C R; Hood, Bruce M


    Although learning and development reflect changes situated in an individual brain, most discussions of behavioral change are based on the evidence of group averages. Our reliance on group-averaged data creates a dilemma. On the one hand, we need to use traditional inferential statistics. On the other hand, group averages are highly ambiguous when we need to understand change in the individual; the average pattern of change may characterize all, some, or none of the individuals in the group. Here we present a new method for statistically characterizing developmental change in each individual child we study. Using false-belief tasks, fifty-two children in two cohorts were repeatedly tested for varying lengths of time between 3 and 5 years of age. Using a novel Bayesian change point analysis, we determined both the presence and-just as importantly-the absence of change in individual longitudinal cumulative records. Whenever the analysis supports a change conclusion, it identifies in that child's record the most likely point at which change occurred. Results show striking variability in patterns of change and stability across individual children. We then group the individuals by their various patterns of change or no change. The resulting patterns provide scarce support for sudden changes in competence and shed new light on the concepts of "passing" and "failing" in developmental studies. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  12. Sustainable Technology Analysis of Artificial Intelligence Using Bayesian and Social Network Models

    Directory of Open Access Journals (Sweden)

    Juhwan Kim


    Full Text Available Recent developments in artificial intelligence (AI have led to a significant increase in the use of AI technologies. Many experts are researching and developing AI technologies in their respective fields, often submitting papers and patent applications as a result. In particular, owing to the characteristics of the patent system that is used to protect the exclusive rights to registered technology, patent documents contain detailed information on the developed technology. Therefore, in this study, we propose a statistical method for analyzing patent data on AI technology to improve our understanding of sustainable technology in the field of AI. We collect patent documents that are related to AI technology, and then analyze the patent data to identify sustainable AI technology. In our analysis, we develop a statistical method that combines social network analysis and Bayesian modeling. Based on the results of the proposed method, we provide a technological structure that can be applied to understand the sustainability of AI technology. To show how the proposed method can be applied to a practical problem, we apply the technological structure to a case study in order to analyze sustainable AI technology.

  13. Bayesian Analysis for Dynamic Generalized Linear Latent Model with Application to Tree Survival Rate

    Directory of Open Access Journals (Sweden)

    Yu-sheng Cheng


    Full Text Available Logistic regression model is the most popular regression technique, available for modeling categorical data especially for dichotomous variables. Classic logistic regression model is typically used to interpret relationship between response variables and explanatory variables. However, in real applications, most data sets are collected in follow-up, which leads to the temporal correlation among the data. In order to characterize the different variables correlations, a new method about the latent variables is introduced in this study. At the same time, the latent variables about AR (1 model are used to depict time dependence. In the framework of Bayesian analysis, parameters estimates and statistical inferences are carried out via Gibbs sampler with Metropolis-Hastings (MH algorithm. Model comparison, based on the Bayes factor, and forecasting/smoothing of the survival rate of the tree are established. A simulation study is conducted to assess the performance of the proposed method and a pika data set is analyzed to illustrate the real application. Since Bayes factor approaches vary significantly, efficiency tests have been performed in order to decide which solution provides a better tool for the analysis of real relational data sets.

  14. Single molecule force spectroscopy at high data acquisition: A Bayesian nonparametric analysis (United States)

    Sgouralis, Ioannis; Whitmore, Miles; Lapidus, Lisa; Comstock, Matthew J.; Pressé, Steve


    Bayesian nonparametrics (BNPs) are poised to have a deep impact in the analysis of single molecule data as they provide posterior probabilities over entire models consistent with the supplied data, not just model parameters of one preferred model. Thus they provide an elegant and rigorous solution to the difficult problem encountered when selecting an appropriate candidate model. Nevertheless, BNPs' flexibility to learn models and their associated parameters from experimental data is a double-edged sword. Most importantly, BNPs are prone to increasing the complexity of the estimated models due to artifactual features present in time traces. Thus, because of experimental challenges unique to single molecule methods, naive application of available BNP tools is not possible. Here we consider traces with time correlations and, as a specific example, we deal with force spectroscopy traces collected at high acquisition rates. While high acquisition rates are required in order to capture dwells in short-lived molecular states, in this setup, a slow response of the optical trap instrumentation (i.e., trapped beads, ambient fluid, and tethering handles) distorts the molecular signals introducing time correlations into the data that may be misinterpreted as true states by naive BNPs. Our adaptation of BNP tools explicitly takes into consideration these response dynamics, in addition to drift and noise, and makes unsupervised time series analysis of correlated single molecule force spectroscopy measurements possible, even at acquisition rates similar to or below the trap's response times.


    Stingo, Francesco C.; Vannucci, Marina; Downey, Gerard


    Discriminant analysis is an effective tool for the classification of experimental units into groups. When the number of variables is much larger than the number of observations it is necessary to include a dimension reduction procedure into the inferential process. Here we present a typical example from chemometrics that deals with the classification of different types of food into species via near infrared spectroscopy. We take a nonparametric approach by modeling the functional predictors via wavelet transforms and then apply discriminant analysis in the wavelet domain. We consider a Bayesian conjugate normal discriminant model, either linear or quadratic, that avoids independence assumptions among the wavelet coefficients. We introduce latent binary indicators for the selection of the discriminatory wavelet coefficients and propose prior formulations that use Markov random tree (MRT) priors to map scale-location connections among wavelets coefficients. We conduct posterior inference via MCMC methods, we show performances on our case study on food authenticity and compare results to several other procedures.. PMID:24761126

  16. Estimation of a quantity of interest in uncertainty analysis: Some help from Bayesian decision theory

    International Nuclear Information System (INIS)

    Pasanisi, Alberto; Keller, Merlin; Parent, Eric


    In the context of risk analysis under uncertainty, we focus here on the problem of estimating a so-called quantity of interest of an uncertainty analysis problem, i.e. a given feature of the probability distribution function (pdf) of the output of a deterministic model with uncertain inputs. We will stay here in a fully probabilistic setting. A common problem is how to account for epistemic uncertainty tainting the parameter of the probability distribution of the inputs. In the standard practice, this uncertainty is often neglected (plug-in approach). When a specific uncertainty assessment is made, under the basis of the available information (expertise and/or data), a common solution consists in marginalizing the joint distribution of both observable inputs and parameters of the probabilistic model (i.e. computing the predictive pdf of the inputs), then propagating it through the deterministic model. We will reinterpret this approach in the light of Bayesian decision theory, and will put into evidence that this practice leads the analyst to adopt implicitly a specific loss function which may be inappropriate for the problem under investigation, and suboptimal from a decisional perspective. These concepts are illustrated on a simple numerical example, concerning a case of flood risk assessment.

  17. Bayesian Analysis of Evolutionary Divergence with Genomic Data under Diverse Demographic Models. (United States)

    Chung, Yujin; Hey, Jody


    We present a new Bayesian method for estimating demographic and phylogenetic history using population genomic data. Several key innovations are introduced that allow the study of diverse models within an Isolation-with-Migration framework. The new method implements a 2-step analysis, with an initial Markov chain Monte Carlo (MCMC) phase that samples simple coalescent trees, followed by the calculation of the joint posterior density for the parameters of a demographic model. In step 1, the MCMC sampling phase, the method uses a reduced state space, consisting of coalescent trees without migration paths, and a simple importance sampling distribution without the demography of interest. Once obtained, a single sample of trees can be used in step 2 to calculate the joint posterior density for model parameters under multiple diverse demographic models, without having to repeat MCMC runs. Because migration paths are not included in the state space of the MCMC phase, but rather are handled by analytic integration in step 2 of the analysis, the method is scalable to a large number of loci with excellent MCMC mixing properties. With an implementation of the new method in the computer program MIST, we demonstrate the method's accuracy, scalability, and other advantages using simulated data and DNA sequences of two common chimpanzee subspecies: Pan troglodytes (P. t.) troglodytes and P. t. verus. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail:

  18. Development of small scale cluster computer for numerical analysis (United States)

    Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.


    In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.

  19. Fuzzy clustering analysis to study geomagnetic coastal effects

    Directory of Open Access Journals (Sweden)

    M. Sridharan


    Full Text Available The utility of fuzzy set theory in cluster analysis and pattern recognition has been evolving since the mid 1960s, in conjunction with the emergence and evolution of computer technology. The classification of objects into categories is the subject of cluster analysis. The aim of this paper is to employ Fuzzy-clustering technique to examine the interrelationship of geomagnetic coastal and other effects at Indian observatories. Data from the observatories used for the present studies are from Alibag on the West Coast, Visakhapatnam and Pondicherry on the East Coast, Hyderabad and Nagpur as central inland stations which are located far from either of the coasts; all the above stations are free from the influence of the daytime equatorial electrojet. It has been found that Alibag and Pondicherry Observatories form a separate cluster showing anomalous variations in the vertical (Z-component. H- and D-components form different clusters. The results are compared with the graphical method. Analytical technique and the results of Fuzzy-clustering analysis are discussed here.

  20. Bayesian Analysis Diagnostics: Diagnosing Predictive and Parameter Uncertainty for Hydrological Models (United States)

    Thyer, Mark; Kavetski, Dmitri; Evin, Guillaume; Kuczera, George; Renard, Ben; McInerney, David


    All scientific and statistical analysis, particularly in natural sciences, is based on approximations and assumptions. For example, the calibration of hydrological models using approaches such as Nash-Sutcliffe efficiency and/or simple least squares (SLS) objective functions may appear to be 'assumption-free'. However, this is a naïve point of view, as SLS assumes that the model residuals (residuals=observed-predictions) are independent, homoscedastic and Gaussian. If these assumptions are poor, parameter inference and model predictions will be correspondingly poor. An essential step in model development is therefore to verify the assumptions and approximations made in the modeling process. Diagnostics play a key role in verifying modeling assumptions. An important advantage of the formal Bayesian approach is that the modeler is required to make the assumptions explicit. Specialized diagnostics can then be developed and applied to test and verify their assumptions. This paper presents a suite of statistical and modeling diagnostics that can be used by environmental modelers to test their modeling calibration assumptions and diagnose model deficiencies. Three major types of diagnostics are presented: Residual Diagnostics Residual diagnostics are used to test whether the assumptions of the residual error model within the likelihood function are compatible with the data. This includes testing for statistical independence, homoscedasticity, unbiasedness, Gaussianity and any distributional assumptions. Parameter Uncertainty and MCMC Diagnostics An important part of Bayesian analysis is assess parameter uncertainty. Markov Chain Monte Carlo (MCMC) methods are a powerful numerical tool for estimating these uncertainties. Diagnostics based on posterior parameter distributions can be used to assess parameter identifiability, interactions and correlations. This provides a very useful tool for detecting and remedying model deficiencies. In addition, numerical diagnostics are

  1. Wavelet-Based Bayesian Methods for Image Analysis and Automatic Target Recognition

    National Research Council Canada - National Science Library

    Nowak, Robert


    .... We have developed two new techniques. First, we have develop a wavelet-based approach to image restoration and deconvolution problems using Bayesian image models and an alternating-maximation method...

  2. Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003-2012. (United States)

    Khan, Diba; Rossen, Lauren M; Hamilton, Brady E; He, Yulei; Wei, Rong; Dienes, Erin


    Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003-2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. Published by Elsevier Ltd.

  3. Using ICD for structural analysis of clusters: a case study on NeAr clusters (United States)

    Fasshauer, E.; Förstel, M.; Pallmann, S.; Pernpointner, M.; Hergenhahn, U.


    We present a method to utilize interatomic Coulombic decay (ICD) to retrieve information about the mean geometric structures of heteronuclear clusters. It is based on observation and modelling of competing ICD channels, which involve the same initial vacancy, but energetically different final states with vacancies in different components of the cluster. Using binary rare gas clusters of Ne and Ar as an example, we measure the relative intensity of ICD into (Ne+)2 and Ne+Ar+ final states with spectroscopically well separated ICD peaks. We compare in detail the experimental ratios of the Ne-Ne and Ne-Ar ICD contributions and their positions and widths to values calculated for a diverse set of possible structures. We conclude that NeAr clusters exhibit a core-shell structure with an argon core surrounded by complete neon shells and, possibly, further an incomplete shell of neon atoms for the experimental conditions investigated. Our analysis allows one to differentiate between clusters of similar size and stochiometric Ar content, but different internal structure. We find evidence for ICD of Ne 2s-1, producing Ar+ vacancies in the second coordination shell of the initial site.

  4. Bus Route Design with a Bayesian Network Analysis of Bus Service Revenues


    Liu, Yi; Jia, Yuanhua; Feng, Xuesong; Wu, Jiang


    A Bayesian network is used to estimate revenues of bus services in consideration of the effect of bus travel demands, passenger transport distances, and so on. In this research, the area X in Beijing has been selected as the study area because of its relatively high bus travel demand and, on the contrary, unsatisfactory bus services. It is suggested that the proposed Bayesian network approach is able to rationally predict the probabilities of different revenues of various route services, from...

  5. A Bayesian network meta-analysis on second-line systemic therapy in advanced gastric cancer. (United States)

    Zhu, Xiaofu; Ko, Yoo-Joung; Berry, Scott; Shah, Keya; Lee, Esther; Chan, Kelvin


    It is unclear which regimen is the most efficacious among the available therapies for advanced gastric cancer in the second-line setting. We performed a network meta-analysis to determine their relative benefits. We conducted a systematic review of randomized controlled trials (RCTs) through the MEDLINE, Embase, and Cochrane Central Register of Controlled Trials databases and American Society of Clinical Oncology abstracts up to June 2014 to identify phase III RCTs on advanced gastric cancer in the second-line setting. Overall survival (OS) data were the primary outcome of interest. Hazard ratios (HRs) were extracted from the publications on the basis of reported values or were extracted from survival curves by established methods. A Bayesian network meta-analysis was performed with WinBUGS to compare all regimens simultaneously. Eight RCTs (2439 patients) were identified and contained extractable data for quantitative analysis. Network meta-analysis showed that paclitaxel plus ramucirumab was superior to single-agent ramucirumab [OS HR 0.51, 95 % credible region (CR) 0.30-0.86], paclitaxel (OS HR 0.81, 95 % CR 0.68-0.96), docetaxel (OS HR 0.56, 95 % CR 0.33-0.94), and irinotecan (OS HR 0.71, 95 % CR 0.52-0.99). Paclitaxel plus ramucirumab also had an 89 % probability of being the best regimen among all these regimens. Single-agent ramucirumab, paclitaxel, docetaxel, and irinotecan were comparable to each other with respect to OS and were superior to best supportive care. This is the first network meta-analysis to compare all second-line regimens reported in phase III gastric cancer trials. The results suggest the paclitaxel plus ramucirumab combination is the most effective therapy and should be the reference regimen for future comparative trials.

  6. A comparison of Bayesian and Monte Carlo sensitivity analysis for unmeasured confounding. (United States)

    McCandless, Lawrence C; Gustafson, Paul


    Bias from unmeasured confounding is a persistent concern in observational studies, and sensitivity analysis has been proposed as a solution. In the recent years, probabilistic sensitivity analysis using either Monte Carlo sensitivity analysis (MCSA) or Bayesian sensitivity analysis (BSA) has emerged as a practical analytic strategy when there are multiple bias parameters inputs. BSA uses Bayes theorem to formally combine evidence from the prior distribution and the data. In contrast, MCSA samples bias parameters directly from the prior distribution. Intuitively, one would think that BSA and MCSA ought to give similar results. Both methods use similar models and the same (prior) probability distributions for the bias parameters. In this paper, we illustrate the surprising finding that BSA and MCSA can give very different results. Specifically, we demonstrate that MCSA can give inaccurate uncertainty assessments (e.g. 95% intervals) that do not reflect the data's influence on uncertainty about unmeasured confounding. Using a data example from epidemiology and simulation studies, we show that certain combinations of data and prior distributions can result in dramatic prior-to-posterior changes in uncertainty about the bias parameters. This occurs because the application of Bayes theorem in a non-identifiable model can sometimes rule out certain patterns of unmeasured confounding that are not compatible with the data. Consequently, the MCSA approach may give 95% intervals that are either too wide or too narrow and that do not have 95% frequentist coverage probability. Based on our findings, we recommend that analysts use BSA for probabilistic sensitivity analysis. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.


    International Nuclear Information System (INIS)

    Malo, Lison; Doyon, René; Lafrenière, David; Artigau, Étienne; Gagné, Jonathan; Baron, Frédérique; Riedel, Adric


    We present a new method based on a Bayesian analysis to identify new members of nearby young kinematic groups. The analysis minimally takes into account the position, proper motion, magnitude, and color of a star, but other observables can be readily added (e.g., radial velocity, distance). We use this method to find new young low-mass stars in the β Pictoris and AB Doradus moving groups and in the TW Hydrae, Tucana-Horologium, Columba, Carina, and Argus associations. Starting from a sample of 758 mid-K to mid-M (K5V-M5V) stars showing youth indicators such as Hα and X-ray emission, our analysis yields 214 new highly probable low-mass members of the kinematic groups analyzed. One is in TW Hydrae, 37 in β Pictoris, 17 in Tucana-Horologium, 20 in Columba, 6 in Carina, 50 in Argus, 32 in AB Doradus, and the remaining 51 candidates are likely young but have an ambiguous membership to more than one association. The false alarm rate for new candidates is estimated to be 5% for β Pictoris and TW Hydrae, 10% for Tucana-Horologium, Columba, Carina, and Argus, and 14% for AB Doradus. Our analysis confirms the membership of 58 stars proposed in the literature. Firm membership confirmation of our new candidates will require measurement of their radial velocity (predicted by our analysis), parallax, and lithium 6708 Å equivalent width. We have initiated these follow-up observations for a number of candidates, and we have identified two stars (2MASSJ01112542+1526214, 2MASSJ05241914-1601153) as very strong candidate members of the β Pictoris moving group and one strong candidate member (2MASSJ05332558-5117131) of the Tucana-Horologium association; these three stars have radial velocity measurements confirming their membership and lithium detections consistent with young age.

  8. Cluster Analytical Method of Fault Risk Analysis in Systems (United States)

    Michaľčonok, German; Horalová Kalinová, Michaela


    In providing safety functions, the proposal of safety functions of control systems is an important part of a risk reduction strategy. In the specification of security requirements, it is necessary to determine and document individual characteristics and the desired performance level for each safety. This article presents the results of the experiment cluster analysis. The results of the experiment prove that the methods of cluster analysis provide a suitable tool for analyzing the reliability of safety systems analysis. Regarding the increasing complexity of the systems, we can state that the application of these methods in the subject area is a good choice.

  9. A Bayesian Nonparametric Approach for the Analysis of Multiple Categorical Item Responses. (United States)

    Waters, Andrew; Fronczyk, Kassandra; Guindani, Michele; Baraniuk, Richard G; Vannucci, Marina


    We develop a modeling framework for joint factor and cluster analysis of datasets where multiple categorical response items are collected on a heterogeneous population of individuals. We introduce a latent factor multinomial probit model and employ prior constructions that allow inference on the number of factors as well as clustering of the subjects into homogenous groups according to their relevant factors. Clustering, in particular, allows us to borrow strength across subjects, therefore helping in the estimation of the model parameters, particularly when the number of observations is small. We employ Markov chain Monte Carlo techniques and obtain tractable posterior inference for our objectives, including sampling of missing data. We demonstrate the effectiveness of our method on simulated data. We also analyze two real-world educational datasets and show that our method outperforms state-of-the-art methods. In the analysis of the real-world data, we uncover hidden relationships between the questions and the underlying educational concepts, while simultaneously partitioning the students into groups of similar educational mastery.

  10. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis (United States)

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan


    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  11. Proteome Profiling of Vitreoretinal Diseases by Cluster Analysis


    Shitama, Tomomi; Hayashi, Hideyuki; Noge, Sumiyo; Uchio, Eiichi; Oshima, Kenji; Haniu, Hisao; Takemori, Nobuaki; Komori, Naoka; Matsumoto, Hiroyuki


    Vitreous samples collected in retinopathic surgeries have diverse properties, making proteomics analysis difficult. We report a cluster analysis to evade this difficulty. Vitreous and subretinal fluid samples were collected from 60 patients during surgical operation of non-proliferative diabetic retinopathy, proliferative diabetic retinopathy, proliferative vitreoretinopathy, and rhegmatogenous retinal detachment. For controls we collected vitreous fluid from patients of idiopathic macular ho...

  12. Cyclist activity and injury risk analysis at signalized intersections: a Bayesian modelling approach. (United States)

    Strauss, Jillian; Miranda-Moreno, Luis F; Morency, Patrick


    This study proposes a two-equation Bayesian modelling approach to simultaneously study cyclist injury occurrence and bicycle activity at signalized intersections as joint outcomes. This approach deals with the potential presence of endogeneity and unobserved heterogeneities and is used to identify factors associated with both cyclist injuries and volumes. Its application to identify high-risk corridors is also illustrated. Montreal, Quebec, Canada is the application environment, using an extensive inventory of a large sample of signalized intersections containing disaggregate motor-vehicle traffic volumes and bicycle flows, geometric design, traffic control and built environment characteristics in the vicinity of the intersections. Cyclist injury data for the period of 2003-2008 is used in this study. Also, manual bicycle counts were standardized using temporal and weather adjustment factors to obtain average annual daily volumes. Results confirm and quantify the effects of both bicycle and motor-vehicle flows on cyclist injury occurrence. Accordingly, more cyclists at an intersection translate into more cyclist injuries but lower injury rates due to the non-linear association between bicycle volume and injury occurrence. Furthermore, the results emphasize the importance of turning motor-vehicle movements. The presence of bus stops and total crosswalk length increase cyclist injury occurrence whereas the presence of a raised median has the opposite effect. Bicycle activity through intersections was found to increase as employment, number of metro stations, land use mix, area of commercial land use type, length of bicycle facilities and the presence of schools within 50-800 m of the intersection increase. Intersections with three approaches are expected to have fewer cyclists than those with four. Using Bayesian analysis, expected injury frequency and injury rates were estimated for each intersection and used to rank corridors. Corridors with high bicycle volumes

  13. Pattern recognition in menstrual bleeding diaries by statistical cluster analysis

    Directory of Open Access Journals (Sweden)

    Wessel Jens


    Full Text Available Abstract Background The aim of this paper is to empirically identify a treatment-independent statistical method to describe clinically relevant bleeding patterns by using bleeding diaries of clinical studies on various sex hormone containing drugs. Methods We used the four cluster analysis methods single, average and complete linkage as well as the method of Ward for the pattern recognition in menstrual bleeding diaries. The optimal number of clusters was determined using the semi-partial R2, the cubic cluster criterion, the pseudo-F- and the pseudo-t2-statistic. Finally, the interpretability of the results from a gynecological point of view was assessed. Results The method of Ward yielded distinct clusters of the bleeding diaries. The other methods successively chained the observations into one cluster. The optimal number of distinctive bleeding patterns was six. We found two desirable and four undesirable bleeding patterns. Cyclic and non cyclic bleeding patterns were well separated. Conclusion Using this cluster analysis with the method of Ward medications and devices having an impact on bleeding can be easily compared and categorized.

  14. Breast cancer clustering in Kanagawa, Japan: a geographic analysis. (United States)

    Katayama, Kayoko; Yokoyama, Kazuhito; Yako-Suketomo, Hiroko; Okamoto, Naoyuki; Tango, Toshiro; Inaba, Yutaka


    The purpose of the present study was to determine geographic clustering of breast cancer incidence in Kanagawa Prefecture, using cancer registry data. The study also aimed at examining the association between socio-economic factors and any identified cluster. Incidence data were collected for women who were first diagnosed with breast cancer during the period from January to December 2006 in Kanagawa. The data consisted of 2,326 incidence cases extracted from the total of 34,323 Kanagawa Cancer Registration data issued in 2011. To adjust for differences in age distribution, the standardized mortality ratio (SMR) and the standardized incidence ratio (SIR) of breast cancer were calculated for each of 56 municipalities (e.g., city, special ward, town, and village) in Kanagawa by an indirect method using Kanagawa female population data. Spatial scan statistics were used to detect any area of elevated risk as a cluster for breast cancer deaths and/ or incidences. The Student t-test was performed to examine differences in socio-economic variables, viz, persons per household, total fertility rate, age at first marriage for women, and marriage rate, between cluster and other regions. There was a statistically significant cluster of breast cancer incidence (p=0.001) composed of 11 municipalities in southeastern area of Kanagawa Prefecture, whose SIR was 35 percent higher than that of the remainder of Kanagawa Prefecture. In this cluster, average value of age at first-marriage for women was significantly higher than in the rest of Kanagawa (p=0.017). No statistically significant clusters of breast cancer deaths were detected (p=0.53). There was a statistically significant cluster of high breast cancer incidence in southeastern area of Kanagawa Prefecture. It was suggested that the cluster region was related to the tendency to marry later. This study methodology will be helpful in the analysis of geographical disparities in cancer deaths and incidence.

  15. Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis

    Directory of Open Access Journals (Sweden)

    Gabjo Kim


    Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.

  16. No control genes required: Bayesian analysis of qRT-PCR data.

    Directory of Open Access Journals (Sweden)

    Mikhail V Matz

    Full Text Available Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process.In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts. Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the "classic" analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests.Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R.

  17. No control genes required: Bayesian analysis of qRT-PCR data. (United States)

    Matz, Mikhail V; Wright, Rachel M; Scott, James G


    Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the "classic" analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R.

  18. clusters

    Indian Academy of Sciences (India)


    Sep 27, 2017 ... while CuCoNO, Co3NO, Cu3CoNO, Cu2Co3NO, Cu3Co3NO and Cu6CoNO clusters display stronger chemical stability. Magnetic and electronic properties are also discussed. The magnetic moment is affected by charge transfer and the spd hybridization. Keywords. CumConNO (m + n = 2–7) clusters; ...

  19. A Hybrid Approach for Reliability Analysis Based on Analytic Hierarchy Process and Bayesian Network

    International Nuclear Information System (INIS)

    Zubair, Muhammad


    By using analytic hierarchy process (AHP) and Bayesian Network (BN) the present research signifies the technical and non-technical issues of nuclear accidents. The study exposed that the technical faults was one major reason of these accidents. Keep an eye on other point of view it becomes clearer that human behavior like dishonesty, insufficient training, and selfishness are also play a key role to cause these accidents. In this study, a hybrid approach for reliability analysis based on AHP and BN to increase nuclear power plant (NPP) safety has been developed. By using AHP, best alternative to improve safety, design, operation, and to allocate budget for all technical and non-technical factors related with nuclear safety has been investigated. We use a special structure of BN based on the method AHP. The graphs of the BN and the probabilities associated with nodes are designed to translate the knowledge of experts on the selection of best alternative. The results show that the improvement in regulatory authorities will decrease failure probabilities and increase safety and reliability in industrial area.

  20. Nonparametric Bayesian inference for mean residual life functions in survival analysis. (United States)

    Poynor, Valerie; Kottas, Athanasios


    Modeling and inference for survival analysis problems typically revolves around different functions related to the survival distribution. Here, we focus on the mean residual life (MRL) function, which provides the expected remaining lifetime given that a subject has survived (i.e. is event-free) up to a particular time. This function is of direct interest in reliability, medical, and actuarial fields. In addition to its practical interpretation, the MRL function characterizes the survival distribution. We develop general Bayesian nonparametric inference for MRL functions built from a Dirichlet process mixture model for the associated survival distribution. The resulting model for the MRL function admits a representation as a mixture of the kernel MRL functions with time-dependent mixture weights. This model structure allows for a wide range of shapes for the MRL function. Particular emphasis is placed on the selection of the mixture kernel, taken to be a gamma distribution, to obtain desirable properties for the MRL function arising from the mixture model. The inference method is illustrated with a data set of two experimental groups and a data set involving right censoring. The supplementary material available at Biostatistics online provides further results on empirical performance of the model, using simulated data examples. © The Author 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail:

  1. Bayesian Nonparametric Regression Analysis of Data with Random Effects Covariates from Longitudinal Measurements

    KAUST Repository

    Ryu, Duchwan


    We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.

  2. Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis

    KAUST Repository

    Rubio, Francisco J.


    We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.

  3. Identification of Watershed-scale Critical Source Areas Using Bayesian Maximum Entropy Spatiotemporal Analysis (United States)

    Roostaee, M.; Deng, Z.


    The states' environmental agencies are required by The Clean Water Act to assess all waterbodies and evaluate potential sources of impairments. Spatial and temporal distributions of water quality parameters are critical in identifying Critical Source Areas (CSAs). However, due to limitations in monetary resources and a large number of waterbodies, available monitoring stations are typically sparse with intermittent periods of data collection. Hence, scarcity of water quality data is a major obstacle in addressing sources of pollution through management strategies. In this study spatiotemporal Bayesian Maximum Entropy method (BME) is employed to model the inherent temporal and spatial variability of measured water quality indicators such as Dissolved Oxygen (DO) concentration for Turkey Creek Watershed. Turkey Creek is located in northern Louisiana and has been listed in 303(d) list for DO impairment since 2014 in Louisiana Water Quality Inventory Reports due to agricultural practices. BME method is proved to provide more accurate estimates than the methods of purely spatial analysis by incorporating space/time distribution and uncertainty in available measured soft and hard data. This model would be used to estimate DO concentration at unmonitored locations and times and subsequently identifying CSAs. The USDA's crop-specific land cover data layers of the watershed were then used to determine those practices/changes that led to low DO concentration in identified CSAs. Primary results revealed that cultivation of corn and soybean as well as urban runoff are main contributing sources in low dissolved oxygen in Turkey Creek Watershed.

  4. Assessment of occupational safety risks in Floridian solid waste systems using Bayesian analysis. (United States)

    Bastani, Mehrad; Celik, Nurcin


    Safety risks embedded within solid waste management systems continue to be a significant issue and are prevalent at every step in the solid waste management process. To recognise and address these occupational hazards, it is necessary to discover the potential safety concerns that cause them, as well as their direct and/or indirect impacts on the different types of solid waste workers. In this research, our goal is to statistically assess occupational safety risks to solid waste workers in the state of Florida. Here, we first review the related standard industrial codes to major solid waste management methods including recycling, incineration, landfilling, and composting. Then, a quantitative assessment of major risks is conducted based on the data collected using a Bayesian data analysis and predictive methods. The risks estimated in this study for the period of 2005-2012 are then compared with historical statistics (1993-1997) from previous assessment studies. The results have shown that the injury rates among refuse collectors in both musculoskeletal and dermal injuries have decreased from 88 and 15 to 16 and three injuries per 1000 workers, respectively. However, a contrasting trend is observed for the injury rates among recycling workers, for whom musculoskeletal and dermal injuries have increased from 13 and four injuries to 14 and six injuries per 1000 workers, respectively. Lastly, a linear regression model has been proposed to identify major elements of the high number of musculoskeletal and dermal injuries. © The Author(s) 2015.

  5. Dissecting high-dimensional phenotypes with bayesian sparse factor analysis of genetic covariance matrices. (United States)

    Runcie, Daniel E; Mukherjee, Sayan


    Quantitative genetic studies that model complex, multivariate phenotypes are important for both evolutionary prediction and artificial selection. For example, changes in gene expression can provide insight into developmental and physiological mechanisms that link genotype and phenotype. However, classical analytical techniques are poorly suited to quantitative genetic studies of gene expression where the number of traits assayed per individual can reach many thousand. Here, we derive a Bayesian genetic sparse factor model for estimating the genetic covariance matrix (G-matrix) of high-dimensional traits, such as gene expression, in a mixed-effects model. The key idea of our model is that we need consider only G-matrices that are biologically plausible. An organism's entire phenotype is the result of processes that are modular and have limited complexity. This implies that the G-matrix will be highly structured. In particular, we assume that a limited number of intermediate traits (or factors, e.g., variations in development or physiology) control the variation in the high-dimensional phenotype, and that each of these intermediate traits is sparse - affecting only a few observed traits. The advantages of this approach are twofold. First, sparse factors are interpretable and provide biological insight into mechanisms underlying the genetic architecture. Second, enforcing sparsity helps prevent sampling errors from swamping out the true signal in high-dimensional data. We demonstrate the advantages of our model on simulated data and in an analysis of a published Drosophila melanogaster gene expression data set.

  6. Integration of Bayesian analysis for eutrophication prediction and assessment in a landscape lake. (United States)

    Yang, Likun; Zhao, Xinhua; Peng, Sen; Zhou, Guangyu


    Eutrophication models have been widely used to assess water quality in landscape lakes. Because flow rate in landscape lakes is relatively low and similar to that of natural lakes, eutrophication is more dominant in landscape lakes. To assess the risk of eutrophication in landscape lakes, a set of dynamic equations was developed to simulate lake water quality for total nitrogen (TN), total phosphorous (TP), dissolve oxygen (DO) and chlorophyll a (Chl a). Firstly, the Bayesian calibration results were described. Moreover, the ability of the model to reproduce adequately the observed mean patterns and major cause-effect relationships for water quality conditions in landscape lakes were presented. Two loading scenarios were used. A Monte Carlo algorithm was applied to calculate the predicated water quality distributions, which were used in the established hierarchical assessment system for lake water quality risk. The important factors affecting the lake water quality risk were defined using linear regression analysis. The results indicated that the variations in the landscape lake receiving recharge water quality caused considerable landscape lake water quality risk in the surrounding area. Moreover, the Chl a concentration in lake water was significantly affected by TP and TN concentrations; the lake TP concentration was the limiting factor for growth of plankton in lake water. The lake water TN concentration provided the basic nutritional requirements. Lastly, lower TN and TP concentrations in the receiving recharge water caused increased lake water quality risk.

  7. Bayesian probability analysis: a prospective demonstration of its clinical utility in diagnosing coronary disease

    International Nuclear Information System (INIS)

    Detrano, R.; Yiannikas, J.; Salcedo, E.E.; Rincon, G.; Go, R.T.; Williams, G.; Leatherman, J.


    One hundred fifty-four patients referred for coronary arteriography were prospectively studied with stress electrocardiography, stress thallium scintigraphy, cine fluoroscopy (for coronary calcifications), and coronary angiography. Pretest probabilities of coronary disease were determined based on age, sex, and type of chest pain. These and pooled literature values for the conditional probabilities of test results based on disease state were used in Bayes theorem to calculate posttest probabilities of disease. The results of the three noninvasive tests were compared for statistical independence, a necessary condition for their simultaneous use in Bayes theorem. The test results were found to demonstrate pairwise independence in patients with and those without disease. Some dependencies that were observed between the test results and the clinical variables of age and sex were not sufficient to invalidate application of the theorem. Sixty-eight of the study patients had at least one major coronary artery obstruction of greater than 50%. When these patients were divided into low-, intermediate-, and high-probability subgroups according to their pretest probabilities, noninvasive test results analyzed by Bayesian probability analysis appropriately advanced 17 of them by at least one probability subgroup while only seven were moved backward. Of the 76 patients without disease, 34 were appropriately moved into a lower probability subgroup while 10 were incorrectly moved up. We conclude that posttest probabilities calculated from Bayes theorem more accurately classified patients with and without disease than did pretest probabilities, thus demonstrating the utility of the theorem in this application

  8. Paired Comparison Analysis of the van Baaren Model Using Bayesian Approach with Noninformative Prior

    Directory of Open Access Journals (Sweden)

    Saima Altaf


    Full Text Available 800x600 Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif";} One technique being commonly studied these days because of its attractive applications for the comparison of several objects is the method of paired comparisons. This technique permits the ranking of the objects by means of a score, which reflects the merit of the items on a linear scale. The present study is concerned with the Bayesian analysis of a paired comparison model, namely the van Baaren model VI using noninformative uniform prior. For this purpose, the joint posterior distribution for the parameters of the model, their marginal distributions, posterior estimates (means and modes, the posterior probabilities for comparing the two treatment parameters and the predictive probabilities are obtained.

  9. Bayesian model accounting for within-class biological variability in Serial Analysis of Gene Expression (SAGE

    Directory of Open Access Journals (Sweden)

    Brentani Helena


    Full Text Available Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE, "Digital Northern" or Massively Parallel Signature Sequencing (MPSS, is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries" and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.

  10. Bayesian model accounting for within-class biological variability in Serial Analysis of Gene Expression (SAGE). (United States)

    Vêncio, Ricardo Z N; Brentani, Helena; Patrão, Diogo F C; Pereira, Carlos A B


    An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.

  11. To be certain about the uncertainty: Bayesian statistics for 13 C metabolic flux analysis. (United States)

    Theorell, Axel; Leweke, Samuel; Wiechert, Wolfgang; Nöh, Katharina


    13 C Metabolic Fluxes Analysis ( 13 C MFA) remains to be the most powerful approach to determine intracellular metabolic reaction rates. Decisions on strain engineering and experimentation heavily rely upon the certainty with which these fluxes are estimated. For uncertainty quantification, the vast majority of 13 C MFA studies relies on confidence intervals from the paradigm of Frequentist statistics. However, it is well known that the confidence intervals for a given experimental outcome are not uniquely defined. As a result, confidence intervals produced by different methods can be different, but nevertheless equally valid. This is of high relevance to 13 C MFA, since practitioners regularly use three different approximate approaches for calculating confidence intervals. By means of a computational study with a realistic model of the central carbon metabolism of E. coli, we provide strong evidence that confidence intervals used in the field depend strongly on the technique with which they were calculated and, thus, their use leads to misinterpretation of the flux uncertainty. In order to provide a better alternative to confidence intervals in 13 C MFA, we demonstrate that credible intervals from the paradigm of Bayesian statistics give more reliable flux uncertainty quantifications which can be readily computed with high accuracy using Markov chain Monte Carlo. In addition, the widely applied chi-square test, as a means of testing whether the model reproduces the data, is examined closer. © 2017 Wiley Periodicals, Inc.

  12. A Bayesian Approach to the Design and Analysis of Computer Experiments

    Energy Technology Data Exchange (ETDEWEB)

    Currin, C.


    We consider the problem of designing and analyzing experiments for prediction of the function y(f), t {element_of} T, where y is evaluated by means of a computer code (typically by solving complicated equations that model a physical system), and T represents the domain of inputs to the code. We use a Bayesian approach, in which uncertainty about y is represented by a spatial stochastic process (random function); here we restrict attention to stationary Gaussian processes. The posterior mean function can be used as an interpolating function, with uncertainties given by the posterior standard deviations. Instead of completely specifying the prior process, we consider several families of priors, and suggest some cross-validational methods for choosing one that performs relatively well on the function at hand. As a design criterion, we use the expected reduction in the entropy of the random vector y (T*), where T* {contained_in} T is a given finite set of ''sites'' (input configurations) at which predictions are to be made. We describe an exchange algorithm for constructing designs that are optimal with respect to this criterion. To demonstrate the use of these design and analysis methods, several examples are given, including one experiment on a computer model of a thermal energy storage device and another on an integrated circuit simulator.

  13. Composite behavior analysis for video surveillance using hierarchical dynamic Bayesian networks (United States)

    Cheng, Huanhuan; Shan, Yong; Wang, Runsheng


    Analyzing composite behaviors involving objects from multiple categories in surveillance videos is a challenging task due to the complicated relationships among human and objects. This paper presents a novel behavior analysis framework using a hierarchical dynamic Bayesian network (DBN) for video surveillance systems. The model is built for extracting objects' behaviors and their relationships by representing behaviors using spatial-temporal characteristics. The recognition of object behaviors is processed by the DBN at multiple levels: features of objects at low level, objects and their relationships at middle level, and event at high level, where event refers to behaviors of a single type object as well as behaviors consisting of several types of objects such as ``a person getting in a car.'' Furthermore, to reduce the complexity, a simple model selection criterion is addressed, by which the appropriated model is picked out from a pool of candidate models. Experiments are shown to demonstrate that the proposed framework could efficiently recognize and semantically describe composite object and human activities in surveillance videos.

  14. Intrinsic Properties of tRNA Molecules as Deciphered via Bayesian Network and Distribution Divergence Analysis

    Directory of Open Access Journals (Sweden)

    Sergio Branciamore


    Full Text Available The identity/recognition of tRNAs, in the context of aminoacyl tRNA synthetases (and other molecules, is a complex phenomenon that has major implications ranging from the origins and evolution of translation machinery and genetic code to the evolution and speciation of tRNAs themselves to human mitochondrial diseases to artificial genetic code engineering. Deciphering it via laboratory experiments, however, is difficult and necessarily time- and resource-consuming. In this study, we propose a mathematically rigorous two-pronged in silico approach to identifying and classifying tRNA positions important for tRNA identity/recognition, rooted in machine learning and information-theoretic methodology. We apply Bayesian Network modeling to elucidate the structure of intra-tRNA-molecule relationships, and distribution divergence analysis to identify meaningful inter-molecule differences between various tRNA subclasses. We illustrate the complementary application of these two approaches using tRNA examples across the three domains of life, and identify and discuss important (informative positions therein. In summary, we deliver to the tRNA research community a novel, comprehensive methodology for identifying the specific elements of interest in various tRNA molecules, which can be followed up by the corresponding experimental work and/or high-resolution position-specific statistical analyses.

  15. Associations between sexual habits, menstrual hygiene practices, demographics and the vaginal microbiome as revealed by Bayesian network analysis


    Noyes, Noelle; Cho, Kyu-Chul; Ravel, Jacques; Forney, Larry J.; Abdo, Zaid


    The vaginal microbiome plays an influential role in several disease states in reproductive age women, including bacterial vaginosis (BV). While demographic characteristics are associated with differences in vaginal microbiome community structure, little is known about the influence of sexual and hygiene habits. Furthermore, associations between the vaginal microbiome and risk symptoms of bacterial vaginosis have not been fully elucidated. Using Bayesian network (BN) analysis of 16S rRNA gene ...

  16. Bayesian Analysis for Food-Safety Risk Assessment: Evaluation of Dose-Response Functions within WinBUGS


    Williams, Michael S.; Ebel, Eric D.; Hoeting, Jennifer A.


    Bayesian methods are becoming increasingly popular in the field of food-safety risk assessment. Risk assessment models often require the integration of a dose-response function over the distribution of all possible doses of a pathogen ingested with a specific food. This requires the evaluation of an integral for every sample for a Markov chain Monte Carlo analysis of a model. While many statistical software packages have functions that allow for the evaluation of the integral, this functional...

  17. The phylogeographic history of the new world screwworm fly, inferred by approximate bayesian computation analysis.

    Directory of Open Access Journals (Sweden)

    Pablo Fresia

    Full Text Available Insect pest phylogeography might be shaped both by biogeographic events and by human influence. Here, we conducted an approximate Bayesian computation (ABC analysis to investigate the phylogeography of the New World screwworm fly, Cochliomyia hominivorax, with the aim of understanding its population history and its order and time of divergence. Our ABC analysis supports that populations spread from North to South in the Americas, in at least two different moments. The first split occurred between the North/Central American and South American populations in the end of the Last Glacial Maximum (15,300-19,000 YBP. The second split occurred between the North and South Amazonian populations in the transition between the Pleistocene and the Holocene eras (9,100-11,000 YBP. The species also experienced population expansion. Phylogenetic analysis likewise suggests this north to south colonization and Maxent models suggest an increase in the number of suitable areas in South America from the past to present. We found that the phylogeographic patterns observed in C. hominivorax cannot be explained only by climatic oscillations and can be connected to host population histories. Interestingly we found these patterns are very coincident with general patterns of ancient human movements in the Americas, suggesting that humans might have played a crucial role in shaping the distribution and population structure of this insect pest. This work presents the first hypothesis test regarding the processes that shaped the current phylogeographic structure of C. hominivorax and represents an alternate perspective on investigating the problem of insect pests.

  18. Recurrent-neural-network-based Boolean factor analysis and its application to word clustering. (United States)

    Frolov, Alexander A; Husek, Dusan; Polyakov, Pavel Yu


    The objective of this paper is to introduce a neural-network-based algorithm for word clustering as an extension of the neural-network-based Boolean factor analysis algorithm (Frolov , 2007). It is shown that this extended algorithm supports even the more complex model of signals that are supposed to be related to textual documents. It is hypothesized that every topic in textual data is characterized by a set of words which coherently appear in documents dedicated to a given topic. The appearance of each word in a document is coded by the activity of a particular neuron. In accordance with the Hebbian learning rule implemented in the network, sets of coherently appearing words (treated as factors) create tightly connected groups of neurons, hence, revealing them as attractors of the network dynamics. The found factors are eliminated from the network memory by the Hebbian unlearning rule facilitating the search of other factors. Topics related to the found sets of words can be identified based on the words' semantics. To make the method complete, a special technique based on a Bayesian procedure has been developed for the following purposes: first, to provide a complete description of factors in terms of component probability, and second, to enhance the accuracy of classification of signals to determine whether it contains the factor. Since it is assumed that every word may possibly contribute to several topics, the proposed method might be related to the method of fuzzy clustering. In this paper, we show that the results of Boolean factor analysis and fuzzy clustering are not contradictory, but complementary. To demonstrate the capabilities of this attempt, the method is applied to two types of textual data on neural networks in two different languages. The obtained topics and corresponding words are at a good level of agreement despite the fact that identical topics in Russian and English conferences contain different sets of keywords.

  19. Traffic Accident, System Model and Cluster Analysis in GIS

    Directory of Open Access Journals (Sweden)

    Veronika Vlčková


    Full Text Available One of the many often frequented topics as normal journalism, so the professional public, is the problem of traffic accidents. This article illustrates the orientation of considerations to a less known context of accidents, with the help of constructive systems theory and its methods, cluster analysis and geoinformation engineering. Traffic accident is reframing the space-time, and therefore it can be to study with tools of technology of geographic information systems. The application of system approach enabling the formulation of the system model, grabbed by tools of geoinformation engineering and multicriterial and cluster analysis.

  20. Application of microarray analysis on computer cluster and cloud platforms. (United States)

    Bernau, C; Boulesteix, A-L; Knaus, J


    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  1. Bayesian biostatistics

    CERN Document Server

    Lesaffre, Emmanuel


    The growth of biostatistics has been phenomenal in recent years and has been marked by considerable technical innovation in both methodology and computational practicality. One area that has experienced significant growth is Bayesian methods. The growing use of Bayesian methodology has taken place partly due to an increasing number of practitioners valuing the Bayesian paradigm as matching that of scientific discovery. In addition, computational advances have allowed for more complex models to be fitted routinely to realistic data sets. Through examples, exercises and a combination of introd

  2. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies

    Directory of Open Access Journals (Sweden)

    Hero Alfred


    Full Text Available Abstract Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP, the Indian Buffet Process (IBP, and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV, Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD, closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

  3. Bayesian Sensitivity Analysis of a Nonlinear Dynamic Factor Analysis Model with Nonparametric Prior and Possible Nonignorable Missingness. (United States)

    Tang, Niansheng; Chow, Sy-Miin; Ibrahim, Joseph G; Zhu, Hongtu


    Many psychological concepts are unobserved and usually represented as latent factors apprehended through multiple observed indicators. When multiple-subject multivariate time series data are available, dynamic factor analysis models with random effects offer one way of modeling patterns of within- and between-person variations by combining factor analysis and time series analysis at the factor level. Using the Dirichlet process (DP) as a nonparametric prior for individual-specific time series parameters further allows the distributional forms of these parameters to deviate from commonly imposed (e.g., normal or other symmetric) functional forms, arising as a result of these parameters' restricted ranges. Given the complexity of such models, a thorough sensitivity analysis is critical but computationally prohibitive. We propose a Bayesian local influence method that allows for simultaneous sensitivity analysis of multiple modeling components within a single fitting of the model of choice. Five illustrations and an empirical example are provided to demonstrate the utility of the proposed approach in facilitating the detection of outlying cases and common sources of misspecification in dynamic factor analysis models, as well as identification of modeling components that are sensitive to changes in the DP prior specification.

  4. Identifying clinical course patterns in SMS data using cluster analysis. (United States)

    Kent, Peter; Kongsted, Alice


    Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important subgroups in the outcomes of research studies. Two previous studies have investigated detailed clinical course patterns in SMS data obtained from people seeking care for low back pain. One used a visual analysis approach and the other performed a cluster analysis of SMS data that had first been transformed by spline analysis. However, cluster analysis of SMS data in its original untransformed form may be simpler and offer other advantages. Therefore, the aim of this study was to determine whether cluster analysis could be used for identifying clinical course patterns distinct from the pattern of the whole group, by including all SMS time points in their original form. It was a 'proof of concept' study to explore the potential, clinical relevance, strengths and weakness of such an approach. This was a secondary analysis of longitudinal SMS data collected in two randomised controlled trials conducted simultaneously from a single clinical population (n = 322). Fortnightly SMS data collected over a year on 'days of problematic low back pain' and on 'days of sick leave' were analysed using Two-Step (probabilistic) Cluster Analysis. Clinical course patterns were identified that were clinically interpretable and different from those of the whole group. Similar patterns were obtained when the number of SMS time points was reduced to monthly. The advantages and disadvantages of this method were contrasted to that of first transforming SMS data by spline analysis. This study showed that clinical course patterns can be identified by cluster analysis using all SMS time points as cluster variables. This method is simple, intuitive and does not require a high level of statistical skill. However, there

  5. A Bayesian SIRS model for the analysis of respiratory syncytial virus in the region of Valencia, Spain. (United States)

    Corberán-Vallet, Ana; Santonja, Francisco J


    We present a Bayesian stochastic susceptible-infected-recovered-susceptible (SIRS) model in discrete time to understand respiratory syncytial virus dynamics in the region of Valencia, Spain. A SIRS model based on ordinary differential equations has also been proposed to describe RSV dynamics in the region of Valencia. However, this continuous-time deterministic model is not suitable when the initial number of infected individuals is small. Stochastic epidemic models based on a probability of disease transmission provide a more natural description of the spread of infectious diseases. In addition, by allowing the transmission rate to vary stochastically over time, the proposed model provides an improved description of RSV dynamics. The Bayesian analysis of the model allows us to calculate both the posterior distribution of the model parameters and the posterior predictive distribution, which facilitates the computation of point forecasts and prediction intervals for future observations. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Applied Hierarchical Cluster Analysis with Average Linkage Algoritm

    Directory of Open Access Journals (Sweden)

    Cindy Cahyaning Astuti


    Full Text Available This research was conducted in Sidoarjo District where source of data used from secondary data contained in the book "Kabupaten Sidoarjo Dalam Angka 2016" .In this research the authors chose 12 variables that can represent sub-district characteristics in Sidoarjo. The variable that represents the characteristics of the sub-district consists of four sectors namely geography, education, agriculture and industry. To determine the equitable geographical conditions, education, agriculture and industry each district, it would require an analysis to classify sub-districts based on the sub-district characteristics. Hierarchical cluster analysis is the analytical techniques used to classify or categorize the object of each case into a relatively homogeneous group expressed as a cluster. The results are expected to provide information about dominant sub-district characteristics and non-dominant sub-district characteristics in four sectors based on the results of the cluster is formed.

  7. Examining lower urinary tract symptom constellations using cluster analysis. (United States)

    Coyne, Karin S; Matza, Louis S; Kopp, Zoe S; Thompson, Christine; Henry, David; Irwin, Debra E; Artibani, Walter; Herschorn, Sender; Milsom, Ian


    To gain a better understanding of how patients experience lower urinary tract symptoms (LUTS) and to determine whether particular symptoms cluster together, as LUTS seldom occur alone. A secondary analysis of a cross-sectional, population-based survey of adults in Sweden, Italy, Germany, UK and Canada was undertaken to examine the presence of LUTS groups. Of the 19,165 telephone surveys, 13,519 respondents reported at least one LUTS and were included in the analysis. All respondents were asked about the presence of 14 LUTS (International Prostate Symptom Score plus seven additional LUTS). K-means cluster analyses, a statistical method for sorting objects into groups so that similar objects are grouped together, was used to identify groups of people based on their symptoms. Men and women were analysed separately. A split-half random sample was selected from the dataset so that exploratory analyses could be conducted in one half and confirmed in the second. On model confirmation, the sample was analysed in its entirety. Included in this analysis were 5014 men (mean age 49.8 years; 95% white) and 8505 women (mean age 50.4 years; 96% white). Among both men and women, six distinct symptom cluster groups were identified and the symptom patterns of each cluster were examined. For both, the largest cluster consisted of respondents with minimal symptoms (i.e. reporting essentially one symptom), 56% of men and 57% of women. The remaining five clusters for men and women were labelled based on their predominant symptoms. For men, the clusters were nocturia of twice or more per night (12%); terminal dribble (11%); urgency (10%); multiple symptoms (9%); and postvoid incontinence (5%). For women, the clusters were nocturia of twice or more per night (12%); terminal dribble (10%); urgency (8%); stress incontinence (8%); and multiple symptoms (5%). The multiple-symptom groups had several and varied LUTS, were older, and had more comorbidities. Clusters of terminal dribble and male

  8. Reconstruction of a beech population bottleneck using archival demographic information and Bayesian analysis of genetic data. (United States)

    Lander, Tonya A; Oddou-Muratorio, Sylvie; Prouillet-Leplat, Helene; Klein, Etienne K


    Range expansion and contraction has occurred in the history of most species and can seriously impact patterns of genetic diversity. Historical data about range change are rare and generally appropriate for studies at large scales, whereas the individual pollen and seed dispersal events that form the basis of geneflow and colonization generally occur at a local scale. In this study, we investigated range change in Fagus sylvatica on Mont Ventoux, France, using historical data from 1838 to the present and approximate Bayesian computation (ABC) analyses of genetic data. From the historical data, we identified a population minimum in 1845 and located remnant populations at least 200 years old. The ABC analysis selected a demographic scenario with three populations, corresponding to two remnant populations and one area of recent expansion. It also identified expansion from a smaller ancestral population but did not find that this expansion followed a population bottleneck, as suggested by the historical data. Despite a strong support to the selected scenario for our data set, the ABC approach showed a low power to discriminate among scenarios on average and a low ability to accurately estimate effective population sizes and divergence dates, probably due to the temporal scale of the study. This study provides an unusual opportunity to test ABC analysis in a system with a well-documented demographic history and identify discrepancies between the results of historical, classical population genetic and ABC analyses. The results also provide valuable insights into genetic processes at work at a fine spatial and temporal scale in range change and colonization. © 2011 Blackwell Publishing Ltd.

  9. Bayesian network modeling: A case study of an epidemiologic system analysis of cardiovascular risk. (United States)

    Fuster-Parra, P; Tauler, P; Bennasar-Veny, M; Ligęza, A; López-González, A A; Aguiló, A


    An extensive, in-depth study of cardiovascular risk factors (CVRF) seems to be of crucial importance in the research of cardiovascular disease (CVD) in order to prevent (or reduce) the chance of developing or dying from CVD. The main focus of data analysis is on the use of models able to discover and understand the relationships between different CVRF. In this paper a report on applying Bayesian network (BN) modeling to discover the relationships among thirteen relevant epidemiological features of heart age domain in order to analyze cardiovascular lost years (CVLY), cardiovascular risk score (CVRS), and metabolic syndrome (MetS) is presented. Furthermore, the induced BN was used to make inference taking into account three reasoning patterns: causal reasoning, evidential reasoning, and intercausal reasoning. Application of BN tools has led to discovery of several direct and indirect relationships between different CVRF. The BN analysis showed several interesting results, among them: CVLY was highly influenced by smoking being the group of men the one with highest risk in CVLY; MetS was highly influence by physical activity (PA) being again the group of men the one with highest risk in MetS, and smoking did not show any influence. BNs produce an intuitive, transparent, graphical representation of the relationships between different CVRF. The ability of BNs to predict new scenarios when hypothetical information is introduced makes BN modeling an Artificial Intelligence (AI) tool of special interest in epidemiological studies. As CVD is multifactorial the use of BNs seems to be an adequate modeling tool. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  10. Assessment of surface water quality using hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Dheeraj Kumar Dabgerwal


    Full Text Available This study was carried out to assess the physicochemical quality river Varuna inVaranasi,India. Water samples were collected from 10 sites during January-June 2015. Pearson correlation analysis was used to assess the direction and strength of relationship between physicochemical parameters. Hierarchical Cluster analysis was also performed to determine the sources of pollution in the river Varuna. The result showed quite high value of DO, Nitrate, BOD, COD and Total Alkalinity, above the BIS permissible limit. The results of correlation analysis identified key water parameters as pH, electrical conductivity, total alkalinity and nitrate, which influence the concentration of other water parameters. Cluster analysis identified three major clusters of sampling sites out of total 10 sites, according to the similarity in water quality. This study illustrated the usefulness of correlation and cluster analysis for getting better information about the river water quality.International Journal of Environment Vol. 5 (1 2016,  pp: 32-44

  11. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José


    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  12. How few countries will do? Comparative survey analysis from a Bayesian perspective

    Directory of Open Access Journals (Sweden)

    Joop J.C.M. Hox


    Full Text Available Meuleman and Billiet (2009 have carried out a simulation study aimed at the question how many countries are needed for accurate multilevel SEM estimation in comparative studies. The authors concluded that a sample of 50 to 100 countries is needed for accurate estimation. Recently, Bayesian estimation methods have been introduced in structural equation modeling which should work well with much lower sample sizes. The current study reanalyzes the simulation of Meuleman and Billiet using Bayesian estimation to find the lowest number of countries needed when conducting multilevel SEM. The main result of our simulations is that a sample of about 20 countries is sufficient for accurate Bayesian estimation, which makes multilevel SEM practicable for the number of countries commonly available in large scale comparative surveys.

  13. Hierarchical Bayesian Spatio-Temporal Analysis of Climatic and Socio-Economic Determinants of Rocky Mountain Spotted Fever.

    Directory of Open Access Journals (Sweden)

    Ram K Raghavan

    Full Text Available This study aims to examine the spatio-temporal dynamics of Rocky Mountain spotted fever (RMSF prevalence in four contiguous states of Midwestern United States, and to determine the impact of environmental and socio-economic factors associated with this disease. Bayesian hierarchical models were used to quantify space and time only trends and spatio-temporal interaction effect in the case reports submitted to the state health departments in the region. Various socio-economic, environmental and climatic covariates screened a priori in a bivariate procedure were added to a main-effects Bayesian model in progressive steps to evaluate important drivers of RMSF space-time patterns in the region. Our results show a steady increase in RMSF incidence over the study period to newer geographic areas, and the posterior probabilities of county-specific trends indicate clustering of high risk counties in the central and southern parts of the study region. At the spatial scale of a county, the prevalence levels of RMSF is influenced by poverty status, average relative humidity, and average land surface temperature (>35°C in the region, and the relevance of these factors in the context of climate-change impacts on tick-borne diseases are discussed.

  14. Bayesian ideas and data analysis an introduction for scientists and statisticians

    CERN Document Server

    Christensen, Ronald; Branscum, Adam; Hanson, Timothy E.


    This book provides a good introduction to Bayesian approaches to applied statistical modelling. … The authors have fulfilled their main aim of introducing Bayesian ideas through examples using a large number of statistical models. An interesting feature of this book is the humour of the authors that make it more fun than typical statistics books. In summary, this is a very interesting introductory book, very well organised and has been written in a style that is extremely pleasant and enjoyable to read. Both the statistical concepts and examples are very well explained. In conclusion, I highly

  15. Accurate phenotyping: Reconciling approaches through Bayesian model averaging.

    Directory of Open Access Journals (Sweden)

    Carla Chia-Ming Chen

    Full Text Available Genetic research into complex diseases is frequently hindered by a lack of clear biomarkers for phenotype ascertainment. Phenotypes for such diseases are often identified on the basis of clinically defined criteria; however such criteria may not be suitable for understanding the genetic composition of the diseases. Various statistical approaches have been proposed for phenotype definition; however our previous studies have shown that differences in phenotypes estimated using different approaches have substantial impact on subsequent analyses. Instead of obtaining results based upon a single model, we propose a new method, using Bayesian model averaging to overcome problems associated with phenotype definition. Although Bayesian model averaging has been used in other fields of research, this is the first study that uses Bayesian model averaging to reconcile phenotypes obtained using multiple models. We illustrate the new method by applying it to simulated genetic and phenotypic data for Kofendred personality disorder-an imaginary disease with several sub-types. Two separate statistical methods were used to identify clusters of individuals with distinct phenotypes: latent class analysis and grade of membership. Bayesian model averaging was then used to combine the two clusterings for the purpose of subsequent linkage analyses. We found that causative genetic loci for the disease produced higher LOD scores using model averaging than under either individual model separately. We attribute this improvement to consolidation of the cores of phenotype clusters identified using each individual method.

  16. Cluster analysis as a prediction tool for pregnancy outcomes. (United States)

    Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L


    Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.

  17. Language Learner Motivational Types: A Cluster Analysis Study (United States)

    Papi, Mostafa; Teimouri, Yasser


    The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…

  18. Characterization of population exposure to organochlorines: A cluster analysis application

    NARCIS (Netherlands)

    R.M. Guimarães (Raphael Mendonça); S. Asmus (Sven); A. Burdorf (Alex)


    textabstractThis study aimed to show the results from a cluster analysis application in the characterization of population exposure to organochlorines through variables related to time and exposure dose. Characteristics of 354 subjects in a population exposed to organochlorine pesticides residues

  19. Cluster analysis for validated climatology stations using precipitation in Mexico

    NARCIS (Netherlands)

    Bravo Cabrera, J. L.; Azpra-Romero, E.; Zarraluqui-Such, V.; Gay-García, C.; Estrada Porrúa, F.


    Annual average of daily precipitation was used to group climatological stations into clusters using the k-means procedure and principal component analysis with varimax rotation. After a careful selection of the stations deployed in Mexico since 1950, we selected 349 characterized by having 35 to 40

  20. cluster

    Indian Academy of Sciences (India)

    has been investigated electrochemically in positive and negative microenvironments, both in solution and in film. Charge nature around the active centre ... in plants, bacteria and also in mammals. This cluster is also an important constituent of a ..... selection of non-cysteine amino acid in the active centre of Rieske proteins.

  1. Compiling Relational Bayesian Networks for Exact Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred; Chavira, Mark; Darwiche, Adnan


    We describe a system for exact inference with relational Bayesian networks as defined in the publicly available \\primula\\ tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating...... and differentiating these circuits in time linear in their size. We report on experimental results showing the successful compilation, and efficient inference, on relational Bayesian networks whose {\\primula}--generated propositional instances have thousands of variables, and whose jointrees have clusters...

  2. Analysis of Roadway Traffic Accidents Based on Rough Sets and Bayesian Networks

    Directory of Open Access Journals (Sweden)

    Xiaoxia Xiong


    Full Text Available The paper integrates Rough Sets (RS and Bayesian Networks (BN for roadway traffic accident analysis. RS reduction of attributes is first employed to generate the key set of attributes affecting accident outcomes, which are then fed into a BN structure as nodes for BN construction and accident outcome classification. Such RS-based BN framework combines the advantages of RS in knowledge reduction capability and BN in describing interrelationships among different attributes. The framework is demonstrated using the 100-car naturalistic driving data from Virginia Tech Transportation Institute to predict accident type. Comparative evaluation with the baseline BNs shows the RS-based BNs generally have a higher prediction accuracy and lower network complexity while with comparable prediction coverage and receiver operating characteristic curve area, proving that the proposed RS-based BN overall outperforms the BNs with/without traditional feature selection approaches. The proposed RS-based BN indicates the most significant attributes that affect accident types include pre-crash manoeuvre, driver’s attention from forward roadway to centre mirror, number of secondary tasks undertaken, traffic density, and relation to junction, most of which feature pre-crash driver states and driver behaviours that have not been extensively researched in literature, and could give further insight into the nature of traffic accidents.

  3. Analysis of ASR Clogging Investigations at Three Australian ASR Sites in a Bayesian Context

    Directory of Open Access Journals (Sweden)

    Peter Dillon


    Full Text Available When evaluating uncertainties in developing an aquifer storage and recovery (ASR system, under normal budgetary constraints, a systematic approach is needed to prioritise investigations. Three case studies where field trials have been undertaken, and clogging evaluated, reveal the changing perceptions of viability of ASR from a clogging perspective as a result of the progress of investigations. Two stormwater and one recycled water ASR investigations in siliceous aquifers are described that involved different strategies to evaluate the potential for clogging. This paper reviews these sites, as well as earlier case studies and information relating water quality, to clogging in column studies. Two novel theoretical concepts are introduced in the paper. Bayesian analysis is applied to demonstrate the increase in expected net benefit in developing a new ASR operation by undertaking clogging experiments (that have an assumed known reliability for predicting viability for the injectant treatment options and aquifer material from the site. Results for an example situation demonstrate benefit cost ratios of experiments ranging from 1.5 to 6 and apply if decisions are based on experimental results whether success or failure are predicted. Additionally, a theoretical assessment of clogging rates characterised as acute and chronic is given, to explore their combined impact, for two operating parameters that define the onset of purging for recovery of reversible clogging and the onset of occasional advanced bore rehabilitation to address recovery of chronic clogging. These allow the assessment of net recharge and the proportion of water purged or redeveloped. Both analyses could inform economic decisions and help motivate an improved investigation methodology. It is expected that aquifer heterogeneity will result in differing injection rates among wells, so operational experience will ultimately be valuable in differentiating clogging behaviour under

  4. Prokinetics for the treatment of functional dyspepsia: Bayesian network meta-analysis. (United States)

    Yang, Young Joo; Bang, Chang Seok; Baik, Gwang Ho; Park, Tae Young; Shin, Suk Pyo; Suk, Ki Tae; Kim, Dong Joon


    Controversies persist regarding the effect of prokinetics for the treatment of functional dyspepsia (FD). This study aimed to assess the comparative efficacy of prokinetic agents for the treatment of FD. Randomized controlled trials (RCTs) of prokinetics for the treatment of FD were identified from core databases. Symptom response rates were extracted and analyzed using odds ratios (ORs). A Bayesian network meta-analysis was performed using the Markov chain Monte Carlo method in WinBUGS and NetMetaXL. In total, 25 RCTs, which included 4473 patients with FD who were treated with 6 different prokinetics or placebo, were identified and analyzed. Metoclopramide showed the best surface under the cumulative ranking curve (SUCRA) probability (92.5%), followed by trimebutine (74.5%) and mosapride (63.3%). However, the therapeutic efficacy of metoclopramide was not significantly different from that of trimebutine (OR:1.32, 95% credible interval: 0.27-6.06), mosapride (OR: 1.99, 95% credible interval: 0.87-4.72), or domperidone (OR: 2.04, 95% credible interval: 0.92-4.60). Metoclopramide showed better efficacy than itopride (OR: 2.79, 95% credible interval: 1.29-6.21) and acotiamide (OR: 3.07, 95% credible interval: 1.43-6.75). Domperidone (SUCRA probability 62.9%) showed better efficacy than itopride (OR: 1.37, 95% credible interval: 1.07-1.77) and acotiamide (OR: 1.51, 95% credible interval: 1.04-2.18). Metoclopramide, trimebutine, mosapride, and domperidone showed better efficacy for the treatment of FD than itopride or acotiamide. Considering the adverse events related to metoclopramide or domperidone, the short-term use of these agents or the alternative use of trimebutine or mosapride could be recommended for the symptomatic relief of FD.

  5. Bayesian Total Error Analysis - An Error Sensitive Approach to Model Calibration (United States)

    Franks, S. W.; Kavetski, D.; Kuczera, G.


    The majority of environmental models require calibration of their parameters before meaningful predictions of catchment behaviour can be made. Despite the importance of reliable parameter estimates, there are growing concerns about the ability of objective-based inference methods to adequately calibrate environmental models. The problem lies with the formulation of the objective or likelihood function, which is currently implemented using essentially ad-hoc methods. We outline limitations of current calibration methodologies and introduce a more systematic Bayesian Total Error Analysis (BATEA) framework for environmental model calibration and validation, which imposes a hitherto missing rigour in environmental modelling by requiring the specification of physically realistic model and data uncertainty models with explicit assumptions that can and must be tested against available evidence. The BATEA formalism enables inference of the hydrological parameters and also of any latent variables of the uncertainty models, e.g., precipitation depth errors. The latter could be useful for improving data sampling and measurement methodologies. In addition, distinguishing between the various sources of errors will reduce the current ambiguity about parameter and predictive uncertainty and enable rational testing of environmental models' hypotheses. Monte Carlo Markov Chain methods are employed to manage the increased computational requirements of BATEA. A case study using synthetic data demonstrates that explicitly accounting for forcing errors leads to immediate advantages over traditional regression (e.g., standard least squares calibration) that ignore rainfall history corruption and pseudo-likelihood methods (e.g., GLUE) do not explicitly characterise data and model errors. It is precisely data and model errors that are responsible for the need for calibration in the first place; we expect that understanding these errors will force fundamental shifts in the model

  6. Integrating Bayesian variable selection with Modular Response Analysis to infer biochemical network topology. (United States)

    Santra, Tapesh; Kolch, Walter; Kholodenko, Boris N


    Recent advancements in genetics and proteomics have led to the acquisition of large quantitative data sets. However, the use of these data to reverse engineer biochemical networks has remained a challenging problem. Many methods have been proposed to infer biochemical network topologies from different types of biological data. Here, we focus on unraveling network topologies from steady state responses of biochemical networks to successive experimental perturbations. We propose a computational algorithm which combines a deterministic network inference method termed Modular Response Analysis (MRA) and a statistical model selection algorithm called Bayesian Variable Selection, to infer functional interactions in cellular signaling pathways and gene regulatory networks. It can be used to identify interactions among individual molecules involved in a biochemical pathway or reveal how different functional modules of a biological network interact with each other to exchange information. In cases where not all network components are known, our method reveals functional interactions which are not direct but correspond to the interaction routes through unknown elements. Using computer simulated perturbation responses of signaling pathways and gene regulatory networks from the DREAM challenge, we demonstrate that the proposed method is robust against noise and scalable to large networks. We also show that our method can infer network topologies using incomplete perturbation datasets. Consequently, we have used this algorithm to explore the ERBB regulated G1/S transition pathway in certain breast cancer cells to understand the molecular mechanisms which cause these cells to become drug resistant. The algorithm successfully inferred many well characterized interactions of this pathway by analyzing experimentally obtained perturbation data. Additionally, it identified some molecular interactions which promote drug resistance in breast cancer cells. The proposed algorithm

  7. A method of spherical harmonic analysis in the geosciences via hierarchical Bayesian inference (United States)

    Muir, J. B.; Tkalčić, H.


    The problem of decomposing irregular data on the sphere into a set of spherical harmonics is common in many fields of geosciences where it is necessary to build a quantitative understanding of a globally varying field. For example, in global seismology, a compressional or shear wave speed that emerges from tomographic images is used to interpret current state and composition of the mantle, and in geomagnetism, secular variation of magnetic field intensity measured at the surface is studied to better understand the changes in the Earth's core. Optimization methods are widely used for spherical harmonic analysis of irregular data, but they typically do not treat the dependence of the uncertainty estimates on the imposed regularization. This can cause significant difficulties in interpretation, especially when the best-fit model requires more variables as a result of underestimating data noise. Here, with the above limitations in mind, the problem of spherical harmonic expansion of irregular data is treated within the hierarchical Bayesian framework. The hierarchical approach significantly simplifies the problem by removing the need for regularization terms and user-supplied noise estimates. The use of the corrected Akaike Information Criterion for picking the optimal maximum degree of spherical harmonic expansion and the resulting spherical harmonic analyses are first illustrated on a noisy synthetic data set. Subsequently, the method is applied to two global data sets sensitive to the Earth's inner core and lowermost mantle, consisting of PKPab-df and PcP-P differential traveltime residuals relative to a spherically symmetric Earth model. The posterior probability distributions for each spherical harmonic coefficient are calculated via Markov Chain Monte Carlo sampling; the uncertainty obtained for the coefficients thus reflects the noise present in the real data and the imperfections in the spherical harmonic expansion.

  8. K-means cluster analysis and seismicity partitioning for Pakistan (United States)

    Rehman, Khaista; Burton, Paul W.; Weatherill, Graeme A.


    Pakistan and the western Himalaya is a region of high seismic activity located at the triple junction between the Arabian, Eurasian and Indian plates. Four devastating earthquakes have resulted in significant numbers of fatalities in Pakistan and the surrounding region in the past century (Quetta, 1935; Makran, 1945; Pattan, 1974 and the recent 2005 Kashmir earthquake). It is therefore necessary to develop an understanding of the spatial distribution of seismicity and the potential seismogenic sources across the region. This forms an important basis for the calculation of seismic hazard; a crucial input in seismic design codes needed to begin to effectively mitigate the high earthquake risk in Pakistan. The development of seismogenic source zones for seismic hazard analysis is driven by both geological and seismotectonic inputs. Despite the many developments in seismic hazard in recent decades, the manner in which seismotectonic information feeds the definition of the seismic source can, in many parts of the world including Pakistan and the surrounding regions, remain a subjective process driven primarily by expert judgment. Whilst much research is ongoing to map and characterise active faults in Pakistan, knowledge of the seismogenic properties of the active faults is still incomplete in much of the region. Consequently, seismicity, both historical and instrumental, remains a primary guide to the seismogenic sources of Pakistan. This study utilises a cluster analysis approach for the purposes of identifying spatial differences in seismicity, which can be utilised to form a basis for delineating seismogenic source regions. An effort is made to examine seismicity partitioning for Pakistan with respect to earthquake database, seismic cluster analysis and seismic partitions in a seismic hazard context. A magnitude homogenous earthquake catalogue has been compiled using various available earthquake data. The earthquake catalogue covers a time span from 1930 to 2007 and

  9. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    Directory of Open Access Journals (Sweden)

    Jessie J Hsu

    Full Text Available One goal of cluster analysis is to sort characteristics into groups (clusters so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes into groups of highly correlated genes that have the same effect on the outcome (recovery. We propose a random effects model where the genes within each group (cluster equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  10. Bayesian Analysis for Risk Assessment of Selected Medical Events in Support of the Integrated Medical Model Effort (United States)

    Gilkey, Kelly M.; Myers, Jerry G.; McRae, Michael P.; Griffin, Elise A.; Kallrui, Aditya S.


    The Exploration Medical Capability project is creating a catalog of risk assessments using the Integrated Medical Model (IMM). The IMM is a software-based system intended to assist mission planners in preparing for spaceflight missions by helping them to make informed decisions about medical preparations and supplies needed for combating and treating various medical events using Probabilistic Risk Assessment. The objective is to use statistical analyses to inform the IMM decision tool with estimated probabilities of medical events occurring during an exploration mission. Because data regarding astronaut health are limited, Bayesian statistical analysis is used. Bayesian inference combines prior knowledge, such as data from the general U.S. population, the U.S. Submarine Force, or the analog astronaut population located at the NASA Johnson Space Center, with observed data for the medical condition of interest. The posterior results reflect the best evidence for specific medical events occurring in flight. Bayes theorem provides a formal mechanism for combining available observed data with data from similar studies to support the quantification process. The IMM team performed Bayesian updates on the following medical events: angina, appendicitis, atrial fibrillation, atrial flutter, dental abscess, dental caries, dental periodontal disease, gallstone disease, herpes zoster, renal stones, seizure, and stroke.

  11. Estimation of expected number of accidents and workforce unavailability through Bayesian population variability analysis and Markov-based model

    International Nuclear Information System (INIS)

    Chagas Moura, Márcio das; Azevedo, Rafael Valença; Droguett, Enrique López; Chaves, Leandro Rego; Lins, Isis Didier


    Occupational accidents pose several negative consequences to employees, employers, environment and people surrounding the locale where the accident takes place. Some types of accidents correspond to low frequency-high consequence (long sick leaves) events, and then classical statistical approaches are ineffective in these cases because the available dataset is generally sparse and contain censored recordings. In this context, we propose a Bayesian population variability method for the estimation of the distributions of the rates of accident and recovery. Given these distributions, a Markov-based model will be used to estimate the uncertainty over the expected number of accidents and the work time loss. Thus, the use of Bayesian analysis along with the Markov approach aims at investigating future trends regarding occupational accidents in a workplace as well as enabling a better management of the labor force and prevention efforts. One application example is presented in order to validate the proposed approach; this case uses available data gathered from a hydropower company in Brazil. - Highlights: • This paper proposes a Bayesian method to estimate rates of accident and recovery. • The model requires simple data likely to be available in the company database. • These results show the proposed model is not too sensitive to the prior estimates.


    Directory of Open Access Journals (Sweden)

    Cristina SUCIU


    Full Text Available Small and medium-sized enterprises (SMEs have had, even in the economic crisis, a major contribution to the achievement of gross domestic product, to create jobs, to increase economic efficiency by stimulating competition through speed of adaptation to conditions and the adoption of new strategies, the ability to adapt to market requirements. Although, at the beginning of the economic crisis in Romania have been suspended or canceled several hundred thousand companies, starting in 2012 it is observed a revival of SMEs. We could say that post crisis period, thanks to measures in support of SMEs, is the beginning of an economic boost of SMEs in Romania. Cluster analysis a multivariate analys is technique, which includes a number of algorithms for classifying objects in to homogeneous groups. Analysis of effectiveness of SMEs from Romania using cluster analysisis a new method of economic analysis which enables an analysis, mathematical methods, regional development of SMEs and increasing their competitiveness.

  13. Cosmological analysis of galaxy clusters surveys in X-rays

    International Nuclear Information System (INIS)

    Clerc, N.


    Clusters of galaxies are the most massive objects in equilibrium in our Universe. Their study allows to test cosmological scenarios of structure formation with precision, bringing constraints complementary to those stemming from the cosmological background radiation, supernovae or galaxies. They are identified through the X-ray emission of their heated gas, thus facilitating their mapping at different epochs of the Universe. This report presents two surveys of galaxy clusters detected in X-rays and puts forward a method for their cosmological interpretation. Thanks to its multi-wavelength coverage extending over 10 sq. deg. and after one decade of expertise, the XMM-LSS allows a systematic census of clusters in a large volume of the Universe. In the framework of this survey, the first part of this report describes the techniques developed to the purpose of characterizing the detected objects. A particular emphasis is placed on the most distant ones (z ≥ 1) through the complementarity of observations in X-ray, optical and infrared bands. Then the X-CLASS survey is fully described. Based on XMM archival data, it provides a new catalogue of 800 clusters detected in X-rays. A cosmological analysis of this survey is performed thanks to 'CR-HR' diagrams. This new method self-consistently includes selection effects and scaling relations and provides a means to bypass the computation of individual cluster masses. Propositions are made for applying this method to future surveys as XMM-XXL and eRosita. (author) [fr

  14. A Framework for the Statistical Analysis of Probability of Mission Success Based on Bayesian Theory (United States)


    regression, and two non-probabilistic methods, fuzzy logic and neural networks, are discussed and compared below to determine which gives the best...2 2.3 Fuzzy Logic ...accurate measure than possibilistic methods, such as fuzzy logic discussed below [5]. Bayesian inference easily accounts for subjectivity and

  15. Joint Bayesian Analysis of Parameters and States in Nonlinear, Non-Gaussian State Space Models

    NARCIS (Netherlands)

    Barra, I.; Hoogerheide, L.F.; Koopman, S.J.; Lucas, A.


    We propose a new methodology for designing flexible proposal densities for the joint posterior density of parameters and states in a nonlinear, non-Gaussian state space model. We show that a highly efficient Bayesian procedure emerges when these proposal densities are used in an independent

  16. Bayesian networks for multivariate data analysis and prognostic modelling in cardiac surgery

    NARCIS (Netherlands)

    Peek, Niels; Verduijn, Marion; Rosseel, Peter M. J.; de Jonge, Evert; de Mol, Bas A.


    Prognostic models are tools to predict the outcome of disease and disease treatment. These models are traditionally built with supervised machine learning techniques, and consider prognosis as a static, one-shot activity. This paper presents a new type of prognostic model that builds on the Bayesian

  17. Fuzzy cluster analysis of air quality in Beijing district (United States)

    Liu, Hongkai


    The principle of fuzzy clustering analysis is applied in this article, by using the method of transitive closure, the main air pollutants in 17 districts of Beijing from 2014 to 2016 were classified. The results of the analysis reflects the nearly three year’s changes of the main air pollutants in Beijing. This can provide the scientific for atmospheric governance in the Beijing area and digital support.

  18. Bayesian road safety analysis: incorporation of past evidence and effect of hyper-prior choice. (United States)

    Miranda-Moreno, Luis F; Heydari, Shahram; Lord, Dominique; Fu, Liping


    This paper aims to address two related issues when applying hierarchical Bayesian models for road safety analysis, namely: (a) how to incorporate available information from previous studies or past experiences in the (hyper) prior distributions for model parameters and (b) what are the potential benefits of incorporating past evidence on the results of a road safety analysis when working with scarce accident data (i.e., when calibrating models with crash datasets characterized by a very low average number of accidents and a small number of sites). A simulation framework was developed to evaluate the performance of alternative hyper-priors including informative and non-informative Gamma, Pareto, as well as Uniform distributions. Based on this simulation framework, different data scenarios (i.e., number of observations and years of data) were defined and tested using crash data collected at 3-legged rural intersections in California and crash data collected for rural 4-lane highway segments in Texas. This study shows how the accuracy of model parameter estimates (inverse dispersion parameter) is considerably improved when incorporating past evidence, in particular when working with the small number of observations and crash data with low mean. The results also illustrates that when the sample size (more than 100 sites) and the number of years of crash data is relatively large, neither the incorporation of past experience nor the choice of the hyper-prior distribution may affect the final results of a traffic safety analysis. As a potential solution to the problem of low sample mean and small sample size, this paper suggests some practical guidance on how to incorporate past evidence into informative hyper-priors. By combining evidence from past studies and data available, the model parameter estimates can significantly be improved. The effect of prior choice seems to be less important on the hotspot identification. The results show the benefits of incorporating prior

  19. Bayesian uncertainty analysis for complex systems biology models: emulation, global parameter searches and evaluation of gene functions. (United States)

    Vernon, Ian; Liu, Junli; Goldstein, Michael; Rowe, James; Topping, Jen; Lindsey, Keith


    Many mathematical models have now been employed across every area of systems biology. These models increasingly involve large numbers of unknown parameters, have complex structure which can result in substantial evaluation time relative to the needs of the analysis, and need to be compared to observed data of various forms. The correct analysis of such models usually requires a global parameter search, over a high dimensional parameter space, that incorporates and respects the most important sources of uncertainty. This can be an extremely difficult task, but it is essential for any meaningful inference or prediction to be made about any biological system. It hence represents a fundamental challenge for the whole of systems biology. Bayesian statistical methodology for the uncertainty analysis of complex models is introduced, which is designed to address the high dimensional global parameter search problem. Bayesian emulators that mimic the systems biology model but which are extremely fast to evaluate are embeded within an iterative history match: an efficient method to search high dimensional spaces within a more formal statistical setting, while incorporating major sources of uncertainty. The approach is demonstrated via application to a model of hormonal crosstalk in Arabidopsis root development, which has 32 rate parameters, for which we identify the sets of rate parameter values that lead to acceptable matches between model output and observed trend data. The multiple insights into the model's structure that this analysis provides are discussed. The methodology is applied to a second related model, and the biological consequences of the resulting comparison, including the evaluation of gene functions, are described. Bayesian uncertainty analysis for complex models using both emulators and history matching is shown to be a powerful technique that can greatly aid the study of a large class of systems biology models. It both provides insight into model behaviour

  20. DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab

    Directory of Open Access Journals (Sweden)

    Alexander Chailytko


    Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.

  1. Visual Analysis and Processing of Clusters Structures in Multidimensional Datasets (United States)

    Bondarev, A. E.


    The article is devoted to problems of visual analysis of clusters structures for a multidimensional datasets. For visual analyzing an approach of elastic maps design [1,2] is applied. This approach is quite suitable for processing and visualizing of multidimensional datasets. To analyze clusters in original data volume the elastic maps are used as the methods of original data points mapping to enclosed manifolds having less dimensionality. Diminishing the elasticity parameters one can design map surface which approximates the multidimensional dataset in question much better. Then the points of dataset in question are projected to the map. The extension of designed map to a flat plane allows one to get an insight about the cluster structure of multidimensional dataset. The approach of elastic maps does not require any a priori information about data in question and does not depend on data nature, data origin, etc. Elastic maps are usually combined with PCA approach. Being presented in the space based on three first principal components the elastic maps provide quite good results. The article describes the results of elastic maps approach application to visual analysis of clusters for different multidimensional datasets including medical data.

  2. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  3. Mobility in Europe: Recent Trends from a Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ioana Manafi


    Full Text Available During the past decade, Europe was confronted with major changes and events offering large opportunities for mobility. The EU enlargement process, the EU policies regarding youth, the economic crisis affecting national economies on different levels, political instabilities in some European countries, high rates of unemployment or the increasing number of refugees are only a few of the factors influencing net migration in Europe. Based on a set of socio-economic indicators for EU/EFTA countries and cluster analysis, the paper provides an overview of regional differences across European countries, related to migration magnitude in the identified clusters. The obtained clusters are in accordance with previous studies in migration, and appear stable during the period of 2005-2013, with only some exceptions. The analysis revealed three country clusters: EU/EFTA center-receiving countries, EU/EFTA periphery-sending countries and EU/EFTA outlier countries, the names suggesting not only the geographical position within Europe, but the trends in net migration flows during the years. Therewith, the results provide evidence for the persistence of a movement from periphery to center countries, which is correlated with recent flows of mobility in Europe.

  4. The Productivity Analysis of Chennai Automotive Industry Cluster (United States)

    Bhaskaran, E.


    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  5. Kinematic gait patterns in healthy runners: A hierarchical cluster analysis. (United States)

    Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed


    Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (Pgait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam


    Full Text Available Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes.Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method.The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001.The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  7. The Quantitative Analysis of Chennai Automotive Industry Cluster (United States)

    Bhaskaran, Ethirajan


    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  8. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches. (United States)

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C


    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  9. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin


    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  10. Bayesian community detection. (United States)

    Mørup, Morten; Schmidt, Mikkel N


    Many networks of scientific interest naturally decompose into clusters or communities with comparatively fewer external than internal links; however, current Bayesian models of network communities do not exert this intuitive notion of communities. We formulate a nonparametric Bayesian model for community detection consistent with an intuitive definition of communities and present a Markov chain Monte Carlo procedure for inferring the community structure. A Matlab toolbox with the proposed inference procedure is available for download. On synthetic and real networks, our model detects communities consistent with ground truth, and on real networks, it outperforms existing approaches in predicting missing links. This suggests that community structure is an important structural property of networks that should be explicitly modeled.

  11. Apples and oranges: avoiding different priors in Bayesian DNA sequence analysis

    Directory of Open Access Journals (Sweden)

    Posch Stefan


    Full Text Available Abstract Background One of the challenges of bioinformatics remains the recognition of short signal sequences in genomic DNA such as donor or acceptor splice sites, splicing enhancers or silencers, translation initiation sites, transcription start sites, transcription factor binding sites, nucleosome binding sites, miRNA binding sites, or insulator binding sites. During the last decade, a wealth of algorithms for the recognition of such DNA sequences has been developed and compared with the goal of improving their performance and to deepen our understanding of the underlying cellular processes. Most of these algorithms are based on statistical models belonging to the family of Markov random fields such as position weight matrix models, weight array matrix models, Markov models of higher order, or moral Bayesian networks. While in many comparative studies different learning principles or different statistical models have been compared, the influence of choosing different prior distributions for the model parameters when using different learning principles has been overlooked, and possibly lead to questionable conclusions. Results With the goal of allowing direct comparisons of different learning principles for models from the family of Markov random fields based on the same a-priori information, we derive a generalization of the commonly-used product-Dirichlet prior. We find that the derived prior behaves like a Gaussian prior close to the maximum and like a Laplace prior in the far tails. In two case studies, we illustrate the utility of the derived prior for a direct comparison of different learning principles with different models for the recognition of binding sites of the transcription factor Sp1 and human donor splice sites. Conclusions We find that comparisons of different learning principles using the same a-priori information can lead to conclusions different from those of previous studies in which the effect resulting from different

  12. A Bayesian ridge regression analysis of congestion's impact on urban expressway safety. (United States)

    Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung


    With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the

  13. Neglected chaos in international stock markets: Bayesian analysis of the joint return-volatility dynamical system (United States)

    Tsionas, Mike G.; Michaelides, Panayotis G.


    We use a novel Bayesian inference procedure for the Lyapunov exponent in the dynamical system of returns and their unobserved volatility. In the dynamical system, computation of largest Lyapunov exponent by traditional methods is impossible as the stochastic nature has to be taken explicitly into account due to unobserved volatility. We apply the new techniques to daily stock return data for a group of six countries, namely USA, UK, Switzerland, Netherlands, Germany and France, from 2003 to 2014, by means of Sequential Monte Carlo for Bayesian inference. The evidence points to the direction that there is indeed noisy chaos both before and after the recent financial crisis. However, when a much simpler model is examined where the interaction between returns and volatility is not taken into consideration jointly, the hypothesis of chaotic dynamics does not receive much support by the data ("neglected chaos").

  14. A Bayesian stochastic frontier analysis of Chinese fossil-fuel electricity generation companies

    International Nuclear Information System (INIS)

    Chen, Zhongfei; Barros, Carlos Pestana; Borges, Maria Rosa


    This paper analyses the technical efficiency of Chinese fossil-fuel electricity generation companies from 1999 to 2011, using a Bayesian stochastic frontier model. The results reveal that efficiency varies among the fossil-fuel electricity generation companies that were analysed. We also focus on the factors of size, location, government ownership and mixed sources of electricity generation for the fossil-fuel electricity generation companies, and also examine their effects on the efficiency of these companies. Policy implications are derived. - Highlights: • We analyze the efficiency of 27 quoted Chinese fossil-fuel electricity generation companies during 1999–2011. • We adopt a Bayesian stochastic frontier model taking into consideration the identified heterogeneity. • With reform background in Chinese energy industry, we propose four hypotheses and check their influence on efficiency. • Big size, coastal location, government control and hydro energy sources all have increased costs

  15. Time independent seismic hazard analysis of Greece deduced from Bayesian statistics

    Directory of Open Access Journals (Sweden)

    T. M. Tsapanos


    Full Text Available A Bayesian statistics approach is applied in the seismogenic sources of Greece and the surrounding area in order to assess seismic hazard, assuming that the earthquake occurrence follows the Poisson process. The Bayesian approach applied supplies the probability that a certain cut-off magnitude of Ms = 6.0 will be exceeded in time intervals of 10, 20 and 75 years. We also produced graphs which present the different seismic hazard in the seismogenic sources examined in terms of varying probability which is useful for engineering and civil protection purposes, allowing the designation of priority sources for earthquake-resistant design. It is shown that within the above time intervals the seismogenic source (4 called Igoumenitsa (in NW Greece and west Albania has the highest probability to experience an earthquake with magnitude M > 6.0. High probabilities are found also for Ochrida (source 22, Samos (source 53 and Chios (source 56.

  16. Bus Route Design with a Bayesian Network Analysis of Bus Service Revenues

    Directory of Open Access Journals (Sweden)

    Yi Liu


    Full Text Available A Bayesian network is used to estimate revenues of bus services in consideration of the effect of bus travel demands, passenger transport distances, and so on. In this research, the area X in Beijing has been selected as the study area because of its relatively high bus travel demand and, on the contrary, unsatisfactory bus services. It is suggested that the proposed Bayesian network approach is able to rationally predict the probabilities of different revenues of various route services, from the perspectives of both satisfying passenger demand and decreasing bus operation cost. This way, the existing bus routes in the studied area can be optimized for their most probable high revenues.


    KAUST Repository

    Prudencio, Ernesto


    In recent years, Bayesian model updating techniques based on measured data have been applied to many engineering and applied science problems. At the same time, parallel computational platforms are becoming increasingly more powerful and are being used more frequently by the engineering and scientific communities. Bayesian techniques usually require the evaluation of multi-dimensional integrals related to the posterior probability density function (PDF) of uncertain model parameters. The fact that such integrals cannot be computed analytically motivates the research of stochastic simulation methods for sampling posterior PDFs. One such algorithm is the adaptive multilevel stochastic simulation algorithm (AMSSA). In this paper we discuss the parallelization of AMSSA, formulating the necessary load balancing step as a binary integer programming problem. We present a variety of results showing the effectiveness of load balancing on the overall performance of AMSSA in a parallel computational environment.

  18. An analysis on operational risk in international banking: A Bayesian approach (2007–2011

    Directory of Open Access Journals (Sweden)

    José Francisco Martínez-Sánchez


    Full Text Available This study aims to develop a Bayesian methodology to identify, quantify and measure operational risk in several business lines of commercial banking. To do this, a Bayesian network (BN model is designed with prior and subsequent distributions to estimate the frequency and severity. Regarding the subsequent distributions, an inference procedure for the maximum expected loss, for a period of 20 days, is carried out by using the Monte Carlo simulation method. The business lines analyzed are marketing and sales, retail banking and private banking, which all together accounted for 88.5% of the losses in 2011. Data was obtained for the period 2007–2011 from the Riskdata Operational Exchange Association (ORX, and external data was provided from qualified experts to complete the missing records or to improve its poor quality.

  19. Spatial Intensity Duration Frequency Relationships Using Hierarchical Bayesian Analysis for Urban Areas (United States)

    Rupa, Chandra; Mujumdar, Pradeep


    In urban areas, quantification of extreme precipitation is important in the design of storm water drains and other infrastructure. Intensity Duration Frequency (IDF) relationships are generally used to obtain design return level for a given duration and return period. Due to lack of availability of extreme precipitation data for sufficiently large number of years, estimating the probability of extreme events is difficult. Typically, a single station data is used to obtain the design return levels for various durations and return periods, which are used in the design of urban infrastructure for the entire city. In an urban setting, the spatial variation of precipitation can be high; the precipitation amounts and patterns often vary within short distances of less than 5 km. Therefore it is crucial to study the uncertainties in the spatial variation of return levels for various durations. In this work, the extreme precipitation is modeled spatially using the Bayesian hierarchical analysis and the spatial variation of return levels is studied. The analysis is carried out with Block Maxima approach for defining the extreme precipitation, using Generalized Extreme Value (GEV) distribution for Bangalore city, Karnataka state, India. Daily data for nineteen stations in and around Bangalore city is considered in the study. The analysis is carried out for summer maxima (March - May), monsoon maxima (June - September) and the annual maxima rainfall. In the hierarchical analysis, the statistical model is specified in three layers. The data layer models the block maxima, pooling the extreme precipitation from all the stations. In the process layer, the latent spatial process characterized by geographical and climatological covariates (lat-lon, elevation, mean temperature etc.) which drives the extreme precipitation is modeled and in the prior level, the prior distributions that govern the latent process are modeled. Markov Chain Monte Carlo (MCMC) algorithm (Metropolis Hastings


    Directory of Open Access Journals (Sweden)

    Ghausia Masood Gilani


    Full Text Available Sometimes it may be difficult for a panelist to rank or compare more than two objects or treatments at the same time. For this reason, paired comparison method is used. In this study, the Davidson and Beaver (1977 model for paired comparisons with order effects is analyzed through the Bayesian Approach. For this purpose, the posterior means and the posterior modes are compared using the noninformative priors.

  1. The Directional Identification Problem in Bayesian Factor Analysis: An Ex-Post Approach


    Pape, Markus; Aßmann, Christian; Boysen-Hogrefe, Jens


    Due to their well-known indeterminacies, factor models require identifying assumptions to guarantee unique parameter estimates. For Bayesian estimation, these identifying assumptions are usually implemented by imposing constraints on certain model parameters. This strategy, however, may result in posterior distributions with shapes that depend on the ordering of cross-sections in the data set. We propose an alternative approach, which relies on a sampler without the usual identifying constrai...

  2. Spatial prediction of N2O emissions in pasture: a Bayesian model averaging analysis.

    Directory of Open Access Journals (Sweden)

    Xiaodong Huang

    Full Text Available Nitrous oxide (N2O is one of the greenhouse gases that can contribute to global warming. Spatial variability of N2O can lead to large uncertainties in prediction. However, previous studies have often ignored the spatial dependency to quantify the N2O - environmental factors relationships. Few researches have examined the impacts of various spatial correlation structures (e.g. independence, distance-based and neighbourhood based on spatial prediction of N2O emissions. This study aimed to assess the impact of three spatial correlation structures on spatial predictions and calibrate the spatial prediction using Bayesian model averaging (BMA based on replicated, irregular point-referenced data. The data were measured in 17 chambers randomly placed across a 271 m(2 field between October 2007 and September 2008 in the southeast of Australia. We used a Bayesian geostatistical model and a Bayesian spatial conditional autoregressive (CAR model to investigate and accommodate spatial dependency, and to estimate the effects of environmental variables on N2O emissions across the study site. We compared these with a Bayesian regression model with independent errors. The three approaches resulted in different derived maps of spatial prediction of N2O emissions. We found that incorporating spatial dependency in the model not only substantially improved predictions of N2O emission from soil, but also better quantified uncertainties of soil parameters in the study. The hybrid model structure obtained by BMA improved the accuracy of spatial prediction of N2O emissions across this study region.

  3. Bayesian networks and statistical analysis application to analyze the diagnostic test accuracy (United States)

    Orzechowski, P.; Makal, Jaroslaw; Onisko, A.


    The computer aided BPH diagnosis system based on Bayesian network is described in the paper. First result are compared to a given statistical method. Different statistical methods are used successfully in medicine for years. However, the undoubted advantages of probabilistic methods make them useful in application in newly created systems which are frequent in medicine, but do not have full and competent knowledge. The article presents advantages of the computer aided BPH diagnosis system in clinical practice for urologists.

  4. Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis. (United States)

    Sharpe, J Danielle; Hopkins, Richard S; Cook, Robert L; Striley, Catherine W


    Traditional influenza surveillance relies on influenza-like illness (ILI) syndrome that is reported by health care providers. It primarily captures individuals who seek medical care and misses those who do not. Recently, Web-based data sources have been studied for application to public health surveillance, as there is a growing number of people who search, post, and tweet about their illnesses before seeking medical care. Existing research has shown some promise of using data from Google, Twitter, and Wikipedia to complement traditional surveillance for ILI. However, past studies have evaluated these Web-based sources individually or dually without comparing all 3 of them, and it would be beneficial to know which of the Web-based sources performs best in order to be considered to complement traditional methods. The objective of this study is to comparatively analyze Google, Twitter, and Wikipedia by examining which best corresponds with Centers for Disease Control and Prevention (CDC) ILI data. It was hypothesized that Wikipedia will best correspond with CDC ILI data as previous research found it to be least influenced by high media coverage in comparison with Google and Twitter. Publicly available, deidentified data were collected from the CDC, Google Flu Trends, HealthTweets, and Wikipedia for the 2012-2015 influenza seasons. Bayesian change point analysis was used to detect seasonal changes, or change points, in each of the data sources. Change points in Google, Twitter, and Wikipedia that occurred during the exact week, 1 preceding week, or 1 week after the CDC's change points were compared with the CDC data as the gold standard. All analyses were conducted using the R package "bcp" version 4.0.0 in RStudio version 0.99.484 (RStudio Inc). In addition, sensitivity and positive predictive values (PPV) were calculated for Google, Twitter, and Wikipedia. During the 2012-2015 influenza seasons, a high sensitivity of 92% was found for Google, whereas the PPV for

  5. An imprecise Dirichlet model for Bayesian analysis of failure data including right-censored observations

    International Nuclear Information System (INIS)

    Coolen, F.P.A.


    This paper is intended to make researchers in reliability theory aware of a recently introduced Bayesian model with imprecise prior distributions for statistical inference on failure data, that can also be considered as a robust Bayesian model. The model consists of a multinomial distribution with Dirichlet priors, making the approach basically nonparametric. New results for the model are presented, related to right-censored observations, where estimation based on this model is closely related to the product-limit estimator, which is an important statistical method to deal with reliability or survival data including right-censored observations. As for the product-limit estimator, the model considered in this paper aims at not using any information other than that provided by observed data, but our model fits into the robust Bayesian context which has the advantage that all inferences can be based on probabilities or expectations, or bounds for probabilities or expectations. The model uses a finite partition of the time-axis, and as such it is also related to life-tables

  6. Bayesian analysis of the astrobiological implications of life's early emergence on Earth. (United States)

    Spiegel, David S; Turner, Edwin L


    Life arose on Earth sometime in the first few hundred million years after the young planet had cooled to the point that it could support water-based organisms on its surface. The early emergence of life on Earth has been taken as evidence that the probability of abiogenesis is high, if starting from young Earth-like conditions. We revisit this argument quantitatively in a bayesian statistical framework. By constructing a simple model of the probability of abiogenesis, we calculate a bayesian estimate of its posterior probability, given the data that life emerged fairly early in Earth's history and that, billions of years later, curious creatures noted this fact and considered its implications. We find that, given only this very limited empirical information, the choice of bayesian prior for the abiogenesis probability parameter has a dominant influence on the computed posterior probability. Although terrestrial life's early emergence provides evidence that life might be abundant in the universe if early-Earth-like conditions are common, the evidence is inconclusive and indeed is consistent with an arbitrarily low intrinsic probability of abiogenesis for plausible uninformative priors. Finding a single case of life arising independently of our lineage (on Earth, elsewhere in the solar system, or on an extrasolar planet) would provide much stronger evidence that abiogenesis is not extremely rare in the universe.

  7. Bayesian Inference on Gravitational Waves

    Directory of Open Access Journals (Sweden)

    Asad Ali


    Full Text Available The Bayesian approach is increasingly becoming popular among the astrophysics data analysis communities. However, the Pakistan statistics communities are unaware of this fertile interaction between the two disciplines. Bayesian methods have been in use to address astronomical problems since the very birth of the Bayes probability in eighteenth century. Today the Bayesian methods for the detection and parameter estimation of gravitational waves have solid theoretical grounds with a strong promise for the realistic applications. This article aims to introduce the Pakistan statistics communities to the applications of Bayesian Monte Carlo methods in the analysis of gravitational wave data with an  overview of the Bayesian signal detection and estimation methods and demonstration by a couple of simplified examples.

  8. Poisson cluster analysis of cardiac arrest incidence in Columbus, Ohio. (United States)

    Warden, Craig; Cudnik, Michael T; Sasson, Comilla; Schwartz, Greg; Semple, Hugh


    Scarce resources in disease prevention and emergency medical services (EMS) need to be focused on high-risk areas of out-of-hospital cardiac arrest (OHCA). Cluster analysis using geographic information systems (GISs) was used to find these high-risk areas and test potential predictive variables. This was a retrospective cohort analysis of EMS-treated adults with OHCAs occurring in Columbus, Ohio, from April 1, 2004, through March 31, 2009. The OHCAs were aggregated to census tracts and incidence rates were calculated based on their adult populations. Poisson cluster analysis determined significant clusters of high-risk census tracts. Both census tract-level and case-level characteristics were tested for association with high-risk areas by multivariate logistic regression. A total of 2,037 eligible OHCAs occurred within the city limits during the study period. The mean incidence rate was 0.85 OHCAs/1,000 population/year. There were five significant geographic clusters with 76 high-risk census tracts out of the total of 245 census tracts. In the case-level analysis, being in a high-risk cluster was associated with a slightly younger age (-3 years, adjusted odds ratio [OR] 0.99, 95% confidence interval [CI] 0.99-1.00), not being white, non-Hispanic (OR 0.54, 95% CI 0.45-0.64), cardiac arrest occurring at home (OR 1.53, 95% CI 1.23-1.71), and not receiving bystander cardiopulmonary resuscitation (CPR) (OR 0.77, 95% CI 0.62-0.96), but with higher survival to hospital discharge (OR 1.78, 95% CI 1.30-2.46). In the census tract-level analysis, high-risk census tracts were also associated with a slightly lower average age (-0.1 years, OR 1.14, 95% CI 1.06-1.22) and a lower proportion of white, non-Hispanic patients (-0.298, OR 0.04, 95% CI 0.01-0.19), but also a lower proportion of high-school graduates (-0.184, OR 0.00, 95% CI 0.00-0.00). This analysis identified high-risk census tracts and associated census tract-level and case-level characteristics that can be used to

  9. Fuzzy cluster analysis of high-field functional MRI data. (United States)

    Windischberger, Christian; Barth, Markus; Lamm, Claus; Schroeder, Lee; Bauer, Herbert; Gur, Ruben C; Moser, Ewald


    Functional magnetic resonance imaging (fMRI) based on blood-oxygen level dependent (BOLD) contrast today is an established brain research method and quickly gains acceptance for complementary clinical diagnosis. However, neither the basic mechanisms like coupling between neuronal activation and haemodynamic response are known exactly, nor can the various artifacts be predicted or controlled. Thus, modeling functional signal changes is non-trivial and exploratory data analysis (EDA) may be rather useful. In particular, identification and separation of artifacts as well as quantification of expected, i.e. stimulus correlated, and novel information on brain activity is important for both, new insights in neuroscience and future developments in functional MRI of the human brain. After an introduction on fuzzy clustering and very high-field fMRI we present several examples where fuzzy cluster analysis (FCA) of fMRI time series helps to identify and locally separate various artifacts. We also present and discuss applications and limitations of fuzzy cluster analysis in very high-field functional MRI: differentiate temporal patterns in MRI using (a) a test object with static and dynamic parts, (b) artifacts due to gross head motion artifacts. Using a synthetic fMRI data set we quantitatively examine the influences of relevant FCA parameters on clustering results in terms of receiver-operator characteristics (ROC) and compare them with a commonly used model-based correlation analysis (CA) approach. The application of FCA in analyzing in vivo fMRI data is shown for (a) a motor paradigm, (b) data from multi-echo imaging, and (c) a fMRI study using mental rotation of three-dimensional cubes. We found that differentiation of true "neural" from false "vascular" activation is possible based on echo time dependence and specific activation levels, as well as based on their signal time-course. Exploratory data analysis methods in general and fuzzy cluster analysis in particular may

  10. Risk Analysis on Leakage Failure of Natural Gas Pipelines by Fuzzy Bayesian Network with a Bow-Tie Model

    Directory of Open Access Journals (Sweden)

    Xian Shan


    Full Text Available Pipeline is the major mode of natural gas transportation. Leakage of natural gas pipelines may cause explosions and fires, resulting in casualties, environmental damage, and material loss. Efficient risk analysis is of great significance for preventing and mitigating such potential accidents. The objective of this study is to present a practical risk assessment method based on Bow-tie model and Bayesian network for risk analysis of natural gas pipeline leakage. Firstly, identify the potential risk factors and consequences of the failure. Then construct the Bow-tie model, use the quantitative analysis of Bayesian network to find the weak links in the system, and make a prediction of the control measures to reduce the rate of the accident. In order to deal with the uncertainty existing in the determination of the probability of basic events, fuzzy logic method is used. Results of a case study show that the most likely causes of natural gas pipeline leakage occurrence are parties ignore signage, implicit signage, overload, and design defect of auxiliaries. Once the leakage occurs, it is most likely to result in fire and explosion. Corresponding measures taken on time will reduce the disaster degree of accidents to the least extent.

  11. Performance Based Clustering for Benchmarking of Container Ports: an Application of Dea and Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Jie Wu


    Full Text Available The operational performance of container ports has received more and more attentions in both academic and practitioner circles, the performance evaluation and process improvement of container ports have also been the focus of several studies. In this paper, Data Envelopment Analysis (DEA, an effective tool for relative efficiency assessment, is utilized for measuring the performances and benchmarking of the 77 world container ports in 2007. The used approaches in the current study consider four inputs (Capacity of Cargo Handling Machines, Number of Berths, Terminal Area and Storage Capacity and a single output (Container Throughput. The results for the efficiency scores are analyzed, and a unique ordering of the ports based on average cross efficiency is provided, also cluster analysis technique is used to select the more appropriate targets for poorly performing ports to use as benchmarks.

  12. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis (United States)

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao


    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  13. Cervical disc arthroplasty for symptomatic cervical disc disease: Traditional and Bayesian meta-analysis with trial sequential analysis. (United States)

    Kan, Shun-Li; Yuan, Zhi-Fang; Ning, Guang-Zhi; Liu, Fei-Fei; Sun, Jing-Cheng; Feng, Shi-Qing


    Cervical disc arthroplasty (CDA) has been designed as a substitute for anterior cervical discectomy and fusion (ACDF) in the treatment of symptomatic cervical disc disease (CDD). Several researchers have compared CDA with ACDF for the treatment of symptomatic CDD; however, the findings of these studies are inconclusive. Using recently published evidence, this meta-analysis was conducted to further verify the benefits and harms of using CDA for treatment of symptomatic CDD. Relevant trials were identified by searching the PubMed, EMBASE, and Cochrane Library databases. Outcomes were reported as odds ratio or standardized mean difference. Both traditional frequentist and Bayesian approaches were used to synthesize evidence within random-effects models. Trial sequential analysis (TSA) was applied to test the robustness of our findings and obtain more conservative estimates. Nineteen trials were included. The findings of this meta-analysis demonstrated better overall, neck disability index (NDI), and neurological success; lower NDI and neck and arm pain scores; higher 36-Item Short Form Health Survey (SF-36) Physical Component Summary (PCS) and Mental Component Summary (MCS) scores; more patient satisfaction; greater range of motion at the operative level; and fewer secondary surgical procedures (all P  0.05). TSA of overall success suggested that the cumulative z-curve crossed both the conventional boundary and the trial sequential monitoring boundary for benefit, indicating sufficient and conclusive evidence had been ascertained. For treating symptomatic CDD, CDA was superior to ACDF in terms of overall, NDI, and neurological success; NDI and neck and arm pain scores; SF-36 PCS and MCS scores; patient satisfaction; ROM at the operative level; and secondary surgical procedures rate. Additionally, there was no significant difference between CDA and ACDF in the rate of adverse events. However, as the CDA procedure is a relatively newer operative technique, long

  14. A Bayesian technique for improving the sensitivity of the atmospheric neutrino L/E analysis

    Energy Technology Data Exchange (ETDEWEB)

    Blake, A. S. T. [Univ. of Cambridge (United Kingdom); Chapman, J. D. [Univ. of Cambridge (United Kingdom); Thomson, M. A. [Univ. of Cambridge (United Kingdom)


    here, a Bayesian technique is used to estimate the Lν/Eν resolution of observed atmospheric neutrinos on an event-by-event basis. By separating the events into bins of Lν/Eν resolution in the oscillation analysis, a significant improvement in oscillation sensitivity can be achieved.

  15. Fractal Segmentation and Clustering Analysis for Seismic Time Slices (United States)

    Ronquillo, G.; Oleschko, K.; Korvin, G.; Arizabalo, R. D.


    Fractal analysis has become part of the standard approach for quantifying texture on gray-tone or colored images. In this research we introduce a multi-stage fractal procedure to segment, classify and measure the clustering patterns on seismic time slices from a 3-D seismic survey. Five fractal classifiers (c1)-(c5) were designed to yield standardized, unbiased and precise measures of the clustering of seismic signals. The classifiers were tested on seismic time slices from the AKAL field, Cantarell Oil Complex, Mexico. The generalized lacunarity (c1), fractal signature (c2), heterogeneity (c3), rugosity of boundaries (c4) and continuity resp. tortuosity (c5) of the clusters are shown to be efficient measures of the time-space variability of seismic signals. The Local Fractal Analysis (LFA) of time slices has proved to be a powerful edge detection filter to detect and enhance linear features, like faults or buried meandering rivers. The local fractal dimensions of the time slices were also compared with the self-affinity dimensions of the corresponding parts of porosity-logs. It is speculated that the spectral dimension of the negative-amplitude parts of the time-slice yields a measure of connectivity between the formation's high-porosity zones, and correlates with overall permeability.

  16. Cluster analysis for DNA methylation profiles having a detection threshold

    Directory of Open Access Journals (Sweden)

    Siegmund Kimberly D


    Full Text Available Abstract Background DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. Results We compare performance of existing methodology (such as k-means with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. Conclusion The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.

  17. Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach

    Directory of Open Access Journals (Sweden)

    Matúš Horváth


    Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.

  18. Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach

    Directory of Open Access Journals (Sweden)

    Matúš Horváth


    Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.

  19. Market analysis of Serbia's raspberry sector and cluster development initiatives

    Directory of Open Access Journals (Sweden)

    Paraušić Vesna


    Full Text Available Authors analyze competitive strength and weakness of raspberry producers in Serbia and propose key prerequisites of which fulfilling will depend develop of successful cluster initiative in Serbian raspberry sector. The research results indicate that Serbian raspberry growers can develop successful cluster and they can keep leading position in the global market of raspberries, only with following many assumptions, like: (a better organized marketing channel through the vertically and horizontal integration of all actors in this sector,(b strengthening specialized cooperatives for raspberry production and associations of raspberry growers, and in the future setting up of producer organizations and associations; (c inclusion of producers of other berries and producers of processed berries; (d introducing innovations, scientific knowledge, and research and development in production, processing, packing, logistics, export of raspberries, etc. An analysis is based on case study in Šumadija and Western Serbia region, which is major region in raspberry production in Serbia.

  20. Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis (United States)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song


    To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.

  1. Application of Bayesian configural frequency analysis (BCFA) to determine characteristics user and non-user motor X (United States)

    Mawardi, Muhamad Iqbal; Padmadisastra, Septiadi; Tantular, Bertho


    Configural Frequency Analysis is a method for cell-wise testing in contingency tables for exploratory search type and antitype, that can see the existence of discrepancy on the model by existence of a significant difference between the frequency of observation and frequency of expectation. This analysis focuses on whether or not the interaction among categories from different variables, and not the interaction among variables. One of the extensions of CFA method is Bayesian CFA, this alternative method pursue the same goal as frequentist version of CFA with the advantage that adjustment of the experiment-wise significance level α is not necessary and test whether groups of types and antitypes form composite types or composite antitypes. Hence, this research will present the concept of the Bayesian CFA and how it works for the real data. The data on this paper is based on case studies in a company about decrease Brand Awareness & Image motor X on Top Of Mind Unit indicator in Cirebon City for user 30.8% and non user 9.8%. From the result of B-CFA have four characteristics from deviation, one of the four characteristics above that is the configuration 2212 need more attention by company to determine promotion strategy to maintain and improve Top Of Mind Unit in Cirebon City.

  2. Joint High-Dimensional Bayesian Variable and Covariance Selection with an Application to eQTL Analysis

    KAUST Repository

    Bhadra, Anindya


    We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. © 2013, The International Biometric Society.

  3. Steady state subchannel analysis of AHWR fuel cluster

    International Nuclear Information System (INIS)

    Dasgupta, A.; Chandraker, D.K.; Vijayan, P.K.; Saha, D.


    Subchannel analysis is a technique used to predict the thermal hydraulic behavior of reactor fuel assemblies. The rod cluster is subdivided into a number of parallel interacting flow subchannels. The conservation equations are solved for each of these subchannels, taking into account subchannel interactions. Subchannel analysis of AHWR D-5 fuel cluster has been carried out to determine the variations in thermal hydraulic conditions of coolant and fuel temperatures along the length of the fuel bundle. The hottest regions within the AHWR fuel bundle have been identified. The effect of creep on the fuel performance has also been studied. MCHFR has been calculated using Jansen-Levy correlation. The calculations have been backed by sensitivity analysis for parameters whose values are not known accurately. The sensitivity analysis showed the calculations to have a very low sensitivity to these parameters. Apart from the analysis, the report also includes a brief introduction of a few subchannel codes. A brief description of the equations and solution methodology used in COBRA-IIIC and COBRA-IV-I is also given. (author)

  4. Uncertainty analysis of pollutant build-up modelling based on a Bayesian weighted least squares approach

    International Nuclear Information System (INIS)

    Haddad, Khaled; Egodawatta, Prasanna; Rahman, Ataur; Goonetilleke, Ashantha


    Reliable pollutant build-up prediction plays a critical role in the accuracy of urban stormwater quality modelling outcomes. However, water quality data collection is resource demanding compared to streamflow data monitoring, where a greater quantity of data is generally available. Consequently, available water quality datasets span only relatively short time scales unlike water quantity data. Therefore, the ability to take due consideration of the variability associated with pollutant processes and natural phenomena is constrained. This in turn gives rise to uncertainty in the modelling outcomes as research has shown that pollutant loadings on catchment surfaces and rainfall within an area can vary considerably over space and time scales. Therefore, the assessment of model uncertainty is an essential element of informed decision making in urban stormwater management. This paper presents the application of a range of regression approaches such as ordinary least squares regression, weighted least squares regression and Bayesian weighted least squares regression for the estimation of uncertainty associated with pollutant build-up prediction using limited datasets. The study outcomes confirmed that the use of ordinary least squares regression with fixed model inputs and limited observational data may not provide realistic estimates. The stochastic nature of the dependent and independent variables need to be taken into consideration in pollutant build-up prediction. It was found that the use of the Bayesian approach along with the Monte Carlo simulation technique provides a powerful tool, which attempts to make the best use of the available knowledge in prediction and thereby presents a practical solution to counteract the limitations which are otherwise imposed on water quality modelling. - Highlights: ► Water quality data spans short time scales leading to significant model uncertainty. ► Assessment of uncertainty essential for informed decision making in water


    Directory of Open Access Journals (Sweden)

    Sheveleva O. A.


    Full Text Available In this paper the interaction between the production macroeconomic indicators of the Russian economy and MIBOR (the main operational benchmark of the Bank of Russia, as well as the relationship between the inflation indicators and money supply were investigated with Bayesian approach. Conjugate Normal Inverse Wishart Prior was used. According to the study, tough monetary policy has a deterrent effect on the Russian economy. The growth of the money market rate causes a reduction in investments and output in the main sectors of the economy, as well as a drop in the income of the population with an increase in the unemployment rate.

  6. A Bayesian approach to the analysis of quantal bioassay studies using nonparametric mixture models. (United States)

    Fronczyk, Kassandra; Kottas, Athanasios


    We develop a Bayesian nonparametric mixture modeling framework for quantal bioassay settings. The approach is built upon modeling dose-dependent response distributions. We adopt a structured nonparametric prior mixture model, which induces a monotonicity restriction for the dose-response curve. Particular emphasis is placed on the key risk assessment goal of calibration for the dose level that corresponds to a specified response. The proposed methodology yields flexible inference for the dose-response relationship as well as for other inferential objectives, as illustrated with two data sets from the literature. © 2013, The International Biometric Society.

  7. Small-signal analysis in high-energy physics: A Bayesian approach

    International Nuclear Information System (INIS)

    Prosper, H.B.


    The statistics of small signals masked by a background of imprecisely known magnitude is addressed from a Bayesian viewpoint using a simple statistical model which may be derived from the principle of maximum entropy. The issue of the correct assignment of prior probabilities is resolved by invoking an invariance principle proposed by Jaynes. We calculate the posterior probability and use it to calculate point estimates and upper limits for the magnitude of the signal. The results are applicable to high-energy physics experiments searching for new phenomena. We illustrate this by reanalyzing some published data from a few experiments

  8. Decision-theoretic analysis of forensic sampling criteria using bayesian decision networks. (United States)

    Biedermann, A; Bozza, S; Garbolino, P; Taroni, F


    Sampling issues represent a topic of ongoing interest to the forensic science community essentially because of their crucial role in laboratory planning and working protocols. For this purpose, forensic literature described thorough (bayesian) probabilistic sampling approaches. These are now widely implemented in practice. They allow, for instance, to obtain probability statements that parameters of interest (e.g., the proportion of a seizure of items that present particular features, such as an illegal substance) satisfy particular criteria (e.g., a threshold or an otherwise limiting value). Currently, there are many approaches that allow one to derive probability statements relating to a population proportion, but questions on how a forensic decision maker--typically a client of a forensic examination or a scientist acting on behalf of a client--ought actually to decide about a proportion or a sample size, remained largely unexplored to date. The research presented here intends to address methodology from decision theory that may help to cope usefully with the wide range of sampling issues typically encountered in forensic science applications. The procedures explored in this paper enable scientists to address a variety of concepts such as the (net) value of sample information, the (expected) value of sample information or the (expected) decision loss. All of these aspects directly relate to questions that are regularly encountered in casework. Besides probability theory and bayesian inference, the proposed approach requires some additional elements from decision theory that may increase the efforts needed for practical implementation. In view of this challenge, the present paper will emphasise the merits of graphical modelling concepts, such as decision trees and bayesian decision networks. These can support forensic scientists in applying the methodology in practice. How this may be achieved is illustrated with several examples. The graphical devices invoked

  9. Qubit feedback and control with kicked quantum nondemolition measurements: A quantum Bayesian analysis (United States)

    Jordan, Andrew N.; Korotkov, Alexander N.


    The informational approach to continuous quantum measurement is derived from positive operator-valued measure formalism for a mesoscopic scattering detector measuring a charge qubit. Quantum Bayesian equations for the qubit density matrix are derived, and cast into the form of a stochastic conformal map. Measurement statistics are derived for kicked quantum nondemolition measurements, combined with conditional unitary operations. These results are applied to derive a feedback protocol to produce an arbitrary pure state after a weak measurement, as well as to investigate how an initially mixed state becomes purified with and without feedback.

  10. Bayesian analysis of spatial point processes in the neighbourhood of Voronoi networks

    DEFF Research Database (Denmark)

    Skare, Øivind; Møller, Jesper; Vedel Jensen, Eva B.

    A model for an inhomogeneous Poisson process with high intensity near the edges of a Voronoi tessellation in 2D or 3D is proposed. The model is analysed in a Bayesian setting with priors on nuclei of the Voronoi tessellation and other model parameters. An MCMC algorithm is constructed to sample...... from the posterior, which contains information about the unobserved Voronoi tessellation and the model parameters. A major element of the MCMC algorithm is the reconstruction of the Voronoi tessellation after a proposed local change of the tessellation. A simulation study and examples of applications...

  11. Bayesian analysis of spatial point processes in the neighbourhood of Voronoi networks

    DEFF Research Database (Denmark)

    Skare, Øivind; Møller, Jesper; Jensen, Eva Bjørn Vedel


    A model for an inhomogeneous Poisson process with high intensity near the edges of a Voronoi tessellation in 2D or 3D is proposed. The model is analysed in a Bayesian setting with priors on nuclei of the Voronoi tessellation and other model parameters. An MCMC algorithm is constructed to sample...... from the posterior, which contains information about the unobserved Voronoi tessellation and the model parameters. A major element of the MCMC algorithm is the reconstruction of the Voronoi tessellation after a proposed local change of the tessellation. A simulation study and examples of applications...

  12. Bayesian analysis of longitudinal Johne's disease diagnostic data without a gold standard test

    DEFF Research Database (Denmark)

    Wang, C.; Turnbull, B.W.; Nielsen, Søren Saxmose


    A Bayesian methodology was developed based on a latent change-point model to evaluate the performance of milk ELISA and fecal culture tests for longitudinal Johne's disease diagnostic data. The situation of no perfect reference test was considered; that is, no “gold standard.” A change-point proc...... an area under the receiver operating characteristic curve (AUC) of 0.984, and is superior to the raw ELISA (AUC = 0.911) and fecal culture (sensitivity = 0.358, specificity = 0.980) tests for Johne's disease diagnosis....

  13. Attributes of GRB pulses: Bayesian blocks analysis of TTE data; a microburst in GRB920229 (United States)

    Scargle, Jeffrey D.; Norris, Jay; Bonnell, Jerry


    Bayesian Blocks is a new time series algorithm for detecting localized structures (spikes or shots), revealing pulse shapes, and generally characterizing intensity variations. It maps raw counting data into a maximum likelihood piecewise constant representation of the underlying signal. This bin-free method imposes no lower limit on measurable time scales. Applied to BATSE TTE data, it reveals the shortest known burst structure-a spike superimposed on the main burst in GRB920229 (BATSE trigger 1453), with rise and decay timescales~few 100 μs.

  14. Bayesian Multi-Trait Analysis Reveals a Useful Tool to Increase Oil Concentration and to Decrease Toxicity in Jatropha curcas L. (United States)

    Silva Junqueira, Vinícius; Azevedo Peixoto, Leonardo de; Galvêas Laviola, Bruno; Lopes Bhering, Leonardo; Mendonça, Simone; Agostini Costa, Tania da Silveira; Antoniassi, Rosemar


    The biggest challenge for jatropha breeding is to identify superior genotypes that present high seed yield and seed oil content with reduced toxicity levels. Therefore, the objective of this study was to estimate genetic parameters for three important traits (weight of 100 seed, oil seed content, and phorbol ester concentration), and to select superior genotypes to be used as progenitors in jatropha breeding. Additionally, the genotypic values and the genetic parameters estimated under the Bayesian multi-trait approach were used to evaluate different selection indices scenarios of 179 half-sib families. Three different scenarios and economic weights were considered. It was possible to simultaneously reduce toxicity and increase seed oil content and weight of 100 seed by using index selection based on genotypic value estimated by the Bayesian multi-trait approach. Indeed, we identified two families that present these characteristics by evaluating genetic diversity using the Ward clustering method, which suggested nine homogenous clusters. Future researches must integrate the Bayesian multi-trait methods with realized relationship matrix, aiming to build accurate selection indices models.

  15. [The hierarchical clustering analysis of hyperspectral image based on probabilistic latent semantic analysis]. (United States)

    Yi, Wen-Bin; Shen, Li; Qi, Yin-Feng; Tang, Hong


    The paper introduces the Probabilistic Latent Semantic Analysis (PLSA) to the image clustering and an effective image clustering algorithm using the semantic information from PLSA is proposed which is used for hyperspectral images. Firstly, the ISODATA algorithm is used to obtain the initial clustering result of hyperspectral image and the clusters of the initial clustering result are considered as the visual words of the PLSA. Secondly, the object-oriented image segmentation algorithm is used to partition the hyperspectral image and segments with relatively pure pixels are regarded as documents in PLSA. Thirdly, a variety of identification methods which can estimate the best number of cluster centers is combined to get the number of latent semantic topics. Then the conditional distributions of visual words in topics and the mixtures of topics in different documents are estimated by using PLSA. Finally, the conditional probabilistic of latent semantic topics are distinguished using statistical pattern recognition method, the topic type for each visual in each document will be given and the clustering result of hyperspectral image are then achieved. Experimental results show the clusters of the proposed algorithm are better than K-MEANS and ISODATA in terms of object-oriented property and the clustering result is closer to the distribution of real spatial distribution of surface.

  16. Segmentation of Residential Gas Consumers Using Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Marta P. Fernandes


    Full Text Available The growing environmental concerns and liberalization of energy markets have resulted in an increased competition between utilities and a strong focus on efficiency. To develop new energy efficiency measures and optimize operations, utilities seek new market-related insights and customer engagement strategies. This paper proposes a clustering-based methodology to define the segmentation of residential gas consumers. The segments of gas consumers are obtained through a detailed clustering analysis using smart metering data. Insights are derived from the segmentation, where the segments result from the clustering process and are characterized based on the consumption profiles, as well as according to information regarding consumers’ socio-economic and household key features. The study is based on a sample of approximately one thousand households over one year. The representative load profiles of consumers are essentially characterized by two evident consumption peaks, one in the morning and the other in the evening, and an off-peak consumption. Significant insights can be derived from this methodology regarding typical consumption curves of the different segments of consumers in the population. This knowledge can assist energy utilities and policy makers in the development of consumer engagement strategies, demand forecasting tools and in the design of more sophisticated tariff systems.

  17. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering. (United States)

    Wu, Lingxiang; Chen, Xiujie; Zhang, Denan; Zhang, Wubing; Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing


    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample.

  18. Analysis of Learning Development With Sugeno Fuzzy Logic And Clustering

    Directory of Open Access Journals (Sweden)

    Maulana Erwin Saputra


    Full Text Available In the first journal, I made this attempt to analyze things that affect the achievement of students in each school of course vary. Because students are one of the goals of achieving the goals of successful educational organizations. The mental influence of students’ emotions and behaviors themselves in relation to learning performance. Fuzzy logic can be used in various fields as well as Clustering for grouping, as in Learning Development analyzes. The process will be performed on students based on the symptoms that exist. In this research will use fuzzy logic and clustering. Fuzzy is an uncertain logic but its excess is capable in the process of language reasoning so that in its design is not required complicated mathematical equations. However Clustering method is K-Means method is method where data analysis is broken down by group k (k = 1,2,3, .. k. To know the optimal number of Performance group. The results of the research is with a questionnaire entered into matlab will produce a value that means in generating the graph. And simplify the school in seeing Student performance in the learning process by using certain criteria. So from the system that obtained the results for a decision-making required by the school.

  19. Visualizing dynamical neural assemblies with a fuzzy synchronization clustering analysis. (United States)

    Zhou, Shu; Wu, Yan; Dos Santos, Claudia C


    Phase synchrony has been proposed as a possible communication mechanism between cerebral regions. The participation index method (PIM) may be used to investigate integrating structures within an oscillatory network, based on the eigenvalue decomposition of matrix of bivariate synchronization indices. However, eigenvector orthogonality between clusters may result in categorization difficulties for hub oscillators and pseudoclustering phenomenon. Here, we propose a method of fuzzy synchronization clustering analysis (FSCA) to avoid the constraint of orthogonality by combining the fuzzy c-means algorithm with the phase-locking value. Following mathematical derivation, we cross-validated the FSCA and the PIM using the same multichannel phase time series of event-related EEG from a subject performing a working memory task. Both clustering methods produced consistent findings for the qualitatively salient configuration of the original network-illustrated here by a visualization technique. In contrast to PIM, use of common virtual oscillatory centroids enabled the FSCA to reveal multiple dynamical neural assemblies as well as the unitary phase information within each assembly.

  20. Bayesian optimization analysis of containment-venting operation in a boiling water reactor severe accident

    Energy Technology Data Exchange (ETDEWEB)

    Zheng, Xiaoyu; Ishikawa, Jun; Sugiyama, Tomoyuki; Maryyama, Yu [Nuclear Safety Research Center, Japan Atomic Energy Agency, Ibaraki (Japan)


    Containment venting is one of several essential measures to protect the integrity of the final barrier of a nuclear reactor during severe accidents, by which the uncontrollable release of fission products can be avoided. The authors seek to develop an optimization approach to venting operations, from a simulation-based perspective, using an integrated severe accident code, THALES2/KICHE. The effectiveness of the containment-venting strategies needs to be verified via numerical simulations based on various settings of the venting conditions. The number of iterations, however, needs to be controlled to avoid cumbersome computational burden of integrated codes. Bayesian optimization is an efficient global optimization approach. By using a Gaussian process regression, a surrogate model of the “black-box” code is constructed. It can be updated simultaneously whenever new simulation results are acquired. With predictions via the surrogate model, upcoming locations of the most probable optimum can be revealed. The sampling procedure is adaptive. Compared with the case of pure random searches, the number of code queries is largely reduced for the optimum finding. One typical severe accident scenario of a boiling water reactor is chosen as an example. The research demonstrates the applicability of the Bayesian optimization approach to the design and establishment of containment-venting strategies during severe accidents.