WorldWideScience

Sample records for bayesian clustering analysis

  1. Bayesian Analysis of Multiple Populations in Galactic Globular Clusters

    Science.gov (United States)

    Wagner-Kaiser, Rachel A.; Sarajedini, Ata; von Hippel, Ted; Stenning, David; Piotto, Giampaolo; Milone, Antonino; van Dyk, David A.; Robinson, Elliot; Stein, Nathan

    2016-01-01

    We use GO 13297 Cycle 21 Hubble Space Telescope (HST) observations and archival GO 10775 Cycle 14 HST ACS Treasury observations of Galactic Globular Clusters to find and characterize multiple stellar populations. Determining how globular clusters are able to create and retain enriched material to produce several generations of stars is key to understanding how these objects formed and how they have affected the structural, kinematic, and chemical evolution of the Milky Way. We employ a sophisticated Bayesian technique with an adaptive MCMC algorithm to simultaneously fit the age, distance, absorption, and metallicity for each cluster. At the same time, we also fit unique helium values to two distinct populations of the cluster and determine the relative proportions of those populations. Our unique numerical approach allows objective and precise analysis of these complicated clusters, providing posterior distribution functions for each parameter of interest. We use these results to gain a better understanding of multiple populations in these clusters and their role in the history of the Milky Way.Support for this work was provided by NASA through grant numbers HST-GO-10775 and HST-GO-13297 from the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS5-26555. This material is based upon work supported by the National Aeronautics and Space Administration under Grant NNX11AF34G issued through the Office of Space Science. This project was supported by the National Aeronautics & Space Administration through the University of Central Florida's NASA Florida Space Grant Consortium.

  2. A Bayesian Analysis of the Ages of Four Open Clusters

    CERN Document Server

    Jeffery, Elizabeth J; van Dyk, David A; Stenning, David C; Robinson, Elliot; Stein, Nathan; Jefferys, W H

    2016-01-01

    In this paper we apply a Bayesian technique to determine the best fit of stellar evolution models to find the main sequence turn off age and other cluster parameters of four intermediate-age open clusters: NGC 2360, NGC 2477, NGC 2660, and NGC 3960. Our algorithm utilizes a Markov chain Monte Carlo technique to fit these various parameters, objectively finding the best-fit isochrone for each cluster. The result is a high-precision isochrone fit. We compare these results with the those of traditional "by-eye" isochrone fitting methods. By applying this Bayesian technique to NGC 2360, NGC 2477, NGC 2660, and NGC 3960, we determine the ages of these clusters to be 1.35 +/- 0.05, 1.02 +/- 0.02, 1.64 +/- 0.04, and 0.860 +/- 0.04 Gyr, respectively. The results of this paper continue our effort to determine cluster ages to higher precision than that offered by these traditional methods of isochrone fitting.

  3. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters III: Analysis of 30 Clusters

    CERN Document Server

    Wagner-Kaiser, R; Sarajedini, A; von Hippel, T; van Dyk, D A; Robinson, E; Stein, N; Jefferys, W H

    2016-01-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ~0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed g...

  4. Individual organisms as units of analysis: Bayesian-clustering alternatives in population genetics.

    Science.gov (United States)

    Mank, Judith E; Avise, John C

    2004-12-01

    Population genetic analyses traditionally focus on the frequencies of alleles or genotypes in 'populations' that are delimited a priori. However, there are potential drawbacks of amalgamating genetic data into such composite attributes of assemblages of specimens: genetic information on individual specimens is lost or submerged as an inherent part of the analysis. A potential also exists for circular reasoning when a population's initial identification and subsequent genetic characterization are coupled. In principle, these problems are circumvented by some newer methods of population identification and individual assignment based on statistical clustering of specimen genotypes. Here we evaluate a recent method in this genre--Bayesian clustering--using four genotypic data sets involving different types of molecular markers in non-model organisms from nature. As expected, measures of population genetic structure (F(ST) and phiST) tended to be significantly greater in Bayesian a posteriori data treatments than in analyses where populations were delimited a priori. In the four biological contexts examined, which involved both geographic population structures and hybrid zones, Bayesian clustering was able to recover differentiated populations, and Bayesian assignments were able to identify likely population sources of specific individuals.

  5. Bayesian Agglomerative Clustering with Coalescents

    OpenAIRE

    Teh, Yee Whye; Daumé III, Hal; Roy, Daniel

    2009-01-01

    We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.

  6. A Nonparametric Bayesian Model for Nested Clustering.

    Science.gov (United States)

    Lee, Juhee; Müller, Peter; Zhu, Yitan; Ji, Yuan

    2016-01-01

    We propose a nonparametric Bayesian model for clustering where clusters of experimental units are determined by a shared pattern of clustering another set of experimental units. The proposed model is motivated by the analysis of protein activation data, where we cluster proteins such that all proteins in one cluster give rise to the same clustering of patients. That is, we define clusters of proteins by the way that patients group with respect to the corresponding protein activations. This is in contrast to (almost) all currently available models that use shared parameters in the sampling model to define clusters. This includes in particular model based clustering, Dirichlet process mixtures, product partition models, and more. We show results for two typical biostatistical inference problems that give rise to clustering. PMID:26519174

  7. Dynamic Bayesian clustering.

    Science.gov (United States)

    Fowler, Anna; Menon, Vilas; Heard, Nicholas A

    2013-10-01

    Clusters of time series data may change location and memberships over time; in gene expression data, this occurs as groups of genes or samples respond differently to stimuli or experimental conditions at different times. In order to uncover this underlying temporal structure, we consider dynamic clusters with time-dependent parameters which split and merge over time, enabling cluster memberships to change. These interesting time-dependent structures are useful in understanding the development of organisms or complex organs, and could not be identified using traditional clustering methods. In cell cycle data, these time-dependent structure may provide links between genes and stages of the cell cycle, whilst in developmental data sets they may highlight key developmental transitions. PMID:24131050

  8. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters. II. NGC 5024, NGC 5272, and NGC 6352

    Science.gov (United States)

    Wagner-Kaiser, R.; Stenning, D. C.; Robinson, E.; von Hippel, T.; Sarajedini, A.; van Dyk, D. A.; Stein, N.; Jefferys, W. H.

    2016-07-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival Advanced Camera for Surveys Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from ∼0.05 to 0.11 for these three clusters. Model grids with solar α-element abundances ([α/Fe] = 0.0) and enhanced α-elements ([α/Fe] = 0.4) are adopted.

  9. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters. II. NGC 5024, NGC 5272, and NGC 6352

    Science.gov (United States)

    Wagner-Kaiser, R.; Stenning, D. C.; Robinson, E.; von Hippel, T.; Sarajedini, A.; van Dyk, D. A.; Stein, N.; Jefferys, W. H.

    2016-07-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival Advanced Camera for Surveys Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from ˜0.05 to 0.11 for these three clusters. Model grids with solar α-element abundances ([α/Fe] = 0.0) and enhanced α-elements ([α/Fe] = 0.4) are adopted.

  10. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters II: NGC 5024, NGC 5272, and NGC 6352

    CERN Document Server

    Wagner-Kaiser, R; Robinson, E; von Hippel, T; Sarajedini, A; van Dyk, D A; Stein, N; Jefferys, W H

    2016-01-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from $\\sim$0.05 to 0.11 for these three clusters. Model grids with solar $\\alpha$-element abundances ([$\\alpha$/Fe] =0.0) and enhanced $\\alpha$-elements ([$\\alpha$/Fe]=0.4) are adopted.

  11. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters. I. Statistical and Computational Methods

    Science.gov (United States)

    Stenning, D. C.; Wagner-Kaiser, R.; Robinson, E.; van Dyk, D. A.; von Hippel, T.; Sarajedini, A.; Stein, N.

    2016-07-01

    We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations. Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties—age, metallicity, helium abundance, distance, absorption, and initial mass—are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and also show how model misspecification can potentially be identified. As a proof of concept, we analyze the two stellar populations of globular cluster NGC 5272 using our model and methods. (BASE-9 is available from GitHub: https://github.com/argiopetech/base/releases).

  12. Bayesian Mediation Analysis

    OpenAIRE

    Yuan, Ying; MacKinnon, David P.

    2009-01-01

    This article proposes Bayesian analysis of mediation effects. Compared to conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian mediation analysis, inference is straightforward and exact, which makes it appealing for studies with small samples. Third, the Bayesian approach is conceptua...

  13. Bayesian data analysis

    CERN Document Server

    Gelman, Andrew; Stern, Hal S; Dunson, David B; Vehtari, Aki; Rubin, Donald B

    2013-01-01

    FUNDAMENTALS OF BAYESIAN INFERENCEProbability and InferenceSingle-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian ApproachesHierarchical ModelsFUNDAMENTALS OF BAYESIAN DATA ANALYSISModel Checking Evaluating, Comparing, and Expanding ModelsModeling Accounting for Data Collection Decision AnalysisADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional ApproximationsREGRESSION MODELS Introduction to Regression Models Hierarchical Linear

  14. A Bayesian self-clustering analysis of the highest energy cosmic rays detected by the Pierre Auger Observatory

    CERN Document Server

    Khanin, Alexander

    2014-01-01

    Cosmic rays (CRs) are protons and atomic nuclei that flow into our Solar system and reach the Earth with energies of up to ~10^21 eV. The sources of ultra-high energy cosmic rays (UHECRs) with E >~ 10^19 eV remain unknown, although there are theoretical reasons to think that at least some come from active galactic nuclei (AGNs). One way to assess the different hypotheses is by analysing the arrival directions of UHECRs, in particular their self-clustering. We have developed a fully Bayesian approach to analyzing the self-clustering of points on the sphere, which we apply to the UHECR arrival directions. The analysis is based on a multi-step approach that enables the application of Bayesian model comparison to cases with weak prior information. We have applied this approach to the 69 highest energy events recorded by the Pierre Auger Observatory (PAO), which is the largest current UHECR data set. We do not detect self-clustering, but simulations show that this is consistent with the AGN-sourced model for a dat...

  15. Bayesian Mediation Analysis

    Science.gov (United States)

    Yuan, Ying; MacKinnon, David P.

    2009-01-01

    In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…

  16. R/BHC: fast Bayesian hierarchical clustering for microarray data

    Directory of Open Access Journals (Sweden)

    Grant Murray

    2009-08-01

    Full Text Available Abstract Background Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. Results We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. Conclusion Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric.

  17. Market Segmentation Using Bayesian Model Based Clustering

    OpenAIRE

    Van Hattum, P.

    2009-01-01

    This dissertation deals with two basic problems in marketing, that are market segmentation, which is the grouping of persons who share common aspects, and market targeting, which is focusing your marketing efforts on one or more attractive market segments. For the grouping of persons who share common aspects a Bayesian model based clustering approach is proposed such that it can be applied to data sets that are specifically used for market segmentation. The cluster algorithm can handle very l...

  18. Bayesian Exploratory Factor Analysis

    DEFF Research Database (Denmark)

    Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.;

    2014-01-01

    This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corr......This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor......, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates...

  19. Bayesian logistic regression analysis

    NARCIS (Netherlands)

    Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.

    2012-01-01

    In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an

  20. Bayesian Independent Component Analysis

    DEFF Research Database (Denmark)

    Winther, Ole; Petersen, Kaare Brandt

    2007-01-01

    In this paper we present an empirical Bayesian framework for independent component analysis. The framework provides estimates of the sources, the mixing matrix and the noise parameters, and is flexible with respect to choice of source prior and the number of sources and sensors. Inside the engine...

  1. Identifying the source of farmed escaped Atlantic salmon (Salmo salar): Bayesian clustering analysis increases accuracy of assignment

    DEFF Research Database (Denmark)

    Glover, Kevin A.; Hansen, Michael Møller; Skaala, Oystein

    2009-01-01

    . Accuracy of assignment varied greatly among the individual samples. For the Bayesian clustered data set consisting of five genetic groups, overall accuracy of self-assignment was 99%, demonstrating the effectiveness of this strategy to significantly increase accuracy of assignment, albeit at the expense....... Performing self-assignment simulations with the data divided into different sub-sets, overall accuracy of assignment was 44% within the entire material (44 samples), 44% for the 28 spring samples, 59% for the 16 autumn samples, and 70% for 8 autumn samples collected from a geographically restricted area...

  2. Bayesian Cosmic Web Reconstruction: BARCODE for Clusters

    Science.gov (United States)

    Patrick Bos, E. G.; van de Weygaert, Rien; Kitaura, Francisco; Cautun, Marius

    2016-10-01

    We describe the Bayesian \\barcode\\ formalism that has been designed towards the reconstruction of the Cosmic Web in a given volume on the basis of the sampled galaxy cluster distribution. Based on the realization that the massive compact clusters are responsible for the major share of the large scale tidal force field shaping the anisotropic and in particular filamentary features in the Cosmic Web. Given the nonlinearity of the constraints imposed by the cluster configurations, we resort to a state-of-the-art constrained reconstruction technique to find a proper statistically sampled realization of the original initial density and velocity field in the same cosmic region. Ultimately, the subsequent gravitational evolution of these initial conditions towards the implied Cosmic Web configuration can be followed on the basis of a proper analytical model or an N-body computer simulation. The BARCODE formalism includes an implicit treatment for redshift space distortions. This enables a direct reconstruction on the basis of observational data, without the need for a correction of redshift space artifacts. In this contribution we provide a general overview of the the Cosmic Web connection with clusters and a description of the Bayesian BARCODE formalism. We conclude with a presentation of its successful workings with respect to test runs based on a simulated large scale matter distribution, in physical space as well as in redshift space.

  3. Bayesian Cosmic Web Reconstruction: BARCODE for Clusters

    CERN Document Server

    Bos, E G Patrick; Kitaura, Francisco; Cautun, Marius

    2016-01-01

    We describe the Bayesian BARCODE formalism that has been designed towards the reconstruction of the Cosmic Web in a given volume on the basis of the sampled galaxy cluster distribution. Based on the realization that the massive compact clusters are responsible for the major share of the large scale tidal force field shaping the anisotropic and in particular filamentary features in the Cosmic Web. Given the nonlinearity of the constraints imposed by the cluster configurations, we resort to a state-of-the-art constrained reconstruction technique to find a proper statistically sampled realization of the original initial density and velocity field in the same cosmic region. Ultimately, the subsequent gravitational evolution of these initial conditions towards the implied Cosmic Web configuration can be followed on the basis of a proper analytical model or an N-body computer simulation. The BARCODE formalism includes an implicit treatment for redshift space distortions. This enables a direct reconstruction on the ...

  4. Bayesian nonparametric data analysis

    CERN Document Server

    Müller, Peter; Jara, Alejandro; Hanson, Tim

    2015-01-01

    This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.

  5. Bayesian Nonparametric Clustering for Positive Definite Matrices.

    Science.gov (United States)

    Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos

    2016-05-01

    Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms. PMID:27046838

  6. Bayesian analysis of volcanic eruptions

    Science.gov (United States)

    Ho, Chih-Hsiang

    1990-10-01

    The simple Poisson model generally gives a good fit to many volcanoes for volcanic eruption forecasting. Nonetheless, empirical evidence suggests that volcanic activity in successive equal time-periods tends to be more variable than a simple Poisson with constant eruptive rate. An alternative model is therefore examined in which eruptive rate(λ) for a given volcano or cluster(s) of volcanoes is described by a gamma distribution (prior) rather than treated as a constant value as in the assumptions of a simple Poisson model. Bayesian analysis is performed to link two distributions together to give the aggregate behavior of the volcanic activity. When the Poisson process is expanded to accomodate a gamma mixing distribution on λ, a consequence of this mixed (or compound) Poisson model is that the frequency distribution of eruptions in any given time-period of equal length follows the negative binomial distribution (NBD). Applications of the proposed model and comparisons between the generalized model and simple Poisson model are discussed based on the historical eruptive count data of volcanoes Mauna Loa (Hawaii) and Etna (Italy). Several relevant facts lead to the conclusion that the generalized model is preferable for practical use both in space and time.

  7. Bayesian Analysis of Experimental Data

    Directory of Open Access Journals (Sweden)

    Lalmohan Bhar

    2013-10-01

    Full Text Available Analysis of experimental data from Bayesian point of view has been considered. Appropriate methodology has been developed for application into designed experiments. Normal-Gamma distribution has been considered for prior distribution. Developed methodology has been applied to real experimental data taken from long term fertilizer experiments.

  8. Bayesian analysis of rare events

    Science.gov (United States)

    Straub, Daniel; Papaioannou, Iason; Betz, Wolfgang

    2016-06-01

    In many areas of engineering and science there is an interest in predicting the probability of rare events, in particular in applications related to safety and security. Increasingly, such predictions are made through computer models of physical systems in an uncertainty quantification framework. Additionally, with advances in IT, monitoring and sensor technology, an increasing amount of data on the performance of the systems is collected. This data can be used to reduce uncertainty, improve the probability estimates and consequently enhance the management of rare events and associated risks. Bayesian analysis is the ideal method to include the data into the probabilistic model. It ensures a consistent probabilistic treatment of uncertainty, which is central in the prediction of rare events, where extrapolation from the domain of observation is common. We present a framework for performing Bayesian updating of rare event probabilities, termed BUS. It is based on a reinterpretation of the classical rejection-sampling approach to Bayesian analysis, which enables the use of established methods for estimating probabilities of rare events. By drawing upon these methods, the framework makes use of their computational efficiency. These methods include the First-Order Reliability Method (FORM), tailored importance sampling (IS) methods and Subset Simulation (SuS). In this contribution, we briefly review these methods in the context of the BUS framework and investigate their applicability to Bayesian analysis of rare events in different settings. We find that, for some applications, FORM can be highly efficient and is surprisingly accurate, enabling Bayesian analysis of rare events with just a few model evaluations. In a general setting, BUS implemented through IS and SuS is more robust and flexible.

  9. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  10. Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods

    Directory of Open Access Journals (Sweden)

    Morita Mitsuo

    2011-06-01

    Full Text Available Abstract Background A Bayesian approach based on a Dirichlet process (DP prior is useful for inferring genetic population structures because it can infer the number of populations and the assignment of individuals simultaneously. However, the properties of the DP prior method are not well understood, and therefore, the use of this method is relatively uncommon. We characterized the DP prior method to increase its practical use. Results First, we evaluated the usefulness of the sequentially-allocated merge-split (SAMS sampler, which is a technique for improving the mixing of Markov chain Monte Carlo algorithms. Although this sampler has been implemented in a preceding program, HWLER, its effectiveness has not been investigated. We showed that this sampler was effective for population structure analysis. Implementation of this sampler was useful with regard to the accuracy of inference and computational time. Second, we examined the effect of a hyperparameter for the prior distribution of allele frequencies and showed that the specification of this parameter was important and could be resolved by considering the parameter as a variable. Third, we compared the DP prior method with other Bayesian clustering methods and showed that the DP prior method was suitable for data sets with unbalanced sample sizes among populations. In contrast, although current popular algorithms for population structure analysis, such as those implemented in STRUCTURE, were suitable for data sets with uniform sample sizes, inferences with these algorithms for unbalanced sample sizes tended to be less accurate than those with the DP prior method. Conclusions The clustering method based on the DP prior was found to be useful because it can infer the number of populations and simultaneously assign individuals into populations, and it is suitable for data sets with unbalanced sample sizes among populations. Here we presented a novel program, DPART, that implements the SAMS

  11. Bayesian Model Averaging for Propensity Score Analysis

    Science.gov (United States)

    Kaplan, David; Chen, Jianshen

    2013-01-01

    The purpose of this study is to explore Bayesian model averaging in the propensity score context. Previous research on Bayesian propensity score analysis does not take into account model uncertainty. In this regard, an internally consistent Bayesian framework for model building and estimation must also account for model uncertainty. The…

  12. Comparison of linear mixed model analysis and genealogy-based haplotype clustering with a Bayesian approach for association mapping in a pedigreed population

    DEFF Research Database (Denmark)

    Dashab, Golam Reza; Kadri, Naveen Kumar; Mahdi Shariati, Mohammad;

    2012-01-01

    ) Mixed model analysis (MMA), 2) Random haplotype model (RHM), 3) Genealogy-based mixed model (GENMIX), and 4) Bayesian variable selection (BVS). The data consisted of phenotypes of 2000 animals from 20 sire families and were genotyped with 9990 SNPs on five chromosomes. Results: Out of the eight...

  13. Bayesian Group Factor Analysis

    OpenAIRE

    Virtanen, Seppo; Klami, Arto; Khan, Suleiman A; Kaski, Samuel

    2011-01-01

    We introduce a factor analysis model that summarizes the dependencies between observed variable groups, instead of dependencies between individual variables as standard factor analysis does. A group may correspond to one view of the same set of objects, one of many data sets tied by co-occurrence, or a set of alternative variables collected from statistics tables to measure one property of interest. We show that by assuming group-wise sparse factors, active in a subset of the sets, the variat...

  14. Advances in Bayesian Model Based Clustering Using Particle Learning

    Energy Technology Data Exchange (ETDEWEB)

    Merl, D M

    2009-11-19

    Recent work by Carvalho, Johannes, Lopes and Polson and Carvalho, Lopes, Polson and Taddy introduced a sequential Monte Carlo (SMC) alternative to traditional iterative Monte Carlo strategies (e.g. MCMC and EM) for Bayesian inference for a large class of dynamic models. The basis of SMC techniques involves representing the underlying inference problem as one of state space estimation, thus giving way to inference via particle filtering. The key insight of Carvalho et al was to construct the sequence of filtering distributions so as to make use of the posterior predictive distribution of the observable, a distribution usually only accessible in certain Bayesian settings. Access to this distribution allows a reversal of the usual propagate and resample steps characteristic of many SMC methods, thereby alleviating to a large extent many problems associated with particle degeneration. Furthermore, Carvalho et al point out that for many conjugate models the posterior distribution of the static variables can be parametrized in terms of [recursively defined] sufficient statistics of the previously observed data. For models where such sufficient statistics exist, particle learning as it is being called, is especially well suited for the analysis of streaming data do to the relative invariance of its algorithmic complexity with the number of data observations. Through a particle learning approach, a statistical model can be fit to data as the data is arriving, allowing at any instant during the observation process direct quantification of uncertainty surrounding underlying model parameters. Here we describe the use of a particle learning approach for fitting a standard Bayesian semiparametric mixture model as described in Carvalho, Lopes, Polson and Taddy. In Section 2 we briefly review the previously presented particle learning algorithm for the case of a Dirichlet process mixture of multivariate normals. In Section 3 we describe several novel extensions to the original

  15. Multiview Bayesian Correlated Component Analysis

    DEFF Research Database (Denmark)

    Kamronn, Simon Due; Poulsen, Andreas Trier; Hansen, Lars Kai

    2015-01-01

    Correlated component analysis as proposed by Dmochowski, Sajda, Dias, and Parra (2012) is a tool for investigating brain process similarity in the responses to multiple views of a given stimulus. Correlated components are identified under the assumption that the involved spatial networks...... are identical. Here we propose a hierarchical probabilistic model that can infer the level of universality in such multiview data, from completely unrelated representations, corresponding to canonical correlation analysis, to identical representations as in correlated component analysis. This new model, which...... we denote Bayesian correlated component analysis, evaluates favorably against three relevant algorithms in simulated data. A well-established benchmark EEG data set is used to further validate the new model and infer the variability of spatial representations across multiple subjects....

  16. Detecting Galaxy Clusters in the DLS and CARS: a Bayesian Cluster Finder

    CERN Document Server

    Ascaso, Begoña; Benítez, Narciso

    2010-01-01

    The detection of galaxy clusters in present and future surveys enables measuring mass-to-light ratios, clustering properties or galaxy cluster abundances and therefore, constraining cosmological parameters. We present a new technique for detecting galaxy clusters, which is based on the Matched Filter Algorithm from a Bayesian point of view. The method is able to determine the position, redshift and richness of the cluster through the maximization of a filter depending on galaxy luminosity, density and photometric redshift combined with a galaxy cluster prior. We tested the algorithm through realistic mock galaxy catalogs, revealing that the detections are 100% complete and 80% pure for clusters up to z 25 (Abell Richness > 0). We applied the algorithm to the CFHTLS Archive Research Survey (CARS) data, recovering similar detections as previously published using the same data plus additional clusters that are very probably real. We also applied this algorithm to the Deep Lens Survey (DLS), obtaining the first ...

  17. Bayesian analysis of exoplanet and binary orbits

    OpenAIRE

    Schulze-Hartung, Tim; Launhardt, Ralf; Henning, Thomas

    2012-01-01

    We introduce BASE (Bayesian astrometric and spectroscopic exoplanet detection and characterisation tool), a novel program for the combined or separate Bayesian analysis of astrometric and radial-velocity measurements of potential exoplanet hosts and binary stars. The capabilities of BASE are demonstrated using all publicly available data of the binary Mizar A.

  18. Low-Complexity Bayesian Estimation of Cluster-Sparse Channels

    KAUST Repository

    Ballal, Tarig

    2015-09-18

    This paper addresses the problem of channel impulse response estimation for cluster-sparse channels under the Bayesian estimation framework. We develop a novel low-complexity minimum mean squared error (MMSE) estimator by exploiting the sparsity of the received signal profile and the structure of the measurement matrix. It is shown that due to the banded Toeplitz/circulant structure of the measurement matrix, a channel impulse response, such as underwater acoustic channel impulse responses, can be partitioned into a number of orthogonal or approximately orthogonal clusters. The orthogonal clusters, the sparsity of the channel impulse response and the structure of the measurement matrix, all combined, result in a computationally superior realization of the MMSE channel estimator. The MMSE estimator calculations boil down to simpler in-cluster calculations that can be reused in different clusters. The reduction in computational complexity allows for a more accurate implementation of the MMSE estimator. The proposed approach is tested using synthetic Gaussian channels, as well as simulated underwater acoustic channels. Symbol-error-rate performance and computation time confirm the superiority of the proposed method compared to selected benchmark methods in systems with preamble-based training signals transmitted over clustersparse channels.

  19. BASE-9: Bayesian Analysis for Stellar Evolution with nine variables

    Science.gov (United States)

    Robinson, Elliot; von Hippel, Ted; Stein, Nathan; Stenning, David; Wagner-Kaiser, Rachel; Si, Shijing; van Dyk, David

    2016-08-01

    The BASE-9 (Bayesian Analysis for Stellar Evolution with nine variables) software suite recovers star cluster and stellar parameters from photometry and is useful for analyzing single-age, single-metallicity star clusters, binaries, or single stars, and for simulating such systems. BASE-9 uses a Markov chain Monte Carlo (MCMC) technique along with brute force numerical integration to estimate the posterior probability distribution for the age, metallicity, helium abundance, distance modulus, line-of-sight absorption, and parameters of the initial-final mass relation (IFMR) for a cluster, and for the primary mass, secondary mass (if a binary), and cluster probability for every potential cluster member. The MCMC technique is used for the cluster quantities (the first six items listed above) and numerical integration is used for the stellar quantities (the last three items in the above list).

  20. Clustering and Bayesian network for image of faces classification

    CERN Document Server

    Jayech, Khlifia

    2012-01-01

    In a content based image classification system, target images are sorted by feature similarities with respect to the query (CBIR). In this paper, we propose to use new approach combining distance tangent, k-means algorithm and Bayesian network for image classification. First, we use the technique of tangent distance to calculate several tangent spaces representing the same image. The objective is to reduce the error in the classification phase. Second, we cut the image in a whole of blocks. For each block, we compute a vector of descriptors. Then, we use K-means to cluster the low-level features including color and texture information to build a vector of labels for each image. Finally, we apply five variants of Bayesian networks classifiers (Na\\"ive Bayes, Global Tree Augmented Na\\"ive Bayes (GTAN), Global Forest Augmented Na\\"ive Bayes (GFAN), Tree Augmented Na\\"ive Bayes for each class (TAN), and Forest Augmented Na\\"ive Bayes for each class (FAN) to classify the image of faces using the vector of labels. ...

  1. Subjective Bayesian Analysis: Principles and Practice

    OpenAIRE

    Goldstein, Michael

    2006-01-01

    We address the position of subjectivism within Bayesian statistics. We argue, first, that the subjectivist Bayes approach is the only feasible method for tackling many important practical problems. Second, we describe the essential role of the subjectivist approach in scientific analysis. Third, we consider possible modifications to the Bayesian approach from a subjectivist viewpoint. Finally, we address the issue of pragmatism in implementing the subjectivist approach.

  2. Bayesian Analysis of Multivariate Probit Models

    OpenAIRE

    Siddhartha Chib; Edward Greenberg

    1996-01-01

    This paper provides a unified simulation-based Bayesian and non-Bayesian analysis of correlated binary data using the multivariate probit model. The posterior distribution is simulated by Markov chain Monte Carlo methods, and maximum likelihood estimates are obtained by a Markov chain Monte Carlo version of the E-M algorithm. Computation of Bayes factors from the simulation output is also considered. The methods are applied to a bivariate data set, to a 534-subject, four-year longitudinal dat...

  3. Bayesian analysis of contingency tables

    OpenAIRE

    Gómez Villegas, Miguel A.; González Pérez, Beatriz

    2005-01-01

    The display of the data by means of contingency tables is used in different approaches to statistical inference, for example, to broach the test of homogeneity of independent multinomial distributions. We develop a Bayesian procedure to test simple null hypotheses versus bilateral alternatives in contingency tables. Given independent samples of two binomial distributions and taking a mixed prior distribution, we calculate the posterior probability that the proportion of successes in the first...

  4. Pair-Wise Cluster Analysis

    CERN Document Server

    Hardoon, David R

    2010-01-01

    This paper studies the problem of learning clusters which are consistently present in different (continuously valued) representations of observed data. Our setup differs slightly from the standard approach of (co-) clustering as we use the fact that some form of `labeling' becomes available in this setup: a cluster is only interesting if it has a counterpart in the alternative representation. The contribution of this paper is twofold: (i) the problem setting is explored and an analysis in terms of the PAC-Bayesian theorem is presented, (ii) a practical kernel-based algorithm is derived exploiting the inherent relation to Canonical Correlation Analysis (CCA), as well as its extension to multiple views. A content based information retrieval (CBIR) case study is presented on the multi-lingual aligned Europal document dataset which supports the above findings.

  5. Bayesian modelling of clusters of galaxies from multi-frequency pointed Sunyaev--Zel'dovich observations

    OpenAIRE

    Feroz, F.; Hobson, M. P.; Zwart, J T L; Saunders, R. D. E.; Grainge, K. J. B.

    2008-01-01

    We present a Bayesian approach to modelling galaxy clusters using multi-frequency pointed observations from telescopes that exploit the Sunyaev--Zel'dovich effect. We use the recently developed MultiNest technique (Feroz, Hobson & Bridges, 2008) to explore the high-dimensional parameter spaces and also to calculate the Bayesian evidence. This permits robust parameter estimation as well as model comparison. Tests on simulated Arcminute Microkelvin Imager observations of a cluster, in the prese...

  6. Bayesian analysis of cosmic structures

    CERN Document Server

    Kitaura, Francisco-Shu

    2011-01-01

    We revise the Bayesian inference steps required to analyse the cosmological large-scale structure. Here we make special emphasis in the complications which arise due to the non-Gaussian character of the galaxy and matter distribution. In particular we investigate the advantages and limitations of the Poisson-lognormal model and discuss how to extend this work. With the lognormal prior using the Hamiltonian sampling technique and on scales of about 4 h^{-1} Mpc we find that the over-dense regions are excellent reconstructed, however, under-dense regions (void statistics) are quantitatively poorly recovered. Contrary to the maximum a posteriori (MAP) solution which was shown to over-estimate the density in the under-dense regions we obtain lower densities than in N-body simulations. This is due to the fact that the MAP solution is conservative whereas the full posterior yields samples which are consistent with the prior statistics. The lognormal prior is not able to capture the full non-linear regime at scales ...

  7. Fast Bayesian Semiparametric Curve-Fitting and Clustering in Massive Data With Application to Cosmology

    CERN Document Server

    Mukhopadhyay, Sabyasachi; Bhattacharya, Sourabh

    2009-01-01

    Recent technological advances have led to a flood of new data on cosmology rich in information about the formation and evolution of the universe, e.g., the data collected in Sloan Digital Sky Survey (SDSS) for more than 200 million objects. The analyses of such data demand cutting edge statistical technologies. Here, we have used the concept of mixture model within Bayesian semiparametric methodology to fit the regression curve with the bivariate data for the apparent magnitude and redshift for Quasars in SDSS (2007) catalogue. Associated with the mixture modeling is a highly efficient curve-fitting procedure, which is central to the application considered in this paper. Moreover, we adopt a new method for analysing the posterior distribution of clusterings, also generated as a by-product of our methodology. The results of our analysis of the cosmological data clearly indicate the existence of four change points on the regression curve andposssibiltiy of clustering of quasars specially at high redshift. This ...

  8. A Bayesian nonparametric meta-analysis model.

    Science.gov (United States)

    Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G

    2015-03-01

    In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall effect size, such models may be adequate, but for prediction, they surely are not if the effect-size distribution exhibits non-normal behavior. To address this issue, we propose a Bayesian nonparametric meta-analysis model, which can describe a wider range of effect-size distributions, including unimodal symmetric distributions, as well as skewed and more multimodal distributions. We demonstrate our model through the analysis of real meta-analytic data arising from behavioral-genetic research. We compare the predictive performance of the Bayesian nonparametric model against various conventional and more modern normal fixed-effects and random-effects models.

  9. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R

    1973-01-01

    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  10. On the Informativeness of Dominant and Co-Dominant Genetic Markers for Bayesian Supervised Clustering

    DEFF Research Database (Denmark)

    Guillot, Gilles; Carpentier-Skandalis, Alexandra

    2011-01-01

    We study the accuracy of a Bayesian supervised method used to cluster individuals into genetically homogeneous groups on the basis of dominant or codominant molecular markers. We provide a formula relating an error criterion to the number of loci used and the number of clusters. This formula...

  11. A Gentle Introduction to Bayesian Analysis : Applications to Developmental Research

    NARCIS (Netherlands)

    Van de Schoot, Rens; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B.; Neyer, Franz J.; van Aken, Marcel A G

    2014-01-01

    Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, t

  12. Book review: Bayesian analysis for population ecology

    Science.gov (United States)

    Link, William A.

    2011-01-01

    Brian Dennis described the field of ecology as “fertile, uncolonized ground for Bayesian ideas.” He continued: “The Bayesian propagule has arrived at the shore. Ecologists need to think long and hard about the consequences of a Bayesian ecology. The Bayesian outlook is a successful competitor, but is it a weed? I think so.” (Dennis 2004)

  13. A Bayesian Alternative to Mutual Information for the Hierarchical Clustering of Dependent Random Variables.

    Directory of Open Access Journals (Sweden)

    Guillaume Marrelec

    Full Text Available The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity, provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.

  14. Bayesian Analysis of Individual Level Personality Dynamics

    Directory of Open Access Journals (Sweden)

    Edward Cripps

    2016-07-01

    Full Text Available A Bayesian technique with analyses of within-person processes at the level of the individual is presented. The approach is used to examine if the patterns of within-person responses on a 12 trial simulation task are consistent with the predictions of ITA theory (Dweck, 1999. ITA theory states that the performance of an individual with an entity theory of ability is more likely to spiral down following a failure experience than the performance of an individual with an incremental theory of ability. This is because entity theorists interpret failure experiences as evidence of a lack of ability, which they believe is largely innate and therefore relatively fixed; whilst incremental theorists believe in the malleability of abilities and interpret failure experiences as evidence of more controllable factors such as poor strategy or lack of effort. The results of our analyses support ITA theory at both the within- and between-person levels of analyses and demonstrate the benefits of Bayesian techniques for the analysis of within-person processes. These include more formal specification of the theory and the ability to draw inferences about each individual, which allows for more nuanced interpretations of individuals within a personality category, such as differences in the individual probabilities of spiralling. While Bayesian techniques have many potential advantages for the analyses of within-person processes at the individual level, ease of use is not one of them for psychologists trained in traditional frequentist statistical techniques.

  15. Bayesian Analysis of Individual Level Personality Dynamics

    Science.gov (United States)

    Cripps, Edward; Wood, Robert E.; Beckmann, Nadin; Lau, John; Beckmann, Jens F.; Cripps, Sally Ann

    2016-01-01

    A Bayesian technique with analyses of within-person processes at the level of the individual is presented. The approach is used to examine whether the patterns of within-person responses on a 12-trial simulation task are consistent with the predictions of ITA theory (Dweck, 1999). ITA theory states that the performance of an individual with an entity theory of ability is more likely to spiral down following a failure experience than the performance of an individual with an incremental theory of ability. This is because entity theorists interpret failure experiences as evidence of a lack of ability which they believe is largely innate and therefore relatively fixed; whilst incremental theorists believe in the malleability of abilities and interpret failure experiences as evidence of more controllable factors such as poor strategy or lack of effort. The results of our analyses support ITA theory at both the within- and between-person levels of analyses and demonstrate the benefits of Bayesian techniques for the analysis of within-person processes. These include more formal specification of the theory and the ability to draw inferences about each individual, which allows for more nuanced interpretations of individuals within a personality category, such as differences in the individual probabilities of spiraling. While Bayesian techniques have many potential advantages for the analyses of processes at the level of the individual, ease of use is not one of them for psychologists trained in traditional frequentist statistical techniques. PMID:27486415

  16. Bayesian Analysis of Individual Level Personality Dynamics.

    Science.gov (United States)

    Cripps, Edward; Wood, Robert E; Beckmann, Nadin; Lau, John; Beckmann, Jens F; Cripps, Sally Ann

    2016-01-01

    A Bayesian technique with analyses of within-person processes at the level of the individual is presented. The approach is used to examine whether the patterns of within-person responses on a 12-trial simulation task are consistent with the predictions of ITA theory (Dweck, 1999). ITA theory states that the performance of an individual with an entity theory of ability is more likely to spiral down following a failure experience than the performance of an individual with an incremental theory of ability. This is because entity theorists interpret failure experiences as evidence of a lack of ability which they believe is largely innate and therefore relatively fixed; whilst incremental theorists believe in the malleability of abilities and interpret failure experiences as evidence of more controllable factors such as poor strategy or lack of effort. The results of our analyses support ITA theory at both the within- and between-person levels of analyses and demonstrate the benefits of Bayesian techniques for the analysis of within-person processes. These include more formal specification of the theory and the ability to draw inferences about each individual, which allows for more nuanced interpretations of individuals within a personality category, such as differences in the individual probabilities of spiraling. While Bayesian techniques have many potential advantages for the analyses of processes at the level of the individual, ease of use is not one of them for psychologists trained in traditional frequentist statistical techniques. PMID:27486415

  17. Bayesian Inference in Statistical Analysis

    CERN Document Server

    Box, George E P

    2011-01-01

    The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Rob

  18. Bayesian analysis for extreme climatic events: A review

    Science.gov (United States)

    Chu, Pao-Shin; Zhao, Xin

    2011-11-01

    This article reviews Bayesian analysis methods applied to extreme climatic data. We particularly focus on applications to three different problems related to extreme climatic events including detection of abrupt regime shifts, clustering tropical cyclone tracks, and statistical forecasting for seasonal tropical cyclone activity. For identifying potential change points in an extreme event count series, a hierarchical Bayesian framework involving three layers - data, parameter, and hypothesis - is formulated to demonstrate the posterior probability of the shifts throughout the time. For the data layer, a Poisson process with a gamma distributed rate is presumed. For the hypothesis layer, multiple candidate hypotheses with different change-points are considered. To calculate the posterior probability for each hypothesis and its associated parameters we developed an exact analytical formula, a Markov Chain Monte Carlo (MCMC) algorithm, and a more sophisticated reversible jump Markov Chain Monte Carlo (RJMCMC) algorithm. The algorithms are applied to several rare event series: the annual tropical cyclone or typhoon counts over the central, eastern, and western North Pacific; the annual extremely heavy rainfall event counts at Manoa, Hawaii; and the annual heat wave frequency in France. Using an Expectation-Maximization (EM) algorithm, a Bayesian clustering method built on a mixture Gaussian model is applied to objectively classify historical, spaghetti-like tropical cyclone tracks (1945-2007) over the western North Pacific and the South China Sea into eight distinct track types. A regression based approach to forecasting seasonal tropical cyclone frequency in a region is developed. Specifically, by adopting large-scale environmental conditions prior to the tropical cyclone season, a Poisson regression model is built for predicting seasonal tropical cyclone counts, and a probit regression model is alternatively developed toward a binary classification problem. With a non

  19. ASteCA - Automated Stellar Cluster Analysis

    CERN Document Server

    Perren, Gabriel I; Piatti, Andrés E

    2014-01-01

    We present ASteCA (Automated Stellar Cluster Analysis), a suit of tools designed to fully automatize the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its unce...

  20. Bayesian Analysis of Type Ia Supernova Data

    Institute of Scientific and Technical Information of China (English)

    王晓峰; 周旭; 李宗伟; 陈黎

    2003-01-01

    Recently, the distances to type Ia supernova (SN Ia) at z ~ 0.5 have been measured with the motivation of estimating cosmological parameters. However, different sleuthing techniques tend to give inconsistent measurements for SN Ia distances (~0.3 mag), which significantly affects the determination of cosmological parameters.A Bayesian "hyper-parameter" procedure is used to analyse jointly the current SN Ia data, which considers the relative weights of different datasets. For a flat Universe, the combining analysis yields ΩM = 0.20 ± 0.07.

  1. Bayesian global analysis of neutrino oscillation data

    CERN Document Server

    Bergstrom, Johannes; Maltoni, Michele; Schwetz, Thomas

    2015-01-01

    We perform a Bayesian analysis of current neutrino oscillation data. When estimating the oscillation parameters we find that the results generally agree with those of the $\\chi^2$ method, with some differences involving $s_{23}^2$ and CP-violating effects. We discuss the additional subtleties caused by the circular nature of the CP-violating phase, and how it is possible to obtain correlation coefficients with $s_{23}^2$. When performing model comparison, we find that there is no significant evidence for any mass ordering, any octant of $s_{23}^2$ or a deviation from maximal mixing, nor the presence of CP-violation.

  2. Pitman Yor Diffusion Trees for Bayesian Hierarchical Clustering.

    Science.gov (United States)

    Knowles, David A; Ghahramani, Zoubin

    2015-02-01

    In this paper we introduce the Pitman Yor Diffusion Tree (PYDT), a Bayesian non-parametric prior over tree structures which generalises the Dirichlet Diffusion Tree [30] and removes the restriction to binary branching structure. The generative process is described and shown to result in an exchangeable distribution over data points. We prove some theoretical properties of the model including showing its construction as the continuum limit of a nested Chinese restaurant process model. We then present two alternative MCMC samplers which allow us to model uncertainty over tree structures, and a computationally efficient greedy Bayesian EM search algorithm. Both algorithms use message passing on the tree structure. The utility of the model and algorithms is demonstrated on synthetic and real world data, both continuous and binary. PMID:26353241

  3. Current trends in Bayesian methodology with applications

    CERN Document Server

    Upadhyay, Satyanshu K; Dey, Dipak K; Loganathan, Appaia

    2015-01-01

    Collecting Bayesian material scattered throughout the literature, Current Trends in Bayesian Methodology with Applications examines the latest methodological and applied aspects of Bayesian statistics. The book covers biostatistics, econometrics, reliability and risk analysis, spatial statistics, image analysis, shape analysis, Bayesian computation, clustering, uncertainty assessment, high-energy astrophysics, neural networking, fuzzy information, objective Bayesian methodologies, empirical Bayes methods, small area estimation, and many more topics.Each chapter is self-contained and focuses on

  4. A Bayesian Approach Accounting for Stochastic Fluctuations in Stellar Cluster Properties

    CERN Document Server

    Fouesneau, M

    2009-01-01

    The integrated spectro-photometric properties of star clusters are subject to large cluster-to-cluster variations. They are distributed in non trivial ways around the average properties predicted by standard population synthesis models. This results from the stochastic mass distribution of the finite (small) number of luminous stars in each cluster, stars which may be either particularly blue or particularly red. The color distributions are broad and usually far from Gaussian, especially for young and intermediate age clusters, as found in interacting galaxies. When photometric measurements of clusters are used to estimate ages and masses in conjunction with standard models, biases are to be expected. We present a Bayesian approach that explicitly accounts for stochasticity when estimating ages and masses of star clusters that cannot be resolved into stars. Based on Monte-Carlo simulations, we are starting to explore the probability distributions of star cluster properties obtained given a set of multi-wavele...

  5. Doing bayesian data analysis a tutorial with R and BUGS

    CERN Document Server

    Kruschke, John K

    2011-01-01

    There is an explosion of interest in Bayesian statistics, primarily because recently created computational methods have finally made Bayesian analysis obtainable to a wide audience. Doing Bayesian Data Analysis, A Tutorial Introduction with R and BUGS provides an accessible approach to Bayesian data analysis, as material is explained clearly with concrete examples. The book begins with the basics, including essential concepts of probability and random sampling, and gradually progresses to advanced hierarchical modeling methods for realistic data. The text delivers comprehensive coverage of all

  6. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    Science.gov (United States)

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  7. Bayesian networks as a tool for epidemiological systems analysis

    OpenAIRE

    Lewis, F.I.

    2012-01-01

    Bayesian network analysis is a form of probabilistic modeling which derives from empirical data a directed acyclic graph (DAG) describing the dependency structure between random variables. Bayesian networks are increasingly finding application in areas such as computational and systems biology, and more recently in epidemiological analyses. The key distinction between standard empirical modeling approaches, such as generalised linear modeling, and Bayesian network analyses is that the latter ...

  8. Bayesian Analysis of High Dimensional Classification

    Science.gov (United States)

    Mukhopadhyay, Subhadeep; Liang, Faming

    2009-12-01

    Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.

  9. The bugs book a practical introduction to Bayesian analysis

    CERN Document Server

    Lunn, David; Best, Nicky; Thomas, Andrew; Spiegelhalter, David

    2012-01-01

    Introduction: Probability and ParametersProbabilityProbability distributionsCalculating properties of probability distributionsMonte Carlo integrationMonte Carlo Simulations Using BUGSIntroduction to BUGSDoodleBUGSUsing BUGS to simulate from distributionsTransformations of random variablesComplex calculations using Monte CarloMultivariate Monte Carlo analysisPredictions with unknown parametersIntroduction to Bayesian InferenceBayesian learningPosterior predictive distributionsConjugate Bayesian inferenceInference about a discrete parameterCombinations of conjugate analysesBayesian and classica

  10. BEAST: Bayesian evolutionary analysis by sampling trees

    Directory of Open Access Journals (Sweden)

    Drummond Alexei J

    2007-11-01

    Full Text Available Abstract Background The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented. Results BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at http://beast-mcmc.googlecode.com/ under the GNU LGPL license. Conclusion BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.

  11. Bayesian data analysis in population ecology: motivations, methods, and benefits

    Science.gov (United States)

    Dorazio, Robert

    2016-01-01

    During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.

  12. Bayesian analysis for EMP damaged function based on Weibull distribution

    International Nuclear Information System (INIS)

    Weibull distribution is one of the most commonly used statistical distribution in EMP vulnerability analysis. In the paper, the EMP damage function based on Weibull distribution of solid state relays was solved by bayesian computation using gibbs sampling algorithm. (authors)

  13. Stochastic back analysis of permeability coefficient using generalized Bayesian method

    Institute of Scientific and Technical Information of China (English)

    Zheng Guilan; Wang Yuan; Wang Fei; Yang Jian

    2008-01-01

    Owing to the fact that the conventional deterministic back analysis of the permeability coefficient cannot reflect the uncertainties of parameters, including the hydraulic head at the boundary, the permeability coefficient and measured hydraulic head, a stochastic back analysis taking consideration of uncertainties of parameters was performed using the generalized Bayesian method. Based on the stochastic finite element method (SFEM) for a seepage field, the variable metric algorithm and the generalized Bayesian method, formulas for stochastic back analysis of the permeability coefficient were derived. A case study of seepage analysis of a sluice foundation was performed to illustrate the proposed method. The results indicate that, with the generalized Bayesian method that considers the uncertainties of measured hydraulic head, the permeability coefficient and the hydraulic head at the boundary, both the mean and standard deviation of the permeability coefficient can be obtained and the standard deviation is less than that obtained by the conventional Bayesian method. Therefore, the present method is valid and applicable.

  14. Bayesian Analysis of the Cosmic Microwave Background

    Science.gov (United States)

    Jewell, Jeffrey

    2007-01-01

    There is a wealth of cosmological information encoded in the spatial power spectrum of temperature anisotropies of the cosmic microwave background! Experiments designed to map the microwave sky are returning a flood of data (time streams of instrument response as a beam is swept over the sky) at several different frequencies (from 30 to 900 GHz), all with different resolutions and noise properties. The resulting analysis challenge is to estimate, and quantify our uncertainty in, the spatial power spectrum of the cosmic microwave background given the complexities of "missing data", foreground emission, and complicated instrumental noise. Bayesian formulation of this problem allows consistent treatment of many complexities including complicated instrumental noise and foregrounds, and can be numerically implemented with Gibbs sampling. Gibbs sampling has now been validated as an efficient, statistically exact, and practically useful method for low-resolution (as demonstrated on WMAP 1 and 3 year temperature and polarization data). Continuing development for Planck - the goal is to exploit the unique capabilities of Gibbs sampling to directly propagate uncertainties in both foreground and instrument models to total uncertainty in cosmological parameters.

  15. Objective Bayesian Analysis of Skew- t Distributions

    KAUST Repository

    BRANCO, MARCIA D'ELIA

    2012-02-27

    We study the Jeffreys prior and its properties for the shape parameter of univariate skew-t distributions with linear and nonlinear Student\\'s t skewing functions. In both cases, we show that the resulting priors for the shape parameter are symmetric around zero and proper. Moreover, we propose a Student\\'s t approximation of the Jeffreys prior that makes an objective Bayesian analysis easy to perform. We carry out a Monte Carlo simulation study that demonstrates an overall better behaviour of the maximum a posteriori estimator compared with the maximum likelihood estimator. We also compare the frequentist coverage of the credible intervals based on the Jeffreys prior and its approximation and show that they are similar. We further discuss location-scale models under scale mixtures of skew-normal distributions and show some conditions for the existence of the posterior distribution and its moments. Finally, we present three numerical examples to illustrate the implications of our results on inference for skew-t distributions. © 2012 Board of the Foundation of the Scandinavian Journal of Statistics.

  16. Bayesian Analysis of Multiple Populations I: Statistical and Computational Methods

    CERN Document Server

    Stenning, D C; Robinson, E; van Dyk, D A; von Hippel, T; Sarajedini, A; Stein, N

    2016-01-01

    We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations (vanDyk et al. 2009, Stein et al. 2013). Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties---age, metallicity, helium abundance, distance, absorption, and initial mass---are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and al...

  17. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

    Science.gov (United States)

    Ellefsen, Karl J.; Smith, David

    2016-01-01

    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.

  18. A Bayesian Analysis of Spectral ARMA Model

    Directory of Open Access Journals (Sweden)

    Manoel I. Silvestre Bezerra

    2012-01-01

    Full Text Available Bezerra et al. (2008 proposed a new method, based on Yule-Walker equations, to estimate the ARMA spectral model. In this paper, a Bayesian approach is developed for this model by using the noninformative prior proposed by Jeffreys (1967. The Bayesian computations, simulation via Markov Monte Carlo (MCMC is carried out and characteristics of marginal posterior distributions such as Bayes estimator and confidence interval for the parameters of the ARMA model are derived. Both methods are also compared with the traditional least squares and maximum likelihood approaches and a numerical illustration with two examples of the ARMA model is presented to evaluate the performance of the procedures.

  19. Bayesian methods for the design and analysis of noninferiority trials.

    Science.gov (United States)

    Gamalo-Siebers, Margaret; Gao, Aijun; Lakshminarayanan, Mani; Liu, Guanghan; Natanegara, Fanni; Railkar, Radha; Schmidli, Heinz; Song, Guochen

    2016-01-01

    The gold standard for evaluating treatment efficacy of a medical product is a placebo-controlled trial. However, when the use of placebo is considered to be unethical or impractical, a viable alternative for evaluating treatment efficacy is through a noninferiority (NI) study where a test treatment is compared to an active control treatment. The minimal objective of such a study is to determine whether the test treatment is superior to placebo. An assumption is made that if the active control treatment remains efficacious, as was observed when it was compared against placebo, then a test treatment that has comparable efficacy with the active control, within a certain range, must also be superior to placebo. Because of this assumption, the design, implementation, and analysis of NI trials present challenges for sponsors and regulators. In designing and analyzing NI trials, substantial historical data are often required on the active control treatment and placebo. Bayesian approaches provide a natural framework for synthesizing the historical data in the form of prior distributions that can effectively be used in design and analysis of a NI clinical trial. Despite a flurry of recent research activities in the area of Bayesian approaches in medical product development, there are still substantial gaps in recognition and acceptance of Bayesian approaches in NI trial design and analysis. The Bayesian Scientific Working Group of the Drug Information Association provides a coordinated effort to target the education and implementation issues on Bayesian approaches for NI trials. In this article, we provide a review of both frequentist and Bayesian approaches in NI trials, and elaborate on the implementation for two common Bayesian methods including hierarchical prior method and meta-analytic-predictive approach. Simulations are conducted to investigate the properties of the Bayesian methods, and some real clinical trial examples are presented for illustration.

  20. Bayesian analysis of Markov point processes

    DEFF Research Database (Denmark)

    Berthelsen, Kasper Klitgaard; Møller, Jesper

    2006-01-01

    Recently Møller, Pettitt, Berthelsen and Reeves introduced a new MCMC methodology for drawing samples from a posterior distribution when the likelihood function is only specified up to a normalising constant. We illustrate the method in the setting of Bayesian inference for Markov point processes...

  1. [Cluster analysis in biomedical researches].

    Science.gov (United States)

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research. PMID:24640781

  2. PAC-Bayesian Analysis of Martingales and Multiarmed Bandits

    CERN Document Server

    Seldin, Yevgeny; Shawe-Taylor, John; Peters, Jan; Auer, Peter

    2011-01-01

    We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many ...

  3. ASteCA: Automated Stellar Cluster Analysis

    Science.gov (United States)

    Perren, G. I.; Vázquez, R. A.; Piatti, A. E.

    2015-04-01

    We present the Automated Stellar Cluster Analysis package (ASteCA), a suit of tools designed to fully automate the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its uncertainties. To validate the code we applied it on a large set of over 400 synthetic MASSCLEAN clusters with varying degrees of field star contamination as well as a smaller set of 20 observed Milky Way open clusters (Berkeley 7, Bochum 11, Czernik 26, Czernik 30, Haffner 11, Haffner 19, NGC 133, NGC 2236, NGC 2264, NGC 2324, NGC 2421, NGC 2627, NGC 6231, NGC 6383, NGC 6705, Ruprecht 1, Tombaugh 1, Trumpler 1, Trumpler 5 and Trumpler 14) studied in the literature. The results show that ASteCA is able to recover cluster parameters with an acceptable precision even for those clusters affected by substantial field star contamination. ASteCA is written in Python and is made available as an open source code which can be downloaded ready to be used from its official site.

  4. MATHEMATICAL RISK ANALYSIS: VIA NICHOLAS RISK MODEL AND BAYESIAN ANALYSIS

    Directory of Open Access Journals (Sweden)

    Anass BAYAGA

    2010-07-01

    Full Text Available The objective of this second part of a two-phased study was to explorethe predictive power of quantitative risk analysis (QRA method andprocess within Higher Education Institution (HEI. The method and process investigated the use impact analysis via Nicholas risk model and Bayesian analysis, with a sample of hundred (100 risk analysts in a historically black South African University in the greater Eastern Cape Province.The first findings supported and confirmed previous literature (KingIII report, 2009: Nicholas and Steyn, 2008: Stoney, 2007: COSA, 2004 that there was a direct relationship between risk factor, its likelihood and impact, certiris paribus. The second finding in relation to either controlling the likelihood or the impact of occurrence of risk (Nicholas risk model was that to have a brighter risk reward, it was important to control the likelihood ofoccurrence of risks as compared with its impact so to have a direct effect on entire University. On the Bayesian analysis, thus third finding, the impact of risk should be predicted along three aspects. These aspects included the human impact (decisions made, the property impact (students and infrastructural based and the business impact. Lastly, the study revealed that although in most business cases, where as business cycles considerably vary dependingon the industry and or the institution, this study revealed that, most impacts in HEI (University was within the period of one academic.The recommendation was that application of quantitative risk analysisshould be related to current legislative framework that affects HEI.

  5. Bayesian clustering of fuzzy feature vectors using a quasi-likelihood approach.

    Science.gov (United States)

    Marttinen, Pekka; Tang, Jing; De Baets, Bernard; Dawyndt, Peter; Corander, Jukka

    2009-01-01

    Bayesian model-based classifiers, both unsupervised and supervised, have been studied extensively and their value and versatility have been demonstrated on a wide spectrum of applications within science and engineering. A majority of the classifiers are built on the assumption of intrinsic discreteness of the considered data features or on the discretization of them prior to the modeling. On the other hand, Gaussian mixture classifiers have also been utilized to a large extent for continuous features in the Bayesian framework. Often the primary reason for discretization in the classification context is the simplification of the analytical and numerical properties of the models. However, the discretization can be problematic due to its \\textit{ad hoc} nature and the decreased statistical power to detect the correct classes in the resulting procedure. We introduce an unsupervised classification approach for fuzzy feature vectors that utilizes a discrete model structure while preserving the continuous characteristics of data. This is achieved by replacing the ordinary likelihood by a binomial quasi-likelihood to yield an analytical expression for the posterior probability of a given clustering solution. The resulting model can be justified from an information-theoretic perspective. Our method is shown to yield highly accurate clusterings for challenging synthetic and empirical data sets. PMID:19029547

  6. Elite Athletes Refine Their Internal Clocks: A Bayesian Analysis.

    Science.gov (United States)

    Chen, Yin-Hua; Verdinelli, Isabella; Cesari, Paola

    2016-07-01

    This paper carries out a full Bayesian analysis for a data set examined in Chen & Cesari (2015). These data were collected for assessing people's ability in evaluating short intervals of time. Chen & Cesari (2015) showed evidence of the existence of two independent internal clocks for evaluating time intervals below and above the second. We reexamine here, the same question by performing a complete statistical Bayesian analysis of the data. The Bayesian approach can be used to analyze these data thanks to the specific trial design. Data were obtained from evaluation of time ranges from two groups of individuals. More specifically, information gathered from a nontrained group (considered as baseline) allowed us to build a prior distribution for the parameter(s) of interest, and data from the trained group determined the likelihood function. This paper's main goals are (i) showing how the Bayesian inferential method can be used in statistical analyses and (ii) showing that the Bayesian methodology gives additional support to the findings presented in Chen & Cesari (2015) regarding the existence of two internal clocks in assessing duration of time intervals.

  7. Analysis of Gumbel Model for Software Reliability Using Bayesian Paradigm

    Directory of Open Access Journals (Sweden)

    Raj Kumar

    2012-12-01

    Full Text Available In this paper, we have illustrated the suitability of Gumbel Model for software reliability data. The model parameters are estimated using likelihood based inferential procedure: classical as well as Bayesian. The quasi Newton-Raphson algorithm is applied to obtain the maximum likelihood estimates and associated probability intervals. The Bayesian estimates of the parameters of Gumbel model are obtained using Markov Chain Monte Carlo(MCMC simulation method in OpenBUGS(established software for Bayesian analysis using Markov Chain Monte Carlo methods. The R functions are developed to study the statistical properties, model validation and comparison tools of the model and the output analysis of MCMC samples generated from OpenBUGS. Details of applying MCMC to parameter estimation for the Gumbel model are elaborated and a real software reliability data set is considered to illustrate the methods of inference discussed in this paper.

  8. Nested sampling applied in Bayesian room-acoustics decay analysis.

    Science.gov (United States)

    Jasa, Tomislav; Xiang, Ning

    2012-11-01

    Room-acoustic energy decays often exhibit single-rate or multiple-rate characteristics in a wide variety of rooms/halls. Both the energy decay order and decay parameter estimation are of practical significance in architectural acoustics applications, representing two different levels of Bayesian probabilistic inference. This paper discusses a model-based sound energy decay analysis within a Bayesian framework utilizing the nested sampling algorithm. The nested sampling algorithm is specifically developed to evaluate the Bayesian evidence required for determining the energy decay order with decay parameter estimates as a secondary result. Taking the energy decay analysis in architectural acoustics as an example, this paper demonstrates that two different levels of inference, decay model-selection and decay parameter estimation, can be cohesively accomplished by the nested sampling algorithm. PMID:23145609

  9. Phycas: software for Bayesian phylogenetic analysis.

    Science.gov (United States)

    Lewis, Paul O; Holder, Mark T; Swofford, David L

    2015-05-01

    Phycas is open source, freely available Bayesian phylogenetics software written primarily in C++ but with a Python interface. Phycas specializes in Bayesian model selection for nucleotide sequence data, particularly the estimation of marginal likelihoods, central to computing Bayes Factors. Marginal likelihoods can be estimated using newer methods (Thermodynamic Integration and Generalized Steppingstone) that are more accurate than the widely used Harmonic Mean estimator. In addition, Phycas supports two posterior predictive approaches to model selection: Gelfand-Ghosh and Conditional Predictive Ordinates. The General Time Reversible family of substitution models, as well as a codon model, are available, and data can be partitioned with all parameters unlinked except tree topology and edge lengths. Phycas provides for analyses in which the prior on tree topologies allows polytomous trees as well as fully resolved trees, and provides for several choices for edge length priors, including a hierarchical model as well as the recently described compound Dirichlet prior, which helps avoid overly informative induced priors on tree length. PMID:25577605

  10. Bayesian networks as a tool for epidemiological systems analysis

    Science.gov (United States)

    Lewis, F. I.

    2012-11-01

    Bayesian network analysis is a form of probabilistic modeling which derives from empirical data a directed acyclic graph (DAG) describing the dependency structure between random variables. Bayesian networks are increasingly finding application in areas such as computational and systems biology, and more recently in epidemiological analyses. The key distinction between standard empirical modeling approaches, such as generalised linear modeling, and Bayesian network analyses is that the latter attempts not only to identify statistically associated variables, but to additionally, and empirically, separate these into those directly and indirectly dependent with one or more outcome variables. Such discrimination is vastly more ambitious but has the potential to reveal far more about key features of complex disease systems. Applying Bayesian network modeling to biological and medical data has considerable computational demands, combined with the need to ensure robust model selection given the vast model space of possible DAGs. These challenges require the use of approximation techniques, such as the Laplace approximation, Markov chain Monte Carlo simulation and parametric bootstrapping, along with computational parallelization. A case study in structure discovery - identification of an optimal DAG for given data - is presented which uses additive Bayesian networks to explore veterinary disease data of industrial and medical relevance.

  11. On Bayesian analysis of on-off measurements

    CERN Document Server

    Nosek, Dalibor

    2016-01-01

    We propose an analytical solution to the on-off problem within the framework of Bayesian statistics. Both the statistical significance for the discovery of new phenomena and credible intervals on model parameters are presented in a consistent way. We use a large enough family of prior distributions of relevant parameters. The proposed analysis is designed to provide Bayesian solutions that can be used for any number of observed on-off events, including zero. The procedure is checked using Monte Carlo simulations. The usefulness of the method is demonstrated on examples from gamma-ray astronomy.

  12. Bayesian analysis of log Gaussian Cox processes for disease mapping

    DEFF Research Database (Denmark)

    Benes, Viktor; Bodlák, Karel; Møller, Jesper;

    of the risk on the covariates. Instead of using the common area level approaches we consider a Bayesian analysis for a log Gaussian Cox point process with covariates. Posterior characteristics for a discretized version of the log Gaussian Cox process are computed using markov chain Monte Carlo methods...

  13. Clustering analysis using Swarm Intelligence

    OpenAIRE

    Farmani, Mohammad Reza

    2016-01-01

    This thesis is concerned with the application of the swarm intelligence methods in clustering analysis of datasets. The main objectives of the thesis are ∙ Take the advantage of a novel evolutionary algorithm, called artificial bee colony, to improve the capability of K-means in finding global optimum clusters in nonlinear partitional clustering problems. ∙ Consider partitional clustering as an optimization problem and an improved antbased algorithm, named Opposition-Based A...

  14. A spatio-temporal nonparametric Bayesian variable selection model of fMRI data for clustering correlated time courses.

    Science.gov (United States)

    Zhang, Linlin; Guindani, Michele; Versace, Francesco; Vannucci, Marina

    2014-07-15

    In this paper we present a novel wavelet-based Bayesian nonparametric regression model for the analysis of functional magnetic resonance imaging (fMRI) data. Our goal is to provide a joint analytical framework that allows to detect regions of the brain which exhibit neuronal activity in response to a stimulus and, simultaneously, infer the association, or clustering, of spatially remote voxels that exhibit fMRI time series with similar characteristics. We start by modeling the data with a hemodynamic response function (HRF) with a voxel-dependent shape parameter. We detect regions of the brain activated in response to a given stimulus by using mixture priors with a spike at zero on the coefficients of the regression model. We account for the complex spatial correlation structure of the brain by using a Markov random field (MRF) prior on the parameters guiding the selection of the activated voxels, therefore capturing correlation among nearby voxels. In order to infer association of the voxel time courses, we assume correlated errors, in particular long memory, and exploit the whitening properties of discrete wavelet transforms. Furthermore, we achieve clustering of the voxels by imposing a Dirichlet process (DP) prior on the parameters of the long memory process. For inference, we use Markov Chain Monte Carlo (MCMC) sampling techniques that combine Metropolis-Hastings schemes employed in Bayesian variable selection with sampling algorithms for nonparametric DP models. We explore the performance of the proposed model on simulated data, with both block- and event-related design, and on real fMRI data. PMID:24650600

  15. Bayesian Investigation of Isochrone Consistency Using the Old Open Cluster NGC 188

    CERN Document Server

    Hills, Shane; Courteau, Stephane; Geller, Aaron M

    2015-01-01

    This paper provides a detailed comparison of the differences in parameters derived for a star cluster from its color-magnitude diagrams depending on the filters and models used. We examine the consistency and reliability of fitting three widely-used stellar evolution models to fifteen combinations of optical and near-IR photometry for the old open cluster NGC 188. The optical filter response curves match those of the theoretical systems and are thus not the source of fit inconsistencies. NGC 188 is ideally suited to the present study thanks to a wide variety of high-quality photometry and available proper motions and radial velocities which enable us to remove non-cluster members and many binaries. Our Bayesian fitting technique yields inferred values of age, metallicity, distance modulus, and absorption as a function of the photometric band combinations and stellar models. We show that the historically-favored three band combinations of UBV and VRI can be meaningfully inconsistent with each other and with lo...

  16. A new sparse Bayesian learning method for inverse synthetic aperture radar imaging via exploiting cluster patterns

    Science.gov (United States)

    Fang, Jun; Zhang, Lizao; Duan, Huiping; Huang, Lei; Li, Hongbin

    2016-05-01

    The application of sparse representation to SAR/ISAR imaging has attracted much attention over the past few years. This new class of sparse representation based imaging methods present a number of unique advantages over conventional range-Doppler methods, the basic idea behind these works is to formulate SAR/ISAR imaging as a sparse signal recovery problem. In this paper, we propose a new two-dimensional pattern-coupled sparse Bayesian learning(SBL) method to capture the underlying cluster patterns of the ISAR target images. Based on this model, an expectation-maximization (EM) algorithm is developed to infer the maximum a posterior (MAP) estimate of the hyperparameters, along with the posterior distribution of the sparse signal. Experimental results demonstrate that the proposed method is able to achieve a substantial performance improvement over existing algorithms, including the conventional SBL method.

  17. Bayesian analysis of genetic differentiation between populations.

    Science.gov (United States)

    Corander, Jukka; Waldmann, Patrik; Sillanpää, Mikko J

    2003-01-01

    We introduce a Bayesian method for estimating hidden population substructure using multilocus molecular markers and geographical information provided by the sampling design. The joint posterior distribution of the substructure and allele frequencies of the respective populations is available in an analytical form when the number of populations is small, whereas an approximation based on a Markov chain Monte Carlo simulation approach can be obtained for a moderate or large number of populations. Using the joint posterior distribution, posteriors can also be derived for any evolutionary population parameters, such as the traditional fixation indices. A major advantage compared to most earlier methods is that the number of populations is treated here as an unknown parameter. What is traditionally considered as two genetically distinct populations, either recently founded or connected by considerable gene flow, is here considered as one panmictic population with a certain probability based on marker data and prior information. Analyses of previously published data on the Moroccan argan tree (Argania spinosa) and of simulated data sets suggest that our method is capable of estimating a population substructure, while not artificially enforcing a substructure when it does not exist. The software (BAPS) used for the computations is freely available from http://www.rni.helsinki.fi/~mjs. PMID:12586722

  18. A Clustering Method of Highly Dimensional Patent Data Using Bayesian Approach

    OpenAIRE

    Sunghae Jun

    2012-01-01

    Patent data have diversely technological information of any technology field. So, many companies have managed the patent data to build their RD policy. Patent analysis is an approach to the patent management. Also, patent analysis is an important tool for technology forecasting. Patent clustering is one of the works for patent analysis. In this paper, we propose an efficient clustering method of patent documents. Generally, patent data are consisted of text document. The patent documents have...

  19. Evaluation of a Partial Genome Screening of Two Asthma Susceptibility Regions Using Bayesian Network Based Bayesian Multilevel Analysis of Relevance

    OpenAIRE

    Ildikó Ungvári; Gábor Hullám; Péter Antal; Petra Sz Kiszel; András Gézsi; Éva Hadadi; Viktor Virág; Gergely Hajós; András Millinghoffer; Adrienne Nagy; András Kiss; Semsei, Ágnes F.; Gergely Temesi; Béla Melegh; Péter Kisfali

    2012-01-01

    Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls). The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA). Th...

  20. Bayesian History Reconstruction of Complex Human Gene Clusters on a Phylogeny

    CERN Document Server

    Vinař, Tomáš; Song, Giltae; Siepel, Adam

    2009-01-01

    Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. Improved understanding of these clusters is of utmost importance, since they have been shown to be the source of evolutionary innovation, and have been linked to multiple diseases, including HIV and a variety of cancers. Previously, Zhang et al. (2008) developed an algorithm for reconstructing parsimonious evolutionary histories of such gene clusters, using only human genomic sequence data. In this paper, we propose a probabilistic model for the evolution of gene clusters on a phylogeny, and an MCMC algorithm for reconstruction of duplication histories from genomic sequences in multiple species. Several projects are underway to obtain high quality BAC-based assemblies of duplicated clusters in multiple species, and we anticipate that our method will be useful in analyzing these valuable new data sets.

  1. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  2. Bayesian phylogeny analysis via stochastic approximation Monte Carlo

    KAUST Repository

    Cheon, Sooyoung

    2009-11-01

    Monte Carlo methods have received much attention in the recent literature of phylogeny analysis. However, the conventional Markov chain Monte Carlo algorithms, such as the Metropolis-Hastings algorithm, tend to get trapped in a local mode in simulating from the posterior distribution of phylogenetic trees, rendering the inference ineffective. In this paper, we apply an advanced Monte Carlo algorithm, the stochastic approximation Monte Carlo algorithm, to Bayesian phylogeny analysis. Our method is compared with two popular Bayesian phylogeny software, BAMBE and MrBayes, on simulated and real datasets. The numerical results indicate that our method outperforms BAMBE and MrBayes. Among the three methods, SAMC produces the consensus trees which have the highest similarity to the true trees, and the model parameter estimates which have the smallest mean square errors, but costs the least CPU time. © 2009 Elsevier Inc. All rights reserved.

  3. Bayesian investigation of isochrone consistency using the old open cluster NGC 188

    Energy Technology Data Exchange (ETDEWEB)

    Hills, Shane; Courteau, Stéphane [Department of Physics, Engineering Physics and Astronomy, Queen’s University, Kingston, ON K7L 3N6 Canada (Canada); Von Hippel, Ted [Department of Physical Sciences, Embry-Riddle Aeronautical University, Daytona Beach, FL 32114 (United States); Geller, Aaron M., E-mail: shane.hills@queensu.ca, E-mail: courteau@astro.queensu.ca, E-mail: ted.vonhippel@erau.edu, E-mail: a-geller@northwestern.edu [Center for Interdisciplinary Exploration and Research in Astrophysics (CIERA) and Department of Physics and Astronomy, Northwestern University, 2145 Sheridan Road, Evanston, IL 60208 (United States)

    2015-03-01

    This paper provides a detailed comparison of the differences in parameters derived for a star cluster from its color–magnitude diagrams (CMDs) depending on the filters and models used. We examine the consistency and reliability of fitting three widely used stellar evolution models to 15 combinations of optical and near-IR photometry for the old open cluster NGC 188. The optical filter response curves match those of theoretical systems and are thus not the source of fit inconsistencies. NGC 188 is ideally suited to this study thanks to a wide variety of high-quality photometry and available proper motions and radial velocities that enable us to remove non-cluster members and many binaries. Our Bayesian fitting technique yields inferred values of age, metallicity, distance modulus, and absorption as a function of the photometric band combinations and stellar models. We show that the historically favored three-band combinations of UBV and VRI can be meaningfully inconsistent with each other and with longer baseline data sets such as UBVRIJHK{sub S}. Differences among model sets can also be substantial. For instance, fitting Yi et al. (2001) and Dotter et al. (2008) models to UBVRIJHK{sub S} photometry for NGC 188 yields the following cluster parameters: age = (5.78 ± 0.03, 6.45 ± 0.04) Gyr, [Fe/H] = (+0.125 ± 0.003, −0.077 ± 0.003) dex, (m−M){sub V} = (11.441 ± 0.007, 11.525 ± 0.005) mag, and A{sub V} = (0.162 ± 0.003, 0.236 ± 0.003) mag, respectively. Within the formal fitting errors, these two fits are substantially and statistically different. Such differences among fits using different filters and models are a cautionary tale regarding our current ability to fit star cluster CMDs. Additional modeling of this kind, with more models and star clusters, and future Gaia parallaxes are critical for isolating and quantifying the most relevant uncertainties in stellar evolutionary models.

  4. REMITTANCES, DUTCH DISEASE, AND COMPETITIVENESS: A BAYESIAN ANALYSIS

    OpenAIRE

    FARID MAKHLOUF; MAZHAR MUGHAL

    2013-01-01

    The paper studies symptoms of Dutch disease in the Pakistani economy arising from international remittances. An IV Bayesian analysis is carried out to take care of the endogeneity and uncertainty due to the managed float of Pakistani Rupee. We find evidence for both spending and resource movement effects in both the short and the long-run. These impacts are stronger and different from those the Official Development Assistance and the FDI exert. We find that while aggregate remittances and the...

  5. Optimizing Nuclear Reaction Analysis (NRA) using Bayesian Experimental Design

    OpenAIRE

    von Toussaint, U.; Schwarz-Selinger, T.; Gori, S.

    2008-01-01

    Nuclear Reaction Analysis with ${}^{3}$He holds the promise to measure Deuterium depth profiles up to large depths. However, the extraction of the depth profile from the measured data is an ill-posed inversion problem. Here we demonstrate how Bayesian Experimental Design can be used to optimize the number of measurements as well as the measurement energies to maximize the information gain. Comparison of the inversion properties of the optimized design with standard settings reveals huge possi...

  6. Applications of Bayesian Procrustes shape analysis to ensemble radar reflectivity nowcast verification

    Science.gov (United States)

    Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang

    2016-07-01

    This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.

  7. Bayesian-network-based safety risk analysis in construction projects

    International Nuclear Information System (INIS)

    This paper presents a systemic decision support approach for safety risk analysis under uncertainty in tunnel construction. Fuzzy Bayesian Networks (FBN) is used to investigate causal relationships between tunnel-induced damage and its influential variables based upon the risk/hazard mechanism analysis. Aiming to overcome limitations on the current probability estimation, an expert confidence indicator is proposed to ensure the reliability of the surveyed data for fuzzy probability assessment of basic risk factors. A detailed fuzzy-based inference procedure is developed, which has a capacity of implementing deductive reasoning, sensitivity analysis and abductive reasoning. The “3σ criterion” is adopted to calculate the characteristic values of a triangular fuzzy number in the probability fuzzification process, and the α-weighted valuation method is adopted for defuzzification. The construction safety analysis progress is extended to the entire life cycle of risk-prone events, including the pre-accident, during-construction continuous and post-accident control. A typical hazard concerning the tunnel leakage in the construction of Wuhan Yangtze Metro Tunnel in China is presented as a case study, in order to verify the applicability of the proposed approach. The results demonstrate the feasibility of the proposed approach and its application potential. A comparison of advantages and disadvantages between FBN and fuzzy fault tree analysis (FFTA) as risk analysis tools is also conducted. The proposed approach can be used to provide guidelines for safety analysis and management in construction projects, and thus increase the likelihood of a successful project in a complex environment. - Highlights: • A systemic Bayesian network based approach for safety risk analysis is developed. • An expert confidence indicator for probability fuzzification is proposed. • Safety risk analysis progress is extended to entire life cycle of risk-prone events. • A typical

  8. BaTMAn: Bayesian Technique for Multi-image Analysis

    CERN Document Server

    Casado, J; García-Benito, R; Guidi, G; Choudhury, O S; Bellocchi, E; Sánchez, S; Díaz, A I

    2016-01-01

    This paper describes the Bayesian Technique for Multi-image Analysis (BaTMAn), a novel image segmentation technique based on Bayesian statistics, whose main purpose is to characterize an astronomical dataset containing spatial information and perform a tessellation based on the measurements and errors provided as input. The algorithm will iteratively merge spatial elements as long as they are statistically consistent with carrying the same information (i.e. signal compatible with being identical within the errors). We illustrate its operation and performance with a set of test cases that comprises both synthetic and real Integral-Field Spectroscopic (IFS) data. Our results show that the segmentations obtained by BaTMAn adapt to the underlying structure of the data, regardless of the precise details of their morphology and the statistical properties of the noise. The quality of the recovered signal represents an improvement with respect to the input, especially in those regions where the signal is actually con...

  9. Bayesian tomography and integrated data analysis in fusion diagnostics

    Science.gov (United States)

    Li, Dong; Dong, Y. B.; Deng, Wei; Shi, Z. B.; Fu, B. Z.; Gao, J. M.; Wang, T. B.; Zhou, Yan; Liu, Yi; Yang, Q. W.; Duan, X. R.

    2016-11-01

    In this article, a Bayesian tomography method using non-stationary Gaussian process for a prior has been introduced. The Bayesian formalism allows quantities which bear uncertainty to be expressed in the probabilistic form so that the uncertainty of a final solution can be fully resolved from the confidence interval of a posterior probability. Moreover, a consistency check of that solution can be performed by checking whether the misfits between predicted and measured data are reasonably within an assumed data error. In particular, the accuracy of reconstructions is significantly improved by using the non-stationary Gaussian process that can adapt to the varying smoothness of emission distribution. The implementation of this method to a soft X-ray diagnostics on HL-2A has been used to explore relevant physics in equilibrium and MHD instability modes. This project is carried out within a large size inference framework, aiming at an integrated analysis of heterogeneous diagnostics.

  10. Bayesian networks for omics data analysis

    NARCIS (Netherlands)

    Gavai, A.K.

    2009-01-01

    This thesis focuses on two aspects of high throughput technologies, i.e. data storage and data analysis, in particular in transcriptomics and metabolomics. Both technologies are part of a research field that is generally called ‘omics’ (or ‘-omics’, with a leading hyphen), which refers to genomics,

  11. Nonparametric Bayesian Negative Binomial Factor Analysis

    OpenAIRE

    Zhou, Mingyuan

    2016-01-01

    A common approach to analyze an attribute-instance count matrix, an element of which represents how many times an attribute appears in an instance, is to factorize it under the Poisson likelihood. We show its limitation in capturing the tendency for an attribute present in an instance to both repeat itself and excite related ones. To address this limitation, we construct negative binomial factor analysis (NBFA) to factorize the matrix under the negative binomial likelihood, and relate it to a...

  12. Bayesian analysis to detect abrupt changes in extreme hydrological processes

    Science.gov (United States)

    Jo, Seongil; Kim, Gwangsu; Jeon, Jong-June

    2016-07-01

    In this study, we develop a new method for a Bayesian change point analysis. The proposed method is easy to implement and can be extended to a wide class of distributions. Using a generalized extreme-value distribution, we investigate the annual maximum of precipitations observed at stations in the South Korean Peninsula, and find significant changes in the considered sites. We evaluate the hydrological risk in predictions using the estimated return levels. In addition, we explain that the misspecification of the probability model can lead to a bias in the number of change points and using a simple example, show that this problem is difficult to avoid by technical data transformation.

  13. A Bayesian analysis of pentaquark signals from CLAS data

    CERN Document Server

    Ireland, D G; Protopopescu, D; Ambrozewicz, P; Anghinolfi, M; Asryan, G; Avakian, H; Bagdasaryan, H; Baillie, N; Ball, J P; Baltzell, N A; Batourine, V; Battaglieri, M; Bedlinskiy, I; Bellis, M; Benmouna, N; Berman, B L; Biselli, A S; Blaszczyk, L; Bouchigny, S; Boiarinov, S; Bradford, R; Branford, D; Briscoe, W J; Brooks, W K; Burkert, V D; Butuceanu, C; Calarco, J R; Careccia, S L; Carman, D S; Casey, L; Chen, S; Cheng, L; Cole, P L; Collins, P; Coltharp, P; Crabb, D; Credé, V; Dashyan, N; De Masi, R; De Vita, R; De Sanctis, E; Degtyarenko, P V; Deur, A; Dickson, R; Djalali, C; Dodge, G E; Donnelly, J; Doughty, D; Dugger, M; Dzyubak, O P; Egiyan, H; Egiyan, K S; El Fassi, L; Elouadrhiri, L; Eugenio, P; Fedotov, G; Feldman, G; Fradi, A; Funsten, H; Garçon, M; Gavalian, G; Gevorgyan, N; Gilfoyle, G P; Giovanetti, K L; Girod, F X; Goetz, J T; Gohn, W; Gonenc, A; Gothe, R W; Griffioen, K A; Guidal, M; Guler, N; Guo, L; Gyurjyan, V; Hafidi, K; Hakobyan, H; Hanretty, C; Hassall, N; Hersman, F W; Hleiqawi, I; Holtrop, M; Hyde-Wright, C E; Ilieva, Y; Ishkhanov, B S; Isupov, E L; Jenkins, D; Jo, H S; Johnstone, J R; Joo, K; Jüngst, H G; Kalantarians, N; Kellie, J D; Khandaker, M; Kim, W; Klein, A; Klein, F J; Kossov, M; Krahn, Z; Kramer, L H; Kubarovski, V; Kühn, J; Kuleshov, S V; Kuznetsov, V; Lachniet, J; Laget, J M; Langheinrich, J; Lawrence, D; Livingston, K; Lu, H Y; MacCormick, M; Markov, N; Mattione, P; Mecking, B A; Mestayer, M D; Meyer, C A; Mibe, T; Mikhailov, K; Mirazita, M; Miskimen, R; Mokeev, V; Moreno, B; Moriya, K; Morrow, S A; Moteabbed, M; Munevar, E; Mutchler, G S; Nadel-Turonski, P; Nasseripour, R; Niccolai, S; Niculescu, G; Niculescu, I; Niczyporuk, B B; Niroula, M R; Niyazov, R A; Nozar, M; Osipenko, M; Ostrovidov, A I; Park, K; Pasyuk, E; Paterson, C; Anefalos Pereira, S; Pierce, J; Pivnyuk, N; Pogorelko, O; Pozdniakov, S; Price, J W; Procureur, S; Prok, Y; Raue, B A; Ricco, G; Ripani, M; Ritchie, B G; Ronchetti, F; Rosner, G; Rossi, P; Sabatie, F; Salamanca, J; Salgado, C; Santoro, J P; Sapunenko, V; Schumacher, R A; Serov, V S; Sharabyan, Yu G; Sharov, D; Shvedunov, N V; Smith, E S; Smith, L C; Sober, D I; Sokhan, D; Stavinsky, A; Stepanyan, S S; Stepanyan, S; Stokes, B E; Stoler, P; Strauch, S; Taiuti, M; Tedeschi, D J; Thoma, U; Tkabladze, A; Tkachenko, S; Tur, C; Ungaro, M; Vineyard, M F; Vlassov, A V; Watts, D P; Weinstein, L B; Weygand, D P; Williams, M; Wolin, E; Wood, M H; Yegneswaran, A; Zana, L; Zhang, J; Zhao, B; Zhao, Z W

    2007-01-01

    We examine the results of two measurements by the CLAS collaboration, one of which claimed evidence for a $\\Theta^{+}$ pentaquark, whilst the other found no such evidence. The unique feature of these two experiments was that they were performed with the same experimental setup. Using a Bayesian analysis we find that the results of the two experiments are in fact compatible with each other, but that the first measurement did not contain sufficient information to determine unambiguously the existence of a $\\Theta^{+}$. Further, we suggest a means by which the existence of a new candidate particle can be tested in a rigorous manner.

  14. A Bayesian analysis of pentaquark signals from CLAS data

    Energy Technology Data Exchange (ETDEWEB)

    David Ireland; Bryan McKinnon; Dan Protopopescu; Pawel Ambrozewicz; Marco Anghinolfi; G. Asryan; Harutyun Avakian; H. Bagdasaryan; Nathan Baillie; Jacques Ball; Nathan Baltzell; V. Batourine; Marco Battaglieri; Ivan Bedlinski; Ivan Bedlinskiy; Matthew Bellis; Nawal Benmouna; Barry Berman; Angela Biselli; Lukasz Blaszczyk; Sylvain Bouchigny; Sergey Boyarinov; Robert Bradford; Derek Branford; William Briscoe; William Brooks; Volker Burkert; Cornel Butuceanu; John Calarco; Sharon Careccia; Daniel Carman; Liam Casey; Shifeng Chen; Lu Cheng; Philip Cole; Patrick Collins; Philip Coltharp; Donald Crabb; Volker Crede; Natalya Dashyan; Rita De Masi; Raffaella De Vita; Enzo De Sanctis; Pavel Degtiarenko; Alexandre Deur; Richard Dickson; Chaden Djalali; Gail Dodge; Joseph Donnelly; David Doughty; Michael Dugger; Oleksandr Dzyubak; Hovanes Egiyan; Kim Egiyan; Lamiaa Elfassi; Latifa Elouadrhiri; Paul Eugenio; Gleb Fedotov; Gerald Feldman; Ahmed Fradi; Herbert Funsten; Michel Garcon; Gagik Gavalian; Nerses Gevorgyan; Gerard Gilfoyle; Kevin Giovanetti; Francois-Xavier Girod; John Goetz; Wesley Gohn; Atilla Gonenc; Ralf Gothe; Keith Griffioen; Michel Guidal; Nevzat Guler; Lei Guo; Vardan Gyurjyan; Kawtar Hafidi; Hayk Hakobyan; Charles Hanretty; Neil Hassall; F. Hersman; Ishaq Hleiqawi; Maurik Holtrop; Charles Hyde; Yordanka Ilieva; Boris Ishkhanov; Eugeny Isupov; D. Jenkins; Hyon-Suk Jo; John Johnstone; Kyungseon Joo; Henry Juengst; Narbe Kalantarians; James Kellie; Mahbubul Khandaker; Wooyoung Kim; Andreas Klein; Franz Klein; Mikhail Kossov; Zebulun Krahn; Laird Kramer; Valery Kubarovsky; Joachim Kuhn; Sergey Kuleshov; Viacheslav Kuznetsov; Jeff Lachniet; Jean Laget; Jorn Langheinrich; D. Lawrence; Kenneth Livingston; Haiyun Lu; Marion MacCormick; Nikolai Markov; Paul Mattione; Bernhard Mecking; Mac Mestayer; Curtis Meyer; Tsutomu Mibe; Konstantin Mikhaylov; Marco Mirazita; Rory Miskimen; Viktor Mokeev; Brahim Moreno; Kei Moriya; Steven Morrow; Maryam Moteabbed; Edwin Munevar Espitia; Gordon Mutchler; Pawel Nadel-Turonski; Rakhsha Nasseripour; Silvia Niccolai; Gabriel Niculescu; Maria-Ioana Niculescu; Bogdan Niczyporuk; Megh Niroula; Rustam Niyazov; Mina Nozar; Mikhail Osipenko; Alexander Ostrovidov; Kijun Park; Evgueni Pasyuk; Craig Paterson; Sergio Pereira; Joshua Pierce; Nikolay Pivnyuk; Oleg Pogorelko; Sergey Pozdnyakov; John Price; Sebastien Procureur; Yelena Prok; Brian Raue; Giovanni Ricco; Marco Ripani; Barry Ritchie; Federico Ronchetti; Guenther Rosner; Patrizia Rossi; Franck Sabatie; Julian Salamanca; Carlos Salgado; Joseph Santoro; Vladimir Sapunenko; Reinhard Schumacher; Vladimir Serov; Youri Sharabian; Dmitri Sharov; Nikolay Shvedunov; Elton Smith; Lee Smith; Daniel Sober; Daria Sokhan; Aleksey Stavinskiy; Samuel Stepanyan; Stepan Stepanyan; Burnham Stokes; Paul Stoler; Steffen Strauch; Mauro Taiuti; David Tedeschi; Ulrike Thoma; Avtandil Tkabladze; Svyatoslav Tkachenko; Clarisse Tur; Maurizio Ungaro; Michael Vineyard; Alexander Vlassov; Daniel Watts; Lawrence Weinstein; Dennis Weygand; M. Williams; Elliott Wolin; M.H. Wood; Amrit Yegneswaran; Lorenzo Zana; Jixie Zhang; Bo Zhao; Zhiwen Zhao

    2008-02-01

    We examine the results of two measurements by the CLAS collaboration, one of which claimed evidence for a $\\Theta^{+}$ pentaquark, whilst the other found no such evidence. The unique feature of these two experiments was that they were performed with the same experimental setup. Using a Bayesian analysis we find that the results of the two experiments are in fact compatible with each other, but that the first measurement did not contain sufficient information to determine unambiguously the existence of a $\\Theta^{+}$. Further, we suggest a means by which the existence of a new candidate particle can be tested in a rigorous manner.

  15. Bayesian conformational analysis of ring molecules through reversible jump MCMC

    DEFF Research Database (Denmark)

    Nolsøe, Kim; Kessler, Mathieu; Pérez, José;

    2005-01-01

    In this paper we address the problem of classifying the conformations of mmembered rings using experimental observations obtained by crystal structure analysis. We formulate a model for the data generation mechanism that consists in a multidimensional mixture model. We perform inference for the p...... for the proportions and the components in a Bayesian framework, implementing an MCMC Reversible Jumps Algorithm to obtain samples of the posterior distributions. The method is illustrated on a simulated data set and on real data corresponding to cyclo-octane structures....

  16. Bayesian Reasoning in Data Analysis A Critical Introduction

    CERN Document Server

    D'Agostini, Giulio

    2003-01-01

    This book provides a multi-level introduction to Bayesian reasoning (as opposed to "conventional statistics") and its applications to data analysis. The basic ideas of this "new" approach to the quantification of uncertainty are presented using examples from research and everyday life. Applications covered include: parametric inference; combination of results; treatment of uncertainty due to systematic errors and background; comparison of hypotheses; unfolding of experimental distributions; upper/lower bounds in frontier-type measurements. Approximate methods for routine use are derived and ar

  17. A Bayesian analysis of pentaquark signals from CLAS data

    International Nuclear Information System (INIS)

    We examine the results of two measurements by the CLAS collaboration, one of which claimed evidence for a Θ+ pentaquark, whilst the other found no such evidence. The unique feature of these two experiments was that they were performed with the same experimental setup. Using a Bayesian analysis we find that the results of the two experiments are in fact compatible with each other, but that the first measurement did not contain sufficient information to determine unambiguously the existence of a Θ+. Further, we suggest a means by which the existence of a new candidate particle can be tested in a rigorous manner

  18. Safety Analysis of Liquid Rocket Engine Using Bayesian Networks

    Institute of Scientific and Technical Information of China (English)

    WANG Hua-wei; YAN Zhi-qiang

    2007-01-01

    Safety analysis for liquid rocket engine has a great meaning for shortening development cycle, saving development expenditure and reducing development risk. The relationship between the structure and component of liquid rocket engine is much more complex, furthermore test data are absent in development phase. Thereby, the uncertainties exist in safety analysis for liquid rocket engine. A safety analysis model integrated with FMEA(failure mode and effect analysis)based on Bayesian networks (BN) is brought forward for liquid rocket engine, which can combine qualitative analysis with quantitative decision. The method has the advantages of fusing multi-information, saving sample amount and having high veracity. An example shows that the method is efficient.

  19. Implementation of a Bayesian Engine for Uncertainty Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Leng Vang; Curtis Smith; Steven Prescott

    2014-08-01

    In probabilistic risk assessment, it is important to have an environment where analysts have access to a shared and secured high performance computing and a statistical analysis tool package. As part of the advanced small modular reactor probabilistic risk analysis framework implementation, we have identified the need for advanced Bayesian computations. However, in order to make this technology available to non-specialists, there is also a need of a simplified tool that allows users to author models and evaluate them within this framework. As a proof-of-concept, we have implemented an advanced open source Bayesian inference tool, OpenBUGS, within the browser-based cloud risk analysis framework that is under development at the Idaho National Laboratory. This development, the “OpenBUGS Scripter” has been implemented as a client side, visual web-based and integrated development environment for creating OpenBUGS language scripts. It depends on the shared server environment to execute the generated scripts and to transmit results back to the user. The visual models are in the form of linked diagrams, from which we automatically create the applicable OpenBUGS script that matches the diagram. These diagrams can be saved locally or stored on the server environment to be shared with other users.

  20. Inference algorithms and learning theory for Bayesian sparse factor analysis

    International Nuclear Information System (INIS)

    Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.

  1. Analysis of Wave Directional Spreading by Bayesian Parameter Estimation

    Institute of Scientific and Technical Information of China (English)

    钱桦; 莊士贤; 高家俊

    2002-01-01

    A spatial array of wave gauges installed on an observatoion platform has been designed and arranged to measure the lo-cal features of winter monsoon directional waves off Taishi coast of Taiwan. A new method, named the Bayesian ParameterEstimation Method( BPEM), is developed and adopted to determine the main direction and the directional spreading parame-ter of directional spectra. The BPEM could be considered as a regression analysis to find the maximum joint probability ofparameters, which best approximates the observed data from the Bayesian viewpoint. The result of the analysis of field wavedata demonstrates the highly dependency of the characteristics of normalized directional spreading on the wave age. The Mit-suyasu type empirical formula of directional spectnun is therefore modified to be representative of monsoon wave field. More-over, it is suggested that Smax could be expressed as a function of wave steepness. The values of Smax decrease with increas-ing steepness. Finally, a local directional spreading model, which is simple to be utilized in engineering practice, is prop-osed.

  2. Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models.

    Science.gov (United States)

    Hack, C Eric

    2006-04-17

    Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach. PMID:16466842

  3. Node Augmentation Technique in Bayesian Network Evidence Analysis and Marshaling

    Energy Technology Data Exchange (ETDEWEB)

    Keselman, Dmitry [Los Alamos National Laboratory; Tompkins, George H [Los Alamos National Laboratory; Leishman, Deborah A [Los Alamos National Laboratory

    2010-01-01

    Given a Bayesian network, sensitivity analysis is an important activity. This paper begins by describing a network augmentation technique which can simplifY the analysis. Next, we present two techniques which allow the user to determination the probability distribution of a hypothesis node under conditions of uncertain evidence; i.e. the state of an evidence node or nodes is described by a user specified probability distribution. Finally, we conclude with a discussion of three criteria for ranking evidence nodes based on their influence on a hypothesis node. All of these techniques have been used in conjunction with a commercial software package. A Bayesian network based on a directed acyclic graph (DAG) G is a graphical representation of a system of random variables that satisfies the following Markov property: any node (random variable) is independent of its non-descendants given the state of all its parents (Neapolitan, 2004). For simplicities sake, we consider only discrete variables with a finite number of states, though most of the conclusions may be generalized.

  4. Statistical analysis using the Bayesian nonparametric method for irradiation embrittlement of reactor pressure vessels

    Science.gov (United States)

    Takamizawa, Hisashi; Itoh, Hiroto; Nishiyama, Yutaka

    2016-10-01

    In order to understand neutron irradiation embrittlement in high fluence regions, statistical analysis using the Bayesian nonparametric (BNP) method was performed for the Japanese surveillance and material test reactor irradiation database. The BNP method is essentially expressed as an infinite summation of normal distributions, with input data being subdivided into clusters with identical statistical parameters, such as mean and standard deviation, for each cluster to estimate shifts in ductile-to-brittle transition temperature (DBTT). The clusters typically depend on chemical compositions, irradiation conditions, and the irradiation embrittlement. Specific variables contributing to the irradiation embrittlement include the content of Cu, Ni, P, Si, and Mn in the pressure vessel steels, neutron flux, neutron fluence, and irradiation temperatures. It was found that the measured shifts of DBTT correlated well with the calculated ones. Data associated with the same materials were subdivided into the same clusters even if neutron fluences were increased. Comparing cluster IDs 2 and 6, embrittlement of high-Cu-bearing materials ( A flux effect with a higher flux range was demonstrated for cluster ID 3 comprising MTR irradiation in a high flux region (≤1 × 1013 n/cm2/s) [44]. For cluster ID 10, classification is rendered based upon flux effect, where embrittlement is accelerated in high Cu-bearing materials irradiated at lower flux levels (less than 5 × 109 n/cm2·s). This is possibly due to increased thermal equilibrium vacancies [44,45]. Per all the above considerations, it was hence ascertained that data belonging to identical cluster ID's maintain the similar embrittlement mechanism attributes.With the aim to examine the clustering versus the neutron fluence, the relationship between the Cu content representative of materials and the fluence for PWR and MTR irradiation was plotted in Fig. 13 (a) and (b). For enhancing plot clarities, data points were slightly

  5. A Bayesian analysis of two probability models describing thunderstorm activity at Cape Kennedy, Florida

    Science.gov (United States)

    Williford, W. O.; Hsieh, P.; Carter, M. C.

    1974-01-01

    A Bayesian analysis of the two discrete probability models, the negative binomial and the modified negative binomial distributions, which have been used to describe thunderstorm activity at Cape Kennedy, Florida, is presented. The Bayesian approach with beta prior distributions is compared to the classical approach which uses a moment method of estimation or a maximum-likelihood method. The accuracy and simplicity of the Bayesian method is demonstrated.

  6. A Bayesian Framework for Reliability Analysis of Spacecraft Deployments

    Science.gov (United States)

    Evans, John W.; Gallo, Luis; Kaminsky, Mark

    2012-01-01

    Deployable subsystems are essential to mission success of most spacecraft. These subsystems enable critical functions including power, communications and thermal control. The loss of any of these functions will generally result in loss of the mission. These subsystems and their components often consist of unique designs and applications for which various standardized data sources are not applicable for estimating reliability and for assessing risks. In this study, a two stage sequential Bayesian framework for reliability estimation of spacecraft deployment was developed for this purpose. This process was then applied to the James Webb Space Telescope (JWST) Sunshield subsystem, a unique design intended for thermal control of the Optical Telescope Element. Initially, detailed studies of NASA deployment history, "heritage information", were conducted, extending over 45 years of spacecraft launches. This information was then coupled to a non-informative prior and a binomial likelihood function to create a posterior distribution for deployments of various subsystems uSing Monte Carlo Markov Chain sampling. Select distributions were then coupled to a subsequent analysis, using test data and anomaly occurrences on successive ground test deployments of scale model test articles of JWST hardware, to update the NASA heritage data. This allowed for a realistic prediction for the reliability of the complex Sunshield deployment, with credibility limits, within this two stage Bayesian framework.

  7. Risk analysis of dust explosion scenarios using Bayesian networks.

    Science.gov (United States)

    Yuan, Zhi; Khakzad, Nima; Khan, Faisal; Amyotte, Paul

    2015-02-01

    In this study, a methodology has been proposed for risk analysis of dust explosion scenarios based on Bayesian network. Our methodology also benefits from a bow-tie diagram to better represent the logical relationships existing among contributing factors and consequences of dust explosions. In this study, the risks of dust explosion scenarios are evaluated, taking into account common cause failures and dependencies among root events and possible consequences. Using a diagnostic analysis, dust particle properties, oxygen concentration, and safety training of staff are identified as the most critical root events leading to dust explosions. The probability adaptation concept is also used for sequential updating and thus learning from past dust explosion accidents, which is of great importance in dynamic risk assessment and management. We also apply the proposed methodology to a case study to model dust explosion scenarios, to estimate the envisaged risks, and to identify the vulnerable parts of the system that need additional safety measures. PMID:25264172

  8. Evolutionary Analysis of Dengue Serotype 2 Viruses Using Phylogenetic and Bayesian Methods from New Delhi, India.

    Science.gov (United States)

    Afreen, Nazia; Naqvi, Irshad H; Broor, Shobha; Ahmed, Anwar; Kazim, Syed Naqui; Dohare, Ravins; Kumar, Manoj; Parveen, Shama

    2016-03-01

    Dengue fever is the most important arboviral disease in the tropical and sub-tropical countries of the world. Delhi, the metropolitan capital state of India, has reported many dengue outbreaks, with the last outbreak occurring in 2013. We have recently reported predominance of dengue virus serotype 2 during 2011-2014 in Delhi. In the present study, we report molecular characterization and evolutionary analysis of dengue serotype 2 viruses which were detected in 2011-2014 in Delhi. Envelope genes of 42 DENV-2 strains were sequenced in the study. All DENV-2 strains grouped within the Cosmopolitan genotype and further clustered into three lineages; Lineage I, II and III. Lineage III replaced lineage I during dengue fever outbreak of 2013. Further, a novel mutation Thr404Ile was detected in the stem region of the envelope protein of a single DENV-2 strain in 2014. Nucleotide substitution rate and time to the most recent common ancestor were determined by molecular clock analysis using Bayesian methods. A change in effective population size of Indian DENV-2 viruses was investigated through Bayesian skyline plot. The study will be a vital road map for investigation of epidemiology and evolutionary pattern of dengue viruses in India.

  9. Evolutionary Analysis of Dengue Serotype 2 Viruses Using Phylogenetic and Bayesian Methods from New Delhi, India.

    Directory of Open Access Journals (Sweden)

    Nazia Afreen

    2016-03-01

    Full Text Available Dengue fever is the most important arboviral disease in the tropical and sub-tropical countries of the world. Delhi, the metropolitan capital state of India, has reported many dengue outbreaks, with the last outbreak occurring in 2013. We have recently reported predominance of dengue virus serotype 2 during 2011-2014 in Delhi. In the present study, we report molecular characterization and evolutionary analysis of dengue serotype 2 viruses which were detected in 2011-2014 in Delhi. Envelope genes of 42 DENV-2 strains were sequenced in the study. All DENV-2 strains grouped within the Cosmopolitan genotype and further clustered into three lineages; Lineage I, II and III. Lineage III replaced lineage I during dengue fever outbreak of 2013. Further, a novel mutation Thr404Ile was detected in the stem region of the envelope protein of a single DENV-2 strain in 2014. Nucleotide substitution rate and time to the most recent common ancestor were determined by molecular clock analysis using Bayesian methods. A change in effective population size of Indian DENV-2 viruses was investigated through Bayesian skyline plot. The study will be a vital road map for investigation of epidemiology and evolutionary pattern of dengue viruses in India.

  10. Nonlinear analysis of EAS clusters

    CERN Document Server

    Zotov, M Yu; Fomin, Y A; Fomin, Yu. A.

    2002-01-01

    We apply certain methods of nonlinear time series analysis to the extensive air shower clusters found earlier in the data set obtained with the EAS-1000 Prototype array. In particular, we use the Grassberger-Procaccia algorithm to compute the correlation dimension of samples in the vicinity of the clusters. The validity of the results is checked by surrogate data tests and some additional quantities. We compare our conclusions with the results of similar investigations performed by the EAS-TOP and LAAS groups.

  11. Bayesian large-scale structure inference and cosmic web analysis

    CERN Document Server

    Leclercq, Florent

    2015-01-01

    Surveys of the cosmic large-scale structure carry opportunities for building and testing cosmological theories about the origin and evolution of the Universe. This endeavor requires appropriate data assimilation tools, for establishing the contact between survey catalogs and models of structure formation. In this thesis, we present an innovative statistical approach for the ab initio simultaneous analysis of the formation history and morphology of the cosmic web: the BORG algorithm infers the primordial density fluctuations and produces physical reconstructions of the dark matter distribution that underlies observed galaxies, by assimilating the survey data into a cosmological structure formation model. The method, based on Bayesian probability theory, provides accurate means of uncertainty quantification. We demonstrate the application of BORG to the Sloan Digital Sky Survey data and describe the primordial and late-time large-scale structure in the observed volume. We show how the approach has led to the fi...

  12. Bayesian analysis of factors associated with fibromyalgia syndrome subjects

    Science.gov (United States)

    Jayawardana, Veroni; Mondal, Sumona; Russek, Leslie

    2015-01-01

    Factors contributing to movement-related fear were assessed by Russek, et al. 2014 for subjects with Fibromyalgia (FM) based on the collected data by a national internet survey of community-based individuals. The study focused on the variables, Activities-Specific Balance Confidence scale (ABC), Primary Care Post-Traumatic Stress Disorder screen (PC-PTSD), Tampa Scale of Kinesiophobia (TSK), a Joint Hypermobility Syndrome screen (JHS), Vertigo Symptom Scale (VSS-SF), Obsessive-Compulsive Personality Disorder (OCPD), Pain, work status and physical activity dependent from the "Revised Fibromyalgia Impact Questionnaire" (FIQR). The study presented in this paper revisits same data with a Bayesian analysis where appropriate priors were introduced for variables selected in the Russek's paper.

  13. Reference priors of nuisance parameters in Bayesian sequential population analysis

    CERN Document Server

    Bousquet, Nicolas

    2010-01-01

    Prior distributions elicited for modelling the natural fluctuations or the uncertainty on parameters of Bayesian fishery population models, can be chosen among a vast range of statistical laws. Since the statistical framework is defined by observational processes, observational parameters enter into the estimation and must be considered random, similarly to parameters or states of interest like population levels or real catches. The former are thus perceived as nuisance parameters whose values are intrinsically linked to the considered experiment, which also require noninformative priors. In fishery research Jeffreys methodology has been presented by Millar (2002) as a practical way to elicit such priors. However they can present wrong properties in multiparameter contexts. Therefore we suggest to use the elicitation method proposed by Berger and Bernardo to avoid paradoxical results raised by Jeffreys priors. These benchmark priors are derived here in the framework of sequential population analysis.

  14. Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations

    CERN Document Server

    Scargle, Jeffrey D; Jackson, Brad; Chiang, James

    2012-01-01

    This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it - an improved and generalized version of Bayesian Blocks (Scargle 1998) - that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multi-variate time series data, analysis of vari...

  15. A Bayesian Seismic Hazard Analysis for the city of Naples

    Science.gov (United States)

    Faenza, Licia; Pierdominici, Simona; Hainzl, Sebastian; Cinti, Francesca R.; Sandri, Laura; Selva, Jacopo; Tonini, Roberto; Perfetti, Paolo

    2016-04-01

    In the last years many studies have been focused on determination and definition of the seismic, volcanic and tsunamogenic hazard in the city of Naples. The reason is that the town of Naples with its neighboring area is one of the most densely populated places in Italy. In addition, the risk is increased also by the type and condition of buildings and monuments in the city. It is crucial therefore to assess which active faults in Naples and surrounding area could trigger an earthquake able to shake and damage the urban area. We collect data from the most reliable and complete databases of macroseismic intensity records (from 79 AD to present). For each seismic event an active tectonic structure has been associated. Furthermore a set of active faults, well-known from geological investigations, located around the study area that they could shake the city, not associated with any earthquake, has been taken into account for our studies. This geological framework is the starting point for our Bayesian seismic hazard analysis for the city of Naples. We show the feasibility of formulating the hazard assessment procedure to include the information of past earthquakes into the probabilistic seismic hazard analysis. This strategy allows on one hand to enlarge the information used in the evaluation of the hazard, from alternative models for the earthquake generation process to past shaking and on the other hand to explicitly account for all kinds of information and their uncertainties. The Bayesian scheme we propose is applied to evaluate the seismic hazard of Naples. We implement five different spatio-temporal models to parameterize the occurrence of earthquakes potentially dangerous for Naples. Subsequently we combine these hazard curves with ShakeMap of past earthquakes that have been felt in Naples. The results are posterior hazard assessment for three exposure times, e.g., 50, 10 and 5 years, in a dense grid that cover the municipality of Naples, considering bedrock soil

  16. An Exploratory Study Examining the Feasibility of Using Bayesian Networks to Predict Circuit Analysis Understanding

    Science.gov (United States)

    Chung, Gregory K. W. K.; Dionne, Gary B.; Kaiser, William J.

    2006-01-01

    Our research question was whether we could develop a feasible technique, using Bayesian networks, to diagnose gaps in student knowledge. Thirty-four college-age participants completed tasks designed to measure conceptual knowledge, procedural knowledge, and problem-solving skills related to circuit analysis. A Bayesian network was used to model…

  17. A Bayesian latent group analysis for detecting poor effort in the assessment of malingering

    NARCIS (Netherlands)

    A. Ortega; E.-J. Wagenmakers; M.D. Lee; H.J. Markowitsch; M. Piefke

    2012-01-01

    Despite their theoretical appeal, Bayesian methods for the assessment of poor effort and malingering are still rarely used in neuropsychological research and clinical diagnosis. In this article, we outline a novel and easy-to-use Bayesian latent group analysis of malingering whose goal is to identif

  18. Survey and Analysis of University Clustering

    Directory of Open Access Journals (Sweden)

    Srinatha Karur

    2013-07-01

    Full Text Available This paper gives on Clustering of Universities in the world with respect to their country policies OR local polices OR continent level polices with sub aims. So clustering method can generally apply when objective is specifically mentioned. For general objectives clusters are available in the form of logical or physical groups without networks. In this paper we emphasis on only University Clusters directly or University Clusters with some other clusters. Data miming methods are used for useful for Sampling Analysis and Clustering of Universities and Colleges with respect to local clusters [1] pp 1.

  19. Evaluation of a partial genome screening of two asthma susceptibility regions using bayesian network based bayesian multilevel analysis of relevance.

    Directory of Open Access Journals (Sweden)

    Ildikó Ungvári

    Full Text Available Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls. The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA. This method uses bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated.With frequentist methods one SNP (rs3751464 in the FRMD6 gene provided evidence for an association with asthma (OR = 1.43(1.2-1.8; p = 3×10(-4. The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics.In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance.

  20. Multivariate meta-analysis of mixed outcomes: a Bayesian approach.

    Science.gov (United States)

    Bujkiewicz, Sylwia; Thompson, John R; Sutton, Alex J; Cooper, Nicola J; Harrison, Mark J; Symmons, Deborah P M; Abrams, Keith R

    2013-09-30

    Multivariate random effects meta-analysis (MRMA) is an appropriate way for synthesizing data from studies reporting multiple correlated outcomes. In a Bayesian framework, it has great potential for integrating evidence from a variety of sources. In this paper, we propose a Bayesian model for MRMA of mixed outcomes, which extends previously developed bivariate models to the trivariate case and also allows for combination of multiple outcomes that are both continuous and binary. We have constructed informative prior distributions for the correlations by using external evidence. Prior distributions for the within-study correlations were constructed by employing external individual patent data and using a double bootstrap method to obtain the correlations between mixed outcomes. The between-study model of MRMA was parameterized in the form of a product of a series of univariate conditional normal distributions. This allowed us to place explicit prior distributions on the between-study correlations, which were constructed using external summary data. Traditionally, independent 'vague' prior distributions are placed on all parameters of the model. In contrast to this approach, we constructed prior distributions for the between-study model parameters in a way that takes into account the inter-relationship between them. This is a flexible method that can be extended to incorporate mixed outcomes other than continuous and binary and beyond the trivariate case. We have applied this model to a motivating example in rheumatoid arthritis with the aim of incorporating all available evidence in the synthesis and potentially reducing uncertainty around the estimate of interest. PMID:23630081

  1. BClass: A Bayesian Approach Based on Mixture Models for Clustering and Classification of Heterogeneous Biological Data

    Directory of Open Access Journals (Sweden)

    Arturo Medrano-Soto

    2004-12-01

    Full Text Available Based on mixture models, we present a Bayesian method (called BClass to classify biological entities (e.g. genes when variables of quite heterogeneous nature are analyzed. Various statistical distributions are used to model the continuous/categorical data commonly produced by genetic experiments and large-scale genomic projects. We calculate the posterior probability of each entry to belong to each element (group in the mixture. In this way, an original set of heterogeneous variables is transformed into a set of purely homogeneous characteristics represented by the probabilities of each entry to belong to the groups. The number of groups in the analysis is controlled dynamically by rendering the groups as 'alive' and 'dormant' depending upon the number of entities classified within them. Using standard Metropolis-Hastings and Gibbs sampling algorithms, we constructed a sampler to approximate posterior moments and grouping probabilities. Since this method does not require the definition of similarity measures, it is especially suitable for data mining and knowledge discovery in biological databases. We applied BClass to classify genes in RegulonDB, a database specialized in information about the transcriptional regulation of gene expression in the bacterium Escherichia coli. The classification obtained is consistent with current knowledge and allowed prediction of missing values for a number of genes. BClass is object-oriented and fully programmed in Lisp-Stat. The output grouping probabilities are analyzed and interpreted using graphical (dynamically linked plots and query-based approaches. We discuss the advantages of using Lisp-Stat as a programming language as well as the problems we faced when the data volume increased exponentially due to the ever-growing number of genomic projects.

  2. Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis

    Science.gov (United States)

    Dezfuli, Homayoon; Kelly, Dana; Smith, Curtis; Vedros, Kurt; Galyean, William

    2009-01-01

    This document, Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis, is intended to provide guidelines for the collection and evaluation of risk and reliability-related data. It is aimed at scientists and engineers familiar with risk and reliability methods and provides a hands-on approach to the investigation and application of a variety of risk and reliability data assessment methods, tools, and techniques. This document provides both: A broad perspective on data analysis collection and evaluation issues. A narrow focus on the methods to implement a comprehensive information repository. The topics addressed herein cover the fundamentals of how data and information are to be used in risk and reliability analysis models and their potential role in decision making. Understanding these topics is essential to attaining a risk informed decision making environment that is being sought by NASA requirements and procedures such as 8000.4 (Agency Risk Management Procedural Requirements), NPR 8705.05 (Probabilistic Risk Assessment Procedures for NASA Programs and Projects), and the System Safety requirements of NPR 8715.3 (NASA General Safety Program Requirements).

  3. Bayesian Analysis of Graphical Models of Marginal Independence for Three Way Contingency Tables

    OpenAIRE

    Tarantola, Claudia; Ntzoufras, Ioannis

    2012-01-01

    This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. Each marginal independence model corresponds to a particular factorization of the cell probabilities and a conjugate analysis based on Dirichlet prior can be performed. We illustrate a comprehensive Bayesian analysis of such models, involving suitable choices of prior parameters, estimation, model determination, as well as the allied computational issues. The posterior di...

  4. New Ephemeris for LSI+61 303, A Bayesian Analysis

    Science.gov (United States)

    Gregory, P. C.

    1997-12-01

    The luminous early-type binary LSI+61 303 is an interesting radio, X-ray and possible gamma-ray source. At radio wavelengths it exhibits periodic outbursts with an approximate period of 26.5 days as well as a longer term modulation of the outburst peaks of approximately 4 years. Recently Paredes et al. have found evidence that the X-ray outbursts are very likely to recur with the same radio outburst period from an analysis of RXTE all sky monitoring data. The system has been observed by many groups at all wavelengths but still the energy source powering the radio outbursts and their relation to the high energy emission remains a mystery. For more details see the "LSI+61 303 Resource Page" at http://www.srl.caltech.edu/personnel/paulr/lsi.html . There has been increasing evidence for a change in the period of the system. We will present a new ephemeris for the system based on a Bayesian analysis of 20 years of radio observations including the GBI-NASA radio monitoring data.

  5. Thermodynamically consistent Bayesian analysis of closed biochemical reaction systems

    Directory of Open Access Journals (Sweden)

    Goutsias John

    2010-11-01

    Full Text Available Abstract Background Estimating the rate constants of a biochemical reaction system with known stoichiometry from noisy time series measurements of molecular concentrations is an important step for building predictive models of cellular function. Inference techniques currently available in the literature may produce rate constant values that defy necessary constraints imposed by the fundamental laws of thermodynamics. As a result, these techniques may lead to biochemical reaction systems whose concentration dynamics could not possibly occur in nature. Therefore, development of a thermodynamically consistent approach for estimating the rate constants of a biochemical reaction system is highly desirable. Results We introduce a Bayesian analysis approach for computing thermodynamically consistent estimates of the rate constants of a closed biochemical reaction system with known stoichiometry given experimental data. Our method employs an appropriately designed prior probability density function that effectively integrates fundamental biophysical and thermodynamic knowledge into the inference problem. Moreover, it takes into account experimental strategies for collecting informative observations of molecular concentrations through perturbations. The proposed method employs a maximization-expectation-maximization algorithm that provides thermodynamically feasible estimates of the rate constant values and computes appropriate measures of estimation accuracy. We demonstrate various aspects of the proposed method on synthetic data obtained by simulating a subset of a well-known model of the EGF/ERK signaling pathway, and examine its robustness under conditions that violate key assumptions. Software, coded in MATLAB®, which implements all Bayesian analysis techniques discussed in this paper, is available free of charge at http://www.cis.jhu.edu/~goutsias/CSS%20lab/software.html. Conclusions Our approach provides an attractive statistical methodology for

  6. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering

    Science.gov (United States)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2016-04-01

    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  7. Bayesian Analysis of Dynamic Multivariate Models with Multiple Structural Breaks

    OpenAIRE

    Sugita, Katsuhiro

    2006-01-01

    This paper considers a vector autoregressive model or a vector error correction model with multiple structural breaks in any subset of parameters, using a Bayesian approach with Markov chain Monte Carlo simulation technique. The number of structural breaks is determined as a sort of model selection by the posterior odds. For a cointegrated model, cointegrating rank is also allowed to change with breaks. Bayesian approach by Strachan (Journal of Business and Economic Statistics 21 (2003) 185) ...

  8. Using Bayesian analysis in repeated preclinical in vivo studies for a more effective use of animals.

    Science.gov (United States)

    Walley, Rosalind; Sherington, John; Rastrick, Joe; Detrait, Eric; Hanon, Etienne; Watt, Gillian

    2016-05-01

    Whilst innovative Bayesian approaches are increasingly used in clinical studies, in the preclinical area Bayesian methods appear to be rarely used in the reporting of pharmacology data. This is particularly surprising in the context of regularly repeated in vivo studies where there is a considerable amount of data from historical control groups, which has potential value. This paper describes our experience with introducing Bayesian analysis for such studies using a Bayesian meta-analytic predictive approach. This leads naturally either to an informative prior for a control group as part of a full Bayesian analysis of the next study or using a predictive distribution to replace a control group entirely. We use quality control charts to illustrate study-to-study variation to the scientists and describe informative priors in terms of their approximate effective numbers of animals. We describe two case studies of animal models: the lipopolysaccharide-induced cytokine release model used in inflammation and the novel object recognition model used to screen cognitive enhancers, both of which show the advantage of a Bayesian approach over the standard frequentist analysis. We conclude that using Bayesian methods in stable repeated in vivo studies can result in a more effective use of animals, either by reducing the total number of animals used or by increasing the precision of key treatment differences. This will lead to clearer results and supports the "3Rs initiative" to Refine, Reduce and Replace animals in research. Copyright © 2016 John Wiley & Sons, Ltd.

  9. Dynamic sensor action selection with Bayesian decision analysis

    Science.gov (United States)

    Kristensen, Steen; Hansen, Volker; Kondak, Konstantin

    1998-10-01

    The aim of this work is to create a framework for the dynamic planning of sensor actions for an autonomous mobile robot. The framework uses Bayesian decision analysis, i.e., a decision-theoretic method, to evaluate possible sensor actions and selecting the most appropriate ones given the available sensors and what is currently known about the state of the world. Since sensing changes the knowledge of the system and since the current state of the robot (task, position, etc.) determines what knowledge is relevant, the evaluation and selection of sensing actions is an on-going process that effectively determines the behavior of the robot. The framework has been implemented on a real mobile robot and has been proven to be able to control in real-time the sensor actions of the system. In current work we are investigating methods to reduce or automatically generate the necessary model information needed by the decision- theoretic method to select the appropriate sensor actions.

  10. STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS

    Energy Technology Data Exchange (ETDEWEB)

    Scargle, Jeffrey D. [Space Science and Astrobiology Division, MS 245-3, NASA Ames Research Center, Moffett Field, CA 94035-1000 (United States); Norris, Jay P. [Physics Department, Boise State University, 2110 University Drive, Boise, ID 83725-1570 (United States); Jackson, Brad [The Center for Applied Mathematics and Computer Science, Department of Mathematics, San Jose State University, One Washington Square, MH 308, San Jose, CA 95192-0103 (United States); Chiang, James, E-mail: jeffrey.d.scargle@nasa.gov [W. W. Hansen Experimental Physics Laboratory, Kavli Institute for Particle Astrophysics and Cosmology, Department of Physics and SLAC National Accelerator Laboratory, Stanford University, Stanford, CA 94305 (United States)

    2013-02-20

    This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it-an improved and generalized version of Bayesian Blocks-that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by Arias-Castro et al. In the spirit of Reproducible Research all of the code and data necessary to reproduce all of the figures in this paper are included as supplementary material.

  11. A Bayesian model for the analysis of transgenerational epigenetic variation.

    Science.gov (United States)

    Varona, Luis; Munilla, Sebastián; Mouresan, Elena Flavia; González-Rodríguez, Aldemar; Moreno, Carlos; Altarriba, Juan

    2015-01-23

    Epigenetics has become one of the major areas of biological research. However, the degree of phenotypic variability that is explained by epigenetic processes still remains unclear. From a quantitative genetics perspective, the estimation of variance components is achieved by means of the information provided by the resemblance between relatives. In a previous study, this resemblance was described as a function of the epigenetic variance component and a reset coefficient that indicates the rate of dissipation of epigenetic marks across generations. Given these assumptions, we propose a Bayesian mixed model methodology that allows the estimation of epigenetic variance from a genealogical and phenotypic database. The methodology is based on the development of a T: matrix of epigenetic relationships that depends on the reset coefficient. In addition, we present a simple procedure for the calculation of the inverse of this matrix ( T-1: ) and a Gibbs sampler algorithm that obtains posterior estimates of all the unknowns in the model. The new procedure was used with two simulated data sets and with a beef cattle database. In the simulated populations, the results of the analysis provided marginal posterior distributions that included the population parameters in the regions of highest posterior density. In the case of the beef cattle dataset, the posterior estimate of transgenerational epigenetic variability was very low and a model comparison test indicated that a model that did not included it was the most plausible.

  12. STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS

    International Nuclear Information System (INIS)

    This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it—an improved and generalized version of Bayesian Blocks—that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by Arias-Castro et al. In the spirit of Reproducible Research all of the code and data necessary to reproduce all of the figures in this paper are included as supplementary material.

  13. Bayesian Spectral Analysis of Metal Abandance Deficient Stars

    CERN Document Server

    Sourlas, E; Kashyap, V L; Drake, J; Pease, D; Sourlas, Epaminondas; Dyk, David van; Kashyap, Vinay; Drake, Jeremy; Pease, Deron

    2002-01-01

    Metallicity can be measured by analyzing the spectra in the X-ray region and comparing the flux in spectral lines to the flux in the underlying Bremsstrahlung continuum. In this paper we propose new Bayesian methods which directly model the Poisson nature of the data and thus are expected to exhibit improved sampling properties. Our model also accounts for the Poisson nature of background contamination of the observations, image blurring due to instrument response, and the absorption of photons in space. The resulting highly structured hierarchical model is fit using the Gibbs sampler, data augmentation and Metropolis-Hasting. We demonstrate our methods with the X-ray spectral analysis of several "Metal Abundance Deficient" stars. The model is designed to summarize the relative frequency of the energy of photons (X-ray or gamma-ray) arriving at a detector. Independent Poisson distributions are more appropriate to model the counts than the commonly used normal approximation. We model the high energy tail of th...

  14. Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).

    Science.gov (United States)

    Sparapani, Rodney A; Logan, Brent R; McCulloch, Robert E; Laud, Purushottam W

    2016-07-20

    Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26854022

  15. Compositional ideas in the bayesian analysis of categorical data with application to dose finding clinical trials

    OpenAIRE

    Gasparini, Mauro; Eisele, J

    2003-01-01

    Compositional random vectors are fundamental tools in the Bayesian analysis of categorical data. Many of the issues that are discussed with reference to the statistical analysis of compositional data have a natural counterpart in the construction of a Bayesian statistical model for categorical data. This note builds on the idea of cross-fertilization of the two areas recommended by Aitchison (1986) in his seminal book on compositional data. Particular emphasis is put on the pro...

  16. Bayesian Analysis of Marginal Log-Linear Graphical Models for Three Way Contingency Tables

    OpenAIRE

    Ntzoufras, Ioannis; Tarantola, Claudia

    2008-01-01

    This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. We use a marginal log-linear parametrization, under which the model is defined through suitable zero-constraints on the interaction parameters calculated within marginal distributions. We undertake a comprehensive Bayesian analysis of these models, involving suitable choices of prior distributions, estimation, model determination, as well as the allied computational issue...

  17. A Latent Variable Bayesian Approach to Spatial Clustering with Background Noise

    NARCIS (Netherlands)

    Kayabol, K.

    2011-01-01

    We propose a finite mixture model for clustering of the spatial data patterns. The model is based on the spatial distances between the data locations in such a way that both the distances of the points to the cluster centers and the distances of a given point to its neighbors within a defined window

  18. Bayesian forecasting and scalable multivariate volatility analysis using simultaneous graphical dynamic models

    OpenAIRE

    Gruber, Lutz F.; West, Mike

    2016-01-01

    The recently introduced class of simultaneous graphical dynamic linear models (SGDLMs) defines an ability to scale on-line Bayesian analysis and forecasting to higher-dimensional time series. This paper advances the methodology of SGDLMs, developing and embedding a novel, adaptive method of simultaneous predictor selection in forward filtering for on-line learning and forecasting. The advances include developments in Bayesian computation for scalability, and a case study in exploring the resu...

  19. JBASE: Joint Bayesian Analysis of Subphenotypes and Epistasis

    Science.gov (United States)

    Colak, Recep; Kim, TaeHyung; Kazan, Hilal; Oh, Yoomi; Cruz, Miguel; Valladares-Salgado, Adan; Peralta, Jesus; Escobedo, Jorge; Parra, Esteban J.; Kim, Philip M.; Goldenberg, Anna

    2016-01-01

    Motivation: Rapid advances in genotyping and genome-wide association studies have enabled the discovery of many new genotype–phenotype associations at the resolution of individual markers. However, these associations explain only a small proportion of theoretically estimated heritability of most diseases. In this work, we propose an integrative mixture model called JBASE: joint Bayesian analysis of subphenotypes and epistasis. JBASE explores two major reasons of missing heritability: interactions between genetic variants, a phenomenon known as epistasis and phenotypic heterogeneity, addressed via subphenotyping. Results: Our extensive simulations in a wide range of scenarios repeatedly demonstrate that JBASE can identify true underlying subphenotypes, including their associated variants and their interactions, with high precision. In the presence of phenotypic heterogeneity, JBASE has higher Power and lower Type 1 Error than five state-of-the-art approaches. We applied our method to a sample of individuals from Mexico with Type 2 diabetes and discovered two novel epistatic modules, including two loci each, that define two subphenotypes characterized by differences in body mass index and waist-to-hip ratio. We successfully replicated these subphenotypes and epistatic modules in an independent dataset from Mexico genotyped with a different platform. Availability and implementation: JBASE is implemented in C++, supported on Linux and is available at http://www.cs.toronto.edu/∼goldenberg/JBASE/jbase.tar.gz. The genotype data underlying this study are available upon approval by the ethics review board of the Medical Centre Siglo XXI. Please contact Dr Miguel Cruz at mcruzl@yahoo.com for assistance with the application. Contact: anna.goldenberg@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26411870

  20. Exploiting sensitivity analysis in Bayesian networks for consumer satisfaction study

    NARCIS (Netherlands)

    Jaronski, W.; Bloemer, J.M.M.; Vanhoof, K.; Wets, G.

    2004-01-01

    The paper presents an application of Bayesian network technology in a empirical customer satisfaction study. The findings of the study should provide insight as to the importance of product/service dimensions in terms of the strength of their influence on overall satisfaction. To this end we apply a

  1. 一种基于非参数贝叶斯模型的聚类算法%Data Clustering via Nonparametric Bayesian Models

    Institute of Scientific and Technical Information of China (English)

    张媛媛

    2013-01-01

    鉴于聚类分析是机器学习和数据挖掘领域的一项重要技术,并且与监督学习不同的是聚类分析中没有类别或标签的指导信息,所以如何选择合适的聚类个数(即模型选择)一直是聚类分析中的难点。由此提出了一种基于Dirichlet过程混合模型的聚类算法,并用collapsed Gibbs采样算法对混合模型的参数进行估计。新算法基于非参数贝叶斯模型的框架,能够在不断的采样过程中优化模型参数并形成合适的聚类个数。在人工合成数据集和真实数据集上的聚类实验结果表明:基于Dirichlet过程混合模型的聚类算法不但能够自动确定聚类个数,而且具有较强灵活性和鲁棒性。%Clustering is one of the most useful techniques in machine learning and data mining. In cluster analysis, model selection concerning how to determine the number of clusters is an important issue. Unlike supervised learning, there are no class labels and criteria to guide the search, so the model for clustering is always difficult to select. To tackle this problem, we present the concept of nonparametric clustering approach based on Dirichlet process mixture model (DPMM), and apply a collapsed Gibbs sampling technique to sample the posterior distribution. The proposed clustering algorithm follows the Bayesian nonparametric framework and can optimize the number of components and the parameters of the model. The experimental result of clustering shows that this Bayes model has promising properties and robust performance.

  2. Estimating size and scope economies in the Portuguese water sector using the Bayesian stochastic frontier analysis.

    Science.gov (United States)

    Carvalho, Pedro; Marques, Rui Cunha

    2016-02-15

    This study aims to search for economies of size and scope in the Portuguese water sector applying Bayesian and classical statistics to make inference in stochastic frontier analysis (SFA). This study proves the usefulness and advantages of the application of Bayesian statistics for making inference in SFA over traditional SFA which just uses classical statistics. The resulting Bayesian methods allow overcoming some problems that arise in the application of the traditional SFA, such as the bias in small samples and skewness of residuals. In the present case study of the water sector in Portugal, these Bayesian methods provide more plausible and acceptable results. Based on the results obtained we found that there are important economies of output density, economies of size, economies of vertical integration and economies of scope in the Portuguese water sector, pointing out to the huge advantages in undertaking mergers by joining the retail and wholesale components and by joining the drinking water and wastewater services.

  3. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    Science.gov (United States)

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  4. Bayesian analysis of the dynamic cosmic web in the SDSS galaxy survey

    CERN Document Server

    Leclercq, Florent; Wandelt, Benjamin

    2015-01-01

    Recent application of the Bayesian algorithm BORG to the Sloan Digital Sky Survey (SDSS) main sample galaxies resulted in the physical inference of the formation history of the observed large-scale structure from its origin to the present epoch. In this work, we use these inferences as inputs for a detailed probabilistic cosmic web-type analysis. To do so, we generate a large set of data-constrained realizations of the large-scale structure using a fast, fully non-linear gravitational model. We then perform a dynamic classification of the cosmic web into four distinct components (voids, sheets, filaments and clusters) on the basis of the tidal field. Our inference framework automatically and self-consistently propagates typical observational uncertainties to web-type classification. As a result, this study produces highly detailed and accurate cosmographic classification of large-scale structure elements in the SDSS volume. By also providing the history of these structure maps, the approach allows an analysis...

  5. A BAYESIAN HIERARCHICAL SPATIAL POINT PROCESS MODEL FOR MULTI-TYPE NEUROIMAGING META-ANALYSIS.

    Science.gov (United States)

    Kang, Jian; Nichols, Thomas E; Wager, Tor D; Johnson, Timothy D

    2014-09-01

    Neuroimaging meta-analysis is an important tool for finding consistent effects over studies that each usually have 20 or fewer subjects. Interest in meta-analysis in brain mapping is also driven by a recent focus on so-called "reverse inference": where as traditional "forward inference" identifies the regions of the brain involved in a task, a reverse inference identifies the cognitive processes that a task engages. Such reverse inferences, however, requires a set of meta-analysis, one for each possible cognitive domain. However, existing methods for neuroimaging meta-analysis have significant limitations. Commonly used methods for neuroimaging meta-analysis are not model based, do not provide interpretable parameter estimates, and only produce null hypothesis inferences; further, they are generally designed for a single group of studies and cannot produce reverse inferences. In this work we address these limitations by adopting a non-parametric Bayesian approach for meta analysis data from multiple classes or types of studies. In particular, foci from each type of study are modeled as a cluster process driven by a random intensity function that is modeled as a kernel convolution of a gamma random field. The type-specific gamma random fields are linked and modeled as a realization of a common gamma random field, shared by all types, that induces correlation between study types and mimics the behavior of a univariate mixed effects model. We illustrate our model on simulation studies and a meta analysis of five emotions from 219 studies and check model fit by a posterior predictive assessment. In addition, we implement reverse inference by using the model to predict study type from a newly presented study. We evaluate this predictive performance via leave-one-out cross validation that is efficiently implemented using importance sampling techniques.

  6. Health at the borders: Bayesian multilevel analysis of women's malnutrition determinants in Ethiopia

    Directory of Open Access Journals (Sweden)

    Tefera Darge Delbiso

    2016-07-01

    Full Text Available Background: Women's malnutrition, particularly undernutrition, remains an important public health challenge in Ethiopia. Although various studies examined the levels and determinants of women's nutritional status, the influence of living close to an international border on women's nutrition has not been investigated. Yet, Ethiopian borders are regularly affected by conflict and refugee flows, which might ultimately impact health. Objective: To investigate the impact of living close to borders in the nutritional status of women in Ethiopia, while considering other important covariates. Design: Our analysis was based on the body mass index (BMI of 6,334 adult women aged 20–49 years, obtained from the 2011 Ethiopian Demographic and Health Survey (EDHS. A Bayesian multilevel multinomial logistic regression analysis was used to capture the clustered structure of the data and the possible correlation that may exist within and between clusters. Results: After controlling for potential confounders, women living close to borders (i.e. ≤100 km in Ethiopia were 59% more likely to be underweight (posterior odds ratio [OR]=1.59; 95% credible interval [CrI]: 1.32–1.90 than their counterparts living far from the borders. This result was robust to different choices of border delineation (i.e. ≤50, ≤75, ≤125, and ≤150 km. Women from poor families, those who have no access to improved toilets, reside in lowland areas, and are Muslim, were independently associated with underweight. In contrast, more wealth, higher education, older age, access to improved toilets, being married, and living in urban or lowlands were independently associated with overweight. Conclusions: The problem of undernutrition among women in Ethiopia is most worrisome in the border areas. Targeted interventions to improve nutritional status in these areas, such as improved access to sanitation, economic and livelihood support, are recommended.

  7. A Dynamic Bayesian Approach to Computational Laban Shape Quality Analysis

    Directory of Open Access Journals (Sweden)

    Dilip Swaminathan

    2009-01-01

    kinesiology. LMA (especially Effort/Shape emphasizes how internal feelings and intentions govern the patterning of movement throughout the whole body. As we argue, a complex understanding of intention via LMA is necessary for human-computer interaction to become embodied in ways that resemble interaction in the physical world. We thus introduce a novel, flexible Bayesian fusion approach for identifying LMA Shape qualities from raw motion capture data in real time. The method uses a dynamic Bayesian network (DBN to fuse movement features across the body and across time and as we discuss can be readily adapted for low-cost video. It has delivered excellent performance in preliminary studies comprising improvisatory movements. Our approach has been incorporated in Response, a mixed-reality environment where users interact via natural, full-body human movement and enhance their bodily-kinesthetic awareness through immersive sound and light feedback, with applications to kinesiology training, Parkinson's patient rehabilitation, interactive dance, and many other areas.

  8. Use of SAMC for Bayesian analysis of statistical models with intractable normalizing constants

    KAUST Repository

    Jin, Ick Hoon

    2014-03-01

    Statistical inference for the models with intractable normalizing constants has attracted much attention. During the past two decades, various approximation- or simulation-based methods have been proposed for the problem, such as the Monte Carlo maximum likelihood method and the auxiliary variable Markov chain Monte Carlo methods. The Bayesian stochastic approximation Monte Carlo algorithm specifically addresses this problem: It works by sampling from a sequence of approximate distributions with their average converging to the target posterior distribution, where the approximate distributions can be achieved using the stochastic approximation Monte Carlo algorithm. A strong law of large numbers is established for the Bayesian stochastic approximation Monte Carlo estimator under mild conditions. Compared to the Monte Carlo maximum likelihood method, the Bayesian stochastic approximation Monte Carlo algorithm is more robust to the initial guess of model parameters. Compared to the auxiliary variable MCMC methods, the Bayesian stochastic approximation Monte Carlo algorithm avoids the requirement for perfect samples, and thus can be applied to many models for which perfect sampling is not available or very expensive. The Bayesian stochastic approximation Monte Carlo algorithm also provides a general framework for approximate Bayesian analysis. © 2012 Elsevier B.V. All rights reserved.

  9. Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.

    Science.gov (United States)

    Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka

    2014-02-01

    In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain. PMID:24246289

  10. Bayesian network models in brain functional connectivity analysis

    OpenAIRE

    Ide, Jaime S.; Zhang, Sheng; Chiang-shan R. Li

    2013-01-01

    Much effort has been made to better understand the complex integration of distinct parts of the human brain using functional magnetic resonance imaging (fMRI). Altered functional connectivity between brain regions is associated with many neurological and mental illnesses, such as Alzheimer and Parkinson diseases, addiction, and depression. In computational science, Bayesian networks (BN) have been used in a broad range of studies to model complex data set in the presence of uncertainty and wh...

  11. Bayesian Analysis of the Black-Scholes Option Price

    OpenAIRE

    Darsinos, Theofanis; Stephen E Satchell

    2001-01-01

    This paper investigates the statistical properties of the Black-Scholes option price under a Bayesian approach. We incorporate randomness, both in the price process and in volatility, to derive the prior and posterior densities of a European call option. Expressions for the density of the option price conditional on the sample estimates of volatility and on the asset price respectively, are also derived. Numerical results are presented to compare how the dispersion of the option price changes...

  12. Hierarchical Bayesian analysis of somatic mutation data in cancer

    OpenAIRE

    Ding, Jie; Trippa, Lorenzo; Zhong, Xiaogang; Parmigiani, Giovanni

    2013-01-01

    Identifying genes underlying cancer development is critical to cancer biology and has important implications across prevention, diagnosis and treatment. Cancer sequencing studies aim at discovering genes with high frequencies of somatic mutations in specific types of cancer, as these genes are potential driving factors (drivers) for cancer development. We introduce a hierarchical Bayesian methodology to estimate gene-specific mutation rates and driver probabilities from somatic mutation data ...

  13. Bayesian analysis of the Hector’s Dolphin data

    OpenAIRE

    King, R; Brooks, S.P.

    2004-01-01

    In recent years there have been increasing concerns for many wildlife populations, due to decreasing population trends. This has led to the introduction of management schemes to increase the survival rates and hence the population size of many species of animals. We concentrate on a particular dolphin population situated off the coast of New Zealand, and investigate whether the introduction of a fishing gill net ban was effective in decreasing dolphin mortality. We undertake a Bayesian analys...

  14. Bayesian models for comparative analysis integrating phylogenetic uncertainty

    Directory of Open Access Journals (Sweden)

    Villemereuil Pierre de

    2012-06-01

    Full Text Available Abstract Background Uncertainty in comparative analyses can come from at least two sources: a phylogenetic uncertainty in the tree topology or branch lengths, and b uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow and inflated significance in hypothesis testing (e.g. p-values will be too small. Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible

  15. Bayesian analysis of hierarchical multi-fidelity codes

    CERN Document Server

    Gratiet, Loic Le

    2011-01-01

    This paper deals with the Gaussian process based approximation of a code which can be run at different levels of accuracy. This co-kriging method allows us to improve a surrogate model of a complex computer code using fast approximations of it. In particular, we focus on the case of a large number of code levels on the one hand and on a Bayesian approach when we have 2 levels on the other hand. Moreover, based on a Bayes linear formulation, an extension of the universal kriging equations are provided for the co-kriging model. We also address the problem of nested space-filling design for multi-fidelity computer experiments and we provide a significant simplification of the computation of the co-kriging cross-validation equations. A hydrodynamic simulator example is used to illustrate the comparison Bayesian versus non-Bayesian co-kriging. A thermodynamic example is used to illustrate the comparison between 2-level and 3-level co-kriging.

  16. Bayesian Analysis for Stellar Evolution with Nine Parameters (BASE-9): User's Manual

    CERN Document Server

    von Hippel, Ted; Jeffery, Elizabeth; Wagner-Kaiser, Rachel; DeGennaro, Steven; Stein, Nathan; Stenning, David; Jefferys, William H; van Dyk, David

    2014-01-01

    BASE-9 is a Bayesian software suite that recovers star cluster and stellar parameters from photometry. BASE-9 is useful for analyzing single-age, single-metallicity star clusters, binaries, or single stars, and for simulating such systems. BASE-9 uses Markov chain Monte Carlo and brute-force numerical integration techniques to estimate the posterior probability distributions for the age, metallicity, helium abundance, distance modulus, and line-of-sight absorption for a cluster, and the mass, binary mass ratio, and cluster membership probability for every stellar object. BASE-9 is provided as open source code on a version-controlled web server. The executables are also available as Amazon Elastic Compute Cloud images. This manual provides potential users with an overview of BASE-9, including instructions for installation and use.

  17. Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection

    KAUST Repository

    Dhavala, Soma S.

    2010-09-01

    Massively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflatedPoisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online. © 2010 American Statistical Association.

  18. Spatial Dependence and Heterogeneity in Bayesian Factor Analysis : A Cross-National Investigation of Schwartz Values

    NARCIS (Netherlands)

    Stakhovych, Stanislav; Bijmolt, Tammo H. A.; Wedel, Michel

    2012-01-01

    In this article, we present a Bayesian spatial factor analysis model. We extend previous work on confirmatory factor analysis by including geographically distributed latent variables and accounting for heterogeneity and spatial autocorrelation. The simulation study shows excellent recovery of the mo

  19. Nuclear stockpile stewardship and Bayesian image analysis (DARHT and the BIE)

    Energy Technology Data Exchange (ETDEWEB)

    Carroll, James L [Los Alamos National Laboratory

    2011-01-11

    Since the end of nuclear testing, the reliability of our nation's nuclear weapon stockpile has been performed using sub-critical hydrodynamic testing. These tests involve some pretty 'extreme' radiography. We will be discussing the challenges and solutions to these problems provided by DARHT (the world's premiere hydrodynamic testing facility) and the BIE or Bayesian Inference Engine (a powerful radiography analysis software tool). We will discuss the application of Bayesian image analysis techniques to this important and difficult problem.

  20. A Bayesian Analysis of the Radioactive Releases of Fukushima

    DEFF Research Database (Denmark)

    Tomioka, Ryota; Mørup, Morten

    2012-01-01

    The Fukushima Daiichi disaster 11 March, 2011 is considered the largest nuclear accident since the 1986 Chernobyl disaster and has been rated at level 7 on the International Nuclear Event Scale. As different radioactive materials have different effects to human body, it is important to know...... the types of nuclides and their levels of concentration from the recorded mixture of radiations to take necessary measures. We presently formulate a Bayesian generative model for the data available on radioactive releases from the Fukushima Daiichi disaster across Japan. From the sparsely sampled...... detailed account also of the latent structure present in the data of the Fukushima Daiichi disaster....

  1. Type Ia Supernova Light Curve Inference: Hierarchical Bayesian Analysis in the Near Infrared

    CERN Document Server

    Mandel, Kaisey S; Friedman, Andrew S; Kirshner, Robert P

    2009-01-01

    We present a comprehensive statistical analysis of the properties of Type Ia SN light curves in the near infrared using recent data from PAIRITEL and the literature. We construct a hierarchical Bayesian framework, incorporating several uncertainties including photometric error, peculiar velocities, dust extinction and intrinsic variations, for coherent statistical inference. SN Ia light curve inferences are drawn from the global posterior probability of parameters describing both individual supernovae and the population conditioned on the entire SN Ia NIR dataset. The logical structure of the hierarchical Bayesian model is represented by a directed acyclic graph. Fully Bayesian analysis of the model and data is enabled by an efficient MCMC algorithm exploiting the conditional structure using Gibbs sampling. We apply this framework to the JHK_s SN Ia light curve data. A new light curve model captures the observed J-band light curve shape variations. The intrinsic variances in peak absolute magnitudes are: sigm...

  2. Introduction to Bayesian statistics

    CERN Document Server

    Bolstad, William M

    2016-01-01

    There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this Third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian staistics. The author continues to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inferenfe cfor discrete random variables, bionomial proprotion, Poisson, normal mean, and simple linear regression. In addition, newly-developing topics in the field are presented in four new chapters: Bayesian inference with unknown mean and variance; Bayesian inference for Multivariate Normal mean vector; Bayesian inference for Multiple Linear RegressionModel; and Computati...

  3. Bayesian Nonparametric Measurement of Factor Betas and Clustering with Application to Hedge Fund Returns

    Directory of Open Access Journals (Sweden)

    Urbi Garay

    2016-03-01

    Full Text Available We define a dynamic and self-adjusting mixture of Gaussian Graphical Models to cluster financial returns, and provide a new method for extraction of nonparametric estimates of dynamic alphas (excess return and betas (to a choice set of explanatory factors in a multivariate setting. This approach, as well as the outputs, has a dynamic, nonstationary and nonparametric form, which circumvents the problem of model risk and parametric assumptions that the Kalman filter and other widely used approaches rely on. The by-product of clusters, used for shrinkage and information borrowing, can be of use to determine relationships around specific events. This approach exhibits a smaller Root Mean Squared Error than traditionally used benchmarks in financial settings, which we illustrate through simulation. As an illustration, we use hedge fund index data, and find that our estimated alphas are, on average, 0.13% per month higher (1.6% per year than alphas estimated through Ordinary Least Squares. The approach exhibits fast adaptation to abrupt changes in the parameters, as seen in our estimated alphas and betas, which exhibit high volatility, especially in periods which can be identified as times of stressful market events, a reflection of the dynamic positioning of hedge fund portfolio managers.

  4. Objective Bayesian analysis of "on/off" measurements

    CERN Document Server

    Casadei, Diego

    2014-01-01

    In high-energy astrophysics, it is common practice to account for the background overlaid with the counts from the source of interest with the help of auxiliary measurements carried on by pointing off-source. In this "on/off" measurement, one knows the number of photons detected while pointing to the source, the number of photons collected while pointing away of the source, and how to estimate the background counts in the source region from the flux observed in the auxiliary measurements. For very faint sources, the number of detected photons is so low that the approximations which hold asymptotically are not valid. On the other hand, the analytical solution exists for the Bayesian statistical inference, which is valid at low and high counts. The Bayesian approach to statistical inference provides a probability distribution describing our degree of belief on the possible outcomes of the parameter of interest, over its entire range. In addition to the specification of the model, in this case assumed to obey Po...

  5. Bayesian analysis of deterministic and stochastic prisoner's dilemma games

    Directory of Open Access Journals (Sweden)

    Howard Kunreuther

    2009-08-01

    Full Text Available This paper compares the behavior of individuals playing a classic two-person deterministic prisoner's dilemma (PD game with choice data obtained from repeated interdependent security prisoner's dilemma games with varying probabilities of loss and the ability to learn (or not learn about the actions of one's counterpart, an area of recent interest in experimental economics. This novel data set, from a series of controlled laboratory experiments, is analyzed using Bayesian hierarchical methods, the first application of such methods in this research domain. We find that individuals are much more likely to be cooperative when payoffs are deterministic than when the outcomes are probabilistic. A key factor explaining this difference is that subjects in a stochastic PD game respond not just to what their counterparts did but also to whether or not they suffered a loss. These findings are interpreted in the context of behavioral theories of commitment, altruism and reciprocity. The work provides a linkage between Bayesian statistics, experimental economics, and consumer psychology.

  6. Exclusive breastfeeding practice in Nigeria: a bayesian stepwise regression analysis.

    Science.gov (United States)

    Gayawan, Ezra; Adebayo, Samson B; Chitekwe, Stanley

    2014-11-01

    Despite the importance of breast milk, the prevalence of exclusive breastfeeding (EBF) in Nigeria is far lower than what has been recommended for developing countries. Worse still, the practise has been on downward trend in the country recently. This study was aimed at investigating the determinants and geographical variations of EBF in Nigeria. Any intervention programme would require a good knowledge of factors that enhance the practise. A pooled data set from Nigeria Demographic and Health Survey conducted in 1999, 2003, and 2008 were analyzed using a Bayesian stepwise approach that involves simultaneous selection of variables and smoothing parameters. Further, the approach allows for geographical variations at a highly disaggregated level of states to be investigated. Within a Bayesian context, appropriate priors are assigned on all the parameters and functions. Findings reveal that education of women and their partners, place of delivery, mother's age at birth, and current age of child are associated with increasing prevalence of EBF. However, visits for antenatal care during pregnancy are not associated with EBF in Nigeria. Further, results reveal considerable geographical variations in the practise of EBF. The likelihood of exclusively breastfeeding children are significantly higher in Kwara, Kogi, Osun, and Oyo states but lower in Jigawa, Katsina, and Yobe. Intensive interventions that can lead to improved practise are required in all states in Nigeria. The importance of breastfeeding needs to be emphasized to women during antenatal visits as this can encourage and enhance the practise after delivery. PMID:24619227

  7. Risk Analysis of New Product Development Using Bayesian Networks

    Directory of Open Access Journals (Sweden)

    MohammadRahim Ramezanian

    2012-06-01

    Full Text Available The process of presenting new product development (NPD to market is of great importance due to variability of competitive rules in the business world. The product development teams face a lot of pressures due to rapid growth of technology, increased risk-taking of world markets and increasing variations in the customers` needs. However, the process of NPD is always associated with high uncertainties and complexities. To be successful in completing NPD project, existing risks should be identified and assessed. On the other hand, the Bayesian networks as a strong approach of decision making modeling of uncertain situations has attracted many researchers in various areas. These networks provide a decision supporting system for problems with uncertainties or probable reasoning. In this paper, the available risk factors in product development have been first identified in an electric company and then, the Bayesian network has been utilized and their interrelationships have been modeled to evaluate the available risk in the process. To determine the primary and conditional probabilities of the nodes, the viewpoints of experts in this area have been applied. The available risks in this process have been divided to High (H, Medium (M and Low (L groups and analyzed by the Agena Risk software. The findings derived from software output indicate that the production of the desired product has relatively high risk. In addition, Predictive support and Diagnostic support have been performed on the model with two different scenarios..

  8. Risk Analysis of New Product Development Using Bayesian Networks

    Directory of Open Access Journals (Sweden)

    Mohammad Rahim Ramezanian

    2012-01-01

    Full Text Available The process of presenting new product development (NPD to market is of great importance due to variability of competitive rules in the business world. The product development teams face a lot of pressures due to rapid growth of technology, increased risk-taking of world markets and increasing variations in the customers` needs. However, the process of NPD is always associated with high uncertainties and complexities. To be successful in completing NPD project, existing risks should be identified and assessed. On the other hand, the Bayesian networks as a strong approach of decision making modeling of uncertain situations has attracted many researchers in various areas. These networks provide a decision supporting system for problems with uncertainties or probable reasoning. In this paper, the available risk factors in product development have been first identified in an electric company and then, the Bayesian network has been utilized and their interrelationships have been modeled to evaluate the available risk in the process. To determine the primary and conditional probabilities of the nodes, the viewpoints of experts in this area have been applied. The available risks in this process have been divided to High (H, Medium (M and Low (L groups and analyzed by the Agena Risk software. The findings derived from software output indicate that the production of the desired product has relatively high risk. In addition, Predictive support and Diagnostic support have been performed on the model with two different scenarios.

  9. Nonparametric Bayesian Clustering of Structural Whole Brain Connectivity in Full Image Resolution

    DEFF Research Database (Denmark)

    Ambrosen, Karen Marie Sandø; Albers, Kristoffer Jon; Dyrby, Tim B.;

    2014-01-01

    groups) that defines structural units at the resolution of statistical support. We apply the model to a network of structural brain connectivity in full image resolution with more than one hundred thousand regions (voxels in the gray-white matter boundary) and around one hundred million connections......Diffusion magnetic resonance imaging enables measuring the structural connectivity of the human brain at a high spatial resolution. Local noisy connectivity estimates can be derived using tractography approaches and statistical models are necessary to quantify the brain’s salient structural....... The derived clustering identifies in the order of one thousand salient structural units and we find that the identified units provide better predictive performance than predicting using the full graph or two commonly used atlases. Extracting structural units of brain connectivity at the full image resolution...

  10. Cluster Analysis of Adolescent Blogs

    Science.gov (United States)

    Liu, Eric Zhi-Feng; Lin, Chun-Hung; Chen, Feng-Yi; Peng, Ping-Chuan

    2012-01-01

    Emerging web applications and networking systems such as blogs have become popular, and they offer unique opportunities and environments for learners, especially for adolescent learners. This study attempts to explore the writing styles and genres used by adolescents in their blogs by employing content, factor, and cluster analyses. Factor…

  11. Calibration of crash risk models on freeways with limited real-time traffic data using Bayesian meta-analysis and Bayesian inference approach.

    Science.gov (United States)

    Xu, Chengcheng; Wang, Wei; Liu, Pan; Li, Zhibin

    2015-12-01

    This study aimed to develop a real-time crash risk model with limited data in China by using Bayesian meta-analysis and Bayesian inference approach. A systematic review was first conducted by using three different Bayesian meta-analyses, including the fixed effect meta-analysis, the random effect meta-analysis, and the meta-regression. The meta-analyses provided a numerical summary of the effects of traffic variables on crash risks by quantitatively synthesizing results from previous studies. The random effect meta-analysis and the meta-regression produced a more conservative estimate for the effects of traffic variables compared with the fixed effect meta-analysis. Then, the meta-analyses results were used as informative priors for developing crash risk models with limited data. Three different meta-analyses significantly affect model fit and prediction accuracy. The model based on meta-regression can increase the prediction accuracy by about 15% as compared to the model that was directly developed with limited data. Finally, the Bayesian predictive densities analysis was used to identify the outliers in the limited data. It can further improve the prediction accuracy by 5.0%.

  12. Learning Bayesian networks from big meteorological spatial datasets. An alternative to complex network analysis

    Science.gov (United States)

    Gutiérrez, Jose Manuel; San Martín, Daniel; Herrera, Sixto; Santiago Cofiño, Antonio

    2016-04-01

    The growing availability of spatial datasets (observations, reanalysis, and regional and global climate models) demands efficient multivariate spatial modeling techniques for many problems of interest (e.g. teleconnection analysis, multi-site downscaling, etc.). Complex networks have been recently applied in this context using graphs built from pairwise correlations between the different stations (or grid boxes) forming the dataset. However, this analysis does not take into account the full dependence structure underlying the data, gien by all possible marginal and conditional dependencies among the stations, and does not allow a probabilistic analysis of the dataset. In this talk we introduce Bayesian networks as an alternative multivariate analysis and modeling data-driven technique which allows building a joint probability distribution of the stations including all relevant dependencies in the dataset. Bayesian networks is a sound machine learning technique using a graph to 1) encode the main dependencies among the variables and 2) to obtain a factorization of the joint probability distribution of the stations given by a reduced number of parameters. For a particular problem, the resulting graph provides a qualitative analysis of the spatial relationships in the dataset (alternative to complex network analysis), and the resulting model allows for a probabilistic analysis of the dataset. Bayesian networks have been widely applied in many fields, but their use in climate problems is hampered by the large number of variables (stations) involved in this field, since the complexity of the existing algorithms to learn from data the graphical structure grows nonlinearly with the number of variables. In this contribution we present a modified local learning algorithm for Bayesian networks adapted to this problem, which allows inferring the graphical structure for thousands of stations (from observations) and/or gridboxes (from model simulations) thus providing new

  13. Unsupervised Transient Light Curve Analysis Via Hierarchical Bayesian Inference

    CERN Document Server

    Sanders, Nathan; Soderberg, Alicia

    2014-01-01

    Historically, light curve studies of supernovae (SNe) and other transient classes have focused on individual objects with copious and high signal-to-noise observations. In the nascent era of wide field transient searches, objects with detailed observations are decreasing as a fraction of the overall known SN population, and this strategy sacrifices the majority of the information contained in the data about the underlying population of transients. A population level modeling approach, simultaneously fitting all available observations of objects in a transient sub-class of interest, fully mines the data to infer the properties of the population and avoids certain systematic biases. We present a novel hierarchical Bayesian statistical model for population level modeling of transient light curves, and discuss its implementation using an efficient Hamiltonian Monte Carlo technique. As a test case, we apply this model to the Type IIP SN sample from the Pan-STARRS1 Medium Deep Survey, consisting of 18,837 photometr...

  14. Variational Bayesian Causal Connectivity Analysis for fMRI

    Directory of Open Access Journals (Sweden)

    Martin eLuessi

    2014-05-01

    Full Text Available The ability to accurately estimate effective connectivity among brain regions from neuroimaging data could help answering many open questions in neuroscience. We propose a method which uses causality to obtain a measure of effective connectivity from fMRI data. The method uses a vector autoregressive model for the latent variables describing neuronal activity in combination with a linear observation model based on a convolution with a hemodynamic response function. Due to the employed modeling, it is possible to efficiently estimate all latent variables of the model using a variational Bayesian inference algorithm. The computational efficiency of the method enables us to apply it to large scale problems with high sampling rates and several hundred regions of interest. We use a comprehensive empirical evaluation with synthetic and real fMRI data to evaluate the performance of our method under various conditions.

  15. A Software Risk Analysis Model Using Bayesian Belief Network

    Institute of Scientific and Technical Information of China (English)

    Yong Hu; Juhua Chen; Mei Liu; Yang Yun; Junbiao Tang

    2006-01-01

    The uncertainty during the period of software project development often brings huge risks to contractors and clients. Ifwe can find an effective method to predict the cost and quality of software projects based on facts like the project character and two-side cooperating capability at the beginning of the project, we can reduce the risk.Bayesian Belief Network(BBN) is a good tool for analyzing uncertain consequences, but it is difficult to produce precise network structure and conditional probability table. In this paper, we built up network structure by Delphi method for conditional probability table learning, and learn update probability table and nodes' confidence levels continuously according to the application cases, which made the evaluation network have learning abilities, and evaluate the software development risk of organization more accurately. This paper also introduces EM algorithm, which will enhance the ability to produce hidden nodes caused by variant software projects.

  16. Family Background Variables as Instruments for Education in Income Regressions: A Bayesian Analysis

    Science.gov (United States)

    Hoogerheide, Lennart; Block, Joern H.; Thurik, Roy

    2012-01-01

    The validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the estimation results. We show that, in case of moderate direct…

  17. Calibration of Uncertainty Analysis of the SWAT Model Using Genetic Algorithms and Bayesian Model Averaging

    Science.gov (United States)

    In this paper, the Genetic Algorithms (GA) and Bayesian model averaging (BMA) were combined to simultaneously conduct calibration and uncertainty analysis for the Soil and Water Assessment Tool (SWAT). In this hybrid method, several SWAT models with different structures are first selected; next GA i...

  18. A Bayesian multidimensional scaling procedure for the spatial analysis of revealed choice data

    NARCIS (Netherlands)

    DeSarbo, WS; Kim, Y; Fong, D

    1999-01-01

    We present a new Bayesian formulation of a vector multidimensional scaling procedure for the spatial analysis of binary choice data. The Gibbs sampler is gainfully employed to estimate the posterior distribution of the specified scalar products, bilinear model parameters. The computational procedure

  19. Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences.

    Science.gov (United States)

    Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric

    2016-01-01

    Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach. PMID:27314566

  20. Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences.

    Science.gov (United States)

    Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric

    2016-01-01

    Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach.

  1. Micronutrients in HIV: a Bayesian meta-analysis.

    Directory of Open Access Journals (Sweden)

    George M Carter

    Full Text Available Approximately 28.5 million people living with HIV are eligible for treatment (CD4<500, but currently have no access to antiretroviral therapy. Reduced serum level of micronutrients is common in HIV disease. Micronutrient supplementation (MNS may mitigate disease progression and mortality.We synthesized evidence on the effect of micronutrient supplementation on mortality and rate of disease progression in HIV disease.We searched MEDLINE, EMBASE, the Cochrane Central, AMED and CINAHL databases through December 2014, without language restriction, for studies of greater than 3 micronutrients versus any or no comparator. We built a hierarchical Bayesian random effects model to synthesize results. Inferences are based on the posterior distribution of the population effects; posterior distributions were approximated by Markov chain Monte Carlo in OpenBugs.From 2166 initial references, we selected 49 studies for full review and identified eight reporting on disease progression and/or mortality. Bayesian synthesis of data from 2,249 adults in three studies estimated the relative risk of disease progression in subjects on MNS vs. control as 0.62 (95% credible interval, 0.37, 0.96. Median number needed to treat is 8.4 (4.8, 29.9 and the Bayes Factor 53.4. Based on data reporting on 4,095 adults reporting mortality in 7 randomized controlled studies, the RR was 0.84 (0.38, 1.85, NNT is 25 (4.3, ∞.MNS significantly and substantially slows disease progression in HIV+ adults not on ARV, and possibly reduces mortality. Micronutrient supplements are effective in reducing progression with a posterior probability of 97.9%. Considering MNS low cost and lack of adverse effects, MNS should be standard of care for HIV+ adults not yet on ARV.

  2. Direct message passing for hybrid Bayesian networks and performance analysis

    Science.gov (United States)

    Sun, Wei; Chang, K. C.

    2010-04-01

    Probabilistic inference for hybrid Bayesian networks, which involves both discrete and continuous variables, has been an important research topic over the recent years. This is not only because a number of efficient inference algorithms have been developed and used maturely for simple types of networks such as pure discrete model, but also for the practical needs that continuous variables are inevitable in modeling complex systems. Pearl's message passing algorithm provides a simple framework to compute posterior distribution by propagating messages between nodes and can provides exact answer for polytree models with pure discrete or continuous variables. In addition, applying Pearl's message passing to network with loops usually converges and results in good approximation. However, for hybrid model, there is a need of a general message passing algorithm between different types of variables. In this paper, we develop a method called Direct Message Passing (DMP) for exchanging messages between discrete and continuous variables. Based on Pearl's algorithm, we derive formulae to compute messages for variables in various dependence relationships encoded in conditional probability distributions. Mixture of Gaussian is used to represent continuous messages, with the number of mixture components up to the size of the joint state space of all discrete parents. For polytree Conditional Linear Gaussian (CLG) Bayesian network, DMP has the same computational requirements and can provide exact solution as the one obtained by the Junction Tree (JT) algorithm. However, while JT can only work for the CLG model, DMP can be applied for general nonlinear, non-Gaussian hybrid model to produce approximate solution using unscented transformation and loopy propagation. Furthermore, we can scale the algorithm by restricting the number of mixture components in the messages. Empirically, we found that the approximation errors are relatively small especially for nodes that are far away from

  3. A Pragmatic Bayesian Perspective on Correlation Analysis. The exoplanetary gravity - stellar activity case

    Science.gov (United States)

    Figueira, P.; Faria, J. P.; Adibekyan, V. Zh.; Oshagh, M.; Santos, N. C.

    2016-11-01

    We apply the Bayesian framework to assess the presence of a correlation between two quantities. To do so, we estimate the probability distribution of the parameter of interest, ρ, characterizing the strength of the correlation. We provide an implementation of these ideas and concepts using python programming language and the pyMC module in a very short (˜ 130 lines of code, heavily commented) and user-friendly program. We used this tool to assess the presence and properties of the correlation between planetary surface gravity and stellar activity level as measured by the log(R^' }_{ {HK}}) indicator. The results of the Bayesian analysis are qualitatively similar to those obtained via p-value analysis, and support the presence of a correlation in the data. The results are more robust in their derivation and more informative, revealing interesting features such as asymmetric posterior distributions or markedly different credible intervals, and allowing for a deeper exploration. We encourage the reader interested in this kind of problem to apply our code to his/her own scientific problems. The full understanding of what the Bayesian framework is can only be gained through the insight that comes by handling priors, assessing the convergence of Monte Carlo runs, and a multitude of other practical problems. We hope to contribute so that Bayesian analysis becomes a tool in the toolkit of researchers, and they understand by experience its advantages and limitations.

  4. A Pragmatic Bayesian Perspective on Correlation Analysis - The exoplanetary gravity - stellar activity case

    Science.gov (United States)

    Figueira, P.; Faria, J. P.; Adibekyan, V. Zh.; Oshagh, M.; Santos, N. C.

    2016-05-01

    We apply the Bayesian framework to assess the presence of a correlation between two quantities. To do so, we estimate the probability distribution of the parameter of interest, ρ, characterizing the strength of the correlation. We provide an implementation of these ideas and concepts using python programming language and the pyMC module in a very short (˜ 130 lines of code, heavily commented) and user-friendly program. We used this tool to assess the presence and properties of the correlation between planetary surface gravity and stellar activity level as measured by the log( R^' }_{{HK}}) indicator. The results of the Bayesian analysis are qualitatively similar to those obtained via p-value analysis, and support the presence of a correlation in the data. The results are more robust in their derivation and more informative, revealing interesting features such as asymmetric posterior distributions or markedly different credible intervals, and allowing for a deeper exploration. We encourage the reader interested in this kind of problem to apply our code to his/her own scientific problems. The full understanding of what the Bayesian framework is can only be gained through the insight that comes by handling priors, assessing the convergence of Monte Carlo runs, and a multitude of other practical problems. We hope to contribute so that Bayesian analysis becomes a tool in the toolkit of researchers, and they understand by experience its advantages and limitations.

  5. A fully Bayesian before-after analysis of permeable friction course (PFC) pavement wet weather safety.

    Science.gov (United States)

    Buddhavarapu, Prasad; Smit, Andre F; Prozzi, Jorge A

    2015-07-01

    Permeable friction course (PFC), a porous hot-mix asphalt, is typically applied to improve wet weather safety on high-speed roadways in Texas. In order to warrant expensive PFC construction, a statistical evaluation of its safety benefits is essential. Generally, the literature on the effectiveness of porous mixes in reducing wet-weather crashes is limited and often inconclusive. In this study, the safety effectiveness of PFC was evaluated using a fully Bayesian before-after safety analysis. First, two groups of road segments overlaid with PFC and non-PFC material were identified across Texas; the non-PFC or reference road segments selected were similar to their PFC counterparts in terms of site specific features. Second, a negative binomial data generating process was assumed to model the underlying distribution of crash counts of PFC and reference road segments to perform Bayesian inference on the safety effectiveness. A data-augmentation based computationally efficient algorithm was employed for a fully Bayesian estimation. The statistical analysis shows that PFC is not effective in reducing wet weather crashes. It should be noted that the findings of this study are in agreement with the existing literature, although these studies were not based on a fully Bayesian statistical analysis. Our study suggests that the safety effectiveness of PFC road surfaces, or any other safety infrastructure, largely relies on its interrelationship with the road user. The results suggest that the safety infrastructure must be properly used to reap the benefits of the substantial investments. PMID:25897515

  6. Bayesian dose-response analysis for epidemiological studies with complex uncertainty in dose estimation.

    Science.gov (United States)

    Kwon, Deukwoo; Hoffman, F Owen; Moroz, Brian E; Simon, Steven L

    2016-02-10

    Most conventional risk analysis methods rely on a single best estimate of exposure per person, which does not allow for adjustment for exposure-related uncertainty. Here, we propose a Bayesian model averaging method to properly quantify the relationship between radiation dose and disease outcomes by accounting for shared and unshared uncertainty in estimated dose. Our Bayesian risk analysis method utilizes multiple realizations of sets (vectors) of doses generated by a two-dimensional Monte Carlo simulation method that properly separates shared and unshared errors in dose estimation. The exposure model used in this work is taken from a study of the risk of thyroid nodules among a cohort of 2376 subjects who were exposed to fallout from nuclear testing in Kazakhstan. We assessed the performance of our method through an extensive series of simulations and comparisons against conventional regression risk analysis methods. When the estimated doses contain relatively small amounts of uncertainty, the Bayesian method using multiple a priori plausible draws of dose vectors gave similar results to the conventional regression-based methods of dose-response analysis. However, when large and complex mixtures of shared and unshared uncertainties are present, the Bayesian method using multiple dose vectors had significantly lower relative bias than conventional regression-based risk analysis methods and better coverage, that is, a markedly increased capability to include the true risk coefficient within the 95% credible interval of the Bayesian-based risk estimate. An evaluation of the dose-response using our method is presented for an epidemiological study of thyroid disease following radiation exposure.

  7. Competing risk models in reliability systems, a weibull distribution model with bayesian analysis approach

    Science.gov (United States)

    Iskandar, Ismed; Satria Gondokaryono, Yudi

    2016-02-01

    In reliability theory, the most important problem is to determine the reliability of a complex system from the reliability of its components. The weakness of most reliability theories is that the systems are described and explained as simply functioning or failed. In many real situations, the failures may be from many causes depending upon the age and the environment of the system and its components. Another problem in reliability theory is one of estimating the parameters of the assumed failure models. The estimation may be based on data collected over censored or uncensored life tests. In many reliability problems, the failure data are simply quantitatively inadequate, especially in engineering design and maintenance system. The Bayesian analyses are more beneficial than the classical one in such cases. The Bayesian estimation analyses allow us to combine past knowledge or experience in the form of an apriori distribution with life test data to make inferences of the parameter of interest. In this paper, we have investigated the application of the Bayesian estimation analyses to competing risk systems. The cases are limited to the models with independent causes of failure by using the Weibull distribution as our model. A simulation is conducted for this distribution with the objectives of verifying the models and the estimators and investigating the performance of the estimators for varying sample size. The simulation data are analyzed by using Bayesian and the maximum likelihood analyses. The simulation results show that the change of the true of parameter relatively to another will change the value of standard deviation in an opposite direction. For a perfect information on the prior distribution, the estimation methods of the Bayesian analyses are better than those of the maximum likelihood. The sensitivity analyses show some amount of sensitivity over the shifts of the prior locations. They also show the robustness of the Bayesian analysis within the range

  8. Clustering analysis of telecommunication customers

    Institute of Scientific and Technical Information of China (English)

    REN Hong; ZHENG Yan; WU Ye-rong

    2009-01-01

    In this article, a clustering method based on genetic algorithm (GA) for telecommunication customer subdivision is presented. First, the features of telecommunication customers (such as the calling behavior and consuming behavior) are extracted. Second, the similarities between the multidimensional feature vectors of telecommunication customers are computed and mapped as the distance between samples on a two-dimensional plane. Finally, the distances are adjusted to approximate the similarities gradually by GA. One advantage of this method is the independent distribution of the sample space. The experiments demonstrate the feasibility of the proposed method.

  9. Bayesian Analysis of Cosmic Ray Propagation: Evidence against Homogeneous Diffusion

    Science.gov (United States)

    Jóhannesson, G.; Ruiz de Austri, R.; Vincent, A. C.; Moskalenko, I. V.; Orlando, E.; Porter, T. A.; Strong, A. W.; Trotta, R.; Feroz, F.; Graff, P.; Hobson, M. P.

    2016-06-01

    We present the results of the most complete scan of the parameter space for cosmic ray (CR) injection and propagation. We perform a Bayesian search of the main GALPROP parameters, using the MultiNest nested sampling algorithm, augmented by the BAMBI neural network machine-learning package. This is the first study to separate out low-mass isotopes (p, \\bar{p}, and He) from the usual light elements (Be, B, C, N, and O). We find that the propagation parameters that best-fit p,\\bar{p}, and He data are significantly different from those that fit light elements, including the B/C and 10Be/9Be secondary-to-primary ratios normally used to calibrate propagation parameters. This suggests that each set of species is probing a very different interstellar medium, and that the standard approach of calibrating propagation parameters using B/C can lead to incorrect results. We present posterior distributions and best-fit parameters for propagation of both sets of nuclei, as well as for the injection abundances of elements from H to Si. The input GALDEF files with these new parameters will be included in an upcoming public GALPROP update.

  10. On the On-Off Problem: An Objective Bayesian Analysis

    CERN Document Server

    Ahnen, Max Ludwig

    2015-01-01

    The On-Off problem, aka. Li-Ma problem, is a statistical problem where a measured rate is the sum of two parts. The first is due to a signal and the second due to a background, both of which are unknown. Mostly frequentist solutions are being used that are only adequate for high count numbers. When the events are rare such an approximation is not good enough. Indeed, in high-energy astrophysics this is often the rule rather than the exception. I will present a universal objective Bayesian solution that depends only on the initial three parameters of the On-Off problem: the number of events in the "on" region, the number of events in the "off" region, and their ratio-of-exposure. With a two-step approach it is possible to infer the signal's significance, strength, uncertainty or upper limit in a unified a way. The approach is valid without restrictions for any count number including zero and may be widely applied in particle physics, cosmic-ray physics and high-energy astrophysics. I apply the method to Gamma ...

  11. Filtering Genes for Cluster and Network Analysis

    Directory of Open Access Journals (Sweden)

    Parkhomenko Elena

    2009-06-01

    Full Text Available Abstract Background Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias. Results This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks. Conclusion The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.

  12. Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video.

    Science.gov (United States)

    Bastani, Vahid; Marcenaro, Lucio; Regazzoni, Carlo S

    2016-05-01

    A method for online incremental mining of activity patterns from the surveillance video stream is presented in this paper. The framework consists of a learning block in which Dirichlet process mixture model is employed for the incremental clustering of trajectories. Stochastic trajectory pattern models are formed using the Gaussian process regression of the corresponding flow functions. Moreover, a sequential Monte Carlo method based on Rao-Blackwellized particle filter is proposed for tracking and online classification as well as the detection of abnormality during the observation of an object. Experimental results on real surveillance video data are provided to show the performance of the proposed algorithm in different tasks of trajectory clustering, classification, and abnormality detection. PMID:26978823

  13. Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler*

    OpenAIRE

    Jin, Ick Hoon; Yuan, Ying; Liang, Faming

    2013-01-01

    Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulati...

  14. OVERALL SENSITIVITY ANALYSIS UTILIZING BAYESIAN NETWORK FOR THE QUESTIONNAIRE INVESTIGATION ON SNS

    OpenAIRE

    Tsuyoshi Aburai; Kazuhiro Takeyasu

    2013-01-01

    Social Networking Service (SNS) is prevailing rapidly in Japan in recent years. The most popular ones are Facebook, mixi, and Twitter, which are utilized in various fields of life together with the convenient tool such as smart-phone. In this work, a questionnaire investigation is carried out in order to clarify the current usage condition, issues and desired functions. More than 1,000 samples are gathered. Bayesian network is utilized for this analysis. Sensitivity analysis is carried out by...

  15. UNSUPERVISED TRANSIENT LIGHT CURVE ANALYSIS VIA HIERARCHICAL BAYESIAN INFERENCE

    Energy Technology Data Exchange (ETDEWEB)

    Sanders, N. E.; Soderberg, A. M. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Betancourt, M., E-mail: nsanders@cfa.harvard.edu [Department of Statistics, University of Warwick, Coventry CV4 7AL (United Kingdom)

    2015-02-10

    Historically, light curve studies of supernovae (SNe) and other transient classes have focused on individual objects with copious and high signal-to-noise observations. In the nascent era of wide field transient searches, objects with detailed observations are decreasing as a fraction of the overall known SN population, and this strategy sacrifices the majority of the information contained in the data about the underlying population of transients. A population level modeling approach, simultaneously fitting all available observations of objects in a transient sub-class of interest, fully mines the data to infer the properties of the population and avoids certain systematic biases. We present a novel hierarchical Bayesian statistical model for population level modeling of transient light curves, and discuss its implementation using an efficient Hamiltonian Monte Carlo technique. As a test case, we apply this model to the Type IIP SN sample from the Pan-STARRS1 Medium Deep Survey, consisting of 18,837 photometric observations of 76 SNe, corresponding to a joint posterior distribution with 9176 parameters under our model. Our hierarchical model fits provide improved constraints on light curve parameters relevant to the physical properties of their progenitor stars relative to modeling individual light curves alone. Moreover, we directly evaluate the probability for occurrence rates of unseen light curve characteristics from the model hyperparameters, addressing observational biases in survey methodology. We view this modeling framework as an unsupervised machine learning technique with the ability to maximize scientific returns from data to be collected by future wide field transient searches like LSST.

  16. Clustering analysis of seismicity and aftershock identification.

    Science.gov (United States)

    Zaliapin, Ilya; Gabrielov, Andrei; Keilis-Borok, Vladimir; Wong, Henry

    2008-07-01

    We introduce a statistical methodology for clustering analysis of seismicity in the time-space-energy domain and use it to establish the existence of two statistically distinct populations of earthquakes: clustered and nonclustered. This result can be used, in particular, for nonparametric aftershock identification. The proposed approach expands the analysis of Baiesi and Paczuski [Phys. Rev. E 69, 066106 (2004)10.1103/PhysRevE.69.066106] based on the space-time-magnitude nearest-neighbor distance eta between earthquakes. We show that for a homogeneous Poisson marked point field with exponential marks, the distance eta has the Weibull distribution, which bridges our results with classical correlation analysis for point fields. The joint 2D distribution of spatial and temporal components of eta is used to identify the clustered part of a point field. The proposed technique is applied to several seismicity models and to the observed seismicity of southern California.

  17. Learning Bayesian Networks from Correlated Data

    Science.gov (United States)

    Bae, Harold; Monti, Stefano; Montano, Monty; Steinberg, Martin H.; Perls, Thomas T.; Sebastiani, Paola

    2016-05-01

    Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.

  18. Application of Bayesian Network Learning Methods to Land Resource Evaluation

    Institute of Scientific and Technical Information of China (English)

    HUANG Jiejun; HE Xiaorong; WAN Youchuan

    2006-01-01

    Bayesian network has a powerful ability for reasoning and semantic representation, which combined with qualitative analysis and quantitative analysis, with prior knowledge and observed data, and provides an effective way to deal with prediction, classification and clustering. Firstly, this paper presented an overview of Bayesian network and its characteristics, and discussed how to learn a Bayesian network structure from given data, and then constructed a Bayesian network model for land resource evaluation with expert knowledge and the dataset. The experimental results based on the test dataset are that evaluation accuracy is 87.5%, and Kappa index is 0.826. All these prove the method is feasible and efficient, and indicate that Bayesian network is a promising approach for land resource evaluation.

  19. Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures.

    Science.gov (United States)

    Orbanz, Peter; Roy, Daniel M

    2015-02-01

    The natural habitat of most Bayesian methods is data represented by exchangeable sequences of observations, for which de Finetti's theorem provides the theoretical foundation. Dirichlet process clustering, Gaussian process regression, and many other parametric and nonparametric Bayesian models fall within the remit of this framework; many problems arising in modern data analysis do not. This article provides an introduction to Bayesian models of graphs, matrices, and other data that can be modeled by random structures. We describe results in probability theory that generalize de Finetti's theorem to such data and discuss their relevance to nonparametric Bayesian modeling. With the basic ideas in place, we survey example models available in the literature; applications of such models include collaborative filtering, link prediction, and graph and network analysis. We also highlight connections to recent developments in graph theory and probability, and sketch the more general mathematical foundation of Bayesian methods for other types of data beyond sequences and arrays. PMID:26353253

  20. Application of Bayesian graphs to SN Ia data analysis and compression

    Science.gov (United States)

    Ma, Cong; Corasaniti, Pier-Stefano; Bassett, Bruce A.

    2016-08-01

    Bayesian graphical models are an efficient tool for modelling complex data and derive self-consistent expressions of the posterior distribution of model parameters. We apply Bayesian graphs to perform statistical analyses of Type Ia supernova (SN Ia) luminosity distance measurements from the Joint Light-curve Analysis (JLA) dataset (Betoule et al. 2014). In contrast to the χ2 approach used in previous studies, the Bayesian inference allows us to fully account for the standard-candle parameter dependence of the data covariance matrix. Comparing with χ2 analysis results we find a systematic offset of the marginal model parameter bounds. We demonstrate that the bias is statistically significant in the case of the SN Ia standardization parameters with a maximal 6σ shift of the SN light-curve colour correction. In addition, we find that the evidence for a host galaxy correction is now only 2.4σ. Systematic offsets on the cosmological parameters remain small, but may increase by combining constraints from complementary cosmological probes. The bias of the χ2 analysis is due to neglecting the parameter-dependent log-determinant of the data covariance, which gives more statistical weight to larger values of the standardization parameters. We find a similar effect on compressed distance modulus data. To this end we implement a fully consistent compression method of the JLA dataset that uses a Gaussian approximation of the posterior distribution for fast generation of compressed data. Overall, the results of our analysis emphasize the need for a fully consistent Bayesian statistical approach in the analysis of future large SN Ia datasets.

  1. Bayesian Statistical Analysis Applied to NAA Data for Neutron Flux Spectrum Determination

    Science.gov (United States)

    Chiesa, D.; Previtali, E.; Sisti, M.

    2014-04-01

    In this paper, we present a statistical method, based on Bayesian statistics, to evaluate the neutron flux spectrum from the activation data of different isotopes. The experimental data were acquired during a neutron activation analysis (NAA) experiment [A. Borio di Tigliole et al., Absolute flux measurement by NAA at the Pavia University TRIGA Mark II reactor facilities, ENC 2012 - Transactions Research Reactors, ISBN 978-92-95064-14-0, 22 (2012)] performed at the TRIGA Mark II reactor of Pavia University (Italy). In order to evaluate the neutron flux spectrum, subdivided in energy groups, we must solve a system of linear equations containing the grouped cross sections and the activation rate data. We solve this problem with Bayesian statistical analysis, including the uncertainties of the coefficients and the a priori information about the neutron flux. A program for the analysis of Bayesian hierarchical models, based on Markov Chain Monte Carlo (MCMC) simulations, is used to define the problem statistical model and solve it. The energy group fluxes and their uncertainties are then determined with great accuracy and the correlations between the groups are analyzed. Finally, the dependence of the results on the prior distribution choice and on the group cross section data is investigated to confirm the reliability of the analysis.

  2. A pragmatic Bayesian perspective on correlation analysis: The exoplanetary gravity - stellar activity case

    CERN Document Server

    Figueira, P; Adibekyan, V Zh; Oshagh, M; Santos, N C

    2016-01-01

    We apply the Bayesian framework to assess the presence of a correlation between two quantities. To do so, we estimate the probability distribution of the parameter of interest, $\\rho$, characterizing the strength of the correlation. We provide an implementation of these ideas and concepts using python programming language and the pyMC module in a very short ($\\sim$130 lines of code, heavily commented) and user-friendly program. We used this tool to assess the presence and properties of the correlation between planetary surface gravity and stellar activity level as measured by the log($R'_{\\mathrm{HK}}$) indicator. The results of the Bayesian analysis are qualitatively similar to those obtained via p-value analysis, and support the presence of a correlation in the data. The results are more robust in their derivation and more informative, revealing interesting features such as asymmetric posterior distributions or markedly different credible intervals, and allowing for a deeper exploration. We encourage the re...

  3. Hierarchical genetic clusters for phenotypic analysis

    Directory of Open Access Journals (Sweden)

    Luiza Barbosa da Matta

    2015-10-01

    Full Text Available Methods to obtain phenotypic information were evaluated to help breeders choosing the best methodology for analysis of genetic diversity in backcross populations. Phenotypes were simulated for 13 characteristics generated in 10 populations with 100 individuals each. Genotypic information was generated from 100 loci of which 20 were taken at random to determine the characteristics expressing two alleles. Dissimilarity measures were calculated, and genetic diversity was analyzed through hierarchical clustering and graphic projection of the distances. A backcross was performed from the two most divergent populations. A set of characteristics with variable heritability was taken into account. The environmental effect was simulated assuming . For hierarchical clusters, the following methods were used: Gower Method, average linkage within the cluster, average linkage among clusters, the furthest neighbor method, the nearest neighbor method, Ward’s method, and the median method. The environmental effect and heritability of the analyzed variables had an influence on the pattern of hierarchical clustering populations according to the backcrossed generations. The nearest neighbor method was the most efficient in reconstructing the system of backcrossing, and it presented the highest cophenetic correlation. The efficiency of the nearest neighbor method was the highest when the analysis involved characteristics of high heritability.

  4. Cluster and constraint analysis in tetrahedron packings.

    Science.gov (United States)

    Jin, Weiwei; Lu, Peng; Liu, Lufeng; Li, Shuixiang

    2015-04-01

    The disordered packings of tetrahedra often show no obvious macroscopic orientational or positional order for a wide range of packing densities, and it has been found that the local order in particle clusters is the main order form of tetrahedron packings. Therefore, a cluster analysis is carried out to investigate the local structures and properties of tetrahedron packings in this work. We obtain a cluster distribution of differently sized clusters, and peaks are observed at two special clusters, i.e., dimer and wagon wheel. We then calculate the amounts of dimers and wagon wheels, which are observed to have linear or approximate linear correlations with packing density. Following our previous work, the amount of particles participating in dimers is used as an order metric to evaluate the order degree of the hierarchical packing structure of tetrahedra, and an order map is consequently depicted. Furthermore, a constraint analysis is performed to determine the isostatic or hyperstatic region in the order map. We employ a Monte Carlo algorithm to test jamming and then suggest a new maximally random jammed packing of hard tetrahedra from the order map with a packing density of 0.6337.

  5. Semiadaptive Fault Diagnosis via Variational Bayesian Mixture Factor Analysis with Application to Wastewater Treatment

    OpenAIRE

    Hongjun Xiao; Yiqi Liu; Daoping Huang

    2016-01-01

    Mainly due to the hostile environment in wastewater plants (WWTPs), the reliability of sensors with respect to important qualities is often poor. In this work, we present the design of a semiadaptive fault diagnosis method based on the variational Bayesian mixture factor analysis (VBMFA) to support process monitoring. The proposed method is capable of capturing strong nonlinearity and the significant dynamic feature of WWTPs that seriously limit the application of conventional multivariate st...

  6. A Hybrid Approach for Reliability Analysis Based on Analytic Hierarchy Process and Bayesian Network

    OpenAIRE

    Zubair, Muhammad

    2014-01-01

    By using analytic hierarchy process (AHP) and Bayesian Network (BN) the present research signifies the technical and non-technical issues of nuclear accidents. The study exposed that the technical faults was one major reason of these accidents. Keep an eye on other point of view it becomes clearer that human behavior like dishonesty, insufficient training, and selfishness are also play a key role to cause these accidents. In this study, a hybrid approach for reliability analysis based on AHP ...

  7. Multiple SNP-sets Analysis for Genome-wide Association Studies through Bayesian Latent Variable Selection

    OpenAIRE

    Lu, Zhaohua; Zhu, Hongtu; Knickmeyer, Rebecca C.; Sullivan, Patrick F.; Stephanie, Williams N.; Zou, Fei

    2015-01-01

    The power of genome-wide association studies (GWAS) for mapping complex traits with single SNP analysis may be undermined by modest SNP effect sizes, unobserved causal SNPs, correlation among adjacent SNPs, and SNP-SNP interactions. Alternative approaches for testing the association between a single SNP-set and individual phenotypes have been shown to be promising for improving the power of GWAS. We propose a Bayesian latent variable selection (BLVS) method to simultaneously model the joint a...

  8. conting : an R package for Bayesian analysis of complete and incomplete contingency tables

    OpenAIRE

    Overstall, Antony M.; Ruth King

    2014-01-01

    The aim of this paper is to demonstrate the R package conting for the Bayesian analysis of complete and incomplete contingency tables using hierarchical log-linear models. This package allows a user to identify interactions between categorical factors (via complete contingency tables) and to estimate closed population sizes using capture-recapture studies (via incomplete contingency tables). The models are fitted using Markov chain Monte Carlo methods. In particular, implementations of the ...

  9. Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

    OpenAIRE

    Refik Soyer; M. Murat Tarimcilar

    2008-01-01

    In this paper, we present a modulated Poisson process model to describe and analyze arrival data to a call center. The attractive feature of this model is that it takes into account both covariate and time effects on the call volume intensity, and in so doing, enables us to assess the effectiveness of different advertising strategies along with predicting the arrival patterns. A Bayesian analysis of the model is developed and an extension of the model is presented to describe potential hetero...

  10. Bayesian non-negative factor analysis for reconstructing transcription factor mediated regulatory networks

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2011-10-01

    Full Text Available Abstract Background Transcriptional regulation by transcription factor (TF controls the time and abundance of mRNA transcription. Due to the limitation of current proteomics technologies, large scale measurements of protein level activities of TFs is usually infeasible, making computational reconstruction of transcriptional regulatory network a difficult task. Results We proposed here a novel Bayesian non-negative factor model for TF mediated regulatory networks. Particularly, the non-negative TF activities and sample clustering effect are modeled as the factors from a Dirichlet process mixture of rectified Gaussian distributions, and the sparse regulatory coefficients are modeled as the loadings from a sparse distribution that constrains its sparsity using knowledge from database; meantime, a Gibbs sampling solution was developed to infer the underlying network structure and the unknown TF activities simultaneously. The developed approach has been applied to simulated system and breast cancer gene expression data. Result shows that, the proposed method was able to systematically uncover TF mediated transcriptional regulatory network structure, the regulatory coefficients, the TF protein level activities and the sample clustering effect. The regulation target prediction result is highly coordinated with the prior knowledge, and sample clustering result shows superior performance over previous molecular based clustering method. Conclusions The results demonstrated the validity and effectiveness of the proposed approach in reconstructing transcriptional networks mediated by TFs through simulated systems and real data.

  11. Clustering

    Directory of Open Access Journals (Sweden)

    Jinfei Liu

    2013-04-01

    Full Text Available DBSCAN is a well-known density-based clustering algorithm which offers advantages for finding clusters of arbitrary shapes compared to partitioning and hierarchical clustering methods. However, there are few papers studying the DBSCAN algorithm under the privacy preserving distributed data mining model, in which the data is distributed between two or more parties, and the parties cooperate to obtain the clustering results without revealing the data at the individual parties. In this paper, we address the problem of two-party privacy preserving DBSCAN clustering. We first propose two protocols for privacy preserving DBSCAN clustering over horizontally and vertically partitioned data respectively and then extend them to arbitrarily partitioned data. We also provide performance analysis and privacy proof of our solution..

  12. Bayesian Propensity Score Analysis: Simulation and Case Study

    Science.gov (United States)

    Kaplan, David; Chen, Cassie J. S.

    2011-01-01

    Propensity score analysis (PSA) has been used in a variety of settings, such as education, epidemiology, and sociology. Most typically, propensity score analysis has been implemented within the conventional frequentist perspective of statistics. This perspective, as is well known, does not account for uncertainty in either the parameters of the…

  13. A Bayesian analysis of rare B decays with advanced Monte Carlo methods

    Energy Technology Data Exchange (ETDEWEB)

    Beaujean, Frederik

    2012-11-12

    Searching for new physics in rare B meson decays governed by b {yields} s transitions, we perform a model-independent global fit of the short-distance couplings C{sub 7}, C{sub 9}, and C{sub 10} of the {Delta}B=1 effective field theory. We assume the standard-model set of b {yields} s{gamma} and b {yields} sl{sup +}l{sup -} operators with real-valued C{sub i}. A total of 59 measurements by the experiments BaBar, Belle, CDF, CLEO, and LHCb of observables in B{yields}K{sup *}{gamma}, B{yields}K{sup (*)}l{sup +}l{sup -}, and B{sub s}{yields}{mu}{sup +}{mu}{sup -} decays are used in the fit. Our analysis is the first of its kind to harness the full power of the Bayesian approach to probability theory. All main sources of theory uncertainty explicitly enter the fit in the form of nuisance parameters. We make optimal use of the experimental information to simultaneously constrain theWilson coefficients as well as hadronic form factors - the dominant theory uncertainty. Generating samples from the posterior probability distribution to compute marginal distributions and predict observables by uncertainty propagation is a formidable numerical challenge for two reasons. First, the posterior has multiple well separated maxima and degeneracies. Second, the computation of the theory predictions is very time consuming. A single posterior evaluation requires O(1s), and a few million evaluations are needed. Population Monte Carlo (PMC) provides a solution to both issues; a mixture density is iteratively adapted to the posterior, and samples are drawn in a massively parallel way using importance sampling. The major shortcoming of PMC is the need for cogent knowledge of the posterior at the initial stage. In an effort towards a general black-box Monte Carlo sampling algorithm, we present a new method to extract the necessary information in a reliable and automatic manner from Markov chains with the help of hierarchical clustering. Exploiting the latest 2012 measurements, the fit

  14. A Bayesian analysis of rare B decays with advanced Monte Carlo methods

    International Nuclear Information System (INIS)

    Searching for new physics in rare B meson decays governed by b → s transitions, we perform a model-independent global fit of the short-distance couplings C7, C9, and C10 of the ΔB=1 effective field theory. We assume the standard-model set of b → sγ and b → sl+l- operators with real-valued Ci. A total of 59 measurements by the experiments BaBar, Belle, CDF, CLEO, and LHCb of observables in B→K*γ, B→K(*)l+l-, and Bs→μ+μ- decays are used in the fit. Our analysis is the first of its kind to harness the full power of the Bayesian approach to probability theory. All main sources of theory uncertainty explicitly enter the fit in the form of nuisance parameters. We make optimal use of the experimental information to simultaneously constrain theWilson coefficients as well as hadronic form factors - the dominant theory uncertainty. Generating samples from the posterior probability distribution to compute marginal distributions and predict observables by uncertainty propagation is a formidable numerical challenge for two reasons. First, the posterior has multiple well separated maxima and degeneracies. Second, the computation of the theory predictions is very time consuming. A single posterior evaluation requires O(1s), and a few million evaluations are needed. Population Monte Carlo (PMC) provides a solution to both issues; a mixture density is iteratively adapted to the posterior, and samples are drawn in a massively parallel way using importance sampling. The major shortcoming of PMC is the need for cogent knowledge of the posterior at the initial stage. In an effort towards a general black-box Monte Carlo sampling algorithm, we present a new method to extract the necessary information in a reliable and automatic manner from Markov chains with the help of hierarchical clustering. Exploiting the latest 2012 measurements, the fit reveals a flipped-sign solution in addition to a standard-model-like solution for the couplings Ci. The two solutions are related

  15. Dating ancient Chinese celadon porcelain by neutron activation analysis and bayesian classification

    International Nuclear Information System (INIS)

    Dating ancient Chinese porcelain is one of the most important and difficult problems in porcelain archaeological field. Eighteen elements in bodies of ancient celadon porcelains fired in Southern Song to Yuan period (AD 1127-1368) and Ming dynasty (AD 1368-1644), including La, Sm, U, Ce, etc., were determined by neutron activation analysis (NAA). After the outliers of experimental data were excluded and multivariate normal distribution was tested, and Bayesian classification was used for dating of 165 ancient celadon porcelain samples. The results show that 98.2% of total ancient celadon porcelain samples are classified correctly. It means that NAA and Bayesian classification are very useful for dating ancient porcelain. (authors)

  16. Bayesian analysis of longitudinal Johne's disease diagnostic data without a gold standard test

    DEFF Research Database (Denmark)

    Wang, C.; Turnbull, B.W.; Nielsen, Søren Saxmose;

    2011-01-01

    A Bayesian methodology was developed based on a latent change-point model to evaluate the performance of milk ELISA and fecal culture tests for longitudinal Johne's disease diagnostic data. The situation of no perfect reference test was considered; that is, no “gold standard.” A change......-point process with a Weibull survival hazard function was used to model the progression of the hidden disease status. The model adjusted for the fixed effects of covariate variables and random effects of subject on the diagnostic testing procedure. Markov chain Monte Carlo methods were used to compute....... An application is presented to an analysis of ELISA and fecal culture test outcomes in the diagnostic testing of paratuberculosis (Johne's disease) for a Danish longitudinal study from January 2000 to March 2003. The posterior probability criterion based on the Bayesian model with 4 repeated observations has...

  17. Risks Analysis of Logistics Financial Business Based on Evidential Bayesian Network

    Directory of Open Access Journals (Sweden)

    Ying Yan

    2013-01-01

    Full Text Available Risks in logistics financial business are identified and classified. Making the failure of the business as the root node, a Bayesian network is constructed to measure the risk levels in the business. Three importance indexes are calculated to find the most important risks in the business. And more, considering the epistemic uncertainties in the risks, evidence theory associate with Bayesian network is used as an evidential network in the risk analysis of logistics finance. To find how much uncertainty in root node is produced by each risk, a new index, epistemic importance, is defined. Numerical examples show that the proposed methods could provide a lot of useful information. With the information, effective approaches could be found to control and avoid these sensitive risks, thus keep logistics financial business working more reliable. The proposed method also gives a quantitative measure of risk levels in logistics financial business, which provides guidance for the selection of financing solutions.

  18. MorePower 6.0 for ANOVA with relational confidence intervals and Bayesian analysis.

    Science.gov (United States)

    Campbell, Jamie I D; Thompson, Valerie A

    2012-12-01

    MorePower 6.0 is a flexible freeware statistical calculator that computes sample size, effect size, and power statistics for factorial ANOVA designs. It also calculates relational confidence intervals for ANOVA effects based on formulas from Jarmasz and Hollands (Canadian Journal of Experimental Psychology 63:124-138, 2009), as well as Bayesian posterior probabilities for the null and alternative hypotheses based on formulas in Masson (Behavior Research Methods 43:679-690, 2011). The program is unique in affording direct comparison of these three approaches to the interpretation of ANOVA tests. Its high numerical precision and ability to work with complex ANOVA designs could facilitate researchers' attention to issues of statistical power, Bayesian analysis, and the use of confidence intervals for data interpretation. MorePower 6.0 is available at https://wiki.usask.ca/pages/viewpageattachments.action?pageId=420413544 .

  19. Application of Bayesian graphs to SN Ia data analysis and compression

    CERN Document Server

    Ma, Con; Bassett, Bruce A

    2016-01-01

    Bayesian graphical models are an efficient tool for modelling complex data and derive self-consistent expressions of the posterior distribution of model parameters. We apply Bayesian graphs to perform statistical analyses of Type Ia supernova (SN Ia) luminosity distance measurements from the Joint Light-curve Analysis (JLA) dataset (Betoule et al. 2014, arXiv:1401.4064). In contrast to the $\\chi^2$ approach used in previous studies, the Bayesian inference allows us to fully account for the standard-candle parameter dependence of the data covariance matrix. Comparing with $\\chi^2$ analysis results we find a systematic offset of the marginal model parameter bounds. We demonstrate that the bias is statistically significant in the case of the SN Ia standardization parameters with a maximal $6\\sigma$ shift of the SN light-curve colour correction. In addition, we find that the evidence for a host galaxy correction is now only $2.4\\sigma$. Systematic offsets on the cosmological parameters remain small, but may incre...

  20. bspmma: An R Package for Bayesian Semiparametric Models for Meta-Analysis

    Directory of Open Access Journals (Sweden)

    Deborah Burr

    2012-07-01

    Full Text Available We introduce an R package, bspmma, which implements a Dirichlet-based random effects model specific to meta-analysis. In meta-analysis, when combining effect estimates from several heterogeneous studies, it is common to use a random-effects model. The usual frequentist or Bayesian models specify a normal distribution for the true effects. However, in many situations, the effect distribution is not normal, e.g., it can have thick tails, be skewed, or be multi-modal. A Bayesian nonparametric model based on mixtures of Dirichlet process priors has been proposed in the literature, for the purpose of accommodating the non-normality. We review this model and then describe a competitor, a semiparametric version which has the feature that it allows for a well-defined centrality parameter convenient for determining whether the overall effect is significant. This second Bayesian model is based on a different version of the Dirichlet process prior, and we call it the "conditional Dirichlet model". The package contains functions to carry out analyses based on either the ordinary or the conditional Dirichlet model, functions for calculating certain Bayes factors that provide a check on the appropriateness of the conditional Dirichlet model, and functions that enable an empirical Bayes selection of the precision parameter of the Dirichlet process. We illustrate the use of the package on two examples, and give an interpretation of the results in these two different scenarios.

  1. cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

    Directory of Open Access Journals (Sweden)

    Adelino R. Ferreira da Silva

    2011-10-01

    Full Text Available Graphic processing units (GPUs are rapidly gaining maturity as powerful general parallel computing devices. A key feature in the development of modern GPUs has been the advancement of the programming model and programming tools. Compute Unified Device Architecture (CUDA is a software platform for massively parallel high-performance computing on Nvidia many-core GPUs. In functional magnetic resonance imaging (fMRI, the volume of the data to be processed, and the type of statistical analysis to perform call for high-performance computing strategies. In this work, we present the main features of the R-CUDA package cudaBayesreg which implements in CUDA the core of a Bayesian multilevel model for the analysis of brain fMRI data. The statistical model implements a Gibbs sampler for multilevel/hierarchical linear models with a normal prior. The main contribution for the increased performance comes from the use of separate threads for fitting the linear regression model at each voxel in parallel. The R-CUDA implementation of the Bayesian model proposed here has been able to reduce significantly the run-time processing of Markov chain Monte Carlo (MCMC simulations used in Bayesian fMRI data analyses. Presently, cudaBayesreg is only configured for Linux systems with Nvidia CUDA support.

  2. Time-varying nonstationary multivariate risk analysis using a dynamic Bayesian copula

    Science.gov (United States)

    Sarhadi, Ali; Burn, Donald H.; Concepción Ausín, María.; Wiper, Michael P.

    2016-03-01

    A time-varying risk analysis is proposed for an adaptive design framework in nonstationary conditions arising from climate change. A Bayesian, dynamic conditional copula is developed for modeling the time-varying dependence structure between mixed continuous and discrete multiattributes of multidimensional hydrometeorological phenomena. Joint Bayesian inference is carried out to fit the marginals and copula in an illustrative example using an adaptive, Gibbs Markov Chain Monte Carlo (MCMC) sampler. Posterior mean estimates and credible intervals are provided for the model parameters and the Deviance Information Criterion (DIC) is used to select the model that best captures different forms of nonstationarity over time. This study also introduces a fully Bayesian, time-varying joint return period for multivariate time-dependent risk analysis in nonstationary environments. The results demonstrate that the nature and the risk of extreme-climate multidimensional processes are changed over time under the impact of climate change, and accordingly the long-term decision making strategies should be updated based on the anomalies of the nonstationary environment.

  3. caBIG™ VISDA: Modeling, visualization, and discovery for cluster analysis of genomic data

    Directory of Open Access Journals (Sweden)

    Xuan Jianhua

    2008-09-01

    Full Text Available Abstract Background The main limitations of most existing clustering methods used in genomic data analysis include heuristic or random algorithm initialization, the potential of finding poor local optima, the lack of cluster number detection, an inability to incorporate prior/expert knowledge, black-box and non-adaptive designs, in addition to the curse of dimensionality and the discernment of uninformative, uninteresting cluster structure associated with confounding variables. Results In an effort to partially address these limitations, we develop the VIsual Statistical Data Analyzer (VISDA for cluster modeling, visualization, and discovery in genomic data. VISDA performs progressive, coarse-to-fine (divisive hierarchical clustering and visualization, supported by hierarchical mixture modeling, supervised/unsupervised informative gene selection, supervised/unsupervised data visualization, and user/prior knowledge guidance, to discover hidden clusters within complex, high-dimensional genomic data. The hierarchical visualization and clustering scheme of VISDA uses multiple local visualization subspaces (one at each node of the hierarchy and consequent subspace data modeling to reveal both global and local cluster structures in a "divide and conquer" scenario. Multiple projection methods, each sensitive to a distinct type of clustering tendency, are used for data visualization, which increases the likelihood that cluster structures of interest are revealed. Initialization of the full dimensional model is based on first learning models with user/prior knowledge guidance on data projected into the low-dimensional visualization spaces. Model order selection for the high dimensional data is accomplished by Bayesian theoretic criteria and user justification applied via the hierarchy of low-dimensional visualization subspaces. Based on its complementary building blocks and flexible functionality, VISDA is generally applicable for gene clustering, sample

  4. SOMBI: Bayesian identification of parameter relations in unstructured cosmological data

    CERN Document Server

    Frank, Philipp; Enßlin, Torsten A

    2016-01-01

    This work describes the implementation and application of a correlation determination method based on Self Organizing Maps and Bayesian Inference (SOMBI). SOMBI aims to automatically identify relations between different observed parameters in unstructured cosmological or astrophysical surveys by automatically identifying data clusters in high-dimensional datasets via the Self Organizing Map neural network algorithm. Parameter relations are then revealed by means of a Bayesian inference within respective identified data clusters. Specifically such relations are assumed to be parametrized as a polynomial of unknown order. The Bayesian approach results in a posterior probability distribution function for respective polynomial coefficients. To decide which polynomial order suffices to describe correlation structures in data, we include a method for model selection, the Bayesian Information Criterion, to the analysis. The performance of the SOMBI algorithm is tested with mock data. As illustration we also provide ...

  5. The GRB Golentskii Correlation as a Cosmological Probe via Bayesian Analysis

    Science.gov (United States)

    Burgess, Michael

    2016-07-01

    Gamma-ray bursts (GRBs) are characterized by a strong correlation between the instantaneous luminosity and the spectral peak energy within a burst. This correlation, which is known as the hardness-intensity correlation or the Golenetskii correlation, not only holds important clues to the physics of GRBs but is thought to have the potential to determine redshifts of bursts. In this paper, I use a hierarchical Bayesian model to study the universality of the rest-frame Golenetskii correlation and in particular I assess its use as a redshift estimator for GRBs. I find that, using a power-law prescription of the correlation, the power-law indices cluster near a common value, but have a broader variance than previously reported ( 1-2). Furthermore, I find evidence that there is spread in intrinsic rest-frame correlation normalizations for the GRBs in our sample ( 10 ^{51}-10 ^{53} erg/s). This points towards variable physical settings of the emission (magnetic field strength, number of emitting electrons, photospheric radius, viewing angle, etc.). Subsequently, these results eliminate the Golenetskii correlation as a useful tool for redshift determination and hence a cosmological probe. Nevertheless, the Bayesian method introduced in this paper allows for a better determination of the rest frame properties of the correlation, which in turn allows for more stringent limitations for physical models of the emission to be set.

  6. Bayesian analysis of longitudinal Johne's disease diagnostic data without a gold standard test.

    Science.gov (United States)

    Wang, C; Turnbull, B W; Nielsen, S S; Gröhn, Y T

    2011-05-01

    A Bayesian methodology was developed based on a latent change-point model to evaluate the performance of milk ELISA and fecal culture tests for longitudinal Johne's disease diagnostic data. The situation of no perfect reference test was considered; that is, no "gold standard." A change-point process with a Weibull survival hazard function was used to model the progression of the hidden disease status. The model adjusted for the fixed effects of covariate variables and random effects of subject on the diagnostic testing procedure. Markov chain Monte Carlo methods were used to compute the posterior estimates of the model parameters that provide the basis for inference concerning the accuracy of the diagnostic procedure. Based on the Bayesian approach, the posterior probability distribution of the change-point onset time can be obtained and used as a criterion for infection diagnosis. An application is presented to an analysis of ELISA and fecal culture test outcomes in the diagnostic testing of paratuberculosis (Johne's disease) for a Danish longitudinal study from January 2000 to March 2003. The posterior probability criterion based on the Bayesian model with 4 repeated observations has an area under the receiver operating characteristic curve (AUC) of 0.984, and is superior to the raw ELISA (AUC=0.911) and fecal culture (sensitivity=0.358, specificity=0.980) tests for Johne's disease diagnosis. PMID:21524521

  7. Speculation and volatility spillover in the crude oil and agricultural commodity markets: A Bayesian analysis

    International Nuclear Information System (INIS)

    This paper assesses factors that potentially influence the volatility of crude oil prices and the possible linkage between this volatility and agricultural commodity markets. Stochastic volatility models are applied to weekly crude oil, corn, and wheat futures prices from November 1998 to January 2009. Model parameters are estimated using Bayesian Markov Chain Monte Carlo methods. Speculation, scalping, and petroleum inventories are found to be important in explaining the volatility of crude oil prices. Several properties of crude oil price dynamics are established, including mean-reversion, an asymmetry between returns and volatility, volatility clustering, and infrequent compound jumps. We find evidence of volatility spillover among crude oil, corn, and wheat markets after the fall of 2006. This can be largely explained by tightened interdependence between crude oil and these commodity markets induced by ethanol production.

  8. Speculation and volatility spillover in the crude oil and agricultural commodity markets: A Bayesian analysis

    Energy Technology Data Exchange (ETDEWEB)

    Du Xiaodong, E-mail: xdu23@wisc.ed [Department of Agricultural and Applied Economics, University of Wisconsin-Madison, WI (United States); Yu, Cindy L., E-mail: cindyyu@iastate.ed [Department of Statistics, Iowa State University, IA (United States); Hayes, Dermot J., E-mail: dhayes@iastate.ed [Department of Economics and Department of Finance, Iowa State University, IA (United States)

    2011-05-15

    This paper assesses factors that potentially influence the volatility of crude oil prices and the possible linkage between this volatility and agricultural commodity markets. Stochastic volatility models are applied to weekly crude oil, corn, and wheat futures prices from November 1998 to January 2009. Model parameters are estimated using Bayesian Markov Chain Monte Carlo methods. Speculation, scalping, and petroleum inventories are found to be important in explaining the volatility of crude oil prices. Several properties of crude oil price dynamics are established, including mean-reversion, an asymmetry between returns and volatility, volatility clustering, and infrequent compound jumps. We find evidence of volatility spillover among crude oil, corn, and wheat markets after the fall of 2006. This can be largely explained by tightened interdependence between crude oil and these commodity markets induced by ethanol production.

  9. An intake prior for the Bayesian analysis of plutonium and uranium exposures in an epidemiology study.

    Science.gov (United States)

    Puncher, M; Birchall, A; Bull, R K

    2014-12-01

    In Bayesian inference, the initial knowledge regarding the value of a parameter, before additional data are considered, is represented as a prior probability distribution. This paper describes the derivation of a prior distribution of intake that was used for the Bayesian analysis of plutonium and uranium worker doses in a recent epidemiology study. The chosen distribution is log-normal with a geometric standard deviation of 6 and a median value that is derived for each worker based on the duration of the work history and the number of reported acute intakes. The median value is a function of the work history and a constant related to activity in air concentration, M, which is derived separately for uranium and plutonium. The value of M is based primarily on measurements of plutonium and uranium in air derived from historical personal air sampler (PAS) data. However, there is significant uncertainty on the value of M that results from paucity of PAS data and from extrapolating these measurements to actual intakes. This paper compares posterior and prior distributions of intake and investigates the sensitivity of the Bayesian analyses to the assumed value of M. It is found that varying M by a factor of 10 results in a much smaller factor of 2 variation in mean intake and lung dose for both plutonium and uranium. It is concluded that if a log-normal distribution is considered to adequately represent worker intakes, then the Bayesian posterior distribution of dose is relatively insensitive to the value assumed of M. PMID:24191121

  10. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  11. Intelligent Pattern Mining and Data Clustering for Pattern Cluster Analysis using Cancer Data

    Directory of Open Access Journals (Sweden)

    G.Raj Kumar

    2010-12-01

    Full Text Available Data mining techniques are used for the knowledge discovery process under the large data set environment. Clustering techniques are used to group up the relevant data sets. Hierarchical and partitioned clustering techniques are used for the clustering process. The clustering process is the complex task with high process time. The pattern extraction scheme is applied to find frequent item sets. Association rule mining techniques are applied to carry out the pattern extraction process. The pattern extraction scheme and the clustering scheme are integrated in the simultaneous pattern extraction and clustering scheme. The clustering process is improved with pattern comparison and transaction transfer process. The simultaneous clustering scheme is implemented to analyze the cancer patient diagnosis reports. The system is implemented as four major modules data set management, pattern extraction, clustering process and performance analysis. The data sets are preprocessed before the pattern extraction process. The patterns are used in the simultaneous clustering process. The performance analysis is done with the comparison of the data clustering scheme and pattern clustering schemes. The process time and memory factors are used in the performance analysis process. The cluster accuracy is represented using the fitness values. The system is enhanced with the K-means clustering algorithm.

  12. ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

    Science.gov (United States)

    Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

    2015-01-01

    Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks. PMID:26357321

  13. Microcanonical thermostatistics analysis without histograms: cumulative distribution and Bayesian approaches

    CERN Document Server

    Alves, Nelson A; Rizzi, Leandro G

    2015-01-01

    Microcanonical thermostatistics analysis has become an important tool to reveal essential aspects of phase transitions in complex systems. An efficient way to estimate the microcanonical inverse temperature $\\beta(E)$ and the microcanonical entropy $S(E)$ is achieved with the statistical temperature weighted histogram analysis method (ST-WHAM). The strength of this method lies on its flexibility, as it can be used to analyse data produced by algorithms with generalised sampling weights. However, for any sampling weight, ST-WHAM requires the calculation of derivatives of energy histograms $H(E)$, which leads to non-trivial and tedious binning tasks for models with continuous energy spectrum such as those for biomolecular and colloidal systems. Here, we discuss two alternative methods that avoid the need for such energy binning to obtain continuous estimates for $H(E)$ in order to evaluate $\\beta(E)$ by using ST-WHAM: (i) a series expansion to estimate probability densities from the empirical cumulative distrib...

  14. Mapping Cigarettes Similarities using Cluster Analysis Methods

    Directory of Open Access Journals (Sweden)

    Lorentz Jäntschi

    2007-09-01

    Full Text Available The aim of the research was to investigate the relationship and/or occurrences in and between chemical composition information (tar, nicotine, carbon monoxide, market information (brand, manufacturer, price, and public health information (class, health warning as well as clustering of a sample of cigarette data. A number of thirty cigarette brands have been analyzed. Six categorical (cigarette brand, manufacturer, health warnings, class and four continuous (tar, nicotine, carbon monoxide concentrations and package price variables were collected for investigation of chemical composition, market information and public health information. Multiple linear regression and two clusterization techniques have been applied. The study revealed interesting remarks. The carbon monoxide concentration proved to be linked with tar and nicotine concentration. The applied clusterization methods identified groups of cigarette brands that shown similar characteristics. The tar and carbon monoxide concentrations were the main criteria used in clusterization. An analysis of a largest sample could reveal more relevant and useful information regarding the similarities between cigarette brands.

  15. Bayesian Analysis and Segmentation of Multichannel Image Sequences

    Science.gov (United States)

    Chang, Michael Ming Hsin

    This thesis is concerned with the segmentation and analysis of multichannel image sequence data. In particular, we use maximum a posteriori probability (MAP) criterion and Gibbs random fields (GRF) to formulate the problems. We start by reviewing the significance of MAP estimation with GRF priors and study the feasibility of various optimization methods for implementing the MAP estimator. We proceed to investigate three areas where image data and parameter estimates are present in multichannels, multiframes, and interrelated in complicated manners. These areas of study include color image segmentation, multislice MR image segmentation, and optical flow estimation and segmentation in multiframe temporal sequences. Besides developing novel algorithms in each of these areas, we demonstrate how to exploit the potential of MAP estimation and GRFs, and we propose practical and efficient implementations. Illustrative examples and relevant experimental results are included.

  16. Brane inflation and the WMAP data: a Bayesian analysis

    CERN Document Server

    Lorenz, Larissa; Ringeval, Christophe

    2007-01-01

    The Wilkinson Microwave Anisotropy Probe (WMAP) constraints on string inspired ``brane inflation'' are investigated. Here, the inflaton field is interpreted as the distance between two branes placed in a flux-enriched background geometry and has a Dirac-Born-Infeld (DBI) kinetic term. Our method relies on an exact numerical integration of the inflationary power spectra coupled to a Markov-Chain Monte-Carlo exploration of the parameter space. This analysis is valid for any perturbative value of the string coupling constant and of the string length, and includes a phenomenological modelling of the reheating era to describe the post-inflationary evolution. It is found that the data favour a scenario where inflation stops by violation of the slow-roll conditions well before brane annihilation, rather than by tachyonic instability. Concerning the background geometry, it is established that log(v) > -10 at 95% confidence level (CL), where "v" is the dimensionless ratio of the five-dimensional sub-manifold at the ba...

  17. OVERALL SENSITIVITY ANALYSIS UTILIZING BAYESIAN NETWORK FOR THE QUESTIONNAIRE INVESTIGATION ON SNS

    Directory of Open Access Journals (Sweden)

    Tsuyoshi Aburai

    2013-11-01

    Full Text Available Social Networking Service (SNS is prevailing rapidly in Japan in recent years. The most popular ones are Facebook, mixi, and Twitter, which are utilized in various fields of life together with the convenient tool such as smart-phone. In this work, a questionnaire investigation is carried out in order to clarify the current usage condition, issues and desired functions. More than 1,000 samples are gathered. Bayesian network is utilized for this analysis. Sensitivity analysis is carried out by setting evidence to all items. This enables overall analysis for each item. We analyzed them by sensitivity analysis and some useful results were obtained. We have presented the paper concerning this. But the volume becomes too large, therefore we have split them and this paper shows the latter half of the investigation result by setting evidence to Bayesian Network parameters. Differences in usage objectives and SNS sites are made clear by the attributes and preference of SNS users. They can be utilized effectively for marketing by clarifying the target customer through the sensitivity analysis.

  18. Bayesian analysis of the genetic structure of a Brazilian popcorn germplasm using data from simple sequence repeats (SSR)

    OpenAIRE

    Javier Saavedra; Tereza Aparecida Silva; Freddy Mora; Carlos Alberto Scapim

    2013-01-01

    Several studies have confirmed that popcorn (Zea mays L. var. everta) has a narrow genetic basis, which affects the quality of breeding programs. In this study, we present a genetic characterization of 420 individuals representing 28 popcorn populations from Brazilian germplasm banks. All individuals were genotyped using 11 microsatellite markers from the Maize Genetics and Genomics Database. A Bayesian clustering approach via Monte Carlo Markov chains was performed to examine the genetic dif...

  19. Chaotic map clustering algorithm for EEG analysis

    Science.gov (United States)

    Bellotti, R.; De Carlo, F.; Stramaglia, S.

    2004-03-01

    The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.

  20. BayesLCA: An R Package for Bayesian Latent Class Analysis

    Directory of Open Access Journals (Sweden)

    Arthur White

    2014-11-01

    Full Text Available The BayesLCA package for R provides tools for performing latent class analysis within a Bayesian setting. Three methods for fitting the model are provided, incorporating an expectation-maximization algorithm, Gibbs sampling and a variational Bayes approximation. The article briefly outlines the methodology behind each of these techniques and discusses some of the technical difficulties associated with them. Methods to remedy these problems are also described. Visualization methods for each of these techniques are included, as well as criteria to aid model selection.

  1. Crystalline nucleation in undercooled liquids: A Bayesian data-analysis approach for a nonhomogeneous Poisson process

    Science.gov (United States)

    Filipponi, A.; Di Cicco, A.; Principi, E.

    2012-12-01

    A Bayesian data-analysis approach to data sets of maximum undercooling temperatures recorded in repeated melting-cooling cycles of high-purity samples is proposed. The crystallization phenomenon is described in terms of a nonhomogeneous Poisson process driven by a temperature-dependent sample nucleation rate J(T). The method was extensively tested by computer simulations and applied to real data for undercooled liquid Ge. It proved to be particularly useful in the case of scarce data sets where the usage of binned data would degrade the available experimental information.

  2. Data Clustering Analysis Based on Wavelet Feature Extraction

    Institute of Scientific and Technical Information of China (English)

    QIANYuntao; TANGYuanyan

    2003-01-01

    A novel wavelet-based data clustering method is presented in this paper, which includes wavelet feature extraction and cluster growing algorithm. Wavelet transform can provide rich and diversified information for representing the global and local inherent structures of dataset. therefore, it is a very powerful tool for clustering feature extraction. As an unsupervised classification, the target of clustering analysis is dependent on the specific clustering criteria. Several criteria that should be con-sidered for general-purpose clustering algorithm are pro-posed. And the cluster growing algorithm is also con-structed to connect clustering criteria with wavelet fea-tures. Compared with other popular clustering methods,our clustering approach provides multi-resolution cluster-ing results,needs few prior parameters, correctly deals with irregularly shaped clusters, and is insensitive to noises and outliers. As this wavelet-based clustering method isaimed at solving two-dimensional data clustering prob-lem, for high-dimensional datasets, self-organizing mapand U-matrlx method are applied to transform them intotwo-dimensional Euclidean space, so that high-dimensional data clustering analysis,Results on some sim-ulated data and standard test data are reported to illus-trate the power of our method.

  3. Fuzzy Bayesian Network-Bow-Tie Analysis of Gas Leakage during Biomass Gasification

    Science.gov (United States)

    Yan, Fang; Xu, Kaili; Yao, Xiwen; Li, Yang

    2016-01-01

    Biomass gasification technology has been rapidly developed recently. But fire and poisoning accidents caused by gas leakage restrict the development and promotion of biomass gasification. Therefore, probabilistic safety assessment (PSA) is necessary for biomass gasification system. Subsequently, Bayesian network-bow-tie (BN-bow-tie) analysis was proposed by mapping bow-tie analysis into Bayesian network (BN). Causes of gas leakage and the accidents triggered by gas leakage can be obtained by bow-tie analysis, and BN was used to confirm the critical nodes of accidents by introducing corresponding three importance measures. Meanwhile, certain occurrence probability of failure was needed in PSA. In view of the insufficient failure data of biomass gasification, the occurrence probability of failure which cannot be obtained from standard reliability data sources was confirmed by fuzzy methods based on expert judgment. An improved approach considered expert weighting to aggregate fuzzy numbers included triangular and trapezoidal numbers was proposed, and the occurrence probability of failure was obtained. Finally, safety measures were indicated based on the obtained critical nodes. The theoretical occurrence probabilities in one year of gas leakage and the accidents caused by it were reduced to 1/10.3 of the original values by these safety measures. PMID:27463975

  4. Fuzzy Bayesian Network-Bow-Tie Analysis of Gas Leakage during Biomass Gasification.

    Directory of Open Access Journals (Sweden)

    Fang Yan

    Full Text Available Biomass gasification technology has been rapidly developed recently. But fire and poisoning accidents caused by gas leakage restrict the development and promotion of biomass gasification. Therefore, probabilistic safety assessment (PSA is necessary for biomass gasification system. Subsequently, Bayesian network-bow-tie (BN-bow-tie analysis was proposed by mapping bow-tie analysis into Bayesian network (BN. Causes of gas leakage and the accidents triggered by gas leakage can be obtained by bow-tie analysis, and BN was used to confirm the critical nodes of accidents by introducing corresponding three importance measures. Meanwhile, certain occurrence probability of failure was needed in PSA. In view of the insufficient failure data of biomass gasification, the occurrence probability of failure which cannot be obtained from standard reliability data sources was confirmed by fuzzy methods based on expert judgment. An improved approach considered expert weighting to aggregate fuzzy numbers included triangular and trapezoidal numbers was proposed, and the occurrence probability of failure was obtained. Finally, safety measures were indicated based on the obtained critical nodes. The theoretical occurrence probabilities in one year of gas leakage and the accidents caused by it were reduced to 1/10.3 of the original values by these safety measures.

  5. Fuzzy Bayesian Network-Bow-Tie Analysis of Gas Leakage during Biomass Gasification.

    Science.gov (United States)

    Yan, Fang; Xu, Kaili; Yao, Xiwen; Li, Yang

    2016-01-01

    Biomass gasification technology has been rapidly developed recently. But fire and poisoning accidents caused by gas leakage restrict the development and promotion of biomass gasification. Therefore, probabilistic safety assessment (PSA) is necessary for biomass gasification system. Subsequently, Bayesian network-bow-tie (BN-bow-tie) analysis was proposed by mapping bow-tie analysis into Bayesian network (BN). Causes of gas leakage and the accidents triggered by gas leakage can be obtained by bow-tie analysis, and BN was used to confirm the critical nodes of accidents by introducing corresponding three importance measures. Meanwhile, certain occurrence probability of failure was needed in PSA. In view of the insufficient failure data of biomass gasification, the occurrence probability of failure which cannot be obtained from standard reliability data sources was confirmed by fuzzy methods based on expert judgment. An improved approach considered expert weighting to aggregate fuzzy numbers included triangular and trapezoidal numbers was proposed, and the occurrence probability of failure was obtained. Finally, safety measures were indicated based on the obtained critical nodes. The theoretical occurrence probabilities in one year of gas leakage and the accidents caused by it were reduced to 1/10.3 of the original values by these safety measures. PMID:27463975

  6. Bayesian analysis of the genetic structure of a Brazilian popcorn germplasm using data from simple sequence repeats (SSR

    Directory of Open Access Journals (Sweden)

    Javier Saavedra

    2013-06-01

    Full Text Available Several studies have confirmed that popcorn (Zea mays L. var. everta has a narrow genetic basis, which affects the quality of breeding programs. In this study, we present a genetic characterization of 420 individuals representing 28 popcorn populations from Brazilian germplasm banks. All individuals were genotyped using 11 microsatellite markers from the Maize Genetics and Genomics Database. A Bayesian clustering approach via Monte Carlo Markov chains was performed to examine the genetic differentiation (Fst values among different clusters. The results indicate the existence of three distinct and strongly differentiated genetic groups (K = 3. Moreover, the Fst values (calculated among clusters were significantly different according to Bayesian credible intervals of the posterior Fst values. The estimates of posterior mean (and 95% credible interval of the Fst values were 0.086 (0.04-0.14, 0.49 (0.376-0.624 and 0.243 (0.173-0.324 for clusters 1, 2, and 3, respectively. Clusters 1 and 3 showed a high level of genetic diversity in terms of expected heterozygosity and number of alleles, indicating their potential for broadening the genetic basis of popcorn in future breeding programs. Additionally, the 11 microsatellites were informative and presented a suitable number of alleles for determining parameters related to genetic diversity and genetic structure. This information is important for increasing our knowledge regarding genetic relationships, for the identification of heterotic groups, and for developing strategies of gene introgression in popcorn.

  7. Constructing storyboards based on hierarchical clustering analysis

    Science.gov (United States)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  8. A joint analysis of AMI and CARMA observations of the recently discovered SZ galaxy cluster system AMI-CL J0300+2613

    Science.gov (United States)

    AMI Consortium; Shimwell, Timothy W.; Carpenter, John M.; Feroz, Farhan; Grainge, Keith J. B.; Hobson, Michael P.; Hurley-Walker, Natasha; Lasenby, Anthony N.; Olamaie, Malak; Perrott, Yvette C.; Pooley, Guy G.; Rodríguez-Gonzálvez, Carmen; Rumsey, Clare; Saunders, Richard D. E.; Schammel, Michel P.; Scott, Paul F.; Titterington, David J.; Waldram, Elizabeth M.

    2013-08-01

    We present Combined Array for Research in Millimeter-wave Astronomy (CARMA) observations of a massive galaxy cluster discovered in the Arcminute Microkelvin Imager (AMI) blind Sunyaev-Zel'dovich (SZ) survey. Without knowledge of the cluster redshift a Bayesian analysis of the AMI, CARMA and joint AMI and CARMA uv-data is used to quantify the detection significance and parametrize both the physical and observational properties of the cluster whilst accounting for the statistics of primary cosmic microwave background anisotropies, receiver noise and radio sources. The joint analysis of the AMI and CARMA uv-data was performed with two parametric physical cluster models: the β-model; and the model described in Olamaie et al. with the pressure profile fixed according to Arnaud et al. The cluster mass derived from these different models is comparable but our Bayesian evidences indicate a preference for the β-profile which we, therefore, use throughout our analysis. From the CARMA data alone we obtain a formal Bayesian probability of detection ratio of 12.8:1 when assuming that a cluster exists within our search area; alternatively assuming that Jenkins et al. accurately predict the number of clusters as a function of mass and redshift, the formal Bayesian probability of detection is 0.29:1. From the Bayesian analysis of the AMI or AMI and CARMA data the probability of detection ratio exceeds 4.5 × 103:1. Performing a joint analysis of the AMI and CARMA data with a physical cluster model we derive the total mass internal to r200 as MT, 200 = 4.1 ± 1.1 × 1014 M⊙. Using a phenomenological β-model to quantify the temperature decrement as a function of angular distance we find a central SZ temperature decrement of 170 ± 24 μK in the AMI and CARMA data. The SZ decrement in the CARMA data is weaker than expected and we speculate that this is a consequence of the cluster morphology. In a forthcoming study the pipeline that we have developed for the analyses of these

  9. Estimating the number of clusters via system evolution for cluster analysis of gene expression data.

    Science.gov (United States)

    Wang, Kaijun; Zheng, Jie; Zhang, Junying; Dong, Jiyang

    2009-09-01

    The estimation of the number of clusters (NC) is one of crucial problems in the cluster analysis of gene expression data. Most approaches available give their answers without the intuitive information about separable degrees between clusters. However, this information is useful for understanding cluster structures. To provide this information, we propose system evolution (SE) method to estimate NC based on partitioning around medoids (PAM) clustering algorithm. SE analyzes cluster structures of a dataset from the viewpoint of a pseudothermodynamics system. The system will go to its stable equilibrium state, at which the optimal NC is found, via its partitioning process and merging process. The experimental results on simulated and real gene expression data demonstrate that the SE works well on the data with well-separated clusters and the one with slightly overlapping clusters. PMID:19527960

  10. Bayesian analysis of esophageal cancer mortality in the presence of misclassification

    Directory of Open Access Journals (Sweden)

    Mohamad Amin Pourhoseingholi

    2011-12-01

    Full Text Available

    Background: Esophageal cancer (EC is one of the most common cancers worldwide. Mortality is a familiar projection that addresses the burden of cancers. With regards to cancer mortality, data are important and used to monitor the effects of screening programs, earlier diagnosis and other prognostic factors. But according to the Iranian death registry, about 20% of death statistics are still recorded in misclassified categories. The aim of this study is to estimate EC mortality in the Iranian population, using a Bayesian approach in order to revise this misclassification.

    Methods: We analyzed National death Statistics reported by the Iranian Ministry of Health and Medical Education from 1995 to 2004. ECs [ICD-9; C15] were expressed as annual mortality rates/100,000, overall, by sex, by age group and age standardized rate (ASR. The Bayesian approach was used to correct and account for misclassification effects in Poisson count regression, with a beta prior employed to estimate the mortality rate of EC in age and sex groups.

    Results: According to the Bayesian analysis, there were between 20 to 30 percent underreported deaths in mortality records related to EC, and the rate of mortality from EC has increased through recent years.

    Conclusions: Our findings suggested a substantial undercount of EC mortality in the Iranian population. So
    policy makers who determine research and treatment priorities based on reported death rates should notice of this underreported data.

  11. BayGO: Bayesian analysis of ontology term enrichment in microarray data

    Directory of Open Access Journals (Sweden)

    Gomes Suely L

    2006-02-01

    Full Text Available Abstract Background The search for enriched (aka over-represented or enhanced ontology terms in a list of genes obtained from microarray experiments is becoming a standard procedure for a system-level analysis. This procedure tries to summarize the information focussing on classification designs such as Gene Ontology, KEGG pathways, and so on, instead of focussing on individual genes. Although it is well known in statistics that association and significance are distinct concepts, only the former approach has been used to deal with the ontology term enrichment problem. Results BayGO implements a Bayesian approach to search for enriched terms from microarray data. The R source-code is freely available at http://blasto.iq.usp.br/~tkoide/BayGO in three versions: Linux, which can be easily incorporated into pre-existent pipelines; Windows, to be controlled interactively; and as a web-tool. The software was validated using a bacterial heat shock response dataset, since this stress triggers known system-level responses. Conclusion The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on. Moreover, BayGO allows one to measure the statistical association between generic ontology terms and differential expression, instead of working only with the common significance analysis.

  12. Application of evidence theory in information fusion of multiple sources in bayesian analysis

    Institute of Scientific and Technical Information of China (English)

    周忠宝; 蒋平; 武小悦

    2004-01-01

    How to obtain proper prior distribution is one of the most critical problems in Bayesian analysis. In many practical cases, the prior information often comes from different sources, and the prior distribution form could be easily known in some certain way while the parameters are hard to determine. In this paper, based on the evidence theory, a new method is presented to fuse the information of multiple sources and determine the parameters of the prior distribution when the form is known. By taking the prior distributions which result from the information of multiple sources and converting them into corresponding mass functions which can be combined by Dempster-Shafer (D-S) method, we get the combined mass function and the representative points of the prior distribution. These points are used to fit with the given distribution form to determine the parameters of the prior distrbution. And then the fused prior distribution is obtained and Bayesian analysis can be performed.How to convert the prior distributions into mass functions properly and get the representative points of the fused prior distribution is the central question we address in this paper. The simulation example shows that the proposed method is effective.

  13. Integrative Bayesian analysis of neuroimaging-genetic data with application to cocaine dependence.

    Science.gov (United States)

    Azadeh, Shabnam; Hobbs, Brian P; Ma, Liangsuo; Nielsen, David A; Moeller, F Gerard; Baladandayuthapani, Veerabhadran

    2016-01-15

    Neuroimaging and genetic studies provide distinct and complementary information about the structural and biological aspects of a disease. Integrating the two sources of data facilitates the investigation of the links between genetic variability and brain mechanisms among different individuals for various medical disorders. This article presents a general statistical framework for integrative Bayesian analysis of neuroimaging-genetic (iBANG) data, which is motivated by a neuroimaging-genetic study in cocaine dependence. Statistical inference necessitated the integration of spatially dependent voxel-level measurements with various patient-level genetic and demographic characteristics under an appropriate probability model to account for the multiple inherent sources of variation. Our framework uses Bayesian model averaging to integrate genetic information into the analysis of voxel-wise neuroimaging data, accounting for spatial correlations in the voxels. Using multiplicity controls based on the false discovery rate, we delineate voxels associated with genetic and demographic features that may impact diffusion as measured by fractional anisotropy (FA) obtained from DTI images. We demonstrate the benefits of accounting for model uncertainties in both model fit and prediction. Our results suggest that cocaine consumption is associated with FA reduction in most white matter regions of interest in the brain. Additionally, gene polymorphisms associated with GABAergic, serotonergic and dopaminergic neurotransmitters and receptors were associated with FA. PMID:26484829

  14. Risk analysis of emergent water pollution accidents based on a Bayesian Network.

    Science.gov (United States)

    Tang, Caihong; Yi, Yujun; Yang, Zhifeng; Sun, Jie

    2016-01-01

    To guarantee the security of water quality in water transfer channels, especially in open channels, analysis of potential emergent pollution sources in the water transfer process is critical. It is also indispensable for forewarnings and protection from emergent pollution accidents. Bridges above open channels with large amounts of truck traffic are the main locations where emergent accidents could occur. A Bayesian Network model, which consists of six root nodes and three middle layer nodes, was developed in this paper, and was employed to identify the possibility of potential pollution risk. Dianbei Bridge is reviewed as a typical bridge on an open channel of the Middle Route of the South to North Water Transfer Project where emergent traffic accidents could occur. Risk of water pollutions caused by leakage of pollutants into water is focused in this study. The risk for potential traffic accidents at the Dianbei Bridge implies a risk for water pollution in the canal. Based on survey data, statistical analysis, and domain specialist knowledge, a Bayesian Network model was established. The human factor of emergent accidents has been considered in this model. Additionally, this model has been employed to describe the probability of accidents and the risk level. The sensitive reasons for pollution accidents have been deduced. The case has also been simulated that sensitive factors are in a state of most likely to lead to accidents.

  15. Bayesian flux balance analysis applied to a skeletal muscle metabolic model.

    Science.gov (United States)

    Heino, Jenni; Tunyan, Knarik; Calvetti, Daniela; Somersalo, Erkki

    2007-09-01

    In this article, the steady state condition for the multi-compartment models for cellular metabolism is considered. The problem is to estimate the reaction and transport fluxes, as well as the concentrations in venous blood when the stoichiometry and bound constraints for the fluxes and the concentrations are given. The problem has been addressed previously by a number of authors, and optimization-based approaches as well as extreme pathway analysis have been proposed. These approaches are briefly discussed here. The main emphasis of this work is a Bayesian statistical approach to the flux balance analysis (FBA). We show how the bound constraints and optimality conditions such as maximizing the oxidative phosphorylation flux can be incorporated into the model in the Bayesian framework by proper construction of the prior densities. We propose an effective Markov chain Monte Carlo (MCMC) scheme to explore the posterior densities, and compare the results with those obtained via the previously studied linear programming (LP) approach. The proposed methodology, which is applied here to a two-compartment model for skeletal muscle metabolism, can be extended to more complex models.

  16. Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling

    Directory of Open Access Journals (Sweden)

    Madsen Per

    2003-03-01

    Full Text Available Abstract A fully Bayesian analysis using Gibbs sampling and data augmentation in a multivariate model of Gaussian, right censored, and grouped Gaussian traits is described. The grouped Gaussian traits are either ordered categorical traits (with more than two categories or binary traits, where the grouping is determined via thresholds on the underlying Gaussian scale, the liability scale. Allowances are made for unequal models, unknown covariance matrices and missing data. Having outlined the theory, strategies for implementation are reviewed. These include joint sampling of location parameters; efficient sampling from the fully conditional posterior distribution of augmented data, a multivariate truncated normal distribution; and sampling from the conditional inverse Wishart distribution, the fully conditional posterior distribution of the residual covariance matrix. Finally, a simulated dataset was analysed to illustrate the methodology. This paper concentrates on a model where residuals associated with liabilities of the binary traits are assumed to be independent. A Bayesian analysis using Gibbs sampling is outlined for the model where this assumption is relaxed.

  17. Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling.

    Science.gov (United States)

    Korsgaard, Inge Riis; Lund, Mogens Sandø; Sorensen, Daniel; Gianola, Daniel; Madsen, Per; Jensen, Just

    2003-01-01

    A fully Bayesian analysis using Gibbs sampling and data augmentation in a multivariate model of Gaussian, right censored, and grouped Gaussian traits is described. The grouped Gaussian traits are either ordered categorical traits (with more than two categories) or binary traits, where the grouping is determined via thresholds on the underlying Gaussian scale, the liability scale. Allowances are made for unequal models, unknown covariance matrices and missing data. Having outlined the theory, strategies for implementation are reviewed. These include joint sampling of location parameters; efficient sampling from the fully conditional posterior distribution of augmented data, a multivariate truncated normal distribution; and sampling from the conditional inverse Wishart distribution, the fully conditional posterior distribution of the residual covariance matrix. Finally, a simulated dataset was analysed to illustrate the methodology. This paper concentrates on a model where residuals associated with liabilities of the binary traits are assumed to be independent. A Bayesian analysis using Gibbs sampling is outlined for the model where this assumption is relaxed.

  18. Risk analysis of emergent water pollution accidents based on a Bayesian Network.

    Science.gov (United States)

    Tang, Caihong; Yi, Yujun; Yang, Zhifeng; Sun, Jie

    2016-01-01

    To guarantee the security of water quality in water transfer channels, especially in open channels, analysis of potential emergent pollution sources in the water transfer process is critical. It is also indispensable for forewarnings and protection from emergent pollution accidents. Bridges above open channels with large amounts of truck traffic are the main locations where emergent accidents could occur. A Bayesian Network model, which consists of six root nodes and three middle layer nodes, was developed in this paper, and was employed to identify the possibility of potential pollution risk. Dianbei Bridge is reviewed as a typical bridge on an open channel of the Middle Route of the South to North Water Transfer Project where emergent traffic accidents could occur. Risk of water pollutions caused by leakage of pollutants into water is focused in this study. The risk for potential traffic accidents at the Dianbei Bridge implies a risk for water pollution in the canal. Based on survey data, statistical analysis, and domain specialist knowledge, a Bayesian Network model was established. The human factor of emergent accidents has been considered in this model. Additionally, this model has been employed to describe the probability of accidents and the risk level. The sensitive reasons for pollution accidents have been deduced. The case has also been simulated that sensitive factors are in a state of most likely to lead to accidents. PMID:26433361

  19. Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data.

    Science.gov (United States)

    Yu, Zhiwen; Chen, Hantao; You, Jane; Liu, Jiming; Wong, Hau-San; Han, Guoqiang; Li, Le

    2015-01-01

    Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research works which perform tumor clustering, few of them considers how to incorporate fuzzy theory together with an optimization process into a consensus clustering framework to improve the performance of clustering analysis. In this paper, we first propose a random double clustering based cluster ensemble framework (RDCCE) to perform tumor clustering based on gene expression data. Specifically, RDCCE generates a set of representative features using a randomly selected clustering algorithm in the ensemble, and then assigns samples to their corresponding clusters based on the grouping results. In addition, we also introduce the random double clustering based fuzzy cluster ensemble framework (RDCFCE), which is designed to improve the performance of RDCCE by integrating the newly proposed fuzzy extension model into the ensemble framework. RDCFCE adopts the normalized cut algorithm as the consensus function to summarize the fuzzy matrices generated by the fuzzy extension models, partition the consensus matrix, and obtain the final result. Finally, adaptive RDCFCE (A-RDCFCE) is proposed to optimize RDCFCE and improve the performance of RDCFCE further by adopting a self-evolutionary process (SEPP) for the parameter set. Experiments on real cancer gene expression profiles indicate that RDCFCE and A-RDCFCE works well on these data sets, and outperform most of the state-of-the-art tumor clustering algorithms. PMID:26357330

  20. Multi-level Bayesian safety analysis with unprocessed Automatic Vehicle Identification data for an urban expressway.

    Science.gov (United States)

    Shi, Qi; Abdel-Aty, Mohamed; Yu, Rongjie

    2016-03-01

    In traffic safety studies, crash frequency modeling of total crashes is the cornerstone before proceeding to more detailed safety evaluation. The relationship between crash occurrence and factors such as traffic flow and roadway geometric characteristics has been extensively explored for a better understanding of crash mechanisms. In this study, a multi-level Bayesian framework has been developed in an effort to identify the crash contributing factors on an urban expressway in the Central Florida area. Two types of traffic data from the Automatic Vehicle Identification system, which are the processed data capped at speed limit and the unprocessed data retaining the original speed were incorporated in the analysis along with road geometric information. The model framework was proposed to account for the hierarchical data structure and the heterogeneity among the traffic and roadway geometric data. Multi-level and random parameters models were constructed and compared with the Negative Binomial model under the Bayesian inference framework. Results showed that the unprocessed traffic data was superior. Both multi-level models and random parameters models outperformed the Negative Binomial model and the models with random parameters achieved the best model fitting. The contributing factors identified imply that on the urban expressway lower speed and higher speed variation could significantly increase the crash likelihood. Other geometric factors were significant including auxiliary lanes and horizontal curvature. PMID:26722989

  1. Multi-level Bayesian safety analysis with unprocessed Automatic Vehicle Identification data for an urban expressway.

    Science.gov (United States)

    Shi, Qi; Abdel-Aty, Mohamed; Yu, Rongjie

    2016-03-01

    In traffic safety studies, crash frequency modeling of total crashes is the cornerstone before proceeding to more detailed safety evaluation. The relationship between crash occurrence and factors such as traffic flow and roadway geometric characteristics has been extensively explored for a better understanding of crash mechanisms. In this study, a multi-level Bayesian framework has been developed in an effort to identify the crash contributing factors on an urban expressway in the Central Florida area. Two types of traffic data from the Automatic Vehicle Identification system, which are the processed data capped at speed limit and the unprocessed data retaining the original speed were incorporated in the analysis along with road geometric information. The model framework was proposed to account for the hierarchical data structure and the heterogeneity among the traffic and roadway geometric data. Multi-level and random parameters models were constructed and compared with the Negative Binomial model under the Bayesian inference framework. Results showed that the unprocessed traffic data was superior. Both multi-level models and random parameters models outperformed the Negative Binomial model and the models with random parameters achieved the best model fitting. The contributing factors identified imply that on the urban expressway lower speed and higher speed variation could significantly increase the crash likelihood. Other geometric factors were significant including auxiliary lanes and horizontal curvature.

  2. A Bayesian analysis of the 69 highest energy cosmic rays detected by the Pierre Auger Observatory

    Science.gov (United States)

    Khanin, Alexander; Mortlock, Daniel J.

    2016-08-01

    The origins of ultrahigh energy cosmic rays (UHECRs) remain an open question. Several attempts have been made to cross-correlate the arrival directions of the UHECRs with catalogues of potential sources, but no definite conclusion has been reached. We report a Bayesian analysis of the 69 events, from the Pierre Auger Observatory (PAO), that aims to determine the fraction of the UHECRs that originate from known AGNs in the Veron-Cety & Verson (VCV) catalogue, as well as AGNs detected with the Swift Burst Alert Telescope (Swift-BAT), galaxies from the 2MASS Redshift Survey (2MRS), and an additional volume-limited sample of 17 nearby AGNs. The study makes use of a multilevel Bayesian model of UHECR injection, propagation and detection. We find that for reasonable ranges of prior parameters the Bayes factors disfavour a purely isotropic model. For fiducial values of the model parameters, we report 68 per cent credible intervals for the fraction of source originating UHECRs of 0.09^{+0.05}_{-0.04}, 0.25^{+0.09}_{-0.08}, 0.24^{+0.12}_{-0.10}, and 0.08^{+0.04}_{-0.03} for the VCV, Swift-BAT and 2MRS catalogues, and the sample of 17 AGNs, respectively.

  3. Critically evaluating the theory and performance of Bayesian analysis of macroevolutionary mixtures.

    Science.gov (United States)

    Moore, Brian R; Höhna, Sebastian; May, Michael R; Rannala, Bruce; Huelsenbeck, John P

    2016-08-23

    Bayesian analysis of macroevolutionary mixtures (BAMM) has recently taken the study of lineage diversification by storm. BAMM estimates the diversification-rate parameters (speciation and extinction) for every branch of a study phylogeny and infers the number and location of diversification-rate shifts across branches of a tree. Our evaluation of BAMM reveals two major theoretical errors: (i) the likelihood function (which estimates the model parameters from the data) is incorrect, and (ii) the compound Poisson process prior model (which describes the prior distribution of diversification-rate shifts across branches) is incoherent. Using simulation, we demonstrate that these theoretical issues cause statistical pathologies; posterior estimates of the number of diversification-rate shifts are strongly influenced by the assumed prior, and estimates of diversification-rate parameters are unreliable. Moreover, the inability to correctly compute the likelihood or to correctly specify the prior for rate-variable trees precludes the use of Bayesian approaches for testing hypotheses regarding the number and location of diversification-rate shifts using BAMM. PMID:27512038

  4. Spatial-Temporal Epidemiology of Tuberculosis in Mainland China: An Analysis Based on Bayesian Theory

    Directory of Open Access Journals (Sweden)

    Kai Cao

    2016-05-01

    Full Text Available Objective: To explore the spatial-temporal interaction effect within a Bayesian framework and to probe the ecological influential factors for tuberculosis. Methods: Six different statistical models containing parameters of time, space, spatial-temporal interaction and their combination were constructed based on a Bayesian framework. The optimum model was selected according to the deviance information criterion (DIC value. Coefficients of climate variables were then estimated using the best fitting model. Results: The model containing spatial-temporal interaction parameter was the best fitting one, with the smallest DIC value (−4,508,660. Ecological analysis results showed the relative risks (RRs of average temperature, rainfall, wind speed, humidity, and air pressure were 1.00324 (95% CI, 1.00150–1.00550, 1.01010 (95% CI, 1.01007–1.01013, 0.83518 (95% CI, 0.93732–0.96138, 0.97496 (95% CI, 0.97181–1.01386, and 1.01007 (95% CI, 1.01003–1.01011, respectively. Conclusions: The spatial-temporal interaction was statistically meaningful and the prevalence of tuberculosis was influenced by the time and space interaction effect. Average temperature, rainfall, wind speed, and air pressure influenced tuberculosis. Average humidity had no influence on tuberculosis.

  5. A Bayesian analysis of the 69 highest energy cosmic rays detected by the Pierre Auger Observatory

    CERN Document Server

    Khanin, Alexander

    2016-01-01

    The origins of ultra-high energy cosmic rays (UHECRs) remain an open question. Several attempts have been made to cross-correlate the arrival directions of the UHECRs with catalogs of potential sources, but no definite conclusion has been reached. We report a Bayesian analysis of the 69 events from the Pierre Auger Observatory (PAO), that aims to determine the fraction of the UHECRs that originate from known AGNs in the Veron-Cety & Veron (VCV) catalog, as well as AGNs detected with the Swift Burst Alert Telescope (Swift-BAT), galaxies from the 2MASS Redshift Survey (2MRS), and an additional volume-limited sample of 17 nearby AGNs. The study makes use of a multi-level Bayesian model of UHECR injection, propagation and detection. We find that for reasonable ranges of prior parameters, the Bayes factors disfavour a purely isotropic model. For fiducial values of the model parameters, we report 68% credible intervals for the fraction of source originating UHECRs of 0.09+0.05-0.04, 0.25+0.09-0.08, 0.24+0.12-0....

  6. Bayesian methods for meta-analysis of causal relationships estimated using genetic instrumental variables

    DEFF Research Database (Denmark)

    2010-01-01

    of multiple genetic markers measured in multiple studies, based on the analysis of individual participant data. First, for a single genetic marker in one study, we show that the usual ratio of coefficients approach can be reformulated as a regression with heterogeneous error in the explanatory variable......Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context....... This can be implemented using a Bayesian approach, which is next extended to include multiple genetic markers. We then propose a hierarchical model for undertaking a meta-analysis of multiple studies, in which it is not necessary that the same genetic markers are measured in each study. This provides...

  7. Bayesian Analysis of Inertial Confinement Fusion Experiments at the National Ignition Facility

    CERN Document Server

    Gaffney, J A; Sonnad, V; Libby, S B

    2012-01-01

    We develop a Bayesian inference method that allows the efficient determination of several interesting parameters from complicated high-energy-density experiments performed on the National Ignition Facility (NIF). The model is based on an exploration of phase space using the hydrodynamic code HYDRA. A linear model is used to describe the effect of nuisance parameters on the analysis, allowing an analytic likelihood to be derived that can be determined from a small number of HYDRA runs and then used in existing advanced statistical analysis methods. This approach is applied to a recent experiment in order to determine the carbon opacity and X-ray drive; it is found that the inclusion of prior expert knowledge and fluctuations in capsule dimensions and chemical composition significantly improve the agreement between experiment and theoretical opacity calculations. A parameterisation of HYDRA results is used to test the application of both Markov chain Monte Carlo (MCMC) and genetic algorithm (GA) techniques to e...

  8. Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler.

    Science.gov (United States)

    Jin, Ick Hoon; Yuan, Ying; Liang, Faming

    2013-10-01

    Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency. PMID:24653788

  9. Bayesian analysis for exponential random graph models using the adaptive exchange sampler

    KAUST Repository

    Jin, Ick Hoon

    2013-01-01

    Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the existence of intractable normalizing constants. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the issue of intractable normalizing constants encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.

  10. Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler*

    Science.gov (United States)

    Jin, Ick Hoon; Yuan, Ying; Liang, Faming

    2014-01-01

    Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency. PMID:24653788

  11. Bayesian Analysis of $C_{x'}$ and $C_{z'}$ Double Polarizations in Kaon Photoproduction

    CERN Document Server

    Hutauruk, P T P

    2010-01-01

    Have been analyzed the latest experimental data for $\\gamma + p \\to K^{+} + \\Lambda$ reaction of $C_{x'}$ and $C_{z'}$ double polarizations. In theoretical calculation, all of these observables can be classified into four Legendre classes and represented by associated Legendre polynomial function itself \\cite{fasano92}. In this analysis we attempt to determine the best data model for both observables. We use the bayesian technique to select the best model by calculating the posterior probabilities and comparing the posterior among the models. The posteriors probabilities for each data model are computed using a Nested sampling integration. From this analysis we concluded that $C_{x'}$ and $C_{z'}$ double polarizations require two and three order of associated Legendre polynomials respectively to describe the data well. The extracted coefficients of each observable will also be presented. It shows the structure of baryon resonances qualitatively

  12. Bayesian Analysis of Hybrid EoS based on Astrophysical Observational Data

    CERN Document Server

    Alvarez-Castillo, David; Blaschke, David; Grigorian, Hovik

    2014-01-01

    The most basic features of a neutron star (NS) are its radius and mass which so far have not been well determined simultaneously for a single object. In some cases masses are precisely measured like in the case of binary systems but radii are quite uncertain. In the other hand, for isolated neutron stars some radius and mass measurements exist but lack the necessary precision to inquire into their interiors. However, the present observable data allows to make probabilistic estimation of the internal structure of the star. In this work preliminary probabilistic estimation of the super dense stellar matter equation of state using Bayesian Analysis and modelling of relativistic configurations of neutron stars is shown. This analysis is important for research of existence the quark-gluon plasma in massive (around 2 sun masses) neutron stars.

  13. Mass-radius constraints for the neutron star EoS - Bayesian analysis

    CERN Document Server

    Ayriyan, A; Blaschke, D; Grigorian, H

    2015-01-01

    We suggest a new Bayesian analysis (BA) using disjunct M-R constraints for extracting probability measures for cold, dense matter equations of state (EoS). One of the key issues of such an analysis is the question of a deconfinement transition in compact stars and whether it proceeds as a crossover or rather as a first order transition. We show by postulating results of not yet existing radius measurements for the known pulsars with a mass of $2M_\\odot$ that a radius gap of about 3 km would clearly select an EoS with a strong first order phase transition as the most probably one. This would support the existence of a critical endpoint in the QCD phase diagram under scrutiny in present and upcoming heavy-ion collision experiments.

  14. Antiplatelets versus anticoagulants for the treatment of cervical artery dissection: Bayesian meta-analysis.

    Directory of Open Access Journals (Sweden)

    Hakan Sarikaya

    Full Text Available OBJECTIVE: To compare the effects of antiplatelets and anticoagulants on stroke and death in patients with acute cervical artery dissection. DESIGN: Systematic review with Bayesian meta-analysis. DATA SOURCES: The reviewers searched MEDLINE and EMBASE from inception to November 2012, checked reference lists, and contacted authors. STUDY SELECTION: Studies were eligible if they were randomised, quasi-randomised or observational comparisons of antiplatelets and anticoagulants in patients with cervical artery dissection. DATA EXTRACTION: Data were extracted by one reviewer and checked by another. Bayesian techniques were used to appropriately account for studies with scarce event data and imbalances in the size of comparison groups. DATA SYNTHESIS: Thirty-seven studies (1991 patients were included. We found no randomised trial. The primary analysis revealed a large treatment effect in favour of antiplatelets for preventing the primary composite outcome of ischaemic stroke, intracranial haemorrhage or death within the first 3 months after treatment initiation (relative risk 0.32, 95% credibility interval 0.12 to 0.63, while the degree of between-study heterogeneity was moderate (τ(2 = 0.18. In an analysis restricted to studies of higher methodological quality, the possible advantage of antiplatelets over anticoagulants was less obvious than in the main analysis (relative risk 0.73, 95% credibility interval 0.17 to 2.30. CONCLUSION: In view of these results and the safety advantages, easier usage and lower cost of antiplatelets, we conclude that antiplatelets should be given precedence over anticoagulants as a first line treatment in patients with cervical artery dissection unless results of an adequately powered randomised trial suggest the opposite.

  15. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    CERN Document Server

    Emmons, Scott; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Blondel, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 o...

  16. The Rest-Frame Golenetskii Correlation via a Hierarchical Bayesian Analysis

    CERN Document Server

    Burgess, J Michael

    2015-01-01

    Gamma-ray bursts (GRBs) are characterised by a strong correlation between the instantaneous luminosity and the spectral peak energy within a burst. This correlation, which is known as the hardness-intensity correlation or the Golenetskii correlation, not only holds important clues to the physics of GRBs but is thought to have the potential to determine redshifts of bursts. In this paper, I use a hierarchical Bayesian model to study the universality of the rest-frame Golenetskii correlation and in particular I assess its use as a redshift estimator for GRBs. I find that, using a power-law prescription of the correlation, the power-law indices cluster near a common value, but have a broader variance than previously reported ($\\sim 1-2$). Furthermore, I find evidence that there is spread in intrinsic rest-frame correlation normalizations for the GRBs in our sample ($\\sim 10^{51}-10^{53}$ erg s$^{-1}$). This points towards variable physical settings of the emission (magnetic field strength, number of emitting ele...

  17. Cluster analysis of word frequency dynamics

    International Nuclear Information System (INIS)

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations

  18. Bayesian analysis of the effective charge from spectroscopic bremsstrahlung measurement in fusion plasmas

    Science.gov (United States)

    Krychowiak, M.; König, R.; Klinger, T.; Fischer, R.

    2004-11-01

    At the stellarator Wendelstein 7-AS (W7-AS) a spectrally resolving two channel system for the measurement of line-of-sight averaged Zeff values has been tested in preparation for its planned installation as a multichannel Zeff-profile measurement system on the stellarator Wendelstein 7-X (W7-X) which is presently under construction. The measurement is performed using the bremsstrahlung intensity in the wavelength region of ultraviolet to near infrared. The spectrally resolved measurement allows to eliminate signal contamination by line radiation. For statistical data analysis a procedure based on Bayesian probability theory has been developed. With this method it is possible to estimate the bremsstrahlung background in the measured signal and its error without the necessity to fit the spectral lines. For evaluation of the random error in Zeff the signal noise has been investigated. Furthermore, the linearity and behavior of the charge-coupled device detector at saturation has been analyzed.

  19. Bayesian analysis of flaw sizing data of the NESC III exercise

    International Nuclear Information System (INIS)

    Non-destructive inspections are performed to give confidence of the non-existence of flaws exceeding a certain safe limit in the inspected structural component. The principal uncertainties related to these inspections are the probability of not detecting an existing flaw larger than a given size, the probability of a false call, and the uncertainty related to the sizing of a flaw. Inspection reliability models aim to account for these uncertainties. This paper presents the analysis of sizing uncertainty of flaws for the results of the NESC III Round Robin Trials on defect-containing dissimilar metal welds. Model parameters are first estimated to characterize the sizing capabilities of various teams. A Bayesian updating of the flaw depth distribution is then demonstrated by combining information from measurement results and sizing performance

  20. Constraints on cosmic-ray propagation models from a global Bayesian analysis

    CERN Document Server

    Trotta, R; Moskalenko, I V; Porter, T A; de Austri, R Ruiz; Strong, A W

    2010-01-01

    Research in many areas of modern physics such as, e.g., indirect searches for dark matter and particle acceleration in SNR shocks, rely heavily on studies of cosmic rays (CRs) and associated diffuse emissions (radio, microwave, X-rays, gamma rays). While very detailed numerical models of CR propagation exist, a quantitative statistical analysis of such models has been so far hampered by the large computational effort that those models require. Although statistical analyses have been carried out before using semi-analytical models (where the computation is much faster), the evaluation of the results obtained from such models is difficult, as they necessarily suffer from many simplifying assumptions, The main objective of this paper is to present a working method for a full Bayesian parameter estimation for a numerical CR propagation model. For this study, we use the GALPROP code, the most advanced of its kind, that uses astrophysical information, nuclear and particle data as input to self-consistently predict ...

  1. Intuitive logic revisited: new data and a Bayesian mixed model meta-analysis.

    Directory of Open Access Journals (Sweden)

    Henrik Singmann

    Full Text Available Recent research on syllogistic reasoning suggests that the logical status (valid vs. invalid of even difficult syllogisms can be intuitively detected via differences in conceptual fluency between logically valid and invalid syllogisms when participants are asked to rate how much they like a conclusion following from a syllogism (Morsanyi & Handley, 2012. These claims of an intuitive logic are at odds with most theories on syllogistic reasoning which posit that detecting the logical status of difficult syllogisms requires effortful and deliberate cognitive processes. We present new data replicating the effects reported by Morsanyi and Handley, but show that this effect is eliminated when controlling for a possible confound in terms of conclusion content. Additionally, we reanalyze three studies (n = 287 without this confound with a Bayesian mixed model meta-analysis (i.e., controlling for participant and item effects which provides evidence for the null-hypothesis and against Morsanyi and Handley's claim.

  2. Bayesian semiparametric power spectral density estimation in gravitational wave data analysis

    CERN Document Server

    Edwards, Matthew C; Christensen, Nelson

    2015-01-01

    The standard noise model in gravitational wave (GW) data analysis assumes detector noise is stationary and Gaussian distributed, with a known power spectral density (PSD) that is usually estimated using clean off-source data. Real GW data often depart from these assumptions, and misspecified parametric models of the PSD could result in misleading inferences. We propose a Bayesian semiparametric approach to improve this. We use a nonparametric Bernstein polynomial prior on the PSD, with weights attained via a Dirichlet process distribution, and update this using the Whittle likelihood. Posterior samples are obtained using a Metropolis-within-Gibbs sampler. We simultaneously estimate the reconstruction parameters of a rotating core collapse supernova GW burst that has been embedded in simulated Advanced LIGO noise. We also discuss an approach to deal with non-stationary data by breaking longer data streams into smaller and locally stationary components.

  3. Bayesian semiparametric power spectral density estimation with applications in gravitational wave data analysis

    Science.gov (United States)

    Edwards, Matthew C.; Meyer, Renate; Christensen, Nelson

    2015-09-01

    The standard noise model in gravitational wave (GW) data analysis assumes detector noise is stationary and Gaussian distributed, with a known power spectral density (PSD) that is usually estimated using clean off-source data. Real GW data often depart from these assumptions, and misspecified parametric models of the PSD could result in misleading inferences. We propose a Bayesian semiparametric approach to improve this. We use a nonparametric Bernstein polynomial prior on the PSD, with weights attained via a Dirichlet process distribution, and update this using the Whittle likelihood. Posterior samples are obtained using a blocked Metropolis-within-Gibbs sampler. We simultaneously estimate the reconstruction parameters of a rotating core collapse supernova GW burst that has been embedded in simulated Advanced LIGO noise. We also discuss an approach to deal with nonstationary data by breaking longer data streams into smaller and locally stationary components.

  4. Bayesian analysis of general failure data from an ageing distribution: advances in numerical methods

    Energy Technology Data Exchange (ETDEWEB)

    Procaccia, H.; Villain, B. [Electricite de France (EDF), 93 - Saint-Denis (France); Clarotti, C.A. [ENEA, Casaccia (Italy)

    1996-12-31

    EDF and ENEA carried out a joint research program for developing the numerical methods and computer codes needed for Bayesian analysis of component-lives in the case of ageing. Early results of this study were presented at ESREL`94. Since then the following further steps have been gone: input data have been generalized to the case that observed lives are censored both on the right and on the left; allowable life distributions are Weibull and gamma - their parameters are both unknown and can be statistically dependent; allowable priors are histograms relative to different parametrizations of the life distribution of concern; first-and-second-order-moments of the posterior distributions can be computed. In particular the covariance will give some important information about the degree of the statistical dependence between the parameters of interest. An application of the code to the appearance of a stress corrosion cracking in a tube of the PWR Steam Generator system is presented. (authors). 10 refs.

  5. Bayesian Reliability Analysis of Non-Stationarity in Multi-agent Systems

    Directory of Open Access Journals (Sweden)

    TONT Gabriela

    2013-05-01

    Full Text Available The Bayesian methods provide information about the meaningful parameters in a statistical analysis obtained by combining the prior and sampling distributions to form the posterior distribution of theparameters. The desired inferences are obtained from this joint posterior. An estimation strategy for hierarchical models, where the resulting joint distribution of the associated model parameters cannotbe evaluated analytically, is to use sampling algorithms, known as Markov Chain Monte Carlo (MCMC methods, from which approximate solutions can be obtained. Both serial and parallel configurations of subcomponents are permitted. The capability of time-dependent method to describe a multi-state system is based on a case study, assessingthe operatial situation of studied system. The rationality and validity of the presented model are demonstrated via a case of study. The effect of randomness of the structural parameters is alsoexamined.

  6. Selection of Trusted Service Providers by Enforcing Bayesian Analysis in iVCE

    Institute of Scientific and Technical Information of China (English)

    GU Bao-jun; LI Xiao-yong; WANG Wei-nong

    2008-01-01

    The initiative of internet-based virtual computing environment (iVCE) aims to provide the end users and applications With a harmonious, trustworthy and transparent integrated computing environment which will facilitate sharing and collaborating of network resources between applications. Trust management is an elementary component for iVCE. The uncertain and dynamic characteristics of iVCE necessitate the requirement for the trust management to be subjective, historical evidence based and context dependent. This paper presents a Bayesian analysis-based trust model, which aims to secure the active agents for selecting appropriate trustod services in iVCE. Simulations are made to analyze the properties of the trust model which show that the subjective prior information influences trust evaluation a lot and the model stimulates positive interactions.

  7. Exposure models for the prior distribution in bayesian decision analysis for occupational hygiene decision making.

    Science.gov (United States)

    Lee, Eun Gyung; Kim, Seung Won; Feigley, Charles E; Harper, Martin

    2013-01-01

    This study introduces two semi-quantitative methods, Structured Subjective Assessment (SSA) and Control of Substances Hazardous to Health (COSHH) Essentials, in conjunction with two-dimensional Monte Carlo simulations for determining prior probabilities. Prior distribution using expert judgment was included for comparison. Practical applications of the proposed methods were demonstrated using personal exposure measurements of isoamyl acetate in an electronics manufacturing facility and of isopropanol in a printing shop. Applicability of these methods in real workplaces was discussed based on the advantages and disadvantages of each method. Although these methods could not be completely independent of expert judgments, this study demonstrated a methodological improvement in the estimation of the prior distribution for the Bayesian decision analysis tool. The proposed methods provide a logical basis for the decision process by considering determinants of worker exposure.

  8. Fermi's paradox, extraterrestrial life and the future of humanity: a Bayesian analysis

    CERN Document Server

    Verendel, Vilhelm

    2015-01-01

    The Great Filter interpretation of Fermi's great silence asserts that $Npq$ is not a very large number, where $N$ is the number of potentially life-supporting planets in the observable universe, $p$ is the probability that a randomly chosen such planet develops intelligent life to the level of present-day human civilization, and $q$ is the conditional probability that it then goes on to develop a technological supercivilization visible all over the observable universe. Evidence suggests that $N$ is huge, which implies that $pq$ is very small. Hanson (1998) and Bostrom (2008) have argued that the discovery of extraterrestrial life would point towards $p$ not being small and therefore a very small $q$, which can be seen as bad news for humanity's prospects of colonizing the universe. Here we investigate whether a Bayesian analysis supports their argument, and the answer turns out to depend critically on the choice of prior distribution.

  9. Intuitive logic revisited: new data and a Bayesian mixed model meta-analysis.

    Science.gov (United States)

    Singmann, Henrik; Klauer, Karl Christoph; Kellen, David

    2014-01-01

    Recent research on syllogistic reasoning suggests that the logical status (valid vs. invalid) of even difficult syllogisms can be intuitively detected via differences in conceptual fluency between logically valid and invalid syllogisms when participants are asked to rate how much they like a conclusion following from a syllogism (Morsanyi & Handley, 2012). These claims of an intuitive logic are at odds with most theories on syllogistic reasoning which posit that detecting the logical status of difficult syllogisms requires effortful and deliberate cognitive processes. We present new data replicating the effects reported by Morsanyi and Handley, but show that this effect is eliminated when controlling for a possible confound in terms of conclusion content. Additionally, we reanalyze three studies (n = 287) without this confound with a Bayesian mixed model meta-analysis (i.e., controlling for participant and item effects) which provides evidence for the null-hypothesis and against Morsanyi and Handley's claim.

  10. Lung scintigraphy clustering by texture analysis

    International Nuclear Information System (INIS)

    The efficiency of texture analysis parameters, describing the organization of grey level variations of an image, was studied for lung scintigraphic data classification. Twenty one patients received a99mTC-MAA perfusion scan and 81mKr and 127Xe ventilation scans. Scans were scaled to 64 grey levels and 100 k events for inter subject comparison. The texture index was the average of the absolute difference between a pixel and its neighbors. Energy, entropy, correlation, local homogeneity and inertia were computed using co-occurrence matrices. A principal component analysis was carried out on each parameter for each type of scan and the first principal components were selected as clustering indices. Validation was achieved by simulating 2 series of 20 increasingly heterogenous perfusion and ventilation scans. For most of the texture parameters, one principal component could summarize the patients data since it corresponded to the relative variances of 67%-88% for perfusion scans, 53%-99% for 81mKr scans and 38%-97% for 127Xe scans. The simulated series demonstrated a linear relationship between the heterogeneity and the first principal component for texture index, energy, entropy and inertia. This was not the case for correlation and local Homogeneity. We conclude that heterogeneity of lung scans may be quantified by texture analysis. The texture index is the easiest to compute and provides the most efficient results for clinical purpose. (orig.)

  11. Application of Bayesian Approach to Cost-Effectiveness Analysis of Antiviral Treatments in Chronic Hepatitis B

    Science.gov (United States)

    Zhang, Hua; Huo, Mingdong; Chao, Jianqian; Liu, Pei

    2016-01-01

    Background Hepatitis B virus (HBV) infection is a major problem for public health; timely antiviral treatment can significantly prevent the progression of liver damage from HBV by slowing down or stopping the virus from reproducing. In the study we applied Bayesian approach to cost-effectiveness analysis, using Markov Chain Monte Carlo (MCMC) simulation methods for the relevant evidence input into the model to evaluate cost-effectiveness of entecavir (ETV) and lamivudine (LVD) therapy for chronic hepatitis B (CHB) in Jiangsu, China, thus providing information to the public health system in the CHB therapy. Methods Eight-stage Markov model was developed, a hypothetical cohort of 35-year-old HBeAg-positive patients with CHB was entered into the model. Treatment regimens were LVD100mg daily and ETV 0.5 mg daily. The transition parameters were derived either from systematic reviews of the literature or from previous economic studies. The outcome measures were life-years, quality-adjusted lifeyears (QALYs), and expected costs associated with the treatments and disease progression. For the Bayesian models all the analysis was implemented by using WinBUGS version 1.4. Results Expected cost, life expectancy, QALYs decreased with age. Cost-effectiveness increased with age. Expected cost of ETV was less than LVD, while life expectancy and QALYs were higher than that of LVD, ETV strategy was more cost-effective. Costs and benefits of the Monte Carlo simulation were very close to the results of exact form among the group, but standard deviation of each group indicated there was a big difference between individual patients. Conclusions Compared with lamivudine, entecavir is the more cost-effective option. CHB patients should accept antiviral treatment as soon as possible as the lower age the more cost-effective. Monte Carlo simulation obtained costs and effectiveness distribution, indicate our Markov model is of good robustness. PMID:27574976

  12. From "weight of evidence" to quantitative data integration using multicriteria decision analysis and Bayesian methods.

    Science.gov (United States)

    Linkov, Igor; Massey, Olivia; Keisler, Jeff; Rusyn, Ivan; Hartung, Thomas

    2015-01-01

    "Weighing" available evidence in the process of decision-making is unavoidable, yet it is one step that routinely raises suspicions: what evidence should be used, how much does it weigh, and whose thumb may be tipping the scales? This commentary aims to evaluate the current state and future roles of various types of evidence for hazard assessment as it applies to environmental health. In its recent evaluation of the US Environmental Protection Agency's Integrated Risk Information System assessment process, the National Research Council committee singled out the term "weight of evidence" (WoE) for critique, deeming the process too vague and detractive to the practice of evaluating human health risks of chemicals. Moving the methodology away from qualitative, vague and controversial methods towards generalizable, quantitative and transparent methods for appropriately managing diverse lines of evidence is paramount for both regulatory and public acceptance of the hazard assessments. The choice of terminology notwithstanding, a number of recent Bayesian WoE-based methods, the emergence of multi criteria decision analysis for WoE applications, as well as the general principles behind the foundational concepts of WoE, show promise in how to move forward and regain trust in the data integration step of the assessments. We offer our thoughts on the current state of WoE as a whole and while we acknowledge that many WoE applications have been largely qualitative and subjective in nature, we see this as an opportunity to turn WoE towards a quantitative direction that includes Bayesian and multi criteria decision analysis.

  13. BGX: a Bioconductor package for the Bayesian integrated analysis of Affymetrix GeneChips

    Directory of Open Access Journals (Sweden)

    Hein Anne-Mette K

    2007-11-01

    Full Text Available Abstract Background Affymetrix 3' GeneChip microarrays are widely used to profile the expression of thousands of genes simultaneously. They differ from many other microarray types in that GeneChips are hybridised using a single labelled extract and because they contain multiple 'match' and 'mismatch' sequences for each transcript. Most algorithms extract the signal from GeneChip experiments in a sequence of separate steps, including background correction and normalisation, which inhibits the simultaneous use of all available information. They principally provide a point estimate of gene expression and, in contrast to BGX, do not fully integrate the uncertainty arising from potentially heterogeneous responses of the probes. Results BGX is a new Bioconductor R package that implements an integrated Bayesian approach to the analysis of 3' GeneChip data. The software takes into account additive and multiplicative error, non-specific hybridisation and replicate summarisation in the spirit of the model outlined in 1. It also provides a posterior distribution for the expression of each gene. Moreover, BGX can take into account probe affinity effects from probe sequence information where available. The package employs a novel adaptive Markov chain Monte Carlo (MCMC algorithm that raises considerably the efficiency with which the posterior distributions are sampled from. Finally, BGX incorporates various ways to analyse the results, such as ranking genes by expression level as well as statistically based methods for estimating the amount of up and down regulated genes between two conditions. Conclusion BGX performs well relative to other widely used methods at estimating expression levels and fold changes. It has the advantage that it provides a statistically sound measure of uncertainty for its estimates. BGX includes various analysis functions to visualise and exploit the rich output that is produced by the Bayesian model.

  14. From "weight of evidence" to quantitative data integration using multicriteria decision analysis and Bayesian methods.

    Science.gov (United States)

    Linkov, Igor; Massey, Olivia; Keisler, Jeff; Rusyn, Ivan; Hartung, Thomas

    2015-01-01

    "Weighing" available evidence in the process of decision-making is unavoidable, yet it is one step that routinely raises suspicions: what evidence should be used, how much does it weigh, and whose thumb may be tipping the scales? This commentary aims to evaluate the current state and future roles of various types of evidence for hazard assessment as it applies to environmental health. In its recent evaluation of the US Environmental Protection Agency's Integrated Risk Information System assessment process, the National Research Council committee singled out the term "weight of evidence" (WoE) for critique, deeming the process too vague and detractive to the practice of evaluating human health risks of chemicals. Moving the methodology away from qualitative, vague and controversial methods towards generalizable, quantitative and transparent methods for appropriately managing diverse lines of evidence is paramount for both regulatory and public acceptance of the hazard assessments. The choice of terminology notwithstanding, a number of recent Bayesian WoE-based methods, the emergence of multi criteria decision analysis for WoE applications, as well as the general principles behind the foundational concepts of WoE, show promise in how to move forward and regain trust in the data integration step of the assessments. We offer our thoughts on the current state of WoE as a whole and while we acknowledge that many WoE applications have been largely qualitative and subjective in nature, we see this as an opportunity to turn WoE towards a quantitative direction that includes Bayesian and multi criteria decision analysis. PMID:25592482

  15. A parametric Bayesian combination of local and regional information in flood frequency analysis

    Science.gov (United States)

    Seidou, O.; Ouarda, T. B. M. J.; Barbet, M.; Bruneau, P.; BobéE, B.

    2006-11-01

    Because of their impact on hydraulic structure design as well as on floodplain management, flood quantiles must be estimated with the highest precision given available information. If the site of interest has been monitored for a sufficiently long period (more than 30-40 years), at-site frequency analysis can be used to estimate flood quantiles with a fair precision. Otherwise, regional estimation may be used to mitigate the lack of data, but local information is then ignored. A commonly used approach to combine at-site and regional information is the linear empirical Bayes estimation: Under the assumption that both local and regional flood quantile estimators have a normal distribution, the empirical Bayesian estimator of the true quantile is the weighted average of both estimations. The weighting factor for each estimator is conversely proportional to its variance. We propose in this paper an alternative Bayesian method for combining local and regional information which provides the full probability density of quantiles and parameters. The application of the method is made with the generalized extreme values (GEV) distribution, but it can be extended to other types of extreme value distributions. In this method the prior distributions are obtained using a regional log linear regression model, and then local observations are used within a Markov chain Monte Carlo algorithm to infer the posterior distributions of parameters and quantiles. Unlike the empirical Bayesian approach the proposed method works even with a single local observation. It also relaxes the hypothesis of normality of the local quantiles probability distribution. The performance of the proposed methodology is compared to that of local, regional, and empirical Bayes estimators on three generated regional data sets with different statistical characteristics. The results show that (1) when the regional log linear model is unbiased, the proposed method gives better estimations of the GEV quantiles and

  16. Sea-level variability in tide-gauge and geological records: An empirical Bayesian analysis (Invited)

    Science.gov (United States)

    Kopp, R. E.; Hay, C.; Morrow, E.; Mitrovica, J. X.; Horton, B.; Kemp, A.

    2013-12-01

    Sea level varies at a range of temporal and spatial scales, and understanding all its significant sources of variability is crucial to building sea-level rise projections relevant to local decision-making. In the twentieth-century record, sites along the U.S. east coast have exhibited typical year-to-year variability of several centimeters. A faster-than-global increase in sea-level rise in the northeastern United States since about 1990 has led some to hypothesize a 'sea-level rise hot spot' in this region, perhaps driven by a trend in the Atlantic Meridional Overturning Circulation related to anthropogenic climate change [1]. However, such hypotheses must be evaluated in the context of natural variability, as revealed by observational and paleo-records. Bayesian and empirical Bayesian statistical approaches are well suited for assimilating data from diverse sources, such as tide-gauges and peats, with differing data availability and uncertainties, and for identifying regionally covarying patterns within these data. We present empirical Bayesian analyses of twentieth-century tide gauge data [2]. We find that the mid-Atlantic region of the United States has experienced a clear acceleration of sea level relative to the global average since about 1990, but this acceleration does not appear to be unprecedented in the twentieth-century record. The rate and extent of this acceleration instead appears comparable to an acceleration observed in the 1930s and 1940s. Both during the earlier episode of acceleration and today, the effect appears to be significantly positively correlated with the Atlantic Multidecadal Oscillation and likely negatively correlated with the North Atlantic Oscillation [2]. The Holocene and Common Era database of geological sea-level rise proxies [3,4] may allow these relationships to be assessed beyond the span of the direct observational record. At a global scale, similar approaches can be employed to look for the spatial fingerprints of land ice

  17. Hip fracture in the elderly: a re-analysis of the EPIDOS study with causal Bayesian networks.

    Directory of Open Access Journals (Sweden)

    Pascal Caillet

    Full Text Available Hip fractures commonly result in permanent disability, institutionalization or death in elderly. Existing hip-fracture predicting tools are underused in clinical practice, partly due to their lack of intuitive interpretation. By use of a graphical layer, Bayesian network models could increase the attractiveness of fracture prediction tools. Our aim was to study the potential contribution of a causal Bayesian network in this clinical setting. A logistic regression was performed as a standard control approach to check the robustness of the causal Bayesian network approach.EPIDOS is a multicenter study, conducted in an ambulatory care setting in five French cities between 1992 and 1996 and updated in 2010. The study included 7598 women aged 75 years or older, in which fractures were assessed quarterly during 4 years. A causal Bayesian network and a logistic regression were performed on EPIDOS data to describe major variables involved in hip fractures occurrences.Both models had similar association estimations and predictive performances. They detected gait speed and mineral bone density as variables the most involved in the fracture process. The causal Bayesian network showed that gait speed and bone mineral density were directly connected to fracture and seem to mediate the influence of all the other variables included in our model. The logistic regression approach detected multiple interactions involving psychotropic drug use, age and bone mineral density.Both approaches retrieved similar variables as predictors of hip fractures. However, Bayesian network highlighted the whole web of relation between the variables involved in the analysis, suggesting a possible mechanism leading to hip fracture. According to the latter results, intervention focusing concomitantly on gait speed and bone mineral density may be necessary for an optimal prevention of hip fracture occurrence in elderly people.

  18. Smartness and Italian Cities. A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Flavio Boscacci

    2014-05-01

    Full Text Available Smart cities have been recently recognized as the most pleasing and attractive places to live in; due to this, both scholars and policy-makers pay close attention to this topic. Specifically, urban “smartness” has been identified by plenty of characteristics that can be grouped into six dimensions (Giffinger et al. 2007: smart Economy (competitiveness, smart People (social and human capital, smart Governance (participation, smart Mobility (both ICTs and transport, smart Environment (natural resources, and smart Living (quality of life. According to this analytical framework, in the present paper the relation between urban attractiveness and the “smart” characteristics has been investigated in the 103 Italian NUTS3 province capitals in the year 2011. To this aim, a descriptive statistics has been followed by a regression analysis (OLS, where the dependent variable measuring the urban attractiveness has been proxied by housing market prices. Besides, a Cluster Analysis (CA has been developed in order to find differences and commonalities among the province capitals.The OLS results indicate that living, people and economy are the key drivers for achieving a better urban attractiveness. Environment, instead, keeps on playing a minor role. Besides, the CA groups the province capitals a

  19. Bias correction and Bayesian analysis of aggregate counts in SAGE libraries

    Directory of Open Access Journals (Sweden)

    Briggs William M

    2010-02-01

    Full Text Available Abstract Background Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power. Results Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context. Conclusions Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.

  20. Gamma prior distribution selection for Bayesian analysis of failure rate and reliability

    International Nuclear Information System (INIS)

    It is assumed that the phenomenon under study is such that the time-to-failure may be modeled by an exponential distribution with failure rate lambda. For Bayesian analyses of the assumed model, the family of gamma distributions provides conjugate prior models for lambda. Thus, an experimenter needs to select a particular gamma model to conduct a Bayesian reliability analysis. The purpose of this report is to present a methodology that can be used to translate engineering information, experience, and judgment into a choice of a gamma prior distribution. The proposed methodology assumes that the practicing engineer can provide percentile data relating to either the failure rate or the reliability of the phenomenon being investigated. For example, the methodology will select the gamma prior distribution which conveys an engineer's belief that the failure rate lambda simultaneously satisfies the probability statements, P(lambda less than 1.0 x 10-3) equals 0.50 and P(lambda less than 1.0 x 10-5) equals 0.05. That is, two percentiles provided by an engineer are used to determine a gamma prior model which agrees with the specified percentiles. For those engineers who prefer to specify reliability percentiles rather than the failure rate percentiles illustrated above, it is possible to use the induced negative-log gamma prior distribution which satisfies the probability statements, P(R(t0) less than 0.99) equals 0.50 and P(R(t0) less than 0.99999) equals 0.95, for some operating time t0. The report also includes graphs for selected percentiles which assist an engineer in applying the procedure. 28 figures, 16 tables

  1. Reliability Analysis of I and C Architecture of Research Reactors Using Bayesian Networks

    International Nuclear Information System (INIS)

    The objective of this research project is to identify a configuration of architecture which gives highest availability with maintaining low cost of manufacturing. In this regard, two configurations of a single channel of RPS are formulated in the current article and BN models were constructed. Bayesian network analysis was performed to find the reliability features. This is a continuation of study towards the standardization of I and C architecture for low and medium power research reactors. This research is the continuation of study to analyze the reliability of single channel of Reactor Protection System (RPS) using Bayesian networks. The focus of research was on the development of architecture for low power research reactors. What level of reliability is sufficient for protection, safety and control systems in case of low power research reactors? There should be a level which should satisfy all the regulatory requirements as well as operational demands with optimized cost of construction. Scholars, researchers and material investigators from educational and research institutes are demanding for construction of more research reactors. In order to meet this demand and construct more units, it is necessary to do more research in various areas. The research is also needed to make a standardization of research reactor I and C architectures on the same lines of commercial power plants. The research reactors are categorized into two broad categories, Low power research reactors and medium to high power research reactors. According to IAEA TECDOC-1234, Research reactors with 0.250-2.0 MW power rating or 2.5-10 Χ 1011 n/cm2.s. flux are termed low power reactor whereas research reactors ranging from 2-10 MW power rating or 0.1-10 Χ 1013 n/cm2.s. are considered as Medium to High power research reactors. Some other standards (IAEA NP-T-5.1) define multipurpose research reactor ranging from power few hundred KW to 10 MW as low power research reactor

  2. Gamma prior distribution selection for Bayesian analysis of failure rate and reliability

    Energy Technology Data Exchange (ETDEWEB)

    Waller, R.A.; Johnson, M.M.; Waterman, M.S.; Martz, H.F. Jr.

    1976-07-01

    It is assumed that the phenomenon under study is such that the time-to-failure may be modeled by an exponential distribution with failure rate lambda. For Bayesian analyses of the assumed model, the family of gamma distributions provides conjugate prior models for lambda. Thus, an experimenter needs to select a particular gamma model to conduct a Bayesian reliability analysis. The purpose of this report is to present a methodology that can be used to translate engineering information, experience, and judgment into a choice of a gamma prior distribution. The proposed methodology assumes that the practicing engineer can provide percentile data relating to either the failure rate or the reliability of the phenomenon being investigated. For example, the methodology will select the gamma prior distribution which conveys an engineer's belief that the failure rate lambda simultaneously satisfies the probability statements, P(lambda less than 1.0 x 10/sup -3/) equals 0.50 and P(lambda less than 1.0 x 10/sup -5/) equals 0.05. That is, two percentiles provided by an engineer are used to determine a gamma prior model which agrees with the specified percentiles. For those engineers who prefer to specify reliability percentiles rather than the failure rate percentiles illustrated above, it is possible to use the induced negative-log gamma prior distribution which satisfies the probability statements, P(R(t/sub 0/) less than 0.99) equals 0.50 and P(R(t/sub 0/) less than 0.99999) equals 0.95, for some operating time t/sub 0/. The report also includes graphs for selected percentiles which assist an engineer in applying the procedure. 28 figures, 16 tables.

  3. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  4. Bayesian reliability analysis for non-periodic inspection with estimation of uncertain parameters; Bayesian shinraisei kaiseki wo tekiyoshita hiteiki kozo kensa ni kansuru kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    Itagaki, H. [Yokohama National University, Yokohama (Japan). Faculty of Engineering; Asada, H.; Ito, S. [National Aerospace Laboratory, Tokyo (Japan); Shinozuka, M.

    1996-12-31

    Risk assessed structural positions in a pressurized fuselage of a transport-type aircraft applied with damage tolerance design are taken up as the subject of discussion. A small number of data obtained from inspections on the positions was used to discuss the Bayesian reliability analysis that can estimate also a proper non-periodic inspection schedule, while estimating proper values for uncertain factors. As a result, time period of generating fatigue cracks was determined according to procedure of detailed visual inspections. The analysis method was found capable of estimating values that are thought reasonable and the proper inspection schedule using these values, in spite of placing the fatigue crack progress expression in a very simple form and estimating both factors as the uncertain factors. Thus, the present analysis method was verified of its effectiveness. This study has discussed at the same time the structural positions, modeling of fatigue cracks generated and develop in the positions, conditions for destruction, damage factors, and capability of the inspection from different viewpoints. This reliability analysis method is thought effective also on such other structures as offshore structures. 18 refs., 8 figs., 1 tab.

  5. Application of Bayesian and cost benefit risk analysis in water resources management

    Science.gov (United States)

    Varouchakis, E. A.; Palogos, I.; Karatzas, G. P.

    2016-03-01

    Decision making is a significant tool in water resources management applications. This technical note approaches a decision dilemma that has not yet been considered for the water resources management of a watershed. A common cost-benefit analysis approach, which is novel in the risk analysis of hydrologic/hydraulic applications, and a Bayesian decision analysis are applied to aid the decision making on whether or not to construct a water reservoir for irrigation purposes. The alternative option examined is a scaled parabolic fine variation in terms of over-pumping violations in contrast to common practices that usually consider short-term fines. The methodological steps are analytically presented associated with originally developed code. Such an application, and in such detail, represents new feedback. The results indicate that the probability uncertainty is the driving issue that determines the optimal decision with each methodology, and depending on the unknown probability handling, each methodology may lead to a different optimal decision. Thus, the proposed tool can help decision makers to examine and compare different scenarios using two different approaches before making a decision considering the cost of a hydrologic/hydraulic project and the varied economic charges that water table limit violations can cause inside an audit interval. In contrast to practices that assess the effect of each proposed action separately considering only current knowledge of the examined issue, this tool aids decision making by considering prior information and the sampling distribution of future successful audits.

  6. Efficient Methods for Bayesian Uncertainty Analysis and Global Optimization of Computationally Expensive Environmental Models

    Science.gov (United States)

    Shoemaker, Christine; Espinet, Antoine; Pang, Min

    2015-04-01

    Models of complex environmental systems can be computationally expensive in order to describe the dynamic interactions of the many components over a sizeable time period. Diagnostics of these systems can include forward simulations of calibrated models under uncertainty and analysis of alternatives of systems management. This discussion will focus on applications of new surrogate optimization and uncertainty analysis methods to environmental models that can enhance our ability to extract information and understanding. For complex models, optimization and especially uncertainty analysis can require a large number of model simulations, which is not feasible for computationally expensive models. Surrogate response surfaces can be used in Global Optimization and Uncertainty methods to obtain accurate answers with far fewer model evaluations, which made the methods practical for computationally expensive models for which conventional methods are not feasible. In this paper we will discuss the application of the SOARS surrogate method for estimating Bayesian posterior density functions for model parameters for a TOUGH2 model of geologic carbon sequestration. We will also briefly discuss new parallel surrogate global optimization algorithm applied to two groundwater remediation sites that was implemented on a supercomputer with up to 64 processors. The applications will illustrate the use of these methods to predict the impact of monitoring and management on subsurface contaminants.

  7. Bayesian statistical modeling of spatially correlated error structure in atmospheric tracer inverse analysis

    Directory of Open Access Journals (Sweden)

    C. Mukherjee

    2011-01-01

    Full Text Available Inverse modeling applications in atmospheric chemistry are increasingly addressing the challenging statistical issues of data synthesis by adopting refined statistical analysis methods. This paper advances this line of research by addressing several central questions in inverse modeling, focusing specifically on Bayesian statistical computation. Motivated by problems of refining bottom-up estimates of source/sink fluxes of trace gas and aerosols based on increasingly high-resolution satellite retrievals of atmospheric chemical concentrations, we address head-on the need for integrating formal spatial statistical methods of residual error structure in global scale inversion models. We do this using analytically and computationally tractable spatial statistical models, know as conditional autoregressive spatial models, as components of a global inversion framework. We develop Markov chain Monte Carlo methods to explore and fit these spatial structures in an overall statistical framework that simultaneously estimates source fluxes. Additional aspects of the study extend the statistical framework to utilize priors in a more physically realistic manner, and to formally address and deal with missing data in satellite retrievals. We demonstrate the analysis in the context of inferring carbon monoxide (CO sources constrained by satellite retrievals of column CO from the Measurement of Pollution in the Troposphere (MOPITT instrument on the TERRA satellite, paying special attention to evaluating performance of the inverse approach using various statistical diagnostic metrics. This is developed using synthetic data generated to resemble MOPITT data to define a~proof-of-concept and model assessment, and then in analysis of real MOPITT data.

  8. Bayesian Inference for Neural Electromagnetic Source Localization: Analysis of MEG Visual Evoked Activity

    International Nuclear Information System (INIS)

    We have developed a Bayesian approach to the analysis of neural electromagnetic (MEG/EEG) data that can incorporate or fuse information from other imaging modalities and addresses the ill-posed inverse problem by sarnpliig the many different solutions which could have produced the given data. From these samples one can draw probabilistic inferences about regions of activation. Our source model assumes a variable number of variable size cortical regions of stimulus-correlated activity. An active region consists of locations on the cortical surf ace, within a sphere centered on some location in cortex. The number and radi of active regions can vary to defined maximum values. The goal of the analysis is to determine the posterior probability distribution for the set of parameters that govern the number, location, and extent of active regions. Markov Chain Monte Carlo is used to generate a large sample of sets of parameters distributed according to the posterior distribution. This sample is representative of the many different source distributions that could account for given data, and allows identification of probable (i.e. consistent) features across solutions. Examples of the use of this analysis technique with both simulated and empirical MEG data are presented

  9. Estimation of protein folding free energy barriers from calorimetric data by multi-model Bayesian analysis.

    Science.gov (United States)

    Naganathan, Athi N; Perez-Jimenez, Raul; Muñoz, Victor; Sanchez-Ruiz, Jose M

    2011-10-14

    The realization that folding free energy barriers can be small enough to result in significant population of the species at the barrier top has sprouted in several methods to estimate folding barriers from equilibrium experiments. Some of these approaches are based on fitting the experimental thermogram measured by differential scanning calorimetry (DSC) to a one-dimensional representation of the folding free-energy surface (FES). Different physical models have been used to represent the FES: (1) a Landau quartic polynomial as a function of the total enthalpy, which acts as an order parameter; (2) the projection onto a structural order parameter (i.e. number of native residues or native contacts) of the free energy of all the conformations generated by Ising-like statistical mechanical models; and (3) mean-field models that define conformational entropy and stabilization energy as functions of a continuous local order parameter. The fundamental question that emerges is how can we obtain robust, model-independent estimates of the thermodynamic folding barrier from the analysis of DSC experiments. Here we address this issue by comparing the performance of various FES models in interpreting the thermogram of a protein with a marginal folding barrier. We chose the small α-helical protein PDD, which folds-unfolds in microseconds crossing a free energy barrier previously estimated as ~1 RT. The fits of the PDD thermogram to the various models and assumptions produce FES with a consistently small free energy barrier separating the folded and unfolded ensembles. However, the fits vary in quality as well as in the estimated barrier. Applying Bayesian probabilistic analysis we rank the fit performance using a statistically rigorous criterion that leads to a global estimate of the folding barrier and its precision, which for PDD is 1.3 ± 0.4 kJ mol(-1). This result confirms that PDD folds over a minor barrier consistent with the downhill folding regime. We have further

  10. Bayesian Analysis Made Simple An Excel GUI for WinBUGS

    CERN Document Server

    Woodward, Philip

    2011-01-01

    From simple NLMs to complex GLMMs, this book describes how to use the GUI for WinBUGS - BugsXLA - an Excel add-in written by the author that allows a range of Bayesian models to be easily specified. With case studies throughout, the text shows how to routinely apply even the more complex aspects of model specification, such as GLMMs, outlier robust models, random effects Emax models, auto-regressive errors, and Bayesian variable selection. It provides brief, up-to-date discussions of current issues in the practical application of Bayesian methods. The author also explains how to obtain free so

  11. Monte-Carlo and Bayesian techniques in gravitational wave burst data analysis

    CERN Document Server

    Searle, Antony C

    2008-01-01

    Monte-Carlo simulations are used in the gravitational wave burst detection community to demonstrate and compare the properties of different search techniques. We note that every Monte-Carlo simulation has a corresponding optimal search technique according to both the non-Bayesian Neyman-Pearson criterion and the Bayesian approach, and that this optimal search technique is the Bayesian statistic. When practical, we recommend deriving the optimal statistic for a credible Monte-Carlo simulation, rather than testing ad hoc statistics against that simulation.

  12. EM Clustering Analysis of Diabetes Patients Basic Diagnosis Index

    OpenAIRE

    Wu, Cai; Steinbauer, Jeffrey R.; Kuo, Grace M

    2005-01-01

    Cluster analysis can group similar instances into same group. Partitioning cluster assigns classes to samples without known the classes in advance. Most common algorithms are K-means and Expectation Maximization (EM). EM clustering algorithm can find number of distributions of generating data and build “mixture models”. It identifies groups that are either overlapping or varying sizes and shapes. In this project, by using EM in Machine Learning Algorithm in JAVA (WEKA) syste...

  13. Clustering and Feature Selection using Sparse Principal Component Analysis

    OpenAIRE

    Luss, Ronny; d'Aspremont, Alexandre

    2007-01-01

    In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. PCA is often used as a simple clustering technique and sparse factors allow us here to interpret the clusters in terms of a reduced set of variables. We begin with a brief int...

  14. Computational model of an infant brain subjected to periodic motion simplified modelling and Bayesian sensitivity analysis.

    Science.gov (United States)

    Batterbee, D C; Sims, N D; Becker, W; Worden, K; Rowson, J

    2011-11-01

    Non-accidental head injury in infants, or shaken baby syndrome, is a highly controversial and disputed topic. Biomechanical studies often suggest that shaking alone cannot cause the classical symptoms, yet many medical experts believe the contrary. Researchers have turned to finite element modelling for a more detailed understanding of the interactions between the brain, skull, cerebrospinal fluid (CSF), and surrounding tissues. However, the uncertainties in such models are significant; these can arise from theoretical approximations, lack of information, and inherent variability. Consequently, this study presents an uncertainty analysis of a finite element model of a human head subject to shaking. Although the model geometry was greatly simplified, fluid-structure-interaction techniques were used to model the brain, skull, and CSF using a Eulerian mesh formulation with penalty-based coupling. Uncertainty and sensitivity measurements were obtained using Bayesian sensitivity analysis, which is a technique that is relatively new to the engineering community. Uncertainty in nine different model parameters was investigated for two different shaking excitations: sinusoidal translation only, and sinusoidal translation plus rotation about the base of the head. The level and type of sensitivity in the results was found to be highly dependent on the excitation type.

  15. Bayesian Analysis for Dynamic Generalized Linear Latent Model with Application to Tree Survival Rate

    Directory of Open Access Journals (Sweden)

    Yu-sheng Cheng

    2014-01-01

    Full Text Available Logistic regression model is the most popular regression technique, available for modeling categorical data especially for dichotomous variables. Classic logistic regression model is typically used to interpret relationship between response variables and explanatory variables. However, in real applications, most data sets are collected in follow-up, which leads to the temporal correlation among the data. In order to characterize the different variables correlations, a new method about the latent variables is introduced in this study. At the same time, the latent variables about AR (1 model are used to depict time dependence. In the framework of Bayesian analysis, parameters estimates and statistical inferences are carried out via Gibbs sampler with Metropolis-Hastings (MH algorithm. Model comparison, based on the Bayes factor, and forecasting/smoothing of the survival rate of the tree are established. A simulation study is conducted to assess the performance of the proposed method and a pika data set is analyzed to illustrate the real application. Since Bayes factor approaches vary significantly, efficiency tests have been performed in order to decide which solution provides a better tool for the analysis of real relational data sets.

  16. Bayesian analysis of uncertainty in predisposing and triggering factors for landslides hazard assessment

    Science.gov (United States)

    Sandric, I.; Petropoulos, Y.; Chitu, Z.; Mihai, B.

    2012-04-01

    The landslide hazard analysis models takes into consideration both predisposing and triggering factors combined into a Bayesian temporal network with uncertainty propagation. The model uses as predisposing factors the first and second derivatives from DEM, the effective precipitations, runoff, lithology and land use. The latter is expressed not as land use classes, as for example CORINE, but as leaf area index. The LAI offers the advantage of modelling not just the changes from different time periods expressed in years, but also the seasonal changes in land use throughout a year. The LAI index was derived from Landsat time series images, starting from 1984 and up to 2011. All the images available for the Panatau administrative unit in Buzau County, Romania, have been downloaded from http://earthexplorer.usgs.gov, including the images with cloud cover. The model is run in a monthly time step and for each time step all the parameters values, a-priory, conditional and posterior probability are obtained and stored in a log file. The validation process uses landslides that have occurred during the period up to the active time step and checks the records of the probabilities and parameters values for those times steps with the values of the active time step. Each time a landslide has been positive identified new a-priory probabilities are recorded for each parameter. A complete log for the entire model is saved and used for statistical analysis and a NETCDF file is created

  17. Bayesian analysis of anisotropic cosmologies: Bianchi VII_h and WMAP

    CERN Document Server

    McEwen, J D; Feeney, S M; Peiris, H V; Lasenby, A N

    2013-01-01

    We perform a definitive analysis of Bianchi VII_h cosmologies with WMAP observations of the cosmic microwave background (CMB) temperature anisotropies. Bayesian analysis techniques are developed to study anisotropic cosmologies using full-sky and partial-sky, masked CMB temperature data. We apply these techniques to analyse the full-sky internal linear combination (ILC) map and a partial-sky, masked W-band map of WMAP 9-year observations. In addition to the physically motivated Bianchi VII_h model, we examine phenomenological models considered in previous studies, in which the Bianchi VII_h parameters are decoupled from the standard cosmological parameters. In the two phenomenological models considered, Bayes factors of 1.7 and 1.1 units of log-evidence favouring a Bianchi component are found in full-sky ILC data. The corresponding best-fit Bianchi maps recovered are similar for both phenomenological models and are very close to those found in previous studies using earlier WMAP data releases. However, no evi...

  18. BayesPeak: Bayesian analysis of ChIP-seq data

    Directory of Open Access Journals (Sweden)

    Stark Rory

    2009-09-01

    Full Text Available Abstract Background High-throughput sequencing technology has become popular and widely used to study protein and DNA interactions. Chromatin immunoprecipitation, followed by sequencing of the resulting samples, produces large amounts of data that can be used to map genomic features such as transcription factor binding sites and histone modifications. Methods Our proposed statistical algorithm, BayesPeak, uses a fully Bayesian hidden Markov model to detect enriched locations in the genome. The structure accommodates the natural features of the Solexa/Illumina sequencing data and allows for overdispersion in the abundance of reads in different regions. Moreover, a control sample can be incorporated in the analysis to account for experimental and sequence biases. Markov chain Monte Carlo algorithms are applied to estimate the posterior distributions of the model parameters, and posterior probabilities are used to detect the sites of interest. Conclusion We have presented a flexible approach for identifying peaks from ChIP-seq reads, suitable for use on both transcription factor binding and histone modification data. Our method estimates probabilities of enrichment that can be used in downstream analysis. The method is assessed using experimentally verified data and is shown to provide high-confidence calls with low false positive rates.

  19. Updating reliability data using feedback analysis: feasibility of a Bayesian subjective method

    International Nuclear Information System (INIS)

    For years, EDF has used Probabilistic Safety Assessment to evaluate a global indicator of the safety of its nuclear power plants and to optimize the performance while ensuring a certain safety level. Therefore, robustness and relevancy of PSA are very important. That is the reason why EDF wants to improve the relevancy of the reliability parameters used in these models. This article aims to propose a Bayesian approach to build PSA parameters when feedback data is not large enough to use the frequentist method. Our method is called subjective because its purpose is to give engineers pragmatic criteria to apply Bayesian in a controlled and consistent way. Using Bayesian is quite common for example in the United States, because the nuclear power plants are less standardized. Bayesian is often used with generic data as prior. So we have to adapt the general methodology within EDF context. (authors)

  20. Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis

    Directory of Open Access Journals (Sweden)

    S. Muthurajkumar

    2014-05-01

    Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.

  1. Development of Hierarchical Bayesian Model Based on Regional Frequency Analysis and Its Application to Estimate Areal Rainfall in South Korea

    Science.gov (United States)

    Kim, J.; Kwon, H. H.

    2014-12-01

    The existing regional frequency analysis has disadvantages in that it is difficult to consider geographical characteristics in estimating areal rainfall. In this regard, This study aims to develop a hierarchical Bayesian model based regional frequency analysis in that spatial patterns of the design rainfall with geographical information are explicitly incorporated. This study assumes that the parameters of Gumbel distribution are a function of geographical characteristics (e.g. altitude, latitude and longitude) within a general linear regression framework. Posterior distributions of the regression parameters are estimated by Bayesian Markov Chain Monte Calro (MCMC) method, and the identified functional relationship is used to spatially interpolate the parameters of the Gumbel distribution by using digital elevation models (DEM) as inputs. The proposed model is applied to derive design rainfalls over the entire Han-river watershed. It was found that the proposed Bayesian regional frequency analysis model showed similar results compared to L-moment based regional frequency analysis. In addition, the model showed an advantage in terms of quantifying uncertainty of the design rainfall and estimating the area rainfall considering geographical information. Acknowledgement: This research was supported by a grant (14AWMP-B079364-01) from Water Management Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.

  2. Hierarchical Cluster Analysis – Various Approaches to Data Preparation

    Directory of Open Access Journals (Sweden)

    Z. Pacáková

    2013-09-01

    Full Text Available The article deals with two various approaches to data preparation to avoid multicollinearity. The aim of the article is to find similarities among the e-communication level of EU states using hierarchical cluster analysis. The original set of fourteen indicators was first reduced on the basis of correlation analysis while in case of high correlation indicator of higher variability was included in further analysis. Secondly the data were transformed using principal component analysis while the principal components are poorly correlated. For further analysis five principal components explaining about 92% of variance were selected. Hierarchical cluster analysis was performed both based on the reduced data set and the principal component scores. Both times three clusters were assumed following Pseudo t-Squared and Pseudo F Statistic, but the final clusters were not identical. An important characteristic to compare the two results found was to look at the proportion of variance accounted for by the clusters which was about ten percent higher for the principal component scores (57.8% compared to 47%. Therefore it can be stated, that in case of using principal component scores as an input variables for cluster analysis with explained proportion high enough (about 92% for in our analysis, the loss of information is lower compared to data reduction on the basis of correlation analysis.

  3. No control genes required: Bayesian analysis of qRT-PCR data.

    Directory of Open Access Journals (Sweden)

    Mikhail V Matz

    Full Text Available BACKGROUND: Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. RESULTS: In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts. Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the "classic" analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. CONCLUSIONS: Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been

  4. Cyclist activity and injury risk analysis at signalized intersections: a Bayesian modelling approach.

    Science.gov (United States)

    Strauss, Jillian; Miranda-Moreno, Luis F; Morency, Patrick

    2013-10-01

    This study proposes a two-equation Bayesian modelling approach to simultaneously study cyclist injury occurrence and bicycle activity at signalized intersections as joint outcomes. This approach deals with the potential presence of endogeneity and unobserved heterogeneities and is used to identify factors associated with both cyclist injuries and volumes. Its application to identify high-risk corridors is also illustrated. Montreal, Quebec, Canada is the application environment, using an extensive inventory of a large sample of signalized intersections containing disaggregate motor-vehicle traffic volumes and bicycle flows, geometric design, traffic control and built environment characteristics in the vicinity of the intersections. Cyclist injury data for the period of 2003-2008 is used in this study. Also, manual bicycle counts were standardized using temporal and weather adjustment factors to obtain average annual daily volumes. Results confirm and quantify the effects of both bicycle and motor-vehicle flows on cyclist injury occurrence. Accordingly, more cyclists at an intersection translate into more cyclist injuries but lower injury rates due to the non-linear association between bicycle volume and injury occurrence. Furthermore, the results emphasize the importance of turning motor-vehicle movements. The presence of bus stops and total crosswalk length increase cyclist injury occurrence whereas the presence of a raised median has the opposite effect. Bicycle activity through intersections was found to increase as employment, number of metro stations, land use mix, area of commercial land use type, length of bicycle facilities and the presence of schools within 50-800 m of the intersection increase. Intersections with three approaches are expected to have fewer cyclists than those with four. Using Bayesian analysis, expected injury frequency and injury rates were estimated for each intersection and used to rank corridors. Corridors with high bicycle volumes

  5. Genotype-Based Bayesian Analysis of Gene-Environment Interactions with Multiple Genetic Markers and Misclassification in Environmental Factors

    OpenAIRE

    Iryna Lobach; Ruzong Fan

    2012-01-01

    A key component to understanding etiology of complex diseases, such as cancer, diabetes, alcohol dependence, is to investigate gene-environment interactions. This work is motivated by the following two concerns in the analysis of gene-environment interactions. First, multiple genetic markers in moderate linkage disequilibrium may be involved in susceptibility to a complex disease. Second, environmental factors may be subject to misclassification. We develop a genotype based Bayesian pseudolik...

  6. Bayesian Word Sense Induction

    OpenAIRE

    Brody, Samuel; Lapata, Mirella

    2009-01-01

    Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim is to partition a word’s contexts into different classes, each representing a word sense. Our work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samp...

  7. Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis.

    Science.gov (United States)

    Rubio, Francisco J; Genton, Marc G

    2016-06-30

    We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26856806

  8. Bayesian Nonparametric Regression Analysis of Data with Random Effects Covariates from Longitudinal Measurements

    KAUST Repository

    Ryu, Duchwan

    2010-09-28

    We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.

  9. Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis

    KAUST Repository

    Rubio, Francisco J.

    2016-02-09

    We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.

  10. Bayesian time series analysis of segments of the Rocky Mountain trumpeter swan population

    Science.gov (United States)

    Wright, Christopher K.; Sojda, Richard S.; Goodman, Daniel

    2002-01-01

    A Bayesian time series analysis technique, the dynamic linear model, was used to analyze counts of Trumpeter Swans (Cygnus buccinator) summering in Idaho, Montana, and Wyoming from 1931 to 2000. For the Yellowstone National Park segment of white birds (sub-adults and adults combined) the estimated probability of a positive growth rate is 0.01. The estimated probability of achieving the Subcommittee on Rocky Mountain Trumpeter Swans 2002 population goal of 40 white birds for the Yellowstone segment is less than 0.01. Outside of Yellowstone National Park, Wyoming white birds are estimated to have a 0.79 probability of a positive growth rate with a 0.05 probability of achieving the 2002 objective of 120 white birds. In the Centennial Valley in southwest Montana, results indicate a probability of 0.87 that the white bird population is growing at a positive rate with considerable uncertainty. The estimated probability of achieving the 2002 Centennial Valley objective of 160 white birds is 0.14 but under an alternative model falls to 0.04. The estimated probability that the Targhee National Forest segment of white birds has a positive growth rate is 0.03. In Idaho outside of the Targhee National Forest, white birds are estimated to have a 0.97 probability of a positive growth rate with a 0.18 probability of attaining the 2002 goal of 150 white birds.

  11. Bayesian analysis of diagnostic test accuracy when disease state is unverified for some subjects.

    Science.gov (United States)

    Pennello, Gene A

    2011-09-01

    Studies of the accuracy of medical tests to diagnose the presence or absence of disease can suffer from an inability to verify the true disease state in everyone. When verification is missing at random (MAR), the missing data mechanism can be ignored in likelihood-based inference. However, this assumption may not hold even approximately. When verification is nonignorably missing, the most general model of the distribution of disease state, test result, and verification indicator is overparameterized. Parameters are only partially identified, creating regions of ignorance for maximum likelihood estimators. For studies of a single test, we use Bayesian analysis to implement the most general nonignorable model, a reduced nonignorable model with identifiable parameters, and the MAR model. Simple Gibbs sampling algorithms are derived that enable computation of the posterior distribution of test accuracy parameters. In particular, the posterior distribution is easily obtained for the most general nonignorable model, which makes relatively weak assumptions about the missing data mechanism. For this model, the posterior distribution combines two sources of uncertainty: ignorance in the estimation of partially identified parameters, and imprecision due to finite sampling variability. We compare the three models on data from a study of the accuracy of scintigraphy to diagnose liver disease.

  12. Composite behavior analysis for video surveillance using hierarchical dynamic Bayesian networks

    Science.gov (United States)

    Cheng, Huanhuan; Shan, Yong; Wang, Runsheng

    2011-03-01

    Analyzing composite behaviors involving objects from multiple categories in surveillance videos is a challenging task due to the complicated relationships among human and objects. This paper presents a novel behavior analysis framework using a hierarchical dynamic Bayesian network (DBN) for video surveillance systems. The model is built for extracting objects' behaviors and their relationships by representing behaviors using spatial-temporal characteristics. The recognition of object behaviors is processed by the DBN at multiple levels: features of objects at low level, objects and their relationships at middle level, and event at high level, where event refers to behaviors of a single type object as well as behaviors consisting of several types of objects such as ``a person getting in a car.'' Furthermore, to reduce the complexity, a simple model selection criterion is addressed, by which the appropriated model is picked out from a pool of candidate models. Experiments are shown to demonstrate that the proposed framework could efficiently recognize and semantically describe composite object and human activities in surveillance videos.

  13. Bayesian analysis of the kinetics of quantal transmitter secretion at the neuromuscular junction.

    Science.gov (United States)

    Saveliev, Anatoly; Khuzakhmetova, Venera; Samigullin, Dmitry; Skorinkin, Andrey; Kovyazina, Irina; Nikolsky, Eugeny; Bukharaeva, Ellya

    2015-10-01

    The timing of transmitter release from nerve endings is considered nowadays as one of the factors determining the plasticity and efficacy of synaptic transmission. In the neuromuscular junction, the moments of release of individual acetylcholine quanta are related to the synaptic delays of uniquantal endplate currents recorded under conditions of lowered extracellular calcium. Using Bayesian modelling, we performed a statistical analysis of synaptic delays in mouse neuromuscular junction with different patterns of rhythmic nerve stimulation and when the entry of calcium ions into the nerve terminal was modified. We have obtained a statistical model of the release timing which is represented as the summation of two independent statistical distributions. The first of these is the exponentially modified Gaussian distribution. The mixture of normal and exponential components in this distribution can be interpreted as a two-stage mechanism of early and late periods of phasic synchronous secretion. The parameters of this distribution depend on both the stimulation frequency of the motor nerve and the calcium ions' entry conditions. The second distribution was modelled as quasi-uniform, with parameters independent of nerve stimulation frequency and calcium entry. Two different probability density functions for the distribution of synaptic delays suggest at least two independent processes controlling the time course of secretion, one of them potentially involving two stages. The relative contribution of these processes to the total number of mediator quanta released depends differently on the motor nerve stimulation pattern and on calcium ion entry into nerve endings. PMID:26129670

  14. A virtual experimental technique for data collection for a Bayesian network approach to human reliability analysis

    International Nuclear Information System (INIS)

    Bayesian network (BN) is a powerful tool for human reliability analysis (HRA) as it can characterize the dependency among different human performance shaping factors (PSFs) and associated actions. It can also quantify the importance of different PSFs that may cause a human error. Data required to fully quantify BN for HRA in offshore emergency situations are not readily available. For many situations, there is little or no appropriate data. This presents significant challenges to assign the prior and conditional probabilities that are required by the BN approach. To handle the data scarcity problem, this paper presents a data collection methodology using a virtual environment for a simplified BN model of offshore emergency evacuation. A two-level, three-factor experiment is used to collect human performance data under different mustering conditions. Collected data are integrated in the BN model and results are compared with a previous study. The work demonstrates that the BN model can assess the human failure likelihood effectively. Besides, the BN model provides the opportunities to incorporate new evidence and handle complex interactions among PSFs and associated actions

  15. Bayesian analysis of sparse anisotropic universe models and application to the 5-yr WMAP data

    CERN Document Server

    Groeneboom, Nicolaas E

    2008-01-01

    We extend the previously described CMB Gibbs sampling framework to allow for exact Bayesian analysis of anisotropic universe models, and apply this method to the 5-year WMAP temperature observations. This involves adding support for non-diagonal signal covariance matrices, and implementing a general spectral parameter MCMC sampler. As a worked example we apply these techniques to the model recently introduced by Ackerman et al., describing for instance violations of rotational invariance during the inflationary epoch. After verifying the code with simulated data, we analyze the foreground-reduced 5-year WMAP temperature sky maps. For l < 400 and the W-band data, we find tentative evidence for a preferred direction pointing towards (l,b) = (110 deg, 10 deg) with an anisotropy amplitude of g* = 0.15 +- 0.039, nominally equivalent to a 3.8 sigma detection. Similar results are obtained from the V-band data [g* = 0.11 +- 0.039; (l,b) = (130 deg, 20 deg)]. Further, the preferred direction is stable with respect ...

  16. Bayesian analysis of heavy-tailed and long-range dependent Processes

    Science.gov (United States)

    Graves, Timothy; Watkins, Nick; Gramacy, Robert; Franzke, Christian

    2014-05-01

    We have used MCMC algorithms to perform a Bayesian analysis of Auto-Regressive Fractionally-Integrated Moving-Average ARFIMA(p,d,q) processes, which are capable of modelling long range dependence (e.g. Beran et al, 2013). Our principal aim is to obtain inference about the long memory parameter, d, with secondary interest in the scale and location parameters. We have developed a reversible-jump method enabling us to integrate over different model forms for the short memory component. We initially assume Gaussianity, and have tested the method on both synthetic and physical time series. We have extended the ARFIMA model by weakening the Gaussianity assumption, assuming an alpha-stable, heavy tailed, distribution for the innovations, and performing joint inference on d and alpha. We will present a study of the dependence of the posterior variance of the memory parameter d on the length of the time series considered. This will be compared with equivalent error diagnostics for other popular measures of d.

  17. Bayesian analysis of the linear reaction norm model with unknown covariates.

    Science.gov (United States)

    Su, G; Madsen, P; Lund, M S; Sorensen, D; Korsgaard, I R; Jensen, J

    2006-07-01

    The reaction norm model is becoming a popular approach for the analysis of genotype x environment interactions. In a classical reaction norm model, the expression of a genotype in different environments is described as a linear function (a reaction norm) of an environmental gradient or value. An environmental value is typically defined as the mean performance of all genotypes in the environment, which is usually unknown. One approximation is to estimate the mean phenotypic performance in each environment and then treat these estimates as known covariates in the model. However, a more satisfactory alternative is to infer environmental values simultaneously with the other parameters of the model. This study describes a method and its Bayesian Markov Chain Monte Carlo implementation that makes this possible. Frequentist properties of the proposed method are tested in a simulation study. Estimates of parameters of interest agree well with the true values. Further, inferences about genetic parameters from the proposed method are similar to those derived from a reaction norm model using true environmental values. On the other hand, using phenotypic means as proxies for environmental values results in poor inferences.

  18. Adverse and Advantageous Selection in the Medicare Supplemental Market: A Bayesian Analysis of Prescription drug Expenditure.

    Science.gov (United States)

    Li, Qian; Trivedi, Pravin K

    2016-02-01

    This paper develops an extended specification of the two-part model, which controls for unobservable self-selection and heterogeneity of health insurance, and analyzes the impact of Medicare supplemental plans on the prescription drug expenditure of the elderly, using a linked data set based on the Medicare Current Beneficiary Survey data for 2003-2004. The econometric analysis is conducted using a Bayesian econometric framework. We estimate the treatment effects for different counterfactuals and find significant evidence of endogeneity in plan choice and the presence of both adverse and advantageous selections in the supplemental insurance market. The average incentive effect is estimated to be $757 (2004 value) or 41% increase per person per year for the elderly enrolled in supplemental plans with drug coverage against the Medicare fee-for-service counterfactual and is $350 or 21% against the supplemental plans without drug coverage counterfactual. The incentive effect varies by different sources of drug coverage: highest for employer-sponsored insurance plans, followed by Medigap and managed medicare plans.

  19. Assessment of occupational safety risks in Floridian solid waste systems using Bayesian analysis.

    Science.gov (United States)

    Bastani, Mehrad; Celik, Nurcin

    2015-10-01

    Safety risks embedded within solid waste management systems continue to be a significant issue and are prevalent at every step in the solid waste management process. To recognise and address these occupational hazards, it is necessary to discover the potential safety concerns that cause them, as well as their direct and/or indirect impacts on the different types of solid waste workers. In this research, our goal is to statistically assess occupational safety risks to solid waste workers in the state of Florida. Here, we first review the related standard industrial codes to major solid waste management methods including recycling, incineration, landfilling, and composting. Then, a quantitative assessment of major risks is conducted based on the data collected using a Bayesian data analysis and predictive methods. The risks estimated in this study for the period of 2005-2012 are then compared with historical statistics (1993-1997) from previous assessment studies. The results have shown that the injury rates among refuse collectors in both musculoskeletal and dermal injuries have decreased from 88 and 15 to 16 and three injuries per 1000 workers, respectively. However, a contrasting trend is observed for the injury rates among recycling workers, for whom musculoskeletal and dermal injuries have increased from 13 and four injuries to 14 and six injuries per 1000 workers, respectively. Lastly, a linear regression model has been proposed to identify major elements of the high number of musculoskeletal and dermal injuries. PMID:26219294

  20. Paired Comparison Analysis of the van Baaren Model Using Bayesian Approach with Noninformative Prior

    Directory of Open Access Journals (Sweden)

    Saima Altaf

    2012-03-01

    Full Text Available 800x600 Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif";} One technique being commonly studied these days because of its attractive applications for the comparison of several objects is the method of paired comparisons. This technique permits the ranking of the objects by means of a score, which reflects the merit of the items on a linear scale. The present study is concerned with the Bayesian analysis of a paired comparison model, namely the van Baaren model VI using noninformative uniform prior. For this purpose, the joint posterior distribution for the parameters of the model, their marginal distributions, posterior estimates (means and modes, the posterior probabilities for comparing the two treatment parameters and the predictive probabilities are obtained.

  1. Adverse and Advantageous Selection in the Medicare Supplemental Market: A Bayesian Analysis of Prescription drug Expenditure.

    Science.gov (United States)

    Li, Qian; Trivedi, Pravin K

    2016-02-01

    This paper develops an extended specification of the two-part model, which controls for unobservable self-selection and heterogeneity of health insurance, and analyzes the impact of Medicare supplemental plans on the prescription drug expenditure of the elderly, using a linked data set based on the Medicare Current Beneficiary Survey data for 2003-2004. The econometric analysis is conducted using a Bayesian econometric framework. We estimate the treatment effects for different counterfactuals and find significant evidence of endogeneity in plan choice and the presence of both adverse and advantageous selections in the supplemental insurance market. The average incentive effect is estimated to be $757 (2004 value) or 41% increase per person per year for the elderly enrolled in supplemental plans with drug coverage against the Medicare fee-for-service counterfactual and is $350 or 21% against the supplemental plans without drug coverage counterfactual. The incentive effect varies by different sources of drug coverage: highest for employer-sponsored insurance plans, followed by Medigap and managed medicare plans. PMID:25504934

  2. Bayesian oligogenic analysis of quantitative and qualitative traits in general pedigrees.

    Science.gov (United States)

    Uimari, P; Sillanpää, M J

    2001-11-01

    A Bayesian method for multipoint oligogenic analysis of quantitative and qualitative traits is presented. This method can be applied to general pedigrees, which do not necessarily have to be "peelable" and can have large numbers of markers. The number of quantitative/qualitative trait loci (QTL), their map positions in the genome, and phenotypic effects (mode of inheritances) are all estimated simultaneously within the same framework. The summaries of the estimated parameters are based on the marginal posterior distributions that are obtained through Markov chain Monte Carlo (MCMC) methods. The method uses founder alleles together with segregation indicators in order to determine the genotypes of the trait loci of all individuals in the pedigree. To improve mixing properties of the sampler, we propose (1) joint sampling of map position and segregation indicators, (2) omitting data augmentation for untyped or uninformative markers (homozygous parent), and (3) updating several markers jointly within a single block. The performance of the method was tested with two replicate GAW10 data sets (considering two levels of available marker information). The results were concordant and similar to those presented earlier with other methods. These analyses clearly illustrate the utility and wide applicability of the method. PMID:11668579

  3. Bayesian analysis of spatial-dependent cosmic-ray propagation: astrophysical background of antiprotons and positrons

    CERN Document Server

    Feng, Jie; Oliva, Alberto

    2016-01-01

    The AMS-02 experiment has reported a new measurement of the antiproton/proton ratio in Galactic cosmic rays (CRs). In the energy range $E\\sim\\,$60-450 GeV, this ratio is found to be remarkably constant. Using recent data on CR proton, helium, carbon, 10Be/9Be, and B/C ratio, we have performed a global Bayesian analysis based on a Markov-Chain Monte-Carlo sampling algorithm under a "two halo model" of CR propagation. In this model, CRs are allowed to experience a different type of diffusion when they propagate in the region close of the Galactic disk. We found that the vertical extent of this region is about 900 pc above and below the disk, and the corresponding diffusion coefficient scales with energy as $D\\sim\\,E^{0.15}$, describing well the observations on primary CR spectra, secondary/primary ratios and anisotropy. Under this model we have carried out improved calculations of antiparticle spectra arising from secondary CR production and their corresponding uncertainties. We made use of Monte-Carlo generato...

  4. Bayesian biostatistics

    CERN Document Server

    Lesaffre, Emmanuel

    2012-01-01

    The growth of biostatistics has been phenomenal in recent years and has been marked by considerable technical innovation in both methodology and computational practicality. One area that has experienced significant growth is Bayesian methods. The growing use of Bayesian methodology has taken place partly due to an increasing number of practitioners valuing the Bayesian paradigm as matching that of scientific discovery. In addition, computational advances have allowed for more complex models to be fitted routinely to realistic data sets. Through examples, exercises and a combination of introd

  5. Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis

    Science.gov (United States)

    Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis

    2016-01-01

    Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230

  6. Bayesian belief networks for human reliability analysis: A review of applications and gaps

    International Nuclear Information System (INIS)

    The use of Bayesian Belief Networks (BBNs) in risk analysis (and in particular Human Reliability Analysis, HRA) is fostered by a number of features, attractive in fields with shortage of data and consequent reliance on subjective judgments: the intuitive graphical representation, the possibility of combining diverse sources of information, the use the probabilistic framework to characterize uncertainties. In HRA, BBN applications are steadily increasing, each emphasizing a different BBN feature or a different HRA aspect to improve. This paper aims at a critical review of these features as well as at suggesting research needs. Five groups of BBN applications are analysed: modelling of organizational factors, analysis of the relationships among failure influencing factors, BBN-based extensions of existing HRA methods, dependency assessment among human failure events, assessment of situation awareness. Further, the paper analyses the process for building BBNs and in particular how expert judgment is used in the assessment of the BBN conditional probability distributions. The gaps identified in the review suggest the need for establishing more systematic frameworks to integrate the different sources of information relevant for HRA (cognitive models, empirical data, and expert judgment) and to investigate algorithms to avoid elicitation of many relationships via expert judgment. - Highlights: • We analyze BBN uses for HRA applications; but some conclusions can be generalized. • Special review focus on BBN building approaches, key for model acceptance. • Gaps relate to the transparency of the BBN building and quantification phases. • Need for more systematic framework to integrate different sources of information. • Need of ways to avoid elicitation of many relationships via expert judgment

  7. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies

    Directory of Open Access Journals (Sweden)

    Hero Alfred

    2010-11-01

    Full Text Available Abstract Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP, the Indian Buffet Process (IBP, and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV, Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD, closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

  8. Bayesian statistics

    OpenAIRE

    Draper, D.

    2001-01-01

    © 2012 Springer Science+Business Media, LLC. All rights reserved. Article Outline: Glossary Definition of the Subject and Introduction The Bayesian Statistical Paradigm Three Examples Comparison with the Frequentist Statistical Paradigm Future Directions Bibliography

  9. Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis

    Science.gov (United States)

    Fu, Pei-hua; Yin, Hong-bo

    In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.

  10. Cancer incidence in men: a cluster analysis of spatial patterns

    Directory of Open Access Journals (Sweden)

    D'Alò Daniela

    2008-11-01

    Full Text Available Abstract Background Spatial clustering of different diseases has received much less attention than single disease mapping. Besides chance or artifact, clustering of different cancers in a given area may depend on exposure to a shared risk factor or to multiple correlated factors (e.g. cigarette smoking and obesity in a deprived area. Models developed so far to investigate co-occurrence of diseases are not well-suited for analyzing many cancers simultaneously. In this paper we propose a simple two-step exploratory method for screening clusters of different cancers in a population. Methods Cancer incidence data were derived from the regional cancer registry of Umbria, Italy. A cluster analysis was performed on smoothed and non-smoothed standardized incidence ratios (SIRs of the 13 most frequent cancers in males. The Besag, York and Mollie model (BYM and Poisson kriging were used to produce smoothed SIRs. Results Cluster analysis on non-smoothed SIRs was poorly informative in terms of clustering of different cancers, as only larynx and oral cavity were grouped, and of characteristic patterns of cancer incidence in specific geographical areas. On the other hand BYM and Poisson kriging gave similar results, showing cancers of the oral cavity, larynx, esophagus, stomach and liver formed a main cluster. Lung and urinary bladder cancers clustered together but not with the cancers mentioned above. Both methods, particularly the BYM model, identified distinct geographic clusters of adjacent areas. Conclusion As in single disease mapping, non-smoothed SIRs do not provide reliable estimates of cancer risks because of small area variability. The BYM model produces smooth risk surfaces which, when entered into a cluster analysis, identify well-defined geographical clusters of adjacent areas. It probably enhances or amplifies the signal arising from exposure of more areas (statistical units to shared risk factors that are associated with different cancers. In

  11. Cluster analysis for anomaly detection in accounting data : an audit approach

    OpenAIRE

    Thiprungsri, Sutapat; Vasarhelyi, Miklos A.

    2011-01-01

    This study examines the application of cluster analysis in the accounting domain, particularly discrepancy detection in audit. Cluster analysis groups data so that points within a single group or cluster are similar to one another and distinct from points in other clusters. Clustering has been shown to be a good candidate for anomaly detection. The purpose of this study is to examine the use of clustering technology to automate fraud filtering during an audit. We use cluster analysis to help ...

  12. Bayesian hierarchical multi-subject multiscale analysis of functional MRI data.

    Science.gov (United States)

    Sanyal, Nilotpal; Ferreira, Marco A R

    2012-11-15

    We develop a methodology for Bayesian hierarchical multi-subject multiscale analysis of functional Magnetic Resonance Imaging (fMRI) data. We begin by modeling the brain images temporally with a standard general linear model. After that, we transform the resulting estimated standardized regression coefficient maps through a discrete wavelet transformation to obtain a sparse representation in the wavelet space. Subsequently, we assign to the wavelet coefficients a prior that is a mixture of a point mass at zero and a Gaussian white noise. In this mixture prior for the wavelet coefficients, the mixture probabilities are related to the pattern of brain activity across different resolutions. To incorporate this information, we assume that the mixture probabilities for wavelet coefficients at the same location and level are common across subjects. Furthermore, we assign for the mixture probabilities a prior that depends on a few hyperparameters. We develop an empirical Bayes methodology to estimate the hyperparameters and, as these hyperparameters are shared by all subjects, we obtain precise estimated values. Then we carry out inference in the wavelet space and obtain smoothed images of the regression coefficients by applying the inverse wavelet transform to the posterior means of the wavelet coefficients. An application to computer simulated synthetic data has shown that, when compared to single-subject analysis, our multi-subject methodology performs better in terms of mean squared error. Finally, we illustrate the utility and flexibility of our multi-subject methodology with an application to an event-related fMRI dataset generated by Postle (2005) through a multi-subject fMRI study of working memory related brain activation. PMID:22951257

  13. A Bayesian Network Approach for Offshore Risk Analysis Through Linguistic Variables

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    This paper presents a new approach for offshore risk analysis that is capable of dealing with linguistic probabilities in Bayesian networks (BNs). In this paper, linguistic probabilities are used to describe occurrence likelihood of hazardous events that may cause possible accidents in offshore operations. In order to use fuzzy information, an f-weighted valuation function is proposed to transform linguistic judgements into crisp probability distributions which can be easily put into a BN to model causal relationships among risk factors. The use of linguistic variables makes it easier for human experts to express their knowledge, and the transformation of linguistic judgements into crisp probabilities can significantly save the cost of computation, modifying and maintaining a BN model. The flexibility of the method allows for multiple forms of information to be used to quantify model relationships, including formally assessed expert opinion when quantitative data are lacking, or when only qualitative or vague statements can be made. The model is a modular representation of uncertain knowledge caused due to randomness, vagueness and ignorance. This makes the risk analysis of offshore engineering systems more functional and easier in many assessment contexts. Specifically, the proposed f-weighted valuation function takes into account not only the dominating values, but also the α-level values that are ignored by conventional valuation methods. A case study of the collision risk between a Floating Production, Storage and Off-loading (FPSO) unit and the authorised vessels due to human elements during operation is used to illustrate the application of the proposed model.

  14. Reduction of a Whole-Body Physiologically Based Pharmacokinetic Model to Stabilise the Bayesian Analysis of Clinical Data.

    Science.gov (United States)

    Wendling, Thierry; Tsamandouras, Nikolaos; Dumitras, Swati; Pigeolet, Etienne; Ogungbenro, Kayode; Aarons, Leon

    2016-01-01

    Whole-body physiologically based pharmacokinetic (PBPK) models are increasingly used in drug development for their ability to predict drug concentrations in clinically relevant tissues and to extrapolate across species, experimental conditions and sub-populations. A whole-body PBPK model can be fitted to clinical data using a Bayesian population approach. However, the analysis might be time consuming and numerically unstable if prior information on the model parameters is too vague given the complexity of the system. We suggest an approach where (i) a whole-body PBPK model is formally reduced using a Bayesian proper lumping method to retain the mechanistic interpretation of the system and account for parameter uncertainty, (ii) the simplified model is fitted to clinical data using Markov Chain Monte Carlo techniques and (iii) the optimised reduced PBPK model is used for extrapolation. A previously developed 16-compartment whole-body PBPK model for mavoglurant was reduced to 7 compartments while preserving plasma concentration-time profiles (median and variance) and giving emphasis to the brain (target site) and the liver (elimination site). The reduced model was numerically more stable than the whole-body model for the Bayesian analysis of mavoglurant pharmacokinetic data in healthy adult volunteers. Finally, the reduced yet mechanistic model could easily be scaled from adults to children and predict mavoglurant pharmacokinetics in children aged from 3 to 11 years with similar performance compared with the whole-body model. This study is a first example of the practicality of formal reduction of complex mechanistic models for Bayesian inference in drug development.

  15. Compiling Relational Bayesian Networks for Exact Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred; Chavira, Mark; Darwiche, Adnan

    2004-01-01

    We describe a system for exact inference with relational Bayesian networks as defined in the publicly available \\primula\\ tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating...... and differentiating these circuits in time linear in their size. We report on experimental results showing the successful compilation, and efficient inference, on relational Bayesian networks whose {\\primula}--generated propositional instances have thousands of variables, and whose jointrees have clusters...

  16. Comparative analysis of genomic signal processing for microarray data clustering.

    Science.gov (United States)

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  17. Using cluster analysis to organize and explore regional GPS velocities

    Science.gov (United States)

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  18. Bayesian analysis of rotating machines - A statistical approach to estimate and track the fundamental frequency

    DEFF Research Database (Denmark)

    Pedersen, Thorkild Find

    2003-01-01

    frequency estimation techniques are considered for predicting the true fundamental frequency from measured acoustic noise or vibration signal. Among the methods are auto-correlation based methods, subspace methods, interpolated Fourier transform methods, and adaptive filters. A modified version...... of an adaptive comb filter is derived for tracking non-stationary signals. The estimation problem is then rephrased in terms of the Bayesian statistical framework. In the Bayesian framework both parameters and observations are considered stochastic processes. The result of the estimation is an expression...... for the probability density function (PDF) of the parameters conditioned on observation. Considering the fundamental frequency as a parameter and the acoustic and vibration signals as observations, a novel Bayesian frequency estimator is developed. With simulations the new estimator is shown to be superior to any...

  19. Cluster Analysis of Gene Expression Data

    CERN Document Server

    Domany, E

    2002-01-01

    The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample - such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50 - 100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by "clustering" and how we analyze the massive amounts of data from such experiments, and present results obtained from a...

  20. Statistical fractal analysis of 25 young star clusters

    CERN Document Server

    Gregorio-Hetem, J; Santos-Silva, T; Fernandes, B

    2015-01-01

    A large sample of young stellar groups is analysed aiming to investigate their clustering properties and dynamical evolution. A comparison of the Q statistical parameter, measured for the clusters, with the fractal dimension estimated for the projected clouds shows that 52% of the sample has substructures and tends to follow the theoretically expected relation between clusters and clouds, according to calculations for artificial distribution of points. The fractal statistics was also compared to structural parameters revealing that clusters having radial density profile show a trend of parameter s increasing with mean surface stellar density. The core radius of the sample, as a function of age, follows a distribution similar to that observed in stellar groups of Milky Way and other galaxies. They also have dynamical age, indicated by their crossing time that is similar to unbound associations. The statistical analysis allowed us to separate the sample into two groups showing different clustering characteristi...

  1. Hierarchical Bayesian Spatio-Temporal Analysis of Climatic and Socio-Economic Determinants of Rocky Mountain Spotted Fever.

    Science.gov (United States)

    Raghavan, Ram K; Goodin, Douglas G; Neises, Daniel; Anderson, Gary A; Ganta, Roman R

    2016-01-01

    This study aims to examine the spatio-temporal dynamics of Rocky Mountain spotted fever (RMSF) prevalence in four contiguous states of Midwestern United States, and to determine the impact of environmental and socio-economic factors associated with this disease. Bayesian hierarchical models were used to quantify space and time only trends and spatio-temporal interaction effect in the case reports submitted to the state health departments in the region. Various socio-economic, environmental and climatic covariates screened a priori in a bivariate procedure were added to a main-effects Bayesian model in progressive steps to evaluate important drivers of RMSF space-time patterns in the region. Our results show a steady increase in RMSF incidence over the study period to newer geographic areas, and the posterior probabilities of county-specific trends indicate clustering of high risk counties in the central and southern parts of the study region. At the spatial scale of a county, the prevalence levels of RMSF is influenced by poverty status, average relative humidity, and average land surface temperature (>35°C) in the region, and the relevance of these factors in the context of climate-change impacts on tick-borne diseases are discussed. PMID:26942604

  2. Hierarchical Bayesian Spatio-Temporal Analysis of Climatic and Socio-Economic Determinants of Rocky Mountain Spotted Fever.

    Directory of Open Access Journals (Sweden)

    Ram K Raghavan

    Full Text Available This study aims to examine the spatio-temporal dynamics of Rocky Mountain spotted fever (RMSF prevalence in four contiguous states of Midwestern United States, and to determine the impact of environmental and socio-economic factors associated with this disease. Bayesian hierarchical models were used to quantify space and time only trends and spatio-temporal interaction effect in the case reports submitted to the state health departments in the region. Various socio-economic, environmental and climatic covariates screened a priori in a bivariate procedure were added to a main-effects Bayesian model in progressive steps to evaluate important drivers of RMSF space-time patterns in the region. Our results show a steady increase in RMSF incidence over the study period to newer geographic areas, and the posterior probabilities of county-specific trends indicate clustering of high risk counties in the central and southern parts of the study region. At the spatial scale of a county, the prevalence levels of RMSF is influenced by poverty status, average relative humidity, and average land surface temperature (>35°C in the region, and the relevance of these factors in the context of climate-change impacts on tick-borne diseases are discussed.

  3. Bayesian Analysis of Linear Inverse Problems with Applications in Economics and Finance

    OpenAIRE

    De Simoni, Anna

    2009-01-01

    In my PhD thesis I propose a Bayesian nonparametric estimation method for structural econometric models where the functional parameter of interest describes the economic agent's behavior. The structural parameter is characterized as the solution of a functional equation, or by using more technical words, as the solution of an inverse problem that can be either ill-posed or well-posed. From a Bayesian point of view, the parameter of interest is a random function and the solution to the infe...

  4. Use of Bayesian Inference in Crystallographic Structure Refinement via Full Diffraction Profile Analysis

    Science.gov (United States)

    Fancher, Chris M.; Han, Zhen; Levin, Igor; Page, Katharine; Reich, Brian J.; Smith, Ralph C.; Wilson, Alyson G.; Jones, Jacob L.

    2016-01-01

    A Bayesian inference method for refining crystallographic structures is presented. The distribution of model parameters is stochastically sampled using Markov chain Monte Carlo. Posterior probability distributions are constructed for all model parameters to properly quantify uncertainty by appropriately modeling the heteroskedasticity and correlation of the error structure. The proposed method is demonstrated by analyzing a National Institute of Standards and Technology silicon standard reference material. The results obtained by Bayesian inference are compared with those determined by Rietveld refinement. Posterior probability distributions of model parameters provide both estimates and uncertainties. The new method better estimates the true uncertainties in the model as compared to the Rietveld method. PMID:27550221

  5. Bayesian Analysis for Risk Assessment of Selected Medical Events in Support of the Integrated Medical Model Effort

    Science.gov (United States)

    Gilkey, Kelly M.; Myers, Jerry G.; McRae, Michael P.; Griffin, Elise A.; Kallrui, Aditya S.

    2012-01-01

    The Exploration Medical Capability project is creating a catalog of risk assessments using the Integrated Medical Model (IMM). The IMM is a software-based system intended to assist mission planners in preparing for spaceflight missions by helping them to make informed decisions about medical preparations and supplies needed for combating and treating various medical events using Probabilistic Risk Assessment. The objective is to use statistical analyses to inform the IMM decision tool with estimated probabilities of medical events occurring during an exploration mission. Because data regarding astronaut health are limited, Bayesian statistical analysis is used. Bayesian inference combines prior knowledge, such as data from the general U.S. population, the U.S. Submarine Force, or the analog astronaut population located at the NASA Johnson Space Center, with observed data for the medical condition of interest. The posterior results reflect the best evidence for specific medical events occurring in flight. Bayes theorem provides a formal mechanism for combining available observed data with data from similar studies to support the quantification process. The IMM team performed Bayesian updates on the following medical events: angina, appendicitis, atrial fibrillation, atrial flutter, dental abscess, dental caries, dental periodontal disease, gallstone disease, herpes zoster, renal stones, seizure, and stroke.

  6. New class of hybrid EoS and Bayesian M - R data analysis

    Energy Technology Data Exchange (ETDEWEB)

    Alvarez-Castillo, D. [JINR Dubna, Bogoliubov Laboratory of Theoretical Physics, Dubna (Russian Federation); Ayriyan, A.; Grigorian, H. [JINR Dubna, Laboratory of Information Technologies, Dubna (Russian Federation); Benic, S. [University of Zagreb, Department of Physics, Zagreb (Croatia); Blaschke, D. [JINR Dubna, Bogoliubov Laboratory of Theoretical Physics, Dubna (Russian Federation); National Research Nuclear University (MEPhI), Moscow (Russian Federation); Typel, S. [GSI Helmholtzzentrum fuer Schwerionenforschung GmbH, Darmstadt (Germany)

    2016-03-15

    We explore systematically a new class of two-phase equations of state (EoS) for hybrid stars that is characterized by three main features: (1) stiffening of the nuclear EoS at supersaturation densities due to quark exchange effects (Pauli blocking) between hadrons, modelled by an excluded volume correction; (2) stiffening of the quark matter EoS at high densities due to multiquark interactions; and (3) possibility for a strong first-order phase transition with an early onset and large density jump. The third feature results from a Maxwell construction for the possible transition from the nuclear to a quark matter phase and its properties depend on the two parameters used for (1) and (2), respectively. Varying these two parameters, one obtains a class of hybrid EoS that yields solutions of the Tolman-Oppenheimer-Volkoff (TOV) equations for sequences of hadronic and hybrid stars in the mass-radius diagram which cover the full range of patterns according to the Alford-Han-Prakash classification following which a hybrid star branch can be either absent, connected or disconnected with the hadronic one. The latter case often includes a tiny connected branch. The disconnected hybrid star branch, also called ''third family'', corresponds to high-mass twin stars characterized by the same gravitational mass but different radii. We perform a Bayesian analysis and demonstrate that the observation of such a pair of high-mass twin stars would have a sufficient discriminating power to favor hybrid EoS with a strong first-order phase transition over alternative EoS. (orig.)

  7. Bayesian approach to the analysis of neutron Brillouin scattering data on liquid metals.

    Science.gov (United States)

    De Francesco, A; Guarini, E; Bafile, U; Formisano, F; Scaccia, L

    2016-08-01

    When the dynamics of liquids and disordered systems at mesoscopic level is investigated by means of inelastic scattering (e.g., neutron or x ray), spectra are often characterized by a poor definition of the excitation lines and spectroscopic features in general and one important issue is to establish how many of these lines need to be included in the modeling function and to estimate their parameters. Furthermore, when strongly damped excitations are present, commonly used and widespread fitting algorithms are particularly affected by the choice of initial values of the parameters. An inadequate choice may lead to an inefficient exploration of the parameter space, resulting in the algorithm getting stuck in a local minimum. In this paper, we present a Bayesian approach to the analysis of neutron Brillouin scattering data in which the number of excitation lines is treated as unknown and estimated along with the other model parameters. We propose a joint estimation procedure based on a reversible-jump Markov chain Monte Carlo algorithm, which efficiently explores the parameter space, producing a probabilistic measure to quantify the uncertainty on the number of excitation lines as well as reliable parameter estimates. The method proposed could turn out of great importance in extracting physical information from experimental data, especially when the detection of spectral features is complicated not only because of the properties of the sample, but also because of the limited instrumental resolution and count statistics. The approach is tested on generated data set and then applied to real experimental spectra of neutron Brillouin scattering from a liquid metal, previously analyzed in a more traditional way. PMID:27627410

  8. A method of spherical harmonic analysis in the geosciences via hierarchical Bayesian inference

    Science.gov (United States)

    Muir, J. B.; Tkalčić, H.

    2015-11-01

    The problem of decomposing irregular data on the sphere into a set of spherical harmonics is common in many fields of geosciences where it is necessary to build a quantitative understanding of a globally varying field. For example, in global seismology, a compressional or shear wave speed that emerges from tomographic images is used to interpret current state and composition of the mantle, and in geomagnetism, secular variation of magnetic field intensity measured at the surface is studied to better understand the changes in the Earth's core. Optimization methods are widely used for spherical harmonic analysis of irregular data, but they typically do not treat the dependence of the uncertainty estimates on the imposed regularization. This can cause significant difficulties in interpretation, especially when the best-fit model requires more variables as a result of underestimating data noise. Here, with the above limitations in mind, the problem of spherical harmonic expansion of irregular data is treated within the hierarchical Bayesian framework. The hierarchical approach significantly simplifies the problem by removing the need for regularization terms and user-supplied noise estimates. The use of the corrected Akaike Information Criterion for picking the optimal maximum degree of spherical harmonic expansion and the resulting spherical harmonic analyses are first illustrated on a noisy synthetic data set. Subsequently, the method is applied to two global data sets sensitive to the Earth's inner core and lowermost mantle, consisting of PKPab-df and PcP-P differential traveltime residuals relative to a spherically symmetric Earth model. The posterior probability distributions for each spherical harmonic coefficient are calculated via Markov Chain Monte Carlo sampling; the uncertainty obtained for the coefficients thus reflects the noise present in the real data and the imperfections in the spherical harmonic expansion.

  9. Bayesian Analysis of Non-Gaussian Long-Range Dependent Processes

    Science.gov (United States)

    Graves, Timothy; Watkins, Nicholas; Franzke, Christian; Gramacy, Robert

    2013-04-01

    Recent studies [e.g. the Antarctic study of Franzke, J. Climate, 2010] have strongly suggested that surface temperatures exhibit long-range dependence (LRD). The presence of LRD would hamper the identification of deterministic trends and the quantification of their significance. It is well established that LRD processes exhibit stochastic trends over rather long periods of time. Thus, accurate methods for discriminating between physical processes that possess long memory and those that do not are an important adjunct to climate modeling. As we briefly review, the LRD idea originated at the same time as H-selfsimilarity, so it is often not realised that a model does not have to be H-self similar to show LRD [e.g. Watkins, GRL Frontiers, 2013]. We have used Markov Chain Monte Carlo algorithms to perform a Bayesian analysis of Auto-Regressive Fractionally-Integrated Moving-Average ARFIMA(p,d,q) processes, which are capable of modeling LRD. Our principal aim is to obtain inference about the long memory parameter, d, with secondary interest in the scale and location parameters. We have developed a reversible-jump method enabling us to integrate over different model forms for the short memory component. We initially assume Gaussianity, and have tested the method on both synthetic and physical time series. Many physical processes, for example the Faraday Antarctic time series, are significantly non-Gaussian. We have therefore extended this work by weakening the Gaussianity assumption, assuming an alpha-stable distribution for the innovations, and performing joint inference on d and alpha. Such a modified FARIMA(p,d,q) process is a flexible, initial model for non-Gaussian processes with long memory. We will present a study of the dependence of the posterior variance of the memory parameter d on the length of the time series considered. This will be compared with equivalent error diagnostics for other measures of d.

  10. Cancer mortality inequalities in urban areas: a Bayesian small area analysis in Spanish cities

    Directory of Open Access Journals (Sweden)

    Martos Carmen M

    2011-01-01

    Full Text Available Abstract Background Intra-urban inequalities in mortality have been infrequently analysed in European contexts. The aim of the present study was to analyse patterns of cancer mortality and their relationship with socioeconomic deprivation in small areas in 11 Spanish cities. Methods It is a cross-sectional ecological design using mortality data (years 1996-2003. Units of analysis were the census tracts. A deprivation index was calculated for each census tract. In order to control the variability in estimating the risk of dying we used Bayesian models. We present the RR of the census tract with the highest deprivation vs. the census tract with the lowest deprivation. Results In the case of men, socioeconomic inequalities are observed in total cancer mortality in all cities, except in Castellon, Cordoba and Vigo, while Barcelona (RR = 1.53 95%CI 1.42-1.67, Madrid (RR = 1.57 95%CI 1.49-1.65 and Seville (RR = 1.53 95%CI 1.36-1.74 present the greatest inequalities. In general Barcelona and Madrid, present inequalities for most types of cancer. Among women for total cancer mortality, inequalities have only been found in Barcelona and Zaragoza. The excess number of cancer deaths due to socioeconomic deprivation was 16,413 for men and 1,142 for women. Conclusion This study has analysed inequalities in cancer mortality in small areas of cities in Spain, not only relating this mortality with socioeconomic deprivation, but also calculating the excess mortality which may be attributed to such deprivation. This knowledge is particularly useful to determine which geographical areas in each city need intersectorial policies in order to promote a healthy environment.

  11. Mapping, Bayesian Geostatistical Analysis and Spatial Prediction of Lymphatic Filariasis Prevalence in Africa

    Science.gov (United States)

    Slater, Hannah; Michael, Edwin

    2013-01-01

    There is increasing interest to control or eradicate the major neglected tropical diseases. Accurate modelling of the geographic distributions of parasitic infections will be crucial to this endeavour. We used 664 community level infection prevalence data collated from the published literature in conjunction with eight environmental variables, altitude and population density, and a multivariate Bayesian generalized linear spatial model that allows explicit accounting for spatial autocorrelation and incorporation of uncertainty in input data and model parameters, to construct the first spatially-explicit map describing LF prevalence distribution in Africa. We also ran the best-fit model against predictions made by the HADCM3 and CCCMA climate models for 2050 to predict the likely distributions of LF under future climate and population changes. We show that LF prevalence is strongly influenced by spatial autocorrelation between locations but is only weakly associated with environmental covariates. Infection prevalence, however, is found to be related to variations in population density. All associations with key environmental/demographic variables appear to be complex and non-linear. LF prevalence is predicted to be highly heterogenous across Africa, with high prevalences (>20%) estimated to occur primarily along coastal West and East Africa, and lowest prevalences predicted for the central part of the continent. Error maps, however, indicate a need for further surveys to overcome problems with data scarcity in the latter and other regions. Analysis of future changes in prevalence indicates that population growth rather than climate change per se will represent the dominant factor in the predicted increase/decrease and spread of LF on the continent. We indicate that these results could play an important role in aiding the development of strategies that are best able to achieve the goals of parasite elimination locally and globally in a manner that may also account

  12. Bayesian approach to the analysis of neutron Brillouin scattering data on liquid metals

    Science.gov (United States)

    De Francesco, A.; Guarini, E.; Bafile, U.; Formisano, F.; Scaccia, L.

    2016-08-01

    When the dynamics of liquids and disordered systems at mesoscopic level is investigated by means of inelastic scattering (e.g., neutron or x ray), spectra are often characterized by a poor definition of the excitation lines and spectroscopic features in general and one important issue is to establish how many of these lines need to be included in the modeling function and to estimate their parameters. Furthermore, when strongly damped excitations are present, commonly used and widespread fitting algorithms are particularly affected by the choice of initial values of the parameters. An inadequate choice may lead to an inefficient exploration of the parameter space, resulting in the algorithm getting stuck in a local minimum. In this paper, we present a Bayesian approach to the analysis of neutron Brillouin scattering data in which the number of excitation lines is treated as unknown and estimated along with the other model parameters. We propose a joint estimation procedure based on a reversible-jump Markov chain Monte Carlo algorithm, which efficiently explores the parameter space, producing a probabilistic measure to quantify the uncertainty on the number of excitation lines as well as reliable parameter estimates. The method proposed could turn out of great importance in extracting physical information from experimental data, especially when the detection of spectral features is complicated not only because of the properties of the sample, but also because of the limited instrumental resolution and count statistics. The approach is tested on generated data set and then applied to real experimental spectra of neutron Brillouin scattering from a liquid metal, previously analyzed in a more traditional way.

  13. New class of hybrid EoS and Bayesian M - R data analysis

    International Nuclear Information System (INIS)

    We explore systematically a new class of two-phase equations of state (EoS) for hybrid stars that is characterized by three main features: (1) stiffening of the nuclear EoS at supersaturation densities due to quark exchange effects (Pauli blocking) between hadrons, modelled by an excluded volume correction; (2) stiffening of the quark matter EoS at high densities due to multiquark interactions; and (3) possibility for a strong first-order phase transition with an early onset and large density jump. The third feature results from a Maxwell construction for the possible transition from the nuclear to a quark matter phase and its properties depend on the two parameters used for (1) and (2), respectively. Varying these two parameters, one obtains a class of hybrid EoS that yields solutions of the Tolman-Oppenheimer-Volkoff (TOV) equations for sequences of hadronic and hybrid stars in the mass-radius diagram which cover the full range of patterns according to the Alford-Han-Prakash classification following which a hybrid star branch can be either absent, connected or disconnected with the hadronic one. The latter case often includes a tiny connected branch. The disconnected hybrid star branch, also called ''third family'', corresponds to high-mass twin stars characterized by the same gravitational mass but different radii. We perform a Bayesian analysis and demonstrate that the observation of such a pair of high-mass twin stars would have a sufficient discriminating power to favor hybrid EoS with a strong first-order phase transition over alternative EoS. (orig.)

  14. New class of hybrid EoS and Bayesian M - R data analysis

    Science.gov (United States)

    Alvarez-Castillo, D.; Ayriyan, A.; Benic, S.; Blaschke, D.; Grigorian, H.; Typel, S.

    2016-03-01

    We explore systematically a new class of two-phase equations of state (EoS) for hybrid stars that is characterized by three main features: 1) stiffening of the nuclear EoS at supersaturation densities due to quark exchange effects (Pauli blocking) between hadrons, modelled by an excluded volume correction; 2) stiffening of the quark matter EoS at high densities due to multiquark interactions; and 3) possibility for a strong first-order phase transition with an early onset and large density jump. The third feature results from a Maxwell construction for the possible transition from the nuclear to a quark matter phase and its properties depend on the two parameters used for 1) and 2), respectively. Varying these two parameters, one obtains a class of hybrid EoS that yields solutions of the Tolman-Oppenheimer-Volkoff (TOV) equations for sequences of hadronic and hybrid stars in the mass-radius diagram which cover the full range of patterns according to the Alford-Han-Prakash classification following which a hybrid star branch can be either absent, connected or disconnected with the hadronic one. The latter case often includes a tiny connected branch. The disconnected hybrid star branch, also called "third family", corresponds to high-mass twin stars characterized by the same gravitational mass but different radii. We perform a Bayesian analysis and demonstrate that the observation of such a pair of high-mass twin stars would have a sufficient discriminating power to favor hybrid EoS with a strong first-order phase transition over alternative EoS.

  15. Sentiment analysis. An example of application and evaluation of RID dictionary and Bayesian classification methods in qualitative data analysis approach

    Directory of Open Access Journals (Sweden)

    Krzysztof Tomanek

    2014-05-01

    Full Text Available The purpose of this article is to present the basic methods for classifying text data. These methods make use of achievements earned in areas such as: natural language processing, the analysis of unstructured data. I introduce and compare two analytical techniques applied to text data. The first analysis makes use of thematic vocabulary tool (sentiment analysis. The second technique uses the idea of Bayesian classification and applies, so-called, naive Bayes algorithm. My comparison goes towards grading the efficiency of use of these two analytical techniques. I emphasize solutions that are to be used to build dictionary accurate for the task of text classification. Then, I compare supervised classification to automated unsupervised analysis’ effectiveness. These results reinforce the conclusion that a dictionary which has received good evaluation as a tool for classification should be subjected to review and modification procedures if is to be applied to new empirical material. Adaptation procedures used for analytical dictionary become, in my proposed approach, the basic step in the methodology of textual data analysis.

  16. Cluster analysis of WIBS single particle bioaerosol data

    Directory of Open Access Journals (Sweden)

    N. H. Robinson

    2012-09-01

    Full Text Available Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial datasets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Waveband Integrated Bioaerosol Sensor (WIBS. The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL before being applied to two separate contemporaneous ambient WIBS datasets recorded in a forest site in Colorado, USA as part of the BEACHON-RoMBAS project. Cluster analysis results between both datasets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity to represent: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long term online PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics is improved.

  17. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  18. Cluster analysis of Southeastern U.S. climate stations

    Science.gov (United States)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  19. Performance Analysis of Enhanced Clustering Algorithm for Gene Expression Data

    CERN Document Server

    Chandrasekhar, T; Elayaraja, E

    2011-01-01

    Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this paper we applied K-Means with Automatic Generations of Merge Factor for ISODATA- AGMFI. Though AGMFI has been applied for clustering of Gene Expression Data, this proposed Enhanced Automatic Generations of Merge Factor for ISODATA- EAGMFI Algorithms overcome the drawbacks of AGMFI in terms of specifying the optimal number of clusters and initialization of good cluster centroids. Experimental results on Gene Expression Data show that the proposed EAGMFI algorithms could identify compact clus...

  20. Variable cluster analysis method for building neural network model

    Institute of Scientific and Technical Information of China (English)

    王海东; 刘元东

    2004-01-01

    To address the problems that input variables should be reduced as much as possible and explain output variables fully in building neural network model of complicated system, a variable selection method based on cluster analysis was investigated. Similarity coefficient which describes the mutual relation of variables was defined. The methods of the highest contribution rate, part replacing whole and variable replacement are put forwarded and deduced by information theory. The software of the neural network based on cluster analysis, which can provide many kinds of methods for defining variable similarity coefficient, clustering system variable and evaluating variable cluster, was developed and applied to build neural network forecast model of cement clinker quality. The results show that all the network scale, training time and prediction accuracy are perfect. The practical application demonstrates that the method of selecting variables for neural network is feasible and effective.

  1. Bayesian analysis of spatial point processes in the neighbourhood of Voronoi networks

    DEFF Research Database (Denmark)

    Skare, Øivind; Møller, Jesper; Vedel Jensen, Eva B.

    A model for an inhomogeneous Poisson process with high intensity near the edges of a Voronoi tessellation in 2D or 3D is proposed. The model is analysed in a Bayesian setting with priors on nuclei of the Voronoi tessellation and other model parameters. An MCMC algorithm is constructed to sample...

  2. Bayesian analysis of spatial point processes in the neighbourhood of Voronoi networks

    DEFF Research Database (Denmark)

    Skare, Øivind; Møller, Jesper; Jensen, Eva B. Vedel

    2007-01-01

    A model for an inhomogeneous Poisson process with high intensity near the edges of a Voronoi tessellation in 2D or 3D is proposed. The model is analysed in a Bayesian setting with priors on nuclei of the Voronoi tessellation and other model parameters. An MCMC algorithm is constructed to sample...

  3. Another look at Bayesian analysis of AMMI models for genotype-environment data

    NARCIS (Netherlands)

    Josse, J.; Eeuwijk, van F.A.; Piepho, H.P.; Denis, J.B.

    2014-01-01

    Linear–bilinear models are frequently used to analyze two-way data such as genotype-by-environment data. A well-known example of this class of models is the additive main effects and multiplicative interaction effects model (AMMI). We propose a new Bayesian treatment of such models offering a proper

  4. Bayesian analysis of censored response data in family-based genetic association studies.

    Science.gov (United States)

    Del Greco M, Fabiola; Pattaro, Cristian; Minelli, Cosetta; Thompson, John R

    2016-09-01

    Biomarkers are subject to censoring whenever some measurements are not quantifiable given a laboratory detection limit. Methods for handling censoring have received less attention in genetic epidemiology, and censored data are still often replaced with a fixed value. We compared different strategies for handling a left-censored continuous biomarker in a family-based study, where the biomarker is tested for association with a genetic variant, S, adjusting for a covariate, X. Allowing different correlations between X and S, we compared simple substitution of censored observations with the detection limit followed by a linear mixed effect model (LMM), Bayesian model with noninformative priors, Tobit model with robust standard errors, the multiple imputation (MI) with and without S in the imputation followed by a LMM. Our comparison was based on real and simulated data in which 20% and 40% censoring were artificially induced. The complete data were also analyzed with a LMM. In the MICROS study, the Bayesian model gave results closer to those obtained with the complete data. In the simulations, simple substitution was always the most biased method, the Tobit approach gave the least biased estimates at all censoring levels and correlation values, the Bayesian model and both MI approaches gave slightly biased estimates but smaller root mean square errors. On the basis of these results the Bayesian approach is highly recommended for candidate gene studies; however, the computationally simpler Tobit and the MI without S are both good options for genome-wide studies.

  5. How few countries will do? Comparative survey analysis from a Bayesian perspective

    NARCIS (Netherlands)

    Hox, Joop; van de Schoot, Rens; Matthijsse, Suzette

    2012-01-01

    Meuleman and Billiet (2009) have carried out a simulation study aimed at the question how many countries are needed for accurate multilevel SEM estimation in comparative studies. The authors concluded that a sample of 50 to 100 countries is needed for accurate estimation. Recently, Bayesian estimati

  6. Hierarchical Bayesian Analysis of Biased Beliefs and Distributional Other-Regarding Preferences

    Directory of Open Access Journals (Sweden)

    Jeroen Weesie

    2013-02-01

    Full Text Available This study investigates the relationship between an actor’s beliefs about others’ other-regarding (social preferences and her own other-regarding preferences, using an “avant-garde” hierarchical Bayesian method. We estimate two distributional other-regarding preference parameters, α and β, of actors using incentivized choice data in binary Dictator Games. Simultaneously, we estimate the distribution of actors’ beliefs about others α and β, conditional on actors’ own α and β, with incentivized belief elicitation. We demonstrate the benefits of the Bayesian method compared to it’s hierarchical frequentist counterparts. Results show a positive association between an actor’s own (α; β and her beliefs about average(α; β in the population. The association between own preferences and the variance in beliefs about others’ preferences in the population, however, is curvilinear for α and insignificant for β. These results are partially consistent with the cone effect [1,2] which is described in detail below. Because in the Bayesian-Nash equilibrium concept, beliefs and own preferences are assumed to be independent, these results cast doubt on the application of the Bayesian-Nash equilibrium concept to experimental data.

  7. Spatial Data Mining using Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ch.N.Santhosh Kumar

    2012-09-01

    Full Text Available Data mining, which is refers to as Knowledge Discovery in Databases(KDD, means a process of nontrivialexaction of implicit, previously useful and unknown information such as knowledge rules, descriptions,regularities, and major trends from large databases. Data mining is evolved in a multidisciplinary field ,including database technology, machine learning, artificial intelligence, neural network, informationretrieval, and so on. In principle data mining should be applicable to the different kind of data and databasesused in many different applications, including relational databases, transactional databases, datawarehouses, object- oriented databases, and special application- oriented databases such as spatialdatabases, temporal databases, multimedia databases, and time- series databases. Spatial data mining, alsocalled spatial mining, is data mining as applied to the spatial data or spatial databases. Spatial data are thedata that have spatial or location component, and they show the information, which is more complex thanclassical data. A spatial database stores spatial data represents by spatial data types and spatialrelationships and among data. Spatial data mining encompasses various tasks. These include spatialclassification, spatial association rule mining, spatial clustering, characteristic rules, discriminant rules,trend detection. This paper presents how spatial data mining is achieved using clustering.

  8. Advanced Heat Map and Clustering Analysis Using Heatmap3

    OpenAIRE

    Shilin Zhao; Yan Guo; Quanhu Sheng; Yu Shyr

    2014-01-01

    Heat maps and clustering are used frequently in expression analysis studies for data visualization and quality control. Simple clustering and heat maps can be produced from the “heatmap” function in R. However, the “heatmap” function lacks certain functionalities and customizability, preventing it from generating advanced heat maps and dendrograms. To tackle the limitations of the “heatmap” function, we have developed an R package “heatmap3” which significantly improves the original “heatmap”...

  9. Fuzzy clustering analysis to study geomagnetic coastal effects

    Directory of Open Access Journals (Sweden)

    M. Sridharan

    2005-06-01

    Full Text Available The utility of fuzzy set theory in cluster analysis and pattern recognition has been evolving since the mid 1960s, in conjunction with the emergence and evolution of computer technology. The classification of objects into categories is the subject of cluster analysis. The aim of this paper is to employ Fuzzy-clustering technique to examine the interrelationship of geomagnetic coastal and other effects at Indian observatories. Data from the observatories used for the present studies are from Alibag on the West Coast, Visakhapatnam and Pondicherry on the East Coast, Hyderabad and Nagpur as central inland stations which are located far from either of the coasts; all the above stations are free from the influence of the daytime equatorial electrojet. It has been found that Alibag and Pondicherry Observatories form a separate cluster showing anomalous variations in the vertical (Z-component. H- and D-components form different clusters. The results are compared with the graphical method. Analytical technique and the results of Fuzzy-clustering analysis are discussed here.

  10. Application of Subspace Clustering in DNA Sequence Analysis.

    Science.gov (United States)

    Wallace, Tim; Sekmen, Ali; Wang, Xiaofei

    2015-10-01

    Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis. PMID:26162018

  11. Cluster analysis of radionuclide concentrations in beach sand

    NARCIS (Netherlands)

    de Meijer, R.J.; James, I.; Jennings, P.J.; Keoyers, J.E.

    2001-01-01

    This paper presents a method in which natural radionuclide concentrations of beach sand minerals are traced along a stretch of coast by cluster analysis. This analysis yields two groups of mineral deposit with different origins. The method deviates from standard methods of following dispersal of rad

  12. Bayesian Inference on Gravitational Waves

    Directory of Open Access Journals (Sweden)

    Asad Ali

    2015-12-01

    Full Text Available The Bayesian approach is increasingly becoming popular among the astrophysics data analysis communities. However, the Pakistan statistics communities are unaware of this fertile interaction between the two disciplines. Bayesian methods have been in use to address astronomical problems since the very birth of the Bayes probability in eighteenth century. Today the Bayesian methods for the detection and parameter estimation of gravitational waves have solid theoretical grounds with a strong promise for the realistic applications. This article aims to introduce the Pakistan statistics communities to the applications of Bayesian Monte Carlo methods in the analysis of gravitational wave data with an  overview of the Bayesian signal detection and estimation methods and demonstration by a couple of simplified examples.

  13. Bayesian community detection

    DEFF Research Database (Denmark)

    Mørup, Morten; Schmidt, Mikkel N

    2012-01-01

    Many networks of scientific interest naturally decompose into clusters or communities with comparatively fewer external than internal links; however, current Bayesian models of network communities do not exert this intuitive notion of communities. We formulate a nonparametric Bayesian model...... for community detection consistent with an intuitive definition of communities and present a Markov chain Monte Carlo procedure for inferring the community structure. A Matlab toolbox with the proposed inference procedure is available for download. On synthetic and real networks, our model detects communities...... consistent with ground truth, and on real networks, it outperforms existing approaches in predicting missing links. This suggests that community structure is an important structural property of networks that should be explicitly modeled....

  14. Performance Analysis of Enhanced Clustering Algorithm for Gene Expression Data

    Directory of Open Access Journals (Sweden)

    T. Chandrasekhar

    2011-11-01

    Full Text Available Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this paper we applied K-Means with Automatic Generations of Merge Factor for ISODATA- AGMFI. Though AGMFI has been applied for clustering of Gene Expression Data, this proposed Enhanced Automatic Generations of Merge Factor for ISODATA- EAGMFI Algorithms overcome the drawbacks of AGMFI in terms of specifying the optimal number of clusters and initialization of good cluster centroids. Experimental results on Gene Expression Data show that the proposed EAGMFI algorithms could identify compact clusters with perform well in terms of the Silhouette Coefficients cluster measure.

  15. Clustering and Feature Selection using Sparse Principal Component Analysis

    CERN Document Server

    Luss, Ronny

    2007-01-01

    In this paper, we use sparse principal component analysis (PCA) to solve clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. PCA is often used as a simple clustering technique and sparse factors allow us here to interpret the clusters in terms of a reduced set of variables. We begin with a brief introduction and motivation on sparse PCA and detail our implementation of the algorithm in d'Aspremont et al. (2005). We finish by describing the application of sparse PCA to clustering and by a brief description of DSPCA, the numerical package used in these experiments.

  16. Cognitive analysis of multiple sclerosis utilizing fuzzy cluster means

    Directory of Open Access Journals (Sweden)

    Imianvan Anthony Agboizebeta

    2012-01-01

    Full Text Available Multiple sclerosis, often called MS, is a disease that affects the central nervous system (the brain and spinal cord. Myelin provides insulation for nerve cells improves the conduction of impulses along the nerves and is important for maintaining the health of the nerves. In multiple sclerosis, inflammation causes the myelin to disappear. Genetic factors, environmental issues and viral infection may also play a role in developing the disease. Ms is characterized by life threatening symptoms such as; loss of balance, hearing problem and depression. The application of Fuzzy Cluster Means (FCM or Fuzzy CMean analysis to the diagnosis of different forms of multiple sclerosis is the focal point of this paper. Application of cluster analysis involves a sequence of methodological and analytical decision steps that enhances the quality and meaning of the clusters produced. Uncertainties associated with analysis of multiple sclerosis test data are eliminated by the system

  17. A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.

    Science.gov (United States)

    Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung

    2016-03-01

    With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the

  18. A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.

    Science.gov (United States)

    Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung

    2016-03-01

    With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the

  19. Apples and oranges: avoiding different priors in Bayesian DNA sequence analysis

    Directory of Open Access Journals (Sweden)

    Posch Stefan

    2010-03-01

    Full Text Available Abstract Background One of the challenges of bioinformatics remains the recognition of short signal sequences in genomic DNA such as donor or acceptor splice sites, splicing enhancers or silencers, translation initiation sites, transcription start sites, transcription factor binding sites, nucleosome binding sites, miRNA binding sites, or insulator binding sites. During the last decade, a wealth of algorithms for the recognition of such DNA sequences has been developed and compared with the goal of improving their performance and to deepen our understanding of the underlying cellular processes. Most of these algorithms are based on statistical models belonging to the family of Markov random fields such as position weight matrix models, weight array matrix models, Markov models of higher order, or moral Bayesian networks. While in many comparative studies different learning principles or different statistical models have been compared, the influence of choosing different prior distributions for the model parameters when using different learning principles has been overlooked, and possibly lead to questionable conclusions. Results With the goal of allowing direct comparisons of different learning principles for models from the family of Markov random fields based on the same a-priori information, we derive a generalization of the commonly-used product-Dirichlet prior. We find that the derived prior behaves like a Gaussian prior close to the maximum and like a Laplace prior in the far tails. In two case studies, we illustrate the utility of the derived prior for a direct comparison of different learning principles with different models for the recognition of binding sites of the transcription factor Sp1 and human donor splice sites. Conclusions We find that comparisons of different learning principles using the same a-priori information can lead to conclusions different from those of previous studies in which the effect resulting from different

  20. Bayesian Analysis of Hmi Images and Comparison to Tsi Variations and MWO Image Observables

    Science.gov (United States)

    Parker, D. G.; Ulrich, R. K.; Beck, J.; Tran, T. V.

    2015-12-01

    We have previously applied the Bayesian automatic classification system AutoClass to solar magnetogram and intensity images from the 150 Foot Solar Tower at Mount Wilson to identify classes of solar surface features associated with variations in total solar irradiance (TSI) and, using those identifications, modeled TSI time series with improved accuracy (r > 0.96). (Ulrich, et al, 2010) AutoClass identifies classes by a two-step process in which it: (1) finds, without human supervision, a set of class definitions based on specified attributes of a sample of the image data pixels, such as magnetic field and intensity in the case of MWO images, and (2) applies the class definitions thus found to new data sets to identify automatically in them the classes found in the sample set. HMI high resolution images capture four observables-magnetic field, continuum intensity, line depth and line width-in contrast to MWO's two observables-magnetic field and intensity. In this study, we apply AutoClass to the HMI observables for images from June, 2010 to December, 2014 to identify solar surface feature classes. We use contemporaneous TSI measurements to determine whether and how variations in the HMI classes are related to TSI variations and compare the characteristic statistics of the HMI classes to those found from MWO images. We also attempt to derive scale factors between the HMI and MWO magnetic and intensity observables.The ability to categorize automatically surface features in the HMI images holds out the promise of consistent, relatively quick and manageable analysis of the large quantity of data available in these images. Given that the classes found in MWO images using AutoClass have been found to improve modeling of TSI, application of AutoClass to the more complex HMI images should enhance understanding of the physical processes at work in solar surface features and their implications for the solar-terrestrial environment.Ulrich, R.K., Parker, D, Bertello, L. and

  1. Bayesian simultaneous equation models for the analysis of energy intake and partitioning in growing pigs

    DEFF Research Database (Denmark)

    Strathe, Anders Bjerring; Jørgensen, Henry; Kebreab, E;

    2012-01-01

    ABSTRACT SUMMARY The objective of the current study was to develop Bayesian simultaneous equation models for modelling energy intake and partitioning in growing pigs. A key feature of the Bayesian approach is that parameters are assigned prior distributions, which may reflect the current state...... of nature. In the models, rates of metabolizable energy (ME) intake, protein deposition (PD) and lipid deposition (LD) were treated as dependent variables accounting for residuals being correlated. Two complementary equation systems were used to model ME intake (MEI), PD and LD. Informative priors were...... genders (barrows, boars and gilts) selected on the basis of similar birth weight. The pigs were fed four diets based on barley, wheat and soybean meal supplemented with crystalline amino acids to meet or exceed Danish nutrient requirement standards. Nutrient balances and gas exchanges were measured at c...

  2. Stellar magnetic field parameters from a Bayesian analysis of high-resolution spectropolarimetric observations

    CERN Document Server

    Petit, V

    2011-01-01

    In this paper we describe a Bayesian statistical method designed to infer the magnetic properties of stars observed using high-resolution circular spectropolarimetry in the context of large surveys. This approach is well suited for analysing stars for which the stellar rotation period is not known, and therefore the rotational phases of the observations are ambiguous. The model assumes that the magnetic observations correspond to a dipole oblique rotator, a situation commonly encountered in intermediate and high-mass stars. Using reasonable assumptions regarding the model parameter prior probability density distributions, the Bayesian algorithm determines the posterior probability densities corresponding to the surface magnetic field geometry and strength by performing a comparison between the observed and computed Stokes V profiles. Based on the results of numerical simulations, we conclude that this method yields a useful estimate of the surface dipole field strength based on a small number (i.e. 1 or 2) of...

  3. A Bayesian stochastic frontier analysis of Chinese fossil-fuel electricity generation companies

    International Nuclear Information System (INIS)

    This paper analyses the technical efficiency of Chinese fossil-fuel electricity generation companies from 1999 to 2011, using a Bayesian stochastic frontier model. The results reveal that efficiency varies among the fossil-fuel electricity generation companies that were analysed. We also focus on the factors of size, location, government ownership and mixed sources of electricity generation for the fossil-fuel electricity generation companies, and also examine their effects on the efficiency of these companies. Policy implications are derived. - Highlights: • We analyze the efficiency of 27 quoted Chinese fossil-fuel electricity generation companies during 1999–2011. • We adopt a Bayesian stochastic frontier model taking into consideration the identified heterogeneity. • With reform background in Chinese energy industry, we propose four hypotheses and check their influence on efficiency. • Big size, coastal location, government control and hydro energy sources all have increased costs

  4. PARALLEL ADAPTIVE MULTILEVEL SAMPLING ALGORITHMS FOR THE BAYESIAN ANALYSIS OF MATHEMATICAL MODELS

    KAUST Repository

    Prudencio, Ernesto

    2012-01-01

    In recent years, Bayesian model updating techniques based on measured data have been applied to many engineering and applied science problems. At the same time, parallel computational platforms are becoming increasingly more powerful and are being used more frequently by the engineering and scientific communities. Bayesian techniques usually require the evaluation of multi-dimensional integrals related to the posterior probability density function (PDF) of uncertain model parameters. The fact that such integrals cannot be computed analytically motivates the research of stochastic simulation methods for sampling posterior PDFs. One such algorithm is the adaptive multilevel stochastic simulation algorithm (AMSSA). In this paper we discuss the parallelization of AMSSA, formulating the necessary load balancing step as a binary integer programming problem. We present a variety of results showing the effectiveness of load balancing on the overall performance of AMSSA in a parallel computational environment.

  5. Traffic Accident, System Model and Cluster Analysis in GIS

    Directory of Open Access Journals (Sweden)

    Veronika Vlčková

    2015-07-01

    Full Text Available One of the many often frequented topics as normal journalism, so the professional public, is the problem of traffic accidents. This article illustrates the orientation of considerations to a less known context of accidents, with the help of constructive systems theory and its methods, cluster analysis and geoinformation engineering. Traffic accident is reframing the space-time, and therefore it can be to study with tools of technology of geographic information systems. The application of system approach enabling the formulation of the system model, grabbed by tools of geoinformation engineering and multicriterial and cluster analysis.

  6. A HYBRID APPROACH FOR RELIABILITY ANALYSIS BASED ON ANALYTIC HIERARCHY PROCESS (AHP) AND BAYESIAN NETWORK (BN)

    OpenAIRE

    Muhammad eZubair

    2014-01-01

    The investigation of the nuclear accidents reveals that the accumulation of various technical and nontechnical lapses compounded the nuclear disaster. By using Analytic Hierarchy Process (AHP) and Bayesian Network (BN) the present research signifies the technical and nontechnical issues of nuclear accidents. The study exposed that besides technical fixes such as enhanced engineering safety features and better siting choices, the critical ingredient for safe operation of nuclear reactors lie i...

  7. Stellar magnetic field parameters from a Bayesian analysis of high-resolution spectropolarimetric observations

    OpenAIRE

    Petit, V.; Wade, G. A.

    2011-01-01

    In this paper we describe a Bayesian statistical method designed to infer the magnetic properties of stars observed using high-resolution circular spectropolarimetry in the context of large surveys. This approach is well suited for analysing stars for which the stellar rotation period is not known, and therefore the rotational phases of the observations are ambiguous. The model assumes that the magnetic observations correspond to a dipole oblique rotator, a situation commonly encountered in i...

  8. A Bayesian Analysis of GPS Guidance System in Precision Agriculture: The Role of Expectations

    OpenAIRE

    Khanal, Aditya R; Mishra, Ashok K.; Lambert, Dayton M.; Paudel, Krishna P.

    2013-01-01

    Farmer’s post adoption responses about technology are important in continuation and diffusion of a technology in precision agriculture. We studied farmer’s frequency of application decisions of GPS guidance system, after adoption. Using a Cotton grower’s precision farming survey in the U.S. and Bayesian approaches, our study suggests that ‘meeting expectation’ plays an important positive role. Farmer’s income level, farm size, and farming occupation are other important factors in modeling GPS...

  9. Uncertainty Reduction using Bayesian Inference and Sensitivity Analysis: A Sequential Approach to the NASA Langley Uncertainty Quantification Challenge

    Science.gov (United States)

    Sankararaman, Shankar

    2016-01-01

    This paper presents a computational framework for uncertainty characterization and propagation, and sensitivity analysis under the presence of aleatory and epistemic un- certainty, and develops a rigorous methodology for efficient refinement of epistemic un- certainty by identifying important epistemic variables that significantly affect the overall performance of an engineering system. The proposed methodology is illustrated using the NASA Langley Uncertainty Quantification Challenge (NASA-LUQC) problem that deals with uncertainty analysis of a generic transport model (GTM). First, Bayesian inference is used to infer subsystem-level epistemic quantities using the subsystem-level model and corresponding data. Second, tools of variance-based global sensitivity analysis are used to identify four important epistemic variables (this limitation specified in the NASA-LUQC is reflective of practical engineering situations where not all epistemic variables can be refined due to time/budget constraints) that significantly affect system-level performance. The most significant contribution of this paper is the development of the sequential refine- ment methodology, where epistemic variables for refinement are not identified all-at-once. Instead, only one variable is first identified, and then, Bayesian inference and global sensi- tivity calculations are repeated to identify the next important variable. This procedure is continued until all 4 variables are identified and the refinement in the system-level perfor- mance is computed. The advantages of the proposed sequential refinement methodology over the all-at-once uncertainty refinement approach are explained, and then applied to the NASA Langley Uncertainty Quantification Challenge problem.

  10. Examination of European Union economic cohesion: A cluster analysis approach

    Directory of Open Access Journals (Sweden)

    Jiri Mazurek

    2014-01-01

    Full Text Available In the past years majority of EU members experienced the highest economic decline in their modern history, but impacts of the global financial crisis were not distributed homogeneously across the continent. The aim of the paper is to examine a cohesion of European Union (plus Norway and Iceland in terms of an economic development of its members from the 1st of January 2008 to the 31st of December 2012. For the study five economic indicators were selected: GDP growth, unemployment, inflation, labour productivity and government debt. Annual data from Eurostat databases were averaged over the whole period and then used as an input for a cluster analysis. It was found that EU countries were divided into six different clusters. The most populated cluster with 14 countries covered Central and West Europe and reflected relative homogeneity of this part of Europe. Countries of Southern Europe (Greece, Portugal and Spain shared their own cluster of the most affected countries by the recent crisis as well as the Baltics and the Balkans states in another cluster. On the other hand Slovakia and Poland, only two countries that escaped a recession, were classified in their own cluster of the most successful countries

  11. A Geometric Analysis of Subspace Clustering with Outliers

    CERN Document Server

    Soltanolkotabi, Mahdi

    2011-01-01

    This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named {\\em sparse subspace clustering} (SSC) \\cite{Elhamifar09}, which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretica...

  12. Bayesian analysis of congruence of core genes in Prochlorococcus and Synechococcus and implications on horizontal gene transfer.

    Directory of Open Access Journals (Sweden)

    Nicholas J Matzke

    Full Text Available It is often suggested that horizontal gene transfer is so ubiquitous in microbes that the concept of a phylogenetic tree representing the pattern of vertical inheritance is oversimplified or even positively misleading. "Universal proteins" have been used to infer the organismal phylogeny, but have been criticized as being only the "tree of one percent." Currently, few options exist for those wishing to rigorously assess how well a universal protein phylogeny, based on a relative handful of well-conserved genes, represents the phylogenetic histories of hundreds of genes. Here, we address this problem by proposing a visualization method and a statistical test within a Bayesian framework. We use the genomes of marine cyanobacteria, a group thought to exhibit substantial amounts of HGT, as a test case. We take 379 orthologous gene families from 28 cyanobacteria genomes and estimate the Bayesian posterior distributions of trees - a "treecloud" - for each, as well as for a concatenated dataset based on putative "universal proteins." We then calculate the average distance between trees within and between all treeclouds on various metrics and visualize this high-dimensional space with non-metric multidimensional scaling (NMMDS. We show that the tree space is strongly clustered and that the universal protein treecloud is statistically significantly closer to the center of this tree space than any individual gene treecloud. We apply several commonly-used tests for incongruence/HGT and show that they agree HGT is rare in this dataset, but make different choices about which genes were subject to HGT. Our results show that the question of the representativeness of the "tree of one percent" is a quantitative empirical question, and that the phylogenetic central tendency is a meaningful observation even if many individual genes disagree due to the various sources of incongruence.

  13. Characterization of Ground Displacement Sources from Variational Bayesian Independent Component Analysis of Space Geodetic Time Series

    Science.gov (United States)

    Gualandi, Adriano; Serpelloni, Enrico; Elina Belardinelli, Maria; Bonafede, Maurizio; Pezzo, Giuseppe; Tolomei, Cristiano

    2015-04-01

    A critical point in the analysis of ground displacement time series, as those measured by modern space geodetic techniques (primarly continuous GPS/GNSS and InSAR) is the development of data driven methods that allow to discern and characterize the different sources that generate the observed displacements. A widely used multivariate statistical technique is the Principal Component Analysis (PCA), which allows to reduce the dimensionality of the data space maintaining most of the variance of the dataset explained. It reproduces the original data using a limited number of Principal Components, but it also shows some deficiencies, since PCA does not perform well in finding the solution to the so-called Blind Source Separation (BSS) problem. The recovering and separation of the different sources that generate the observed ground deformation is a fundamental task in order to provide a physical meaning to the possible different sources. PCA fails in the BSS problem since it looks for a new Euclidean space where the projected data are uncorrelated. Usually, the uncorrelation condition is not strong enough and it has been proven that the BSS problem can be tackled imposing on the components to be independent. The Independent Component Analysis (ICA) is, in fact, another popular technique adopted to approach this problem, and it can be used in all those fields where PCA is also applied. An ICA approach enables us to explain the displacement time series imposing a fewer number of constraints on the model, and to reveal anomalies in the data such as transient deformation signals. However, the independence condition is not easy to impose, and it is often necessary to introduce some approximations. To work around this problem, we use a variational bayesian ICA (vbICA) method, which models the probability density function (pdf) of each source signal using a mix of Gaussian distributions. This technique allows for more flexibility in the description of the pdf of the sources

  14. Triaxial strong-lensing analysis of the z > 0.5 MACS clusters: the mass-concentration relation

    CERN Document Server

    Sereno, M

    2011-01-01

    The high concentrations derived for several strong-lensing clusters present a major inconsistency between theoretical LambdaCDM expectations and measurements. Triaxiality and orientation biases might be at the origin of this disagreement, as clusters elongated along the line-of-sight would have a relatively higher projected mass density, boosting the resulting lensing properties. Analyses of statistical samples can probe further these effects and crucially reduce biases. In this work we perform a fully triaxial strong-lensing analysis of the 12 MACS clusters at z > 0.5, a complete X-ray selected sample, and fully account for the impact of the intrinsic 3D shapes on their strong lensing properties. We first construct strong-lensing mass models for each cluster based on multiple-images, and fit projected ellipsoidal Navarro-Frenk-White halos with arbitrary orientations to each mass distribution. We then invert the measured surface mass densities using Bayesian statistics. Although the Einstein radii of this sam...

  15. CLASH: Joint Analysis of Strong-Lensing, Weak-Lensing Shear and Magnification Data for 20 Galaxy Clusters

    CERN Document Server

    Umetsu, Keiichi; Gruen, Daniel; Merten, Julian; Donahue, Megan; Postman, Marc

    2015-01-01

    We present a comprehensive analysis of strong-lensing, weak-lensing shear and magnification data for a sample of 16 X-ray-regular and 4 high-magnification galaxy clusters selected from the CLASH survey. Our analysis combines constraints from 16-band HST observations and wide-field multi-color imaging taken primarily with Subaru/Suprime-Cam. We reconstruct surface mass density profiles of individual clusters from a joint analysis of the full lensing constraints, and determine masses and concentrations for all clusters. We find internal consistency of the ensemble mass calibration to be $\\le 5\\% \\pm 6\\%$ by comparison with the CLASH weak-lensing-only measurements of Umetsu et al. For the X-ray-selected subsample, we examine the concentration-mass relation and its intrinsic scatter using a Bayesian regression approach. Our model yields a mean concentration of $c|_{z=0.34} = 3.95 \\pm 0.35$ at $M_{200c} \\simeq 14\\times 10^{14}M_\\odot$ and an intrinsic scatter of $\\sigma(\\ln c_{200c}) = 0.13 \\pm 0.06$, in excellent...

  16. Order-Constrained Reference Priors with Implications for Bayesian Isotonic Regression, Analysis of Covariance and Spatial Models

    Science.gov (United States)

    Gong, Maozhen

    Selecting an appropriate prior distribution is a fundamental issue in Bayesian Statistics. In this dissertation, under the framework provided by Berger and Bernardo, I derive the reference priors for several models which include: Analysis of Variance (ANOVA)/Analysis of Covariance (ANCOVA) models with a categorical variable under common ordering constraints, the conditionally autoregressive (CAR) models and the simultaneous autoregressive (SAR) models with a spatial autoregression parameter rho considered. The performances of reference priors for ANOVA/ANCOVA models are evaluated by simulation studies with comparisons to Jeffreys' prior and Least Squares Estimation (LSE). The priors are then illustrated in a Bayesian model of the "Risk of Type 2 Diabetes in New Mexico" data, where the relationship between the type 2 diabetes risk (through Hemoglobin A1c) and different smoking levels is investigated. In both simulation studies and real data set modeling, the reference priors that incorporate internal order information show good performances and can be used as default priors. The reference priors for the CAR and SAR models are also illustrated in the "1999 SAT State Average Verbal Scores" data with a comparison to a Uniform prior distribution. Due to the complexity of the reference priors for both CAR and SAR models, only a portion (12 states in the Midwest) of the original data set is considered. The reference priors can give a different marginal posterior distribution compared to a Uniform prior, which provides an alternative for prior specifications for areal data in Spatial statistics.

  17. Joint high-dimensional Bayesian variable and covariance selection with an application to eQTL analysis.

    Science.gov (United States)

    Bhadra, Anindya; Mallick, Bani K

    2013-06-01

    We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. PMID:23607608

  18. Joint High-Dimensional Bayesian Variable and Covariance Selection with an Application to eQTL Analysis

    KAUST Repository

    Bhadra, Anindya

    2013-04-22

    We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. © 2013, The International Biometric Society.

  19. A Cluster Analysis of Personality Style in Adults with ADHD

    Science.gov (United States)

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  20. Frailty phenotypes in the elderly based on cluster analysis

    DEFF Research Database (Denmark)

    Dato, Serena; Montesanto, Alberto; Lagani, Vincenzo;

    2012-01-01

    genetic background on the frailty status is still questioned. We investigated the applicability of a cluster analysis approach based on specific geriatric parameters, previously set up and validated in a southern Italian population, to two large longitudinal Danish samples. In both cohorts, we identified...

  1. Topics in Bayesian statistics and maximum entropy

    International Nuclear Information System (INIS)

    Notions of Bayesian decision theory and maximum entropy methods are reviewed with particular emphasis on probabilistic inference and Bayesian modeling. The axiomatic approach is considered as the best justification of Bayesian analysis and maximum entropy principle applied in natural sciences. Particular emphasis is put on solving the inverse problem in digital image restoration and Bayesian modeling of neural networks. Further topics addressed briefly include language modeling, neutron scattering, multiuser detection and channel equalization in digital communications, genetic information, and Bayesian court decision-making. (author)

  2. K-means cluster analysis and seismicity partitioning for Pakistan

    Science.gov (United States)

    Rehman, Khaista; Burton, Paul W.; Weatherill, Graeme A.

    2014-07-01

    Pakistan and the western Himalaya is a region of high seismic activity located at the triple junction between the Arabian, Eurasian and Indian plates. Four devastating earthquakes have resulted in significant numbers of fatalities in Pakistan and the surrounding region in the past century (Quetta, 1935; Makran, 1945; Pattan, 1974 and the recent 2005 Kashmir earthquake). It is therefore necessary to develop an understanding of the spatial distribution of seismicity and the potential seismogenic sources across the region. This forms an important basis for the calculation of seismic hazard; a crucial input in seismic design codes needed to begin to effectively mitigate the high earthquake risk in Pakistan. The development of seismogenic source zones for seismic hazard analysis is driven by both geological and seismotectonic inputs. Despite the many developments in seismic hazard in recent decades, the manner in which seismotectonic information feeds the definition of the seismic source can, in many parts of the world including Pakistan and the surrounding regions, remain a subjective process driven primarily by expert judgment. Whilst much research is ongoing to map and characterise active faults in Pakistan, knowledge of the seismogenic properties of the active faults is still incomplete in much of the region. Consequently, seismicity, both historical and instrumental, remains a primary guide to the seismogenic sources of Pakistan. This study utilises a cluster analysis approach for the purposes of identifying spatial differences in seismicity, which can be utilised to form a basis for delineating seismogenic source regions. An effort is made to examine seismicity partitioning for Pakistan with respect to earthquake database, seismic cluster analysis and seismic partitions in a seismic hazard context. A magnitude homogenous earthquake catalogue has been compiled using various available earthquake data. The earthquake catalogue covers a time span from 1930 to 2007 and

  3. Cluster analysis of movement patterns in multiarticular actions: a tutorial.

    Science.gov (United States)

    Rein, Robert; Button, Chris; Davids, Keith; Summers, Jeffery

    2010-04-01

    The present paper proposes a technical analysis method for extracting information about movement patterning in studies of motor control, based on a cluster analysis of movement kinematics. In a tutorial fashion, data from three different experiments are presented to exemplify and validate the technical method. When applied to three different basketball-shooting techniques, the method clearly distinguished between the different patterns. When applied to a cyclical wrist supination-pronation task, the cluster analysis provided the same results as an analysis using the conventional discrete relative phase measure. Finally, when analyzing throwing performance constrained by distance to target, the method grouped movement patterns together according to throwing distance. In conclusion, the proposed technical method provides a valuable tool to improve understanding of coordination and control in different movement models, including multiarticular actions. PMID:20484771

  4. Bayesian survival analysis modeling applied to sensory shelf life of foods

    OpenAIRE

    Calle, M. Luz; Hough, Guillermo; Curia, Ana; Gómez, Guadalupe

    2006-01-01

    Data from sensory shelf-life studies can be analyzed using survival statistical methods. The objective of this research was to introduce Bayesian methodology to sensory shelf-life studies and discuss its advantages in relation to classical (frequentist) methods. A specific algorithm which incorporates the interval censored data from shelf-life studies is presented. Calculations were applied to whole-fat and fat-free yogurt, each tasted by 80 consumers who answered ‘‘yes’’ or ‘‘no’’ t...

  5. Cognitive analysis of multiple sclerosis utilizing fuzzy cluster means

    Directory of Open Access Journals (Sweden)

    Imianvan Anthony Agboizebeta

    2012-02-01

    Full Text Available Multiple sclerosis, often called MS, is a disease that affects the central nervous system (the brain andspinal cord. Myelin provides insulation for nerve cells improves the conduction of impulses along thenerves and is important for maintaining the health of the nerves. In multiple sclerosis, inflammationcauses the myelin to disappear. Genetic factors, environmental issues and viral infection may alsoplay a role in developing the disease. Ms is characterized by life threatening symptoms such as; loss ofbalance, hearing problem and depression. The application of Fuzzy Cluster Means (FCM or Fuzzy CMeananalysis to the diagnosis of different forms of multiple sclerosis is the focal point of this paper.Application of cluster analysis involves a sequence of methodological and analytical decision stepsthat enhances the quality and meaning of the clusters produced. Uncertainties associated withanalysis of multiple sclerosis test data are eliminated by the system

  6. Bayesian Multi-Trait Analysis Reveals a Useful Tool to Increase Oil Concentration and to Decrease Toxicity in Jatropha curcas L.

    Science.gov (United States)

    Silva Junqueira, Vinícius; de Azevedo Peixoto, Leonardo; Galvêas Laviola, Bruno; Lopes Bhering, Leonardo; Mendonça, Simone; Agostini Costa, Tania da Silveira; Antoniassi, Rosemar

    2016-01-01

    The biggest challenge for jatropha breeding is to identify superior genotypes that present high seed yield and seed oil content with reduced toxicity levels. Therefore, the objective of this study was to estimate genetic parameters for three important traits (weight of 100 seed, oil seed content, and phorbol ester concentration), and to select superior genotypes to be used as progenitors in jatropha breeding. Additionally, the genotypic values and the genetic parameters estimated under the Bayesian multi-trait approach were used to evaluate different selection indices scenarios of 179 half-sib families. Three different scenarios and economic weights were considered. It was possible to simultaneously reduce toxicity and increase seed oil content and weight of 100 seed by using index selection based on genotypic value estimated by the Bayesian multi-trait approach. Indeed, we identified two families that present these characteristics by evaluating genetic diversity using the Ward clustering method, which suggested nine homogenous clusters. Future researches must integrate the Bayesian multi-trait methods with realized relationship matrix, aiming to build accurate selection indices models. PMID:27281340

  7. Bayesian Approaches to Gamma Ray Bursts Data Analysis%伽玛暴数据处理中的贝叶斯方法

    Institute of Scientific and Technical Information of China (English)

    卜庆翠; 陈黎; 李兆升; 王德华

    2013-01-01

    Bayesian method is based on Bayes’ Theorem in which one deems it sensible to consider a probability distribution function (pdf) for the unknown parameterθ, p(θ), called the“prior”pdf forθ. p(θ) reflects our knowledge before observation. We let p(θ|D) be the“posterior”pdf of θ(given the data D), which reflects our modified beliefs after incorporating the results of the observation. All Bayesian inferences are based on the posterior distribution. Bayesian approach is a powerful tool for the gamma ray bursts (GRBs) data analysis, such as analyzing structure in photon counting data, combining different lightcurves, determining various parameters’ distributions, testing if there is spectral line, and so on. During the last decade, more and more astronomers realized the potential of the Bayesian ap-proach. Our aim here is not to provide a detailed explanation of Bayesian theory but rather to show what we have learned from GRBs data analysis by using Bayesian approach. As a comparison, we first give a brief introduction to frequentist approach and Bayesian approach. Then, we present spe-cific examples to study various characteristics of GRBs by using Bayesian method. We emphasize the following aspects: using the BIC (Bayesian information criterion) to select model, using BB (Bayesian block) analysis to find optimal change-points in GRBs light curves,using Bayesian fit method to estimate GRBs peak energy, using this method to extend the Hubble diagram to a very high redshift. Bayesian approach has two mainly usages in GRBs data analysis:model selection and parameter estimation. To some degrees, Bayesian approach requires a great deal of thought about the given situation to apply sensibly, therefore, it seemed to need more techniques and experiences.%贝叶斯推断是建立在贝叶斯定理上的一种参数估计方法。根据贝叶斯定理,当根据经验,对待估计的参量θ的分布密度p(θ)(称为“验前分布”)有所了解时,在

  8. Cosmological analysis of galaxy clusters surveys in X-rays

    International Nuclear Information System (INIS)

    Clusters of galaxies are the most massive objects in equilibrium in our Universe. Their study allows to test cosmological scenarios of structure formation with precision, bringing constraints complementary to those stemming from the cosmological background radiation, supernovae or galaxies. They are identified through the X-ray emission of their heated gas, thus facilitating their mapping at different epochs of the Universe. This report presents two surveys of galaxy clusters detected in X-rays and puts forward a method for their cosmological interpretation. Thanks to its multi-wavelength coverage extending over 10 sq. deg. and after one decade of expertise, the XMM-LSS allows a systematic census of clusters in a large volume of the Universe. In the framework of this survey, the first part of this report describes the techniques developed to the purpose of characterizing the detected objects. A particular emphasis is placed on the most distant ones (z ≥ 1) through the complementarity of observations in X-ray, optical and infrared bands. Then the X-CLASS survey is fully described. Based on XMM archival data, it provides a new catalogue of 800 clusters detected in X-rays. A cosmological analysis of this survey is performed thanks to 'CR-HR' diagrams. This new method self-consistently includes selection effects and scaling relations and provides a means to bypass the computation of individual cluster masses. Propositions are made for applying this method to future surveys as XMM-XXL and eRosita. (author)

  9. An Optical Analysis of the Merging Cluster Abell 3888

    CERN Document Server

    Shakouri, S; Dehghan, S

    2016-01-01

    In this paper we present new AAOmega spectroscopy of 254 galaxies within a 30' radius around Abell 3888. We combine these data with the existing redshifts measured in a one degree radius around the cluster and performed a substructure analysis. We confirm 71 member galaxies within the core of A3888 and determine a new average redshift and velocity dispersion for the cluster of 0.1535 +\\- 0.0009 and 1181 +\\- 197 km/s, respectively. The cluster is elongated along an East-West axis and we find the core is bimodal along this axis with two sub-groups of 26 and 41 members detected. Our results suggest that A3888 is a merging system putting to rest the previous conjecture about the morphological status of the cluster derived from X-ray observations. In addition to the results on A3888 we also present six newly detected galaxy over-densities in the field, three of which we classify as new galaxy clusters.

  10. Analysis of axle and vehicle load properties through Bayesian Networks based on Weigh-in-Motion data

    International Nuclear Information System (INIS)

    Weigh-in-Motion (WIM) systems are used, among other applications, in pavement and bridge reliability. The system measures quantities such as individual axle load, vehicular loads, vehicle speed, vehicle length and number of axles. Because of the nature of traffic configuration, the quantities measured are evidently regarded as random variables. The dependence structure of the data of such complex systems as the traffic systems is also very complex. It is desirable to be able to represent the complex multidimensional-distribution with models where the dependence may be explained in a clear way and different locations where the system operates may be treated simultaneously. Bayesian Networks (BNs) are models that comply with the characteristics listed above. In this paper we discuss BN models and results concerning their ability to adequately represent the data. The paper places attention on the construction and use of the models. We discuss applications of the proposed BNs in reliability analysis. In particular we show how the proposed BNs may be used for computing design values for individual axles, vehicle weight and maximum bending moments of bridges in certain time intervals. These estimates have been used to advise authorities with respect to bridge reliability. Directions as to how the model may be extended to include locations where the WIM system does not operate are given whenever possible. These ideas benefit from structured expert judgment techniques previously used to quantify Hybrid Bayesian Networks (HBNs) with success

  11. Bayesian probabilistic sensitivity analysis of Markov models for natural history of a disease: an application for cervical cancer

    Directory of Open Access Journals (Sweden)

    Giulia Carreras

    2012-09-01

    Full Text Available

    Background: parameter uncertainty in the Markov model’s description of a disease course was addressed. Probabilistic sensitivity analysis (PSA is now considered the only tool that properly permits parameter uncertainty’s examination. This consists in sampling values from the parameter’s probability distributions.

    Methods: Markov models fitted with microsimulation were considered and methods for carrying out a PSA on transition probabilities were studied. Two Bayesian solutions were developed: for each row of the modeled transition matrix the prior distribution was assumed as a product of Beta or a Dirichlet. The two solutions differ in the source of information: several different sources for each transition in the Beta approach and a single source for each transition from a given health state in the Dirichlet. The two methods were applied to a simple cervical cancer’s model.

    Results : differences between posterior estimates from the two methods were negligible. Results showed that the prior variability highly influence the posterior distribution.

    Conclusions: the novelty of this work is the Bayesian approach that integrates the two distributions with a product of Binomial distributions likelihood. Such methods could be also applied to cohort data and their application to more complex models could be useful and unique in the cervical cancer context, as well as in other disease modeling.

  12. A Bayesian Analysis for Identifying DNA Copy Number Variations Using a Compound Poisson Process

    Directory of Open Access Journals (Sweden)

    Yiğiter Ayten

    2010-01-01

    Full Text Available To study chromosomal aberrations that may lead to cancer formation or genetic diseases, the array-based Comparative Genomic Hybridization (aCGH technique is often used for detecting DNA copy number variants (CNVs. Various methods have been developed for gaining CNVs information based on aCGH data. However, most of these methods make use of the log-intensity ratios in aCGH data without taking advantage of other information such as the DNA probe (e.g., biomarker positions/distances contained in the data. Motivated by the specific features of aCGH data, we developed a novel method that takes into account the estimation of a change point or locus of the CNV in aCGH data with its associated biomarker position on the chromosome using a compound Poisson process. We used a Bayesian approach to derive the posterior probability for the estimation of the CNV locus. To detect loci of multiple CNVs in the data, a sliding window process combined with our derived Bayesian posterior probability was proposed. To evaluate the performance of the method in the estimation of the CNV locus, we first performed simulation studies. Finally, we applied our approach to real data from aCGH experiments, demonstrating its applicability.

  13. Joint Bayesian Stochastic Inversion of Well Logs and Seismic Data for Volumetric Uncertainty Analysis

    Directory of Open Access Journals (Sweden)

    Moslem Moradi

    2015-06-01

    Full Text Available Here in, an application of a new seismic inversion algorithm in one of Iran’s oilfields is described. Stochastic (geostatistical seismic inversion, as a complementary method to deterministic inversion, is perceived as contribution combination of geostatistics and seismic inversion algorithm. This method integrates information from different data sources with different scales, as prior information in Bayesian statistics. Data integration leads to a probability density function (named as a posteriori probability that can yield a model of subsurface. The Markov Chain Monte Carlo (MCMC method is used to sample the posterior probability distribution, and the subsurface model characteristics can be extracted by analyzing a set of the samples. In this study, the theory of stochastic seismic inversion in a Bayesian framework was described and applied to infer P-impedance and porosity models. The comparison between the stochastic seismic inversion and the deterministic model based seismic inversion indicates that the stochastic seismic inversion can provide more detailed information of subsurface character. Since multiple realizations are extracted by this method, an estimation of pore volume and uncertainty in the estimation were analyzed.

  14. A Bayesian analysis of HAT-P-7b using the EXONEST algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Placek, Ben [Department of Physics, University at Albany (SUNY), Albany NY (United States); Knuth, Kevin H. [Department of Physics, University at Albany (SUNY), Albany NY, USA and Department of Informatics, University at Albany (SUNY), Albany NY (United States)

    2015-01-13

    The study of exoplanets (planets orbiting other stars) is revolutionizing the way we view our universe. High-precision photometric data provided by the Kepler Space Telescope (Kepler) enables not only the detection of such planets, but also their characterization. This presents a unique opportunity to apply Bayesian methods to better characterize the multitude of previously confirmed exoplanets. This paper focuses on applying the EXONEST algorithm to characterize the transiting short-period-hot-Jupiter, HAT-P-7b (also referred to as Kepler-2b). EXONEST evaluates a suite of exoplanet photometric models by applying Bayesian Model Selection, which is implemented with the MultiNest algorithm. These models take into account planetary effects, such as reflected light and thermal emissions, as well as the effect of the planetary motion on the host star, such as Doppler beaming, or boosting, of light from the reflex motion of the host star, and photometric variations due to the planet-induced ellipsoidal shape of the host star. By calculating model evidences, one can determine which model best describes the observed data, thus identifying which effects dominate the planetary system. Presented are parameter estimates and model evidences for HAT-P-7b.

  15. Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model.

    Directory of Open Access Journals (Sweden)

    Gerhard Moser

    2015-04-01

    Full Text Available Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96% had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.

  16. Semiparametric Bayesian Analysis of Nutritional Epidemiology Data in the Presence of Measurement Error

    KAUST Repository

    Sinha, Samiran

    2009-08-10

    We propose a semiparametric Bayesian method for handling measurement error in nutritional epidemiological data. Our goal is to estimate nonparametrically the form of association between a disease and exposure variable while the true values of the exposure are never observed. Motivated by nutritional epidemiological data, we consider the setting where a surrogate covariate is recorded in the primary data, and a calibration data set contains information on the surrogate variable and repeated measurements of an unbiased instrumental variable of the true exposure. We develop a flexible Bayesian method where not only is the relationship between the disease and exposure variable treated semiparametrically, but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparametric functions are modeled simultaneously via B-splines. In addition, we model the distribution of the exposure variable as a Dirichlet process mixture of normal distributions, thus making its modeling essentially nonparametric and placing this work into the context of functional measurement error modeling. We apply our method to the NIH-AARP Diet and Health Study and examine its performance in a simulation study.

  17. OBJECTIVE BAYESIAN ANALYSIS OF ''ON/OFF'' MEASUREMENTS

    Energy Technology Data Exchange (ETDEWEB)

    Casadei, Diego, E-mail: diego.casadei@fhnw.ch [Visiting Scientist, Department of Physics and Astronomy, UCL, Gower Street, London WC1E 6BT (United Kingdom)

    2015-01-01

    In high-energy astrophysics, it is common practice to account for the background overlaid with counts from the source of interest with the help of auxiliary measurements carried out by pointing off-source. In this ''on/off'' measurement, one knows the number of photons detected while pointing toward the source, the number of photons collected while pointing away from the source, and how to estimate the background counts in the source region from the flux observed in the auxiliary measurements. For very faint sources, the number of photons detected is so low that the approximations that hold asymptotically are not valid. On the other hand, an analytical solution exists for the Bayesian statistical inference, which is valid at low and high counts. Here we illustrate the objective Bayesian solution based on the reference posterior and compare the result with the approach very recently proposed by Knoetig, and discuss its most delicate points. In addition, we propose to compute the significance of the excess with respect to the background-only expectation with a method that is able to account for any uncertainty on the background and is valid for any photon count. This method is compared to the widely used significance formula by Li and Ma, which is based on asymptotic properties.

  18. Bayesian analysis for two-parameter hybrid EoS with high-mass compact star twins

    CERN Document Server

    Alvarez-Castillo, David E; Blaschke, David; Grigorian, Hovik

    2015-01-01

    We perform a Bayesian analysis in the basis of a recently developed two-parameter class of hybrid equations of state that allow for high-mass compact star twins. While recently a wide range of radii, from 9 - 15 km, has been inferred for different neutron stars using different techniques, we perform our analysis under the supposition that the radii are towards the large end ($13-15$ km). We use this radius constraint together with the undebated statistically independent constraint for high masses ($\\sim 2~M_\\odot$) as priors in selecting the most probable hybrid equations of state from a family with two free parameters: the baryon excluded volume in the hadronic phase and the 8-quark vector channel interaction in the quark matter phase.

  19. Classification of Micro Array Gene Expression Data using Factor Analysis Approach with Naïve Bayesian Classifier

    Directory of Open Access Journals (Sweden)

    Tamilselvi Madeswaran

    2013-10-01

    Full Text Available Microarray data studies produce large number of data and in order to analyze such large micro array data lies on Data mining or Statistical Analysis. Our objective is to classify the micro arraygene expression data. Usually before going for the classification the dimensionality reduction will be performed on the micro array gene expression dataset. A statistical approach for the extraction of thegene has been proposed. The drawback in the statistical analysis is that, it doesn’t identify the important genes. Here for the classification process we use k-nearest neighbor and SVM and Naïve Bayesian classifiers. From the experimental result our proposed classifiers show increase in the efficiency and accuracy.

  20. Latent features in similarity judgments: a nonparametric bayesian approach.

    Science.gov (United States)

    Navarro, Daniel J; Griffiths, Thomas L

    2008-11-01

    One of the central problems in cognitive science is determining the mental representations that underlie human inferences. Solutions to this problem often rely on the analysis of subjective similarity judgments, on the assumption that recognizing likenesses between people, objects, and events is crucial to everyday inference. One such solution is provided by the additive clustering model, which is widely used to infer the features of a set of stimuli from their similarities, on the assumption that similarity is a weighted linear function of common features. Existing approaches for implementing additive clustering often lack a complete framework for statistical inference, particularly with respect to choosing the number of features. To address these problems, this article develops a fully Bayesian formulation of the additive clustering model, using methods from nonparametric Bayesian statistics to allow the number of features to vary. We use this to explore several approaches to parameter estimation, showing that the nonparametric Bayesian approach provides a straightforward way to obtain estimates of both the number of features and their importance. PMID:18533818

  1. DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab

    Directory of Open Access Journals (Sweden)

    Alexander Chailytko

    2016-05-01

    Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.

  2. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    Full Text Available BACKGROUND: Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes. METHODS: Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method. RESULTS: The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001. CONCLUSION: The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  3. Reconstruction of a windborne insect invasion using a particle dispersal model, historical wind data, and Bayesian analysis of genetic data.

    Science.gov (United States)

    Lander, Tonya A; Klein, Etienne K; Oddou-Muratorio, Sylvie; Candau, Jean-Noël; Gidoin, Cindy; Chalon, Alain; Roig, Anne; Fallour, Delphine; Auger-Rozenberg, Marie-Anne; Boivin, Thomas

    2014-12-01

    Understanding how invasive species establish and spread is vital for developing effective management strategies for invaded areas and identifying new areas where the risk of invasion is highest. We investigated the explanatory power of dispersal histories reconstructed based on local-scale wind data and a regional-scale wind-dispersed particle trajectory model for the invasive seed chalcid wasp Megastigmus schimitscheki (Hymenoptera: Torymidae) in France. The explanatory power was tested by: (1) survival analysis of empirical data on M. schimitscheki presence, absence and year of arrival at 52 stands of the wasp's obligate hosts, Cedrus (true cedar trees); and (2) Approximate Bayesian analysis of M. schimitscheki genetic data using a coalescence model. The Bayesian demographic modeling and traditional population genetic analysis suggested that initial invasion across the range was the result of long-distance dispersal from the longest established sites. The survival analyses of the windborne expansion patterns derived from a particle dispersal model indicated that there was an informative correlation between the M. schimitscheki presence/absence data from the annual surveys and the scenarios based on regional-scale wind data. These three very different analyses produced highly congruent results supporting our proposal that wind is the most probable vector for passive long-distance dispersal of this invasive seed wasp. This result confirms that long-distance dispersal from introduction areas is a likely driver of secondary expansion of alien invasive species. Based on our results, management programs for this and other windborne invasive species may consider (1) focusing effort at the longest established sites and (2) monitoring outlying populations remains critically important due to their influence on rates of spread. We also suggest that there is a distinct need for new analysis methods that have the capacity to combine empirical spatiotemporal field data

  4. How frequently do clusters occur in hierarchical clustering analysis? A graph theoretical approach to studying ties in proximity

    OpenAIRE

    Leal, Wilmer; Llanos, Eugenio J.; RESTREPO Guillermo; Carlos F Suárez; Patarroyo, Manuel Elkin

    2016-01-01

    Background Hierarchical cluster analysis (HCA) is a widely used classificatory technique in many areas of scientific knowledge. Applications usually yield a dendrogram from an HCA run over a given data set, using a grouping algorithm and a similarity measure. However, even when such parameters are fixed, ties in proximity (i.e. two equidistant clusters from a third one) may produce several different dendrograms, having different possible clustering patterns (different classifications). This s...

  5. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  6. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  7. Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.

    Science.gov (United States)

    Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed

    2015-11-01

    Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (Pgait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners.

  8. The Quantitative Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  9. The Quantitative Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, Ethirajan

    2016-05-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  10. Bayesian methods for measures of agreement

    CERN Document Server

    Broemeling, Lyle D

    2009-01-01

    Using WinBUGS to implement Bayesian inferences of estimation and testing hypotheses, Bayesian Methods for Measures of Agreement presents useful methods for the design and analysis of agreement studies. It focuses on agreement among the various players in the diagnostic process.The author employs a Bayesian approach to provide statistical inferences based on various models of intra- and interrater agreement. He presents many examples that illustrate the Bayesian mode of reasoning and explains elements of a Bayesian application, including prior information, experimental information, the likelihood function, posterior distribution, and predictive distribution. The appendices provide the necessary theoretical foundation to understand Bayesian methods as well as introduce the fundamentals of programming and executing the WinBUGS software.Taking a Bayesian approach to inference, this hands-on book explores numerous measures of agreement, including the Kappa coefficient, the G coefficient, and intraclass correlation...

  11. Training image analysis for model error assessment and dimension reduction in Bayesian-MCMC solutions to inverse problems

    Science.gov (United States)

    Koepke, C.; Irving, J.

    2015-12-01

    Bayesian solutions to inverse problems in near-surface geophysics and hydrology have gained increasing popularity as a means of estimating not only subsurface model parameters, but also their corresponding uncertainties that can be used in probabilistic forecasting and risk analysis. In particular, Markov-chain-Monte-Carlo (MCMC) methods have attracted much recent attention as a means of statistically sampling from the Bayesian posterior distribution. In this regard, two approaches are commonly used to improve the computational tractability of the Bayesian-MCMC approach: (i) Forward models involving a simplification of the underlying physics are employed, which offer a significant reduction in the time required to calculate data, but generally at the expense of model accuracy, and (ii) the model parameter space is represented using a limited set of spatially correlated basis functions as opposed to a more intuitive high-dimensional pixel-based parameterization. It has become well understood that model inaccuracies resulting from (i) can lead to posterior parameter distributions that are highly biased and overly confident. Further, when performing model reduction as described in (ii), it is not clear how the prior distribution for the basis weights should be defined because simple (e.g., Gaussian or uniform) priors that may be suitable for a pixel-based parameterization may result in a strong prior bias when used for the weights. To address the issue of model error resulting from known forward model approximations, we generate a set of error training realizations and analyze them with principal component analysis (PCA) in order to generate a sparse basis. The latter is used in the MCMC inversion to remove the main model-error component from the residuals. To improve issues related to prior bias when performing model reduction, we also use a training realization approach, but this time models are simulated from the prior distribution and analyzed using independent

  12. Analysis of gene expression data of the NCl 60 cancer cell lines using Bayesian hierarchical effects model

    Science.gov (United States)

    Lee, Jae K.; Scherf, Uwe; Smith, Lawrence H.; Tanabe, Lorraine; Weinstein, John N.

    2001-06-01

    From the end of the last decade, NCI has been performing large screening of anticancer drug compounds and molecular targets on a pool of 60 cell lines of various types of cancer. In particular, a complete set of cDNA expression array data on the 60 cell lines are now available. To discover differentially-expressed genes in each type of cancer cell lines, we need to estimate a large number of genetic parameters, especially interaction effects for all combinations of cancer types and genes, by decomposing the total variance into biological and array instrumental components. This error decomposition is important to identify subtle genes with low biological variability. An innovative statistical method is required for simultaneously estimating more than 100,000 parameters of interaction effects and error components. We propose a Bayesian statistical approach based on the construction of a hierarchical model adopting parameterization of a liner effects model. The estimation of the model parameters is performed by Markov Chain Monte Carlo, a recent computer- intensive statistical resampling technique. We have identified novel genes whose effects have not been revealed by the previous clustering approaches to the gene expression data.

  13. Bayesian programming

    CERN Document Server

    Bessiere, Pierre; Ahuactzin, Juan Manuel; Mekhnacha, Kamel

    2013-01-01

    Probability as an Alternative to Boolean LogicWhile logic is the mathematical foundation of rational reasoning and the fundamental principle of computing, it is restricted to problems where information is both complete and certain. However, many real-world problems, from financial investments to email filtering, are incomplete or uncertain in nature. Probability theory and Bayesian computing together provide an alternative framework to deal with incomplete and uncertain data. Decision-Making Tools and Methods for Incomplete and Uncertain DataEmphasizing probability as an alternative to Boolean

  14. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  15. Team performance and collective efficacy in the dynamic psychology of competitive team: a Bayesian network analysis.

    Science.gov (United States)

    Fuster-Parra, P; García-Mas, A; Ponseti, F J; Leo, F M

    2015-04-01

    The purpose of this paper was to discover the relationships among 22 relevant psychological features in semi-professional football players in order to study team's performance and collective efficacy via a Bayesian network (BN). The paper includes optimization of team's performance and collective efficacy using intercausal reasoning pattern which constitutes a very common pattern in human reasoning. The BN is used to make inferences regarding our problem, and therefore we obtain some conclusions; among them: maximizing the team's performance causes a decrease in collective efficacy and when team's performance achieves the minimum value it causes an increase in moderate/high values of collective efficacy. Similarly, we may reason optimizing team collective efficacy instead. It also allows us to determine the features that have the strongest influence on performance and which on collective efficacy. From the BN two different coaching styles were differentiated taking into account the local Markov property: training leadership and autocratic leadership.

  16. Bayesian networks precipitation model based on hidden Markov analysis and its application

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    Surface precipitation estimation is very important in hydrologic forecast. To account for the influence of the neighbors on the precipitation of an arbitrary grid in the network, Bayesian networks and Markov random field were adopted to estimate surface precipitation. Spherical coordinates and the expectation-maximization (EM) algorithm were used for region interpolation, and for estimation of the precipitation of arbitrary point in the region. Surface precipitation estimation of seven precipitation stations in Qinghai Lake region was performed. By comparing with other surface precipitation methods such as Thiessen polygon method, distance weighted mean method and arithmetic mean method, it is shown that the proposed method can judge the relationship of precipitation among different points in the area under complicated circumstances and the simulation results are more accurate and rational.

  17. A Bayesian Analysis of a Random Effects Small Business Loan Credit Scoring Model

    Directory of Open Access Journals (Sweden)

    Patrick J. Farrell

    2011-09-01

    Full Text Available One of the most important aspects of credit scoring is constructing a model that has low misclassification rates and is also flexible enough to allow for random variation. It is also well known that, when there are a large number of highly correlated variables as is typical in studies involving questionnaire data, a method must be found to reduce the number of variables to those that have high predictive power. Here we propose a Bayesian multivariate logistic regression model with both fixed and random effects for small business loan credit scoring and a variable reduction method using Bayes factors. The method is illustrated on an interesting data set based on questionnaires sent to loan officers in Canadian banks and venture capital companies

  18. Team performance and collective efficacy in the dynamic psychology of competitive team: a Bayesian network analysis.

    Science.gov (United States)

    Fuster-Parra, P; García-Mas, A; Ponseti, F J; Leo, F M

    2015-04-01

    The purpose of this paper was to discover the relationships among 22 relevant psychological features in semi-professional football players in order to study team's performance and collective efficacy via a Bayesian network (BN). The paper includes optimization of team's performance and collective efficacy using intercausal reasoning pattern which constitutes a very common pattern in human reasoning. The BN is used to make inferences regarding our problem, and therefore we obtain some conclusions; among them: maximizing the team's performance causes a decrease in collective efficacy and when team's performance achieves the minimum value it causes an increase in moderate/high values of collective efficacy. Similarly, we may reason optimizing team collective efficacy instead. It also allows us to determine the features that have the strongest influence on performance and which on collective efficacy. From the BN two different coaching styles were differentiated taking into account the local Markov property: training leadership and autocratic leadership. PMID:25546263

  19. Default Bayesian analysis for multi-way tables: a data-augmentation approach

    CERN Document Server

    Polson, Nicholas G

    2011-01-01

    This paper proposes a strategy for regularized estimation in multi-way contingency tables, which are common in meta-analyses and multi-center clinical trials. Our approach is based on data augmentation, and appeals heavily to a novel class of Polya-Gamma distributions. Our main contributions are to build up the relevant distributional theory and to demonstrate three useful features of this data-augmentation scheme. First, it leads to simple EM and Gibbs-sampling algorithms for posterior inference, circumventing the need for analytic approximations, numerical integration, Metropolis--Hastings, or variational methods. Second, it allows modelers much more flexibility when choosing priors, which have traditionally come from the Dirichlet or logistic-normal family. For example, our approach allows users to incorporate Bayesian analogues of classical penalized-likelihood techniques (e.g. the lasso or bridge) in computing regularized estimates for log-odds ratios. Finally, our data-augmentation scheme naturally sugg...

  20. Analysis and assessment of injury risk in female gymnastics:Bayesian Network approach

    Directory of Open Access Journals (Sweden)

    Lyudmila Dimitrova

    2015-02-01

    Full Text Available This paper presents a Bayesian network (BN model for estimating injury risk in female artistic gymnastics. The model illustrates the connections betweenunderlying injury risk factorsthrough a series ofcausal dependencies. The quantitativepart of the model – the conditional probability tables, are determined using ТNormal distribution with parameters, derived by experts. The injury rates calculated by the network are in an agreement with injury statistic data and correctly reports the impact of various risk factors on injury rates. The model is designed to assist coaches and supporting teams in planning the training activity so that injuries are minimized. This study provides important background for further data collection and research necessary to improve the precision of the quantitative predictions of the model.

  1. Insurance penetration and economic growth in Africa: Dynamic effects analysis using Bayesian TVP-VAR approach

    Directory of Open Access Journals (Sweden)

    D.O. Olayungbo

    2016-12-01

    Full Text Available This paper examines the dynamic interactions between insurance and economic growth in eight African countries for the period of 1970–2013. Insurance demand is measured by insurance penetration which accounts for income differences across the sample countries. A Bayesian Time Varying Parameter Vector Auto regression (TVP-VAR model with stochastic volatility is used to analyze the short run and the long run among the variables of interest. Using insurance penetration as a measure of insurance to economic growth, we find positive relationship for Egypt, while short-run negative and long-run positive effects are found for Kenya, Mauritius, and South Africa. On the contrary, negative effects are found for Algeria, Nigeria, Tunisia, and Zimbabwe. Implementation of sound financial reforms and wide insurance coverage are proposed recommendations for insurance development in the selected African countries.

  2. Computation of posterior distribution in Bayesian analysis – application in an intermittently used reliability system

    Directory of Open Access Journals (Sweden)

    V. S.S. Yadavalli

    2002-09-01

    Full Text Available Bayesian estimation is presented for the stationary rate of disappointments, D∞, for two models (with different specifications of intermittently used systems. The random variables in the system are considered to be independently exponentially distributed. Jeffreys’ prior is assumed for the unknown parameters in the system. Inference about D∞ is being restrained in both models by the complex and non-linear definition of D∞. Monte Carlo simulation is used to derive the posterior distribution of D∞ and subsequently the highest posterior density (HPD intervals. A numerical example where Bayes estimates and the HPD intervals are determined illustrates these results. This illustration is extended to determine the frequentistical properties of this Bayes procedure, by calculating covering proportions for each of these HPD intervals, assuming fixed values for the parameters.

  3. Improving PWR core simulations by Monte Carlo uncertainty analysis and Bayesian inference

    CERN Document Server

    Castro, Emilio; Buss, Oliver; Garcia-Herranz, Nuria; Hoefer, Axel; Porsch, Dieter

    2016-01-01

    A Monte Carlo-based Bayesian inference model is applied to the prediction of reactor operation parameters of a PWR nuclear power plant. In this non-perturbative framework, high-dimensional covariance information describing the uncertainty of microscopic nuclear data is combined with measured reactor operation data in order to provide statistically sound, well founded uncertainty estimates of integral parameters, such as the boron letdown curve and the burnup-dependent reactor power distribution. The performance of this methodology is assessed in a blind test approach, where we use measurements of a given reactor cycle to improve the prediction of the subsequent cycle. As it turns out, the resulting improvement of the prediction quality is impressive. In particular, the prediction uncertainty of the boron letdown curve, which is of utmost importance for the planning of the reactor cycle length, can be reduced by one order of magnitude by including the boron concentration measurement information of the previous...

  4. Segment clustering methodology for unsupervised Holter recordings analysis

    Science.gov (United States)

    Rodríguez-Sotelo, Jose Luis; Peluffo-Ordoñez, Diego; Castellanos Dominguez, German

    2015-01-01

    Cardiac arrhythmia analysis on Holter recordings is an important issue in clinical settings, however such issue implicitly involves attending other problems related to the large amount of unlabelled data which means a high computational cost. In this work an unsupervised methodology based in a segment framework is presented, which consists of dividing the raw data into a balanced number of segments in order to identify fiducial points, characterize and cluster the heartbeats in each segment separately. The resulting clusters are merged or split according to an assumed criterion of homogeneity. This framework compensates the high computational cost employed in Holter analysis, being possible its implementation for further real time applications. The performance of the method is measure over the records from the MIT/BIH arrhythmia database and achieves high values of sensibility and specificity, taking advantage of database labels, for a broad kind of heartbeats types recommended by the AAMI.

  5. Data Preprocessing in Cluster Analysis of Gene Expression

    Institute of Scientific and Technical Information of China (English)

    杨春梅; 万柏坤; 高晓峰

    2003-01-01

    Considering that the DNA microarray technology has generated explosive gene expression data and that it is urgent to analyse and to visualize such massive datasets with efficient methods, we investigate the data preprocessing methods used in cluster analysis, normalization or logarithm of the matrix, by using hierarchical clustering, principal component analysis (PCA) and self-organizing maps (SOMs). The results illustrate that when using the Euclidean distance as measuring metrics, logarithm of relative expression level is the best preprocessing method, while data preprocessed by normalization cannot attain the expected results because the data structure is ruined. If there are only a few principal components, the PCA is an effective method to extract the frame structure, while SOMs are more suitable for a specific structure.

  6. Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors.

    Science.gov (United States)

    Van De Wiel, Mark A; Leday, Gwenaël G R; Pardo, Luba; Rue, Håvard; Van Der Vaart, Aad W; Van Wieringen, Wessel N

    2013-01-01

    Next generation sequencing is quickly replacing microarrays as a technique to probe different molecular levels of the cell, such as DNA or RNA. The technology provides higher resolution, while reducing bias. RNA sequencing results in counts of RNA strands. This type of data imposes new statistical challenges. We present a novel, generic approach to model and analyze such data. Our approach aims at large flexibility of the likelihood (count) model and the regression model alike. Hence, a variety of count models is supported, such as the popular NB model, which accounts for overdispersion. In addition, complex, non-balanced designs and random effects are accommodated. Like some other methods, our method provides shrinkage of dispersion-related parameters. However, we extend it by enabling joint shrinkage of parameters, including those for which inference is desired. We argue that this is essential for Bayesian multiplicity correction. Shrinkage is effectuated by empirically estimating priors. We discuss several parametric (mixture) and non-parametric priors and develop procedures to estimate (parameters of) those. Inference is provided by means of local and Bayesian false discovery rates. We illustrate our method on several simulations and two data sets, also to compare it with other methods. Model- and data-based simulations show substantial improvements in the sensitivity at the given specificity. The data motivate the use of the ZI-NB as a powerful alternative to the NB, which results in higher detection rates for low-count data. Finally, compared with other methods, the results on small sample subsets are more reproducible when validated on their large sample complements, illustrating the importance of the type of shrinkage. PMID:22988280

  7. Bayesian analysis of multi-state data with individual covariates for estimating genetic effects on demography

    Science.gov (United States)

    Converse, Sarah J.; Royle, J. Andrew; Urbanek, Richard P.

    2012-01-01

    Inbreeding depression is frequently a concern of managers interested in restoring endangered species. Decisions to reduce the potential for inbreeding depression by balancing genotypic contributions to reintroduced populations may exact a cost on long-term demographic performance of the population if those decisions result in reduced numbers of animals released and/or restriction of particularly successful genotypes (i.e., heritable traits of particular family lines). As part of an effort to restore a migratory flock of Whooping Cranes (Grus americana) to eastern North America using the offspring of captive breeders, we obtained a unique dataset which includes post-release mark-recapture data, as well as the pedigree of each released individual. We developed a Bayesian formulation of a multi-state model to analyze radio-telemetry, band-resight, and dead recovery data on reintroduced individuals, in order to track survival and breeding state transitions. We used studbook-based individual covariates to examine the comparative evidence for and degree of effects of inbreeding, genotype, and genotype quality on post-release survival of reintroduced individuals. We demonstrate implementation of the Bayesian multi-state model, which allows for the integration of imperfect detection, multiple data types, random effects, and individual- and time-dependent covariates. Our results provide only weak evidence for an effect of the quality of an individual's genotype in captivity on post-release survival as well as for an effect of inbreeding on post-release survival. We plan to integrate our results into a decision-analytic modeling framework that can explicitly examine tradeoffs between the effects of inbreeding and the effects of genotype and demographic stochasticity on population establishment.

  8. Rapid sphere sizing using a Bayesian analysis of reciprocal space imaging data.

    Science.gov (United States)

    Ziovas, K; Sederman, A J; Gehin-Delval, C; Gunes, D Z; Hughes, E; Mantle, M D

    2016-01-15

    Dispersed systems are important in many applications in a wide range of industries such as the petroleum, pharmaceutical and food industries. Therefore the ability to control and non-invasively measure the physical properties of these systems, such as the dispersed phase size distribution, is of significant interest, in particular for concentrated systems, where microscopy or scattering techniques may not apply or with very limited output quality. In this paper we show how reciprocal space data acquired using both 1D magnetic resonance imaging (MRI) and 2D X-ray micro-tomographic (X-ray μCT) data can be analysed, using a Bayesian statistical model, to extract the sphere size distribution (SSD) from model sphere systems and dispersed food foam samples. Glass spheres-in-xanthan gels were used as model samples with sphere diameters (D) in the range of 45μm⩽D⩽850μm. The results show that the SSD was successfully estimated from both the NMR and X-ray μCT with a good degree of accuracy for the entire range of glass spheres in times as short as two seconds. After validating the technique using model samples, the Bayesian sphere sizing method was successfully applied to air/water foam samples generated using a microfluidics apparatus with 160μm⩽D⩽400μm. The effect of different experimental parameters such as the standard deviation of the bubble size distribution and the volume fraction of the dispersed phase is discussed. PMID:26439290

  9. Euro area structural convergence? A multi-criterion cluster analysis.

    OpenAIRE

    Irac, D.; Lopez, J.

    2013-01-01

    This paper proposes a classification of the old member countries of the euro area in a structural data rich environment and run a convergence analysis using the same framework. First, we use a clustering approach and identify two structurally distinct groups of countries that are not modified between 1995 and 2007: the South Countries Group (SCG) – composed of Greece, Italy, Portugal and Spain – and the Other Countries Group (OCG). Second, we propose a convergence metrics and reach three key ...

  10. Customer-Classified Algorithm Based onFuzzy Clustering Analysis

    Institute of Scientific and Technical Information of China (English)

    郭蕴华; 祖巧红; 陈定方

    2004-01-01

    A customer-classified evaluation system is described with the customization-supporting tree of evaluation indexes, in which users can determine any evaluation index independently. Based on this system, a customer-classified algorithm based on fuzzy clustering analysis is proposed to implement the customer-classified management. A numerical example is presented, which provides correct results,indicating that the algorithm can be used in the decision support system of CRM.

  11. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis.

    Directory of Open Access Journals (Sweden)

    Nan Lin

    Full Text Available Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis.

  12. Clustering analysis of ancient celadon based on SOM neural network

    Institute of Scientific and Technical Information of China (English)

    ZHOU ShaoHuai; FU Lue; LIANG BaoLiu

    2008-01-01

    In the study,chemical compositions of 48 fragments of ancient ceramics excavated in 4 archaeological kiln sites which were located in 3 cities (Hangzhou,Cixi and Longquan in Zhejiang Province,China) have been examined by energy-dispersive X-ray fluorescence (EDXRF) technique.Then the method of SOM was introduced into the clustering analysis based on the major and minor element compositions of the bodies,the results manifested that 48 samples could be perfectly distributed into 3 locations,Hangzhou,Cixi and Longquan.Because the major and minor ele-ment compositions of two Royal Kilns were similar to each other,the classification accuracy over them was merely 76.92%.In view of this,the authors have made a SOM clustering analysis again based on the trace element compositions of the bodies,the classification accuracy rose to 84.61%.These results indicated that discrepancies in the trace element compositions of the bodies of the ancient ce-ramics excavated in two Royal Kiln sites were more distinct than those in the major and minor element compositions,which was in accordance with the fact.We ar-gued that SOM could be employed in the clustering analysis of ancient ceramics.

  13. Clustering analysis of ancient celadon based on SOM neural network

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    In the study, chemical compositions of 48 fragments of ancient ceramics excavated in 4 archaeological kiln sites which were located in 3 cities (Hangzhou, Cixi and Longquan in Zhejiang Province, China) have been examined by energy-dispersive X-ray fluorescence (EDXRF) technique. Then the method of SOM was introduced into the clustering analysis based on the major and minor element compositions of the bodies, the results manifested that 48 samples could be perfectly distributed into 3 locations, Hangzhou, Cixi and Longquan. Because the major and minor element compositions of two Royal Kilns were similar to each other, the classification accuracy over them was merely 76.92%. In view of this, the authors have made a SOM clustering analysis again based on the trace element compositions of the bodies, the classification accuracy rose to 84.61%. These results indicated that discrepancies in the trace element compositions of the bodies of the ancient ceramics excavated in two Royal Kiln sites were more distinct than those in the major and minor element compositions, which was in accordance with the fact. We argued that SOM could be employed in the clustering analysis of ancient ceramics.

  14. Coupled Two-Way Clustering Analysis of Gene Microarray Data

    CERN Document Server

    Getz, G; Domany, E

    2000-01-01

    We present a novel coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task: we present an algorithm, based on iterative clustering, which performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  15. A Review on Clustering and Outlier Analysis Techniques in Datamining

    Directory of Open Access Journals (Sweden)

    S. Koteeswaran

    2012-01-01

    Full Text Available Problem statement: The modern world is based on using physical, biological and social systems more effectively using advanced computerized techniques. A great amount of data being generated by such systems; it leads to a paradigm shift from classical modeling and analyses based on basic principles to developing models and the corresponding analyses directly from data. The ability to extract useful hidden knowledge in these data and to act on that knowledge is becoming increasingly important in today's competitive world. Approach: The entire process of applying a computer-based methodology, including new techniques, for discovering knowledge from data is called data mining. There are two primary goals in the data mining which are prediction and classification. The larger data involved in the data mining requires clustering and outlier analysis for reducing as well as collecting only useful data set. Results: This study is focusing the review of implementation techniques, recent research on clustering and outlier analysis. Conclusion: The study aims for providing the review of clustering and outlier analysis technique and the discussion on the study will guide the researcher for improving their research direction.

  16. Linking bovine tuberculosis on cattle farms to white-tailed deer and environmental variables using Bayesian hierarchical analysis.

    Directory of Open Access Journals (Sweden)

    W David Walter

    Full Text Available Bovine tuberculosis is a bacterial disease caused by Mycobacterium bovis in livestock and wildlife with hosts that include Eurasian badgers (Meles meles, brushtail possum (Trichosurus vulpecula, and white-tailed deer (Odocoileus virginianus. Risk-assessment efforts in Michigan have been initiated on farms to minimize interactions of cattle with wildlife hosts but research on M. bovis on cattle farms has not investigated the spatial context of disease epidemiology. To incorporate spatially explicit data, initial likelihood of infection probabilities for cattle farms tested for M. bovis, prevalence of M. bovis in white-tailed deer, deer density, and environmental variables for each farm were modeled in a Bayesian hierarchical framework. We used geo-referenced locations of 762 cattle farms that have been tested for M. bovis, white-tailed deer prevalence, and several environmental variables that may lead to long-term survival and viability of M. bovis on farms and surrounding habitats (i.e., soil type, habitat type. Bayesian hierarchical analyses identified deer prevalence and proportion of sandy soil within our sampling grid as the most supported model. Analysis of cattle farms tested for M. bovis identified that for every 1% increase in sandy soil resulted in an increase in odds of infection by 4%. Our analysis revealed that the influence of prevalence of M. bovis in white-tailed deer was still a concern even after considerable efforts to prevent cattle interactions with white-tailed deer through on-farm mitigation and reduction in the deer population. Cattle farms test positive for M. bovis annually in our study area suggesting that the potential for an environmental source either on farms or in the surrounding landscape may contributing to new or re-infections with M. bovis. Our research provides an initial assessment of potential environmental factors that could be incorporated into additional modeling efforts as more knowledge of deer herd

  17. Using a Simple Binomial Model to Assess Improvement in Predictive Capability: Sequential Bayesian Inference, Hypothesis Testing, and Power Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Sigeti, David E. [Los Alamos National Laboratory; Pelak, Robert A. [Los Alamos National Laboratory

    2012-09-11

    We present a Bayesian statistical methodology for identifying improvement in predictive simulations, including an analysis of the number of (presumably expensive) simulations that will need to be made in order to establish with a given level of confidence that an improvement has been observed. Our analysis assumes the ability to predict (or postdict) the same experiments with legacy and new simulation codes and uses a simple binomial model for the probability, {theta}, that, in an experiment chosen at random, the new code will provide a better prediction than the old. This model makes it possible to do statistical analysis with an absolute minimum of assumptions about the statistics of the quantities involved, at the price of discarding some potentially important information in the data. In particular, the analysis depends only on whether or not the new code predicts better than the old in any given experiment, and not on the magnitude of the improvement. We show how the posterior distribution for {theta} may be used, in a kind of Bayesian hypothesis testing, both to decide if an improvement has been observed and to quantify our confidence in that decision. We quantify the predictive probability that should be assigned, prior to taking any data, to the possibility of achieving a given level of confidence, as a function of sample size. We show how this predictive probability depends on the true value of {theta} and, in particular, how there will always be a region around {theta} = 1/2 where it is highly improbable that we will be able to identify an improvement in predictive capability, although the width of this region will shrink to zero as the sample size goes to infinity. We show how the posterior standard deviation may be used, as a kind of 'plan B metric' in the case that the analysis shows that {theta} is close to 1/2 and argue that such a plan B should generally be part of hypothesis testing. All the analysis presented in the paper is done with a

  18. Diagnostics of subtropical plants functional state by cluster analysis

    Directory of Open Access Journals (Sweden)

    Oksana Belous

    2016-05-01

    Full Text Available The article presents an application example of statistical methods for data analysis on diagnosis of the adaptive capacity of subtropical plants varieties. We depicted selection indicators and basic physiological parameters that were defined as diagnostic. We used evaluation on a set of parameters of water regime, there are: determination of water deficit of the leaves, determining the fractional composition of water and detection parameters of the concentration of cell sap (CCS (for tea culture flushes. These settings are characterized by high liability and high responsiveness to the effects of many abiotic factors that determined the particular care in the selection of plant material for analysis and consideration of the impact on sustainability. On the basis of the experimental data calculated the coefficients of pair correlation between climatic factors and used physiological indicators. The result was a selection of physiological and biochemical indicators proposed to assess the adaptability and included in the basis of methodical recommendations on diagnostics of the functional state of the studied cultures. Analysis of complex studies involving a large number of indicators is quite difficult, especially does not allow to quickly identify the similarity of new varieties for their adaptive responses to adverse factors, and, therefore, to set general requirements to conditions of cultivation. Use of cluster analysis suggests that in the analysis of only quantitative data; define a set of variables used to assess varieties (and the more sampling, the more accurate the clustering will happen, be sure to ascertain the measure of similarity (or difference between objects. It is shown that the identification of diagnostic features, which are subjected to statistical processing, impact the accuracy of the varieties classification. Selection in result of the mono-clusters analysis (variety tea Kolhida; hazelnut Lombardsky red; variety kiwi Monty

  19. Genomic cluster and network analysis for predictive screening for hepatotoxicity.

    Science.gov (United States)

    Fukushima, Tamio; Kikkawa, Rie; Hamada, Yoshimasa; Horii, Ikuo

    2006-12-01

    The present study was undertaken to estimate the usefulness of genomic approaches to predict hepatotoxicity. Male rats were treated with acetaminophen (APAP), carbon tetrachloride (CCL), amiodarone (AD) or tetracycline (TC) at toxic doses. Their livers were extracted 6 or 24 hr after the dosings and were used for subsequent examinations. At 6 hr there were no histological changes noted in any of the groups except for the CCL group, but at 24 hr, such changes were noted in all but the AD group. Regarding genomic analysis, we performed hierarchical cluster analysis using S-plus software. The individual microarray data were clearly classified into 5 treatment-related clusters at 24 hr as well as at 6 hr, even though no morphological changes were noted at 6 hr. In the gene expression analysis using GeneSpring, transcription factor and oxidative stress- and lipid metabolism-related genes were markedly affected in all treatment groups at both time points when compared with the corresponding control values. Finally, we investigated gene networks in the above-affected genes by using Ingenuity Pathway Analysis software. Down-regulation of lipid metabolism-related genes regulated by SREBP1 was observed in all treatment groups at both time points, and up-regulation of oxidative stress-related genes regulated by Nrf2 was observed in the APAP and CCL treatment groups. From the above findings, for the application of genomic approaches to predict hepatotoxicity, we considered that cluster analysis for classification and early prediction of hepatotoxicity and network analysis for investigation of toxicological biomarkers would be useful. PMID:17202758

  20. Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach

    Directory of Open Access Journals (Sweden)

    Matúš Horváth

    2012-10-01

    Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.