Bayesian variable selection with spherically symmetric priors
De Kock, M. B.; Eggers, H. C.
2014-01-01
We propose that Bayesian variable selection for linear parametrisations with Gaussian iid likelihoods be based on the spherical symmetry of the diagonalised parameter space. Our r-prior results in closed forms for the evidence for four examples, including the hyper-g prior and the Zellner-Siow prior, which are shown to be special cases. Scenarios of a single variable dispersion parameter and of fixed dispersion are studied, and asymptotic forms comparable to the traditional information criter...
Bayesian Variable Selection via Particle Stochastic Search.
Shi, Minghui; Dunson, David B
2011-02-01
We focus on Bayesian variable selection in regression models. One challenge is to search the huge model space adequately, while identifying high posterior probability regions. In the past decades, the main focus has been on the use of Markov chain Monte Carlo (MCMC) algorithms for these purposes. In this article, we propose a new computational approach based on sequential Monte Carlo (SMC), which we refer to as particle stochastic search (PSS). We illustrate PSS through applications to linear regression and probit models.
Bayesian variable selection for latent class models.
Ghosh, Joyee; Herring, Amy H; Siega-Riz, Anna Maria
2011-09-01
In this article, we develop a latent class model with class probabilities that depend on subject-specific covariates. One of our major goals is to identify important predictors of latent classes. We consider methodology that allows estimation of latent classes while allowing for variable selection uncertainty. We propose a Bayesian variable selection approach and implement a stochastic search Gibbs sampler for posterior computation to obtain model-averaged estimates of quantities of interest such as marginal inclusion probabilities of predictors. Our methods are illustrated through simulation studies and application to data on weight gain during pregnancy, where it is of interest to identify important predictors of latent weight gain classes.
A Bayesian variable selection procedure to rank overlapping gene sets
Directory of Open Access Journals (Sweden)
Skarman Axel
2012-05-01
Full Text Available Abstract Background Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability and stable given a limited number of observations. Conclusions Bayesian variable selection is a useful way to prioritize gene sets while considering their overlaps. Ignoring the overlaps gives different and possibly misleading results. Additional procedures may be needed in cases of highly overlapping pathways that are hard to prioritize.
Bayesian Variable Selection in Cost-Effectiveness Analysis
Directory of Open Access Journals (Sweden)
Miguel A. Negrín
2010-04-01
Full Text Available Linear regression models are often used to represent the cost and effectiveness of medical treatment. The covariates used may include sociodemographic variables, such as age, gender or race; clinical variables, such as initial health status, years of treatment or the existence of concomitant illnesses; and a binary variable indicating the treatment received. However, most studies estimate only one model, which usually includes all the covariates. This procedure ignores the question of uncertainty in model selection. In this paper, we examine four alternative Bayesian variable selection methods that have been proposed. In this analysis, we estimate the inclusion probability of each covariate in the real model conditional on the data. Variable selection can be useful for estimating incremental effectiveness and incremental cost, through Bayesian model averaging, as well as for subgroup analysis.
Bayesian nonparametric centered random effects models with variable selection.
Yang, Mingan
2013-03-01
In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject-specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross-country and interlaboratory rodent uterotrophic bioassay.
Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences.
Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric
2016-01-01
Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach.
Peterson, Christine B; Stingo, Francesco C; Vannucci, Marina
2016-03-30
In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications because it allows the identification of pathways of functionally related genes or proteins that impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival.
Locating disease genes using Bayesian variable selection with the Haseman-Elston method
Directory of Open Access Journals (Sweden)
He Qimei
2003-12-01
Full Text Available Abstract Background We applied stochastic search variable selection (SSVS, a Bayesian model selection method, to the simulated data of Genetic Analysis Workshop 13. We used SSVS with the revisited Haseman-Elston method to find the markers linked to the loci determining change in cholesterol over time. To study gene-gene interaction (epistasis and gene-environment interaction, we adopted prior structures, which incorporate the relationship among the predictors. This allows SSVS to search in the model space more efficiently and avoid the less likely models. Results In applying SSVS, instead of looking at the posterior distribution of each of the candidate models, which is sensitive to the setting of the prior, we ranked the candidate variables (markers according to their marginal posterior probability, which was shown to be more robust to the prior. Compared with traditional methods that consider one marker at a time, our method considers all markers simultaneously and obtains more favorable results. Conclusions We showed that SSVS is a powerful method for identifying linked markers using the Haseman-Elston method, even for weak effects. SSVS is very effective because it does a smart search over the entire model space.
Energy Technology Data Exchange (ETDEWEB)
Zhao, Kaiguang; Valle, Denis; Popescu, Sorin; Zhang, Xuesong; Malick, Bani
2013-05-15
Model specification remains challenging in spectroscopy of plant biochemistry, as exemplified by the availability of various spectral indices or band combinations for estimating the same biochemical. This lack of consensus in model choice across applications argues for a paradigm shift in hyperspectral methods to address model uncertainty and misspecification. We demonstrated one such method using Bayesian model averaging (BMA), which performs variable/band selection and quantifies the relative merits of many candidate models to synthesize a weighted average model with improved predictive performances. The utility of BMA was examined using a portfolio of 27 foliage spectral–chemical datasets representing over 80 species across the globe to estimate multiple biochemical properties, including nitrogen, hydrogen, carbon, cellulose, lignin, chlorophyll (a or b), carotenoid, polar and nonpolar extractives, leaf mass per area, and equivalent water thickness. We also compared BMA with partial least squares (PLS) and stepwise multiple regression (SMR). Results showed that all the biochemicals except carotenoid were accurately estimated from hyerspectral data with R2 values > 0.80.
Bhadra, Anindya
2013-04-22
We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. © 2013, The International Biometric Society.
Dose-Response Modeling Under Simple Order Restrictions Using Bayesian Variable Selection Methods
Otava, Martin; Shkedy, Ziv; Lin, Dan; Goehlmann, Hinrich W. H.; Bijnens, Luc; Talloen, Willem; Kasim, Adetayo
2014-01-01
Bayesian modeling of dose–response data offers the possibility to establish the relationship between a clinical or a genomic response and increasing doses of a therapeutic compound and to determine the nature of the relationship wherever it exists. In this article, we focus on an order-restricted one-way ANOVA model which can be used to test the null hypothesis of no dose effect against an ordered alternative. Within the framework of the dose–response modeling, a model uncertainty can be addr...
Bayesian Evidence and Model Selection
Knuth, Kevin H; Malakar, Nabin K; Mubeen, Asim M; Placek, Ben
2014-01-01
In this paper we review the concept of the Bayesian evidence and its application to model selection. The theory is presented along with a discussion of analytic, approximate and numerical techniques. Application to several practical examples within the context of signal processing are discussed.
Learning dynamic Bayesian networks with mixed variables
DEFF Research Database (Denmark)
Bøttcher, Susanne Gammelgaard
This paper considers dynamic Bayesian networks for discrete and continuous variables. We only treat the case, where the distribution of the variables is conditional Gaussian. We show how to learn the parameters and structure of a dynamic Bayesian network and also how the Markov order can be learned....... An automated procedure for specifying prior distributions for the parameters in a dynamic Bayesian network is presented. It is a simple extension of the procedure for the ordinary Bayesian networks. Finally the W¨olfer?s sunspot numbers are analyzed....
Jacobi, Liana; Wagner, Helga; Frühwirth-Schnatter, Sylvia
2014-01-01
Child birth leads to a break in a woman's employment history and is considered one reason for the relatively poor labor market outcomes observed for women compared to men. However, the time spent at home after child birth varies significantly across mothers and is likely driven by observed and, more importantly, unobserved factors that also affect labor market outcomes directly. In this paper we propose two alternative Bayesian treatment modeling and inferential frameworks for panel outcomes ...
Bayesian model selection in Gaussian regression
Abramovich, Felix
2009-01-01
We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting estimator. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for "nearly-orthogonal" and "multicollinear" designs.
Random Effect and Latent Variable Model Selection
Dunson, David B
2008-01-01
Presents various methods for accommodating model uncertainty in random effects and latent variable models. This book focuses on frequentist likelihood ratio and score tests for zero variance components. It also focuses on Bayesian methods for random effects selection in linear mixed effects and generalized linear mixed models
Bayesian site selection for fast Gaussian process regression
Pourhabib, Arash
2014-02-05
Gaussian Process (GP) regression is a popular method in the field of machine learning and computer experiment designs; however, its ability to handle large data sets is hindered by the computational difficulty in inverting a large covariance matrix. Likelihood approximation methods were developed as a fast GP approximation, thereby reducing the computation cost of GP regression by utilizing a much smaller set of unobserved latent variables called pseudo points. This article reports a further improvement to the likelihood approximation methods by simultaneously deciding both the number and locations of the pseudo points. The proposed approach is a Bayesian site selection method where both the number and locations of the pseudo inputs are parameters in the model, and the Bayesian model is solved using a reversible jump Markov chain Monte Carlo technique. Through a number of simulated and real data sets, it is demonstrated that with appropriate priors chosen, the Bayesian site selection method can produce a good balance between computation time and prediction accuracy: it is fast enough to handle large data sets that a full GP is unable to handle, and it improves, quite often remarkably, the prediction accuracy, compared with the existing likelihood approximations. © 2014 Taylor and Francis Group, LLC.
Improving randomness characterization through Bayesian model selection
R., Rafael Díaz-H; Martínez, Alí M Angulo; U'Ren, Alfred B; Hirsch, Jorge G; Marsili, Matteo; Castillo, Isaac Pérez
2016-01-01
Nowadays random number generation plays an essential role in technology with important applications in areas ranging from cryptography, which lies at the core of current communication protocols, to Monte Carlo methods, and other probabilistic algorithms. In this context, a crucial scientific endeavour is to develop effective methods that allow the characterization of random number generators. However, commonly employed methods either lack formality (e.g. the NIST test suite), or are inapplicable in principle (e.g. the characterization derived from the Algorithmic Theory of Information (ATI)). In this letter we present a novel method based on Bayesian model selection, which is both rigorous and effective, for characterizing randomness in a bit sequence. We derive analytic expressions for a model's likelihood which is then used to compute its posterior probability distribution. Our method proves to be more rigorous than NIST's suite and the Borel-Normality criterion and its implementation is straightforward. We...
Entropic Priors and Bayesian Model Selection
Brewer, Brendon J
2009-01-01
We demonstrate that the principle of maximum relative entropy (ME), used judiciously, can ease the specification of priors in model selection problems. The resulting effect is that models that make sharp predictions are disfavoured, weakening the usual Bayesian "Occam's Razor". This is illustrated with a simple example involving what Jaynes called a "sure thing" hypothesis. Jaynes' resolution of the situation involved introducing a large number of alternative "sure thing" hypotheses that were possible before we observed the data. However, in more complex situations, it may not be possible to explicitly enumerate large numbers of alternatives. The entropic priors formalism produces the desired result without modifying the hypothesis space or requiring explicit enumeration of alternatives; all that is required is a good model for the prior predictive distribution for the data. This idea is illustrated with a simple rigged-lottery example, and we outline how this idea may help to resolve a recent debate amongst ...
Bayesian Item Selection in Constrained Adaptive Testing Using Shadow Tests
Veldkamp, Bernard P.
2010-01-01
Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item…
Estimation and variable selection with exponential weights
Arias-Castro, Ery; Lounici, Karim
2014-01-01
In the context of a linear model with a sparse coefficient vector, exponential weights methods have been shown to be achieve oracle inequalities for denoising/prediction. We show that such methods also succeed at variable selection and estimation under the near minimum condition on the design matrix, instead of much stronger assumptions required by other methods such as the Lasso or the Dantzig Selector. The same analysis yields consistency results for Bayesian methods and BIC-type variable s...
Bayesian Network Models for Local Dependence among Observable Outcome Variables
Almond, Russell G.; Mulder, Joris; Hemat, Lisa A.; Yan, Duanli
2009-01-01
Bayesian network models offer a large degree of flexibility for modeling dependence among observables (item outcome variables) from the same task, which may be dependent. This article explores four design patterns for modeling locally dependent observations: (a) no context--ignores dependence among observables; (b) compensatory context--introduces…
Model Criticism of Bayesian Networks with Latent Variables.
Williamson, David M.; Mislevy, Robert J.; Almond, Russell G.
This study investigated statistical methods for identifying errors in Bayesian networks (BN) with latent variables, as found in intelligent cognitive assessments. BN, commonly used in artificial intelligence systems, are promising mechanisms for scoring constructed-response examinations. The success of an intelligent assessment or tutoring system…
A guide to Bayesian model selection for ecologists
Hooten, Mevin B.; Hobbs, N.T.
2015-01-01
The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.
Variable selection through CART
Sauvé, Marie
2011-01-01
This paper deals with variable selection in the regression and binary classification frameworks. It proposes an automatic and exhaustive procedure which relies on the use of the CART algorithm and on model selection via penalization. This work, of theoretical nature, aims at determining adequate penalties, i.e. penalties which allow to get oracle type inequalities justifying the performance of the proposed procedure. Since the exhaustive procedure can not be executed when the number of variables is too big, a more practical procedure is also proposed and still theoretically validated. A simulation study completes the theoretical results.
Lin, Lin; Chan, Cliburn; West, Mike
2016-01-01
We discuss the evaluation of subsets of variables for the discriminative evidence they provide in multivariate mixture modeling for classification. The novel development of Bayesian classification analysis presented is partly motivated by problems of design and selection of variables in biomolecular studies, particularly involving widely used assays of large-scale single-cell data generated using flow cytometry technology. For such studies and for mixture modeling generally, we define discriminative analysis that overlays fitted mixture models using a natural measure of concordance between mixture component densities, and define an effective and computationally feasible method for assessing and prioritizing subsets of variables according to their roles in discrimination of one or more mixture components. We relate the new discriminative information measures to Bayesian classification probabilities and error rates, and exemplify their use in Bayesian analysis of Dirichlet process mixture models fitted via Markov chain Monte Carlo methods as well as using a novel Bayesian expectation-maximization algorithm. We present a series of theoretical and simulated data examples to fix concepts and exhibit the utility of the approach, and compare with prior approaches. We demonstrate application in the context of automatic classification and discriminative variable selection in high-throughput systems biology using large flow cytometry datasets.
Bayesian Model Selection with Network Based Diffusion Analysis.
Whalen, Andrew; Hoppitt, William J E
2016-01-01
A number of recent studies have used Network Based Diffusion Analysis (NBDA) to detect the role of social transmission in the spread of a novel behavior through a population. In this paper we present a unified framework for performing NBDA in a Bayesian setting, and demonstrate how the Watanabe Akaike Information Criteria (WAIC) can be used for model selection. We present a specific example of applying this method to Time to Acquisition Diffusion Analysis (TADA). To examine the robustness of this technique, we performed a large scale simulation study and found that NBDA using WAIC could recover the correct model of social transmission under a wide range of cases, including under the presence of random effects, individual level variables, and alternative models of social transmission. This work suggests that NBDA is an effective and widely applicable tool for uncovering whether social transmission underpins the spread of a novel behavior, and may still provide accurate results even when key model assumptions are relaxed.
Dissecting magnetar variability with Bayesian hierarchical models
Huppenkothen, D; Hogg, D W; Murray, I; Frean, M; Elenbaas, C; Watts, A L; Levin, Y; van der Horst, A J; Kouveliotou, C
2015-01-01
Neutron stars are a prime laboratory for testing physical processes under conditions of strong gravity, high density, and extreme magnetic fields. Among the zoo of neutron star phenomena, magnetars stand out for their bursting behaviour, ranging from extremely bright, rare giant flares to numerous, less energetic recurrent bursts. The exact trigger and emission mechanisms for these bursts are not known; favoured models involve either a crust fracture and subsequent energy release into the magnetosphere, or explosive reconnection of magnetic field lines. In the absence of a predictive model, understanding the physical processes responsible for magnetar burst variability is difficult. Here, we develop an empirical model that decomposes magnetar bursts into a superposition of small spike-like features with a simple functional form, where the number of model components is itself part of the inference problem. The cascades of spikes that we model might be formed by avalanches of reconnection, or crust rupture afte...
Bayesian feature selection to estimate customer survival
Figini, Silvia; Giudici, Paolo; Brooks, S P
2006-01-01
We consider the problem of estimating the lifetime value of customers, when a large number of features are present in the data. In order to measure lifetime value we use survival analysis models to estimate customer tenure. In such a context, a number of classical modelling challenges arise. We will show how our proposed Bayesian methods perform, and compare it with classical churn models on a real case study. More specifically, based on data from a media service company, our aim will be to p...
Bayesian genomic selection: the effect of haplotype lenghts and priors
DEFF Research Database (Denmark)
Villumsen, Trine Michelle; Janss, Luc
2009-01-01
Breeding values for animals with marker data are estimated using a genomic selection approach where data is analyzed using Bayesian multi-marker association models. Fourteen model scenarios with varying haplotype lengths, hyper parameter and prior distributions were compared to find the scenario ...
Bayesian Model Selection for LISA Pathfinder
Karnesis, Nikolaos; Sopuerta, Carlos F; Gibert, Ferran; Armano, Michele; Audley, Heather; Congedo, Giuseppe; Diepholz, Ingo; Ferraioli, Luigi; Hewitson, Martin; Hueller, Mauro; Korsakova, Natalia; Plagnol, Eric; Vitale, and Stefano
2013-01-01
The main goal of the LISA Pathfinder (LPF) mission is to fully characterize the acceleration noise models and to test key technologies for future space-based gravitational-wave observatories similar to the LISA/eLISA concept. The Data Analysis (DA) team has developed complex three-dimensional models of the LISA Technology Package (LTP) experiment on-board LPF. These models are used for simulations, but more importantly, they will be used for parameter estimation purposes during flight operations. One of the tasks of the DA team is to identify the physical effects that contribute significantly to the properties of the instrument noise. A way of approaching to this problem is to recover the essential parameters of the LTP which describe the data. Thus, we want to define the simplest model that efficiently explains the observations. To do so, adopting a Bayesian framework, one has to estimate the so-called Bayes Factor between two competing models. In our analysis, we use three main different methods to estimate...
Empirical evaluation of scoring functions for Bayesian network model selection.
Liu, Zhifa; Malone, Brandon; Yuan, Changhe
2012-01-01
In this work, we empirically evaluate the capability of various scoring functions of Bayesian networks for recovering true underlying structures. Similar investigations have been carried out before, but they typically relied on approximate learning algorithms to learn the network structures. The suboptimal structures found by the approximation methods have unknown quality and may affect the reliability of their conclusions. Our study uses an optimal algorithm to learn Bayesian network structures from datasets generated from a set of gold standard Bayesian networks. Because all optimal algorithms always learn equivalent networks, this ensures that only the choice of scoring function affects the learned networks. Another shortcoming of the previous studies stems from their use of random synthetic networks as test cases. There is no guarantee that these networks reflect real-world data. We use real-world data to generate our gold-standard structures, so our experimental design more closely approximates real-world situations. A major finding of our study suggests that, in contrast to results reported by several prior works, the Minimum Description Length (MDL) (or equivalently, Bayesian information criterion (BIC)) consistently outperforms other scoring functions such as Akaike's information criterion (AIC), Bayesian Dirichlet equivalence score (BDeu), and factorized normalized maximum likelihood (fNML) in recovering the underlying Bayesian network structures. We believe this finding is a result of using both datasets generated from real-world applications rather than from random processes used in previous studies and learning algorithms to select high-scoring structures rather than selecting random models. Other findings of our study support existing work, e.g., large sample sizes result in learning structures closer to the true underlying structure; the BDeu score is sensitive to the parameter settings; and the fNML performs pretty well on small datasets. We also
Eight-Dimensional Mid-Infrared/Optical Bayesian Quasar Selection
Richards, Gordon T; Lacy, Mark; Myers, Adam D; Nichol, Robert C; Zakamska, Nadia L; Brunner, Robert J; Brandt, W N; Gray, Alexander G; Parejko, John K; Ptak, Andrew; Schneider, Donald P; Storrie-Lombardi, Lisa J; Szalay, Alexander S
2008-01-01
We explore the multidimensional, multiwavelength selection of quasars from mid-IR (MIR) plus optical data, specifically from Spitzer-IRAC and the Sloan Digital Sky Survey (SDSS). We apply modern statistical techniques to combined Spitzer MIR and SDSS optical data, allowing up to 8-D color selection of quasars. Using a Bayesian selection method, we catalog 5546 quasar candidates to an 8.0um depth of 56uJy over an area of ~24 sq. deg; ~70% of these candidates are not identified by applying the same Bayesian algorithm to 4-color SDSS optical data alone. Our selection recovers 97.7% of known type 1 quasars in this area and greatly improves the effectiveness of identifying 3.5
BASE-9: Bayesian Analysis for Stellar Evolution with nine variables
Robinson, Elliot; von Hippel, Ted; Stein, Nathan; Stenning, David; Wagner-Kaiser, Rachel; Si, Shijing; van Dyk, David
2016-08-01
The BASE-9 (Bayesian Analysis for Stellar Evolution with nine variables) software suite recovers star cluster and stellar parameters from photometry and is useful for analyzing single-age, single-metallicity star clusters, binaries, or single stars, and for simulating such systems. BASE-9 uses a Markov chain Monte Carlo (MCMC) technique along with brute force numerical integration to estimate the posterior probability distribution for the age, metallicity, helium abundance, distance modulus, line-of-sight absorption, and parameters of the initial-final mass relation (IFMR) for a cluster, and for the primary mass, secondary mass (if a binary), and cluster probability for every potential cluster member. The MCMC technique is used for the cluster quantities (the first six items listed above) and numerical integration is used for the stellar quantities (the last three items in the above list).
Lee, Sik-Yum; Song, Xin-Yuan; Tang, Nian-Sheng
2007-01-01
The analysis of interaction among latent variables has received much attention. This article introduces a Bayesian approach to analyze a general structural equation model that accommodates the general nonlinear terms of latent variables and covariates. This approach produces a Bayesian estimate that has the same statistical optimal properties as a…
Benchmarking Variable Selection in QSAR.
Eklund, Martin; Norinder, Ulf; Boyer, Scott; Carlsson, Lars
2012-02-01
Variable selection is important in QSAR modeling since it can improve model performance and transparency, as well as reduce the computational cost of model fitting and predictions. Which variable selection methods that perform well in QSAR settings is largely unknown. To address this question we, in a total of 1728 benchmarking experiments, rigorously investigated how eight variable selection methods affect the predictive performance and transparency of random forest models fitted to seven QSAR datasets covering different endpoints, descriptors sets, types of response variables, and number of chemical compounds. The results show that univariate variable selection methods are suboptimal and that the number of variables in the benchmarked datasets can be reduced with about 60 % without significant loss in model performance when using multivariate adaptive regression splines MARS and forward selection.
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-12-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.
Bayesian accounts of covert selective attention: A tutorial review.
Vincent, Benjamin T
2015-05-01
Decision making and optimal observer models offer an important theoretical approach to the study of covert selective attention. While their probabilistic formulation allows quantitative comparison to human performance, the models can be complex and their insights are not always immediately apparent. Part 1 establishes the theoretical appeal of the Bayesian approach, and introduces the way in which probabilistic approaches can be applied to covert search paradigms. Part 2 presents novel formulations of Bayesian models of 4 important covert attention paradigms, illustrating optimal observer predictions over a range of experimental manipulations. Graphical model notation is used to present models in an accessible way and Supplementary Code is provided to help bridge the gap between model theory and practical implementation. Part 3 reviews a large body of empirical and modelling evidence showing that many experimental phenomena in the domain of covert selective attention are a set of by-products. These effects emerge as the result of observers conducting Bayesian inference with noisy sensory observations, prior expectations, and knowledge of the generative structure of the stimulus environment.
Variable Selection with Exponential Weights and $l_0$-Penalization
Arias-Castro, Ery; Lounici, Karim
2012-01-01
In the context of a linear model with a sparse coefficient vector, exponential weights methods have been shown to be achieve oracle inequalities for prediction. We show that such methods also succeed at variable selection and estimation under the necessary identifiability condition on the design matrix, instead of much stronger assumptions required by other methods such as the Lasso or the Dantzig Selector. The same analysis yields consistency results for Bayesian methods and BIC-type variabl...
Adaptive Robust Variable Selection
Fan, Jianqing; Barut, Emre
2012-01-01
Heavy-tailed high-dimensional data are commonly encountered in various scientific fields and pose great challenges to modern statistical analysis. A natural procedure to address this problem is to use penalized least absolute deviation (LAD) method with weighted $L_1$-penalty, called weighted robust Lasso (WR-Lasso), in which weights are introduced to ameliorate the bias problem induced by the $L_1$-penalty. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of the WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is c...
Variable Selection in Discriminant Analysis.
Huberty, Carl J.; Mourad, Salah A.
Methods for ordering and selecting variables for discriminant analysis in multiple group comparison or group prediction studies include: univariate Fs, stepwise analysis, learning discriminant function (LDF) variable correlations, communalities, LDF standardized coefficients, and weighted standardized coefficients. Five indices based on distance,…
A hierarchical Bayesian framework for force field selection in molecular dynamics simulations.
Wu, S; Angelikopoulos, P; Papadimitriou, C; Moser, R; Koumoutsakos, P
2016-02-13
We present a hierarchical Bayesian framework for the selection of force fields in molecular dynamics (MD) simulations. The framework associates the variability of the optimal parameters of the MD potentials under different environmental conditions with the corresponding variability in experimental data. The high computational cost associated with the hierarchical Bayesian framework is reduced by orders of magnitude through a parallelized Transitional Markov Chain Monte Carlo method combined with the Laplace Asymptotic Approximation. The suitability of the hierarchical approach is demonstrated by performing MD simulations with prescribed parameters to obtain data for transport coefficients under different conditions, which are then used to infer and evaluate the parameters of the MD model. We demonstrate the selection of MD models based on experimental data and verify that the hierarchical model can accurately quantify the uncertainty across experiments; improve the posterior probability density function estimation of the parameters, thus, improve predictions on future experiments; identify the most plausible force field to describe the underlying structure of a given dataset. The framework and associated software are applicable to a wide range of nanoscale simulations associated with experimental data with a hierarchical structure.
Colombani, C; Legarra, A; Fritz, S; Guillaume, F; Croiseau, P; Ducrocq, V; Robert-Granié, C
2013-01-01
Recently, the amount of available single nucleotide polymorphism (SNP) marker data has considerably increased in dairy cattle breeds, both for research purposes and for application in commercial breeding and selection programs. Bayesian methods are currently used in the genomic evaluation of dairy cattle to handle very large sets of explanatory variables with a limited number of observations. In this study, we applied 2 bayesian methods, BayesCπ and bayesian least absolute shrinkage and selection operator (LASSO), to 2 genotyped and phenotyped reference populations consisting of 3,940 Holstein bulls and 1,172 Montbéliarde bulls with approximately 40,000 polymorphic SNP. We compared the accuracy of the bayesian methods for the prediction of 3 traits (milk yield, fat content, and conception rate) with pedigree-based BLUP, genomic BLUP, partial least squares (PLS) regression, and sparse PLS regression, a variable selection PLS variant. The results showed that the correlations between observed and predicted phenotypes were similar in BayesCπ (including or not pedigree information) and bayesian LASSO for most of the traits and whatever the breed. In the Holstein breed, bayesian methods led to higher correlations than other approaches for fat content and were similar to genomic BLUP for milk yield and to genomic BLUP and PLS regression for the conception rate. In the Montbéliarde breed, no method dominated the others, except BayesCπ for fat content. The better performances of the bayesian methods for fat content in Holstein and Montbéliarde breeds are probably due to the effect of the DGAT1 gene. The SNP identified by the BayesCπ, bayesian LASSO, and sparse PLS regression methods, based on their effect on the different traits of interest, were located at almost the same position on the genome. As the bayesian methods resulted in regressions of direct genomic values on daughter trait deviations closer to 1 than for the other methods tested in this study, bayesian
Bayesian modeling of ChIP-chip data using latent variables.
Wu, Mingqi
2009-10-26
BACKGROUND: The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations. RESULTS: In this paper, we propose a Bayesian latent model for the ChIP-chip data. The new model mainly differs from the existing Bayesian models, such as the joint deconvolution model, the hierarchical gamma mixture model, and the Bayesian hierarchical model, in two respects. Firstly, it works on the difference between the averaged treatment and control samples. This enables the use of a simple model for the data, which avoids the probe-specific effect and the sample (control/treatment) effect. As a consequence, this enables an efficient MCMC simulation of the posterior distribution of the model, and also makes the model more robust to the outliers. Secondly, it models the neighboring dependence of probes by introducing a latent indicator vector. A truncated Poisson prior distribution is assumed for the latent indicator variable, with the rationale being justified at length. CONCLUSION: The Bayesian latent method is successfully applied to real and ten simulated datasets, with comparisons with some of the existing Bayesian methods, hidden Markov model methods, and sliding window methods. The numerical results indicate that the
Bayesian selection of nucleotide substitution models and their site assignments.
Wu, Chieh-Hsi; Suchard, Marc A; Drummond, Alexei J
2013-03-01
Probabilistic inference of a phylogenetic tree from molecular sequence data is predicated on a substitution model describing the relative rates of change between character states along the tree for each site in the multiple sequence alignment. Commonly, one assumes that the substitution model is homogeneous across sites within large partitions of the alignment, assigns these partitions a priori, and then fixes their underlying substitution model to the best-fitting model from a hierarchy of named models. Here, we introduce an automatic model selection and model averaging approach within a Bayesian framework that simultaneously estimates the number of partitions, the assignment of sites to partitions, the substitution model for each partition, and the uncertainty in these selections. This new approach is implemented as an add-on to the BEAST 2 software platform. We find that this approach dramatically improves the fit of the nucleotide substitution model compared with existing approaches, and we show, using a number of example data sets, that as many as nine partitions are required to explain the heterogeneity in nucleotide substitution process across sites in a single gene analysis. In some instances, this improved modeling of the substitution process can have a measurable effect on downstream inference, including the estimated phylogeny, relative divergence times, and effective population size histories.
Feature Selection for Bayesian Evaluation of Trauma Death Risk
Jakaite, L
2008-01-01
In the last year more than 70,000 people have been brought to the UK hospitals with serious injuries. Each time a clinician has to urgently take a patient through a screening procedure to make a reliable decision on the trauma treatment. Typically, such procedure comprises around 20 tests; however the condition of a trauma patient remains very difficult to be tested properly. What happens if these tests are ambiguously interpreted, and information about the severity of the injury will come misleading? The mistake in a decision can be fatal: using a mild treatment can put a patient at risk of dying from posttraumatic shock, while using an overtreatment can also cause death. How can we reduce the risk of the death caused by unreliable decisions? It has been shown that probabilistic reasoning, based on the Bayesian methodology of averaging over decision models, allows clinicians to evaluate the uncertainty in decision making. Based on this methodology, in this paper we aim at selecting the most important screeni...
Family background variables as instruments for education in income regressions: A Bayesian analysis
L.F. Hoogerheide (Lennart); J.H. Block (Jörn); A.R. Thurik (Roy)
2012-01-01
textabstractThe validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the
Asymptotic accuracy of Bayesian estimation for a single latent variable.
Yamazaki, Keisuke
2015-09-01
In data science and machine learning, hierarchical parametric models, such as mixture models, are often used. They contain two kinds of variables: observable variables, which represent the parts of the data that can be directly measured, and latent variables, which represent the underlying processes that generate the data. Although there has been an increase in research on the estimation accuracy for observable variables, the theoretical analysis of estimating latent variables has not been thoroughly investigated. In a previous study, we determined the accuracy of a Bayes estimation for the joint probability of the latent variables in a dataset, and we proved that the Bayes method is asymptotically more accurate than the maximum-likelihood method. However, the accuracy of the Bayes estimation for a single latent variable remains unknown. In the present paper, we derive the asymptotic expansions of the error functions, which are defined by the Kullback-Leibler divergence, for two types of single-variable estimations when the statistical regularity is satisfied. Our results indicate that the accuracies of the Bayes and maximum-likelihood methods are asymptotically equivalent and clarify that the Bayes method is only advantageous for multivariable estimations.
Directory of Open Access Journals (Sweden)
Guillaume Marrelec
Full Text Available The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity, provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.
Energy Technology Data Exchange (ETDEWEB)
Elsheikh, Ahmed H., E-mail: aelsheikh@ices.utexas.edu [Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, TX (United States); Institute of Petroleum Engineering, Heriot-Watt University, Edinburgh EH14 4AS (United Kingdom); Wheeler, Mary F. [Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, TX (United States); Hoteit, Ibrahim [Department of Earth Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal (Saudi Arabia)
2014-02-01
A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems.
Bayesian inference of selection in a heterogeneous environment from genetic time-series data.
Gompert, Zachariah
2016-01-01
Evolutionary geneticists have sought to characterize the causes and molecular targets of selection in natural populations for many years. Although this research programme has been somewhat successful, most statistical methods employed were designed to detect consistent, weak to moderate selection. In contrast, phenotypic studies in nature show that selection varies in time and that individual bouts of selection can be strong. Measurements of the genomic consequences of such fluctuating selection could help test and refine hypotheses concerning the causes of ecological specialization and the maintenance of genetic variation in populations. Herein, I proposed a Bayesian nonhomogeneous hidden Markov model to estimate effective population sizes and quantify variable selection in heterogeneous environments from genetic time-series data. The model is described and then evaluated using a series of simulated data, including cases where selection occurs on a trait with a simple or polygenic molecular basis. The proposed method accurately distinguished neutral loci from non-neutral loci under strong selection, but not from those under weak selection. Selection coefficients were accurately estimated when selection was constant or when the fitness values of genotypes varied linearly with the environment, but these estimates were less accurate when fitness was polygenic or the relationship between the environment and the fitness of genotypes was nonlinear. Past studies of temporal evolutionary dynamics in laboratory populations have been remarkably successful. The proposed method makes similar analyses of genetic time-series data from natural populations more feasible and thereby could help answer fundamental questions about the causes and consequences of evolution in the wild.
Bayesian parameter inference and model selection by population annealing in systems biology.
Murakami, Yohei
2014-01-01
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named "posterior parameter ensemble". We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.
Instrumental Variable Bayesian Model Averaging via Conditional Bayes Factors
Karl, Anna; Lenkoski, Alex
2012-01-01
We develop a method to perform model averaging in two-stage linear regression systems subject to endogeneity. Our method extends an existing Gibbs sampler for instrumental variables to incorporate a component of model uncertainty. Direct evaluation of model probabilities is intractable in this setting. We show that by nesting model moves inside the Gibbs sampler, model comparison can be performed via conditional Bayes factors, leading to straightforward calculations. This new Gibbs sampler is...
Directory of Open Access Journals (Sweden)
Abel Palafox
2014-01-01
Full Text Available We address a prototype inverse scattering problem in the interface of applied mathematics, statistics, and scientific computing. We pose the acoustic inverse scattering problem in a Bayesian inference perspective and simulate from the posterior distribution using MCMC. The PDE forward map is implemented using high performance computing methods. We implement a standard Bayesian model selection method to estimate an effective number of Fourier coefficients that may be retrieved from noisy data within a standard formulation.
Directory of Open Access Journals (Sweden)
D. Das
2014-04-01
Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.
Bayesian data fusion for spatial prediction of categorical variables in environmental sciences
Energy Technology Data Exchange (ETDEWEB)
Gengler, Sarah, E-mail: sarahgengler@gmail.com; Bogaert, Patrick, E-mail: sarahgengler@gmail.com [Earth and Life Institute, Environmental Sciences. Université catholique de Louvain, Croix du Sud 2/L7.05.16, B-1348 Louvain-la-Neuve (Belgium)
2014-12-05
First developed to predict continuous variables, Bayesian Maximum Entropy (BME) has become a complete framework in the context of space-time prediction since it has been extended to predict categorical variables and mixed random fields. This method proposes solutions to combine several sources of data whatever the nature of the information. However, the various attempts that were made for adapting the BME methodology to categorical variables and mixed random fields faced some limitations, as a high computational burden. The main objective of this paper is to overcome this limitation by generalizing the Bayesian Data Fusion (BDF) theoretical framework to categorical variables, which is somehow a simplification of the BME method through the convenient conditional independence hypothesis. The BDF methodology for categorical variables is first described and then applied to a practical case study: the estimation of soil drainage classes using a soil map and point observations in the sandy area of Flanders around the city of Mechelen (Belgium). The BDF approach is compared to BME along with more classical approaches, as Indicator CoKringing (ICK) and logistic regression. Estimators are compared using various indicators, namely the Percentage of Correctly Classified locations (PCC) and the Average Highest Probability (AHP). Although BDF methodology for categorical variables is somehow a simplification of BME approach, both methods lead to similar results and have strong advantages compared to ICK and logistic regression.
Improving water quality assessments through a hierarchical Bayesian analysis of variability.
Gronewold, Andrew D; Borsuk, Mark E
2010-10-15
Water quality measurement error and variability, while well-documented in laboratory-scale studies, is rarely acknowledged or explicitly resolved in most model-based water body assessments, including those conducted in compliance with the United States Environmental Protection Agency (USEPA) Total Maximum Daily Load (TMDL) program. Consequently, proposed pollutant loading reductions in TMDLs and similar water quality management programs may be biased, resulting in either slower-than-expected rates of water quality restoration and designated use reinstatement or, in some cases, overly conservative management decisions. To address this problem, we present a hierarchical Bayesian approach for relating actual in situ or model-predicted pollutant concentrations to multiple sampling and analysis procedures, each with distinct sources of variability. We apply this method to recently approved TMDLs to investigate whether appropriate accounting for measurement error and variability will lead to different management decisions. We find that required pollutant loading reductions may in fact vary depending not only on how measurement variability is addressed but also on which water quality analysis procedure is used to assess standard compliance. As a general strategy, our Bayesian approach to quantifying variability may represent an alternative to the common practice of addressing all forms of uncertainty through an arbitrary margin of safety (MOS).
Directory of Open Access Journals (Sweden)
Ali Ahmed
2011-01-01
Full Text Available Problem statement: Similarity based Virtual Screening (VS deals with a large amount of data containing irrelevant and/or redundant fragments or features. Recent use of Bayesian network as an alternative for existing tools for similarity based VS has received noticeable attention of the researchers in the field of chemoinformatics. Approach: To this end, different models of Bayesian network have been developed. In this study, we enhance the Bayesian Inference Network (BIN using a subset of selected molecules features. Results: In this approach, a few features were filtered from the molecular fingerprint features based on a features selection approach. Conclusion: Simulated virtual screening experiments with MDL Drug Data Report (MDDR data sets showed that the proposed method provides simple ways of enhancing the cost effectiveness of ligand-based virtual screening searches, especially for higher diversity data set.
Bayesian estimation in IRT models with missing values in background variables
Directory of Open Access Journals (Sweden)
Christian Aßmann
2015-12-01
Full Text Available Large scale assessment studies typically aim at investigating the relationship between persons competencies and explaining variables. Individual competencies are often estimated by explicitly including explaining background variables into corresponding Item Response Theory models. Since missing values in background variables inevitably occur, strategies to handle the uncertainty related to missing values in parameter estimation are required. We propose to adapt a Bayesian estimation strategy based on Markov Chain Monte Carlo techniques. Sampling from the posterior distribution of parameters is thereby enriched by sampling from the full conditional distribution of the missing values. We consider non-parametric as well as parametric approximations for the full conditional distributions of missing values, thus allowing for a flexible incorporation of metric as well as categorical background variables. We evaluate the validity of our approach with respect to statistical accuracy by a simulation study controlling the missing values generating mechanism. We show that the proposed Bayesian strategy allows for effective comparison of nested model specifications via gauging highest posterior density intervals of all involved model parameters. An illustration of the suggested approach uses data from the National Educational Panel Study on mathematical competencies of fifth grade students.
Variable and subset selection in PLS regression
DEFF Research Database (Denmark)
Høskuldsson, Agnar
2001-01-01
The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...
Variable selection by lasso-type methods
Directory of Open Access Journals (Sweden)
Sohail Chand
2011-09-01
Full Text Available Variable selection is an important property of shrinkage methods. The adaptive lasso is an oracle procedure and can do consistent variable selection. In this paper, we provide an explanation that how use of adaptive weights make it possible for the adaptive lasso to satisfy the necessary and almost sufcient condition for consistent variable selection. We suggest a novel algorithm and give an important result that for the adaptive lasso if predictors are normalised after the introduction of adaptive weights, it makes the adaptive lasso performance identical to the lasso.
A Bayesian variable selection procedure for ranking overlapping gene sets
DEFF Research Database (Denmark)
Skarman, Axel; Mahdi Shariati, Mohammad; Janss, Luc;
2012-01-01
Background Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been de...
A Bayesian outlier criterion to detect SNPs under selection in large data sets.
Directory of Open Access Journals (Sweden)
Mathieu Gautier
Full Text Available BACKGROUND: The recent advent of high-throughput SNP genotyping technologies has opened new avenues of research for population genetics. In particular, a growing interest in the identification of footprints of selection, based on genome scans for adaptive differentiation, has emerged. METHODOLOGY/PRINCIPAL FINDINGS: The purpose of this study is to develop an efficient model-based approach to perform bayesian exploratory analyses for adaptive differentiation in very large SNP data sets. The basic idea is to start with a very simple model for neutral loci that is easy to implement under a bayesian framework and to identify selected loci as outliers via Posterior Predictive P-values (PPP-values. Applications of this strategy are considered using two different statistical models. The first one was initially interpreted in the context of populations evolving respectively under pure genetic drift from a common ancestral population while the second one relies on populations under migration-drift equilibrium. Robustness and power of the two resulting bayesian model-based approaches to detect SNP under selection are further evaluated through extensive simulations. An application to a cattle data set is also provided. CONCLUSIONS/SIGNIFICANCE: The procedure described turns out to be much faster than former bayesian approaches and also reasonably efficient especially to detect loci under positive selection.
Variable selection with error control: Another look at Stability Selection
Shah, Rajen
2011-01-01
Stability Selection was recently introduced by Meinshausen and Buhlmann (2010) as a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection procedure to subsamples of the data. We introduce a variant, called Complementary Pairs Stability Selection (CPSS), and derive bounds both on the expected number of variables included by CPSS that have low selection probability under the original procedure, and on the expected number of high selection probability variables that are excluded. These results require no (e.g. exchangeability) assumptions on the underlying model or on the quality of the original selection procedure. Under reasonable shape restrictions, the bounds can be further tightened, yielding improved error control, and therefore increasing the applicability of the methodology.
Diagnosing Hybrid Systems: a Bayesian Model Selection Approach
McIlraith, Sheila A.
2005-01-01
In this paper we examine the problem of monitoring and diagnosing noisy complex dynamical systems that are modeled as hybrid systems-models of continuous behavior, interleaved by discrete transitions. In particular, we examine continuous systems with embedded supervisory controllers that experience abrupt, partial or full failure of component devices. Building on our previous work in this area (MBCG99;MBCG00), our specific focus in this paper ins on the mathematical formulation of the hybrid monitoring and diagnosis task as a Bayesian model tracking algorithm. The nonlinear dynamics of many hybrid systems present challenges to probabilistic tracking. Further, probabilistic tracking of a system for the purposes of diagnosis is problematic because the models of the system corresponding to failure modes are numerous and generally very unlikely. To focus tracking on these unlikely models and to reduce the number of potential models under consideration, we exploit logic-based techniques for qualitative model-based diagnosis to conjecture a limited initial set of consistent candidate models. In this paper we discuss alternative tracking techniques that are relevant to different classes of hybrid systems, focusing specifically on a method for tracking multiple models of nonlinear behavior simultaneously using factored sampling and conditional density propagation. To illustrate and motivate the approach described in this paper we examine the problem of monitoring and diganosing NASA's Sprint AERCam, a small spherical robotic camera unit with 12 thrusters that enable both linear and rotational motion.
Energy Technology Data Exchange (ETDEWEB)
Placek, Ben; Knuth, Kevin H. [Physics Department, University at Albany (SUNY), Albany, NY 12222 (United States); Angerhausen, Daniel, E-mail: bplacek@albany.edu, E-mail: kknuth@albany.edu, E-mail: daniel.angerhausen@gmail.com [Department of Physics, Applied Physics, and Astronomy, Rensselear Polytechnic Institute, Troy, NY 12180 (United States)
2014-11-10
EXONEST is an algorithm dedicated to detecting and characterizing the photometric signatures of exoplanets, which include reflection and thermal emission, Doppler boosting, and ellipsoidal variations. Using Bayesian inference, we can test between competing models that describe the data as well as estimate model parameters. We demonstrate this approach by testing circular versus eccentric planetary orbital models, as well as testing for the presence or absence of four photometric effects. In addition to using Bayesian model selection, a unique aspect of EXONEST is the potential capability to distinguish between reflective and thermal contributions to the light curve. A case study is presented using Kepler data recorded from the transiting planet KOI-13b. By considering only the nontransiting portions of the light curve, we demonstrate that it is possible to estimate the photometrically relevant model parameters of KOI-13b. Furthermore, Bayesian model testing confirms that the orbit of KOI-13b has a detectable eccentricity.
Placek, Ben; Angerhausen, Daniel
2013-01-01
EXONEST is an algorithm dedicated to detecting and characterizing the photometric signatures of exoplanets, which include reflection and thermal emission, Doppler boosting, and ellipsoidal variations. Using Bayesian Inference, we can test between competing models that describe the data as well as estimate model parameters. We demonstrate this approach by testing circular versus eccentric planetary orbital models, as well as testing for the presence or absence of four photometric effects. In addition to using Bayesian Model Selection, a unique aspect of EXONEST is the capability to distinguish between reflective and thermal contributions to the light curve. A case-study is presented using Kepler data recorded from the transiting planet KOI-13b. By considering only the non-transiting portions of the light curve, we demonstrate that it is possible to estimate the photometrically-relevant model parameters of KOI-13b. Furthermore, Bayesian model testing confirms that the orbit of KOI-13b has a detectable eccentric...
Placek, Ben; Knuth, Kevin H.; Angerhausen, Daniel
2014-11-01
EXONEST is an algorithm dedicated to detecting and characterizing the photometric signatures of exoplanets, which include reflection and thermal emission, Doppler boosting, and ellipsoidal variations. Using Bayesian inference, we can test between competing models that describe the data as well as estimate model parameters. We demonstrate this approach by testing circular versus eccentric planetary orbital models, as well as testing for the presence or absence of four photometric effects. In addition to using Bayesian model selection, a unique aspect of EXONEST is the potential capability to distinguish between reflective and thermal contributions to the light curve. A case study is presented using Kepler data recorded from the transiting planet KOI-13b. By considering only the nontransiting portions of the light curve, we demonstrate that it is possible to estimate the photometrically relevant model parameters of KOI-13b. Furthermore, Bayesian model testing confirms that the orbit of KOI-13b has a detectable eccentricity.
Elsheikh, Ahmed H.
2014-02-01
A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems. © 2013 Elsevier Inc.
Variable Selection in Logistic Regression Mo del
Institute of Scientific and Technical Information of China (English)
ZHANG Shangli; ZHANG Lili; QIU Kuanmin; LU Ying; CAI Baigen
2015-01-01
Variable selection is one of the most impor-tant problems in pattern recognition. In linear regression model, there are many methods can solve this problem, such as Least absolute shrinkage and selection operator (LASSO) and many improved LASSO methods, but there are few variable selection methods in generalized linear models. We study the variable selection problem in logis-tic regression model. We propose a new variable selection method–the logistic elastic net, prove that it has grouping eff ect which means that the strongly correlated predictors tend to be in or out of the model together. The logistic elastic net is particularly useful when the number of pre-dictors (p) is much bigger than the number of observations (n). By contrast, the LASSO is not a very satisfactory vari-able selection method in the case when p is more larger than n. The advantage and eff ectiveness of this method are demonstrated by real leukemia data and a simulation study.
Xu, Lizhen; Paterson, Andrew D; Xu, Wei
2017-04-01
Motivated by the multivariate nature of microbiome data with hierarchical taxonomic clusters, counts that are often skewed and zero inflated, and repeated measures, we propose a Bayesian latent variable methodology to jointly model multiple operational taxonomic units within a single taxonomic cluster. This novel method can incorporate both negative binomial and zero-inflated negative binomial responses, and can account for serial and familial correlations. We develop a Markov chain Monte Carlo algorithm that is built on a data augmentation scheme using Pólya-Gamma random variables. Hierarchical centering and parameter expansion techniques are also used to improve the convergence of the Markov chain. We evaluate the performance of our proposed method through extensive simulations. We also apply our method to a human microbiome study.
Bayesian model selection in complex linear systems, as illustrated in genetic association studies.
Wen, Xiaoquan
2014-03-01
Motivated by examples from genetic association studies, this article considers the model selection problem in a general complex linear model system and in a Bayesian framework. We discuss formulating model selection problems and incorporating context-dependent a priori information through different levels of prior specifications. We also derive analytic Bayes factors and their approximations to facilitate model selection and discuss their theoretical and computational properties. We demonstrate our Bayesian approach based on an implemented Markov Chain Monte Carlo (MCMC) algorithm in simulations and a real data application of mapping tissue-specific eQTLs. Our novel results on Bayes factors provide a general framework to perform efficient model comparisons in complex linear model systems.
A biological mechanism for Bayesian feature selection: Weight decay and raising the LASSO.
Connor, Patrick; Hollensen, Paul; Krigolson, Olav; Trappenberg, Thomas
2015-07-01
Biological systems are capable of learning that certain stimuli are valuable while ignoring the many that are not, and thus perform feature selection. In machine learning, one effective feature selection approach is the least absolute shrinkage and selection operator (LASSO) form of regularization, which is equivalent to assuming a Laplacian prior distribution on the parameters. We review how such Bayesian priors can be implemented in gradient descent as a form of weight decay, which is a biologically plausible mechanism for Bayesian feature selection. In particular, we describe a new prior that offsets or "raises" the Laplacian prior distribution. We evaluate this alongside the Gaussian and Cauchy priors in gradient descent using a generic regression task where there are few relevant and many irrelevant features. We find that raising the Laplacian leads to less prediction error because it is a better model of the underlying distribution. We also consider two biologically relevant online learning tasks, one synthetic and one modeled after the perceptual expertise task of Krigolson et al. (2009). Here, raising the Laplacian prior avoids the fast erosion of relevant parameters over the period following training because it only allows small weights to decay. This better matches the limited loss of association seen between days in the human data of the perceptual expertise task. Raising the Laplacian prior thus results in a biologically plausible form of Bayesian feature selection that is effective in biologically relevant contexts.
Directory of Open Access Journals (Sweden)
Brentani Helena
2004-08-01
Full Text Available Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE, "Digital Northern" or Massively Parallel Signature Sequencing (MPSS, is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries" and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
A default Bayesian hypothesis test for ANOVA designs
Wetzels, R.; Grasman, R.P.P.P.; Wagenmakers, E.J.
2012-01-01
This article presents a Bayesian hypothesis test for analysis of variance (ANOVA) designs. The test is an application of standard Bayesian methods for variable selection in regression models. We illustrate the effect of various g-priors on the ANOVA hypothesis test. The Bayesian test for ANOVA desig
Sea-level variability in tide-gauge and geological records: An empirical Bayesian analysis (Invited)
Kopp, R. E.; Hay, C.; Morrow, E.; Mitrovica, J. X.; Horton, B.; Kemp, A.
2013-12-01
Sea level varies at a range of temporal and spatial scales, and understanding all its significant sources of variability is crucial to building sea-level rise projections relevant to local decision-making. In the twentieth-century record, sites along the U.S. east coast have exhibited typical year-to-year variability of several centimeters. A faster-than-global increase in sea-level rise in the northeastern United States since about 1990 has led some to hypothesize a 'sea-level rise hot spot' in this region, perhaps driven by a trend in the Atlantic Meridional Overturning Circulation related to anthropogenic climate change [1]. However, such hypotheses must be evaluated in the context of natural variability, as revealed by observational and paleo-records. Bayesian and empirical Bayesian statistical approaches are well suited for assimilating data from diverse sources, such as tide-gauges and peats, with differing data availability and uncertainties, and for identifying regionally covarying patterns within these data. We present empirical Bayesian analyses of twentieth-century tide gauge data [2]. We find that the mid-Atlantic region of the United States has experienced a clear acceleration of sea level relative to the global average since about 1990, but this acceleration does not appear to be unprecedented in the twentieth-century record. The rate and extent of this acceleration instead appears comparable to an acceleration observed in the 1930s and 1940s. Both during the earlier episode of acceleration and today, the effect appears to be significantly positively correlated with the Atlantic Multidecadal Oscillation and likely negatively correlated with the North Atlantic Oscillation [2]. The Holocene and Common Era database of geological sea-level rise proxies [3,4] may allow these relationships to be assessed beyond the span of the direct observational record. At a global scale, similar approaches can be employed to look for the spatial fingerprints of land ice
Robust cluster analysis and variable selection
Ritter, Gunter
2014-01-01
Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot
Barroso, L M A; Teodoro, P E; Nascimento, M; Torres, F E; Dos Santos, A; Corrêa, A M; Sagrilo, E; Corrêa, C C G; Silva, F A; Ceccon, G
2016-03-11
This study aimed to verify that a Bayesian approach could be used for the selection of upright cowpea genotypes with high adaptability and phenotypic stability, and the study also evaluated the efficiency of using informative and minimally informative a priori distributions. Six trials were conducted in randomized blocks, and the grain yield of 17 upright cowpea genotypes was assessed. To represent the minimally informative a priori distributions, a probability distribution with high variance was used, and a meta-analysis concept was adopted to represent the informative a priori distributions. Bayes factors were used to conduct comparisons between the a priori distributions. The Bayesian approach was effective for selection of upright cowpea genotypes with high adaptability and phenotypic stability using the Eberhart and Russell method. Bayes factors indicated that the use of informative a priori distributions provided more accurate results than minimally informative a priori distributions.
Heart rate variability estimation in photoplethysmography signals using Bayesian learning approach
Alwosheel, Ahmad; Alasaad, Amr
2016-01-01
Heart rate variability (HRV) has become a marker for various health and disease conditions. Photoplethysmography (PPG) sensors integrated in wearable devices such as smart watches and phones are widely used to measure heart activities. HRV requires accurate estimation of time interval between consecutive peaks in the PPG signal. However, PPG signal is very sensitive to motion artefact which may lead to poor HRV estimation if false peaks are detected. In this Letter, the authors propose a probabilistic approach based on Bayesian learning to better estimate HRV from PPG signal recorded by wearable devices and enhance the performance of the automatic multi scale-based peak detection (AMPD) algorithm used for peak detection. The authors’ experiments show that their approach enhances the performance of the AMPD algorithm in terms of number of HRV related metrics such as sensitivity, positive predictive value, and average temporal resolution. PMID:27382483
Almond, Russell G.; Mulder, Joris; Hemat, Lisa A.; Yan, Duanli
2006-01-01
Bayesian network models offer a large degree of flexibility for modeling dependence among observables (item outcome variables) from the same task that may be dependent. This paper explores four design patterns for modeling locally dependent observations from the same task: (1) No context--Ignore dependence among observables; (2) Compensatory…
Quasar Selection Based on Photometric Variability
MacLeod, C L; Ivezic, Z; Kochanek, C S; Gibson, R; Meisner, A; Kozlowski, S; Sesar, B; Becker, A C; de Vries, W
2010-01-01
We develop a method for separating quasars from other variable point sources using SDSS Stripe 82 light curve data for ~10,000 variable objects. To statistically describe quasar variability, we use a damped random walk model parametrized by a damping time scale, tau, and an asymptotic amplitude (structure function), SF_inf. With the aid of an SDSS spectroscopically confirmed quasar sample, we demonstrate that variability selection in typical extragalactic fields with low stellar density can deliver complete samples with reasonable purity (or efficiency, E). Compared to a selection method based solely on the slope of the structure function, the inclusion of the tau information boosts E from 60% to 75% while maintaining a highly complete sample (98%) even in the absence of color information. For a completeness of C=90%, E is boosted from 80% to 85%. Conversely, C improves from 90% to 97% while maintaining E=80% when imposing a lower limit on tau. With the aid of color selection, the purity can be further booste...
Xu, Lei; Johnson, Timothy D.; Nichols, Thomas E.; Nee, Derek E.
2010-01-01
Summary The aim of this work is to develop a spatial model for multi-subject fMRI data. There has been extensive work on univariate modeling of each voxel for single and multi-subject data, some work on spatial modeling of single-subject data, and some recent work on spatial modeling of multi-subject data. However, there has been no work on spatial models that explicitly account for inter-subject variability in activation locations. In this work, we use the idea of activation centers and model the inter-subject variability in activation locations directly. Our model is specified in a Bayesian hierarchical frame work which allows us to draw inferences at all levels: the population level, the individual level and the voxel level. We use Gaussian mixtures for the probability that an individual has a particular activation. This helps answer an important question which is not addressed by any of the previous methods: What proportion of subjects had a significant activity in a given region. Our approach incorporates the unknown number of mixture components into the model as a parameter whose posterior distribution is estimated by reversible jump Markov Chain Monte Carlo. We demonstrate our method with a fMRI study of resolving proactive interference and show dramatically better precision of localization with our method relative to the standard mass-univariate method. Although we are motivated by fMRI data, this model could easily be modified to handle other types of imaging data. PMID:19210732
Xu, Zhiqiang
2017-02-16
Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.
Simultaneous estimation and variable selection in median regression using Lasso-type penalty.
Xu, Jinfeng; Ying, Zhiliang
2010-06-01
We consider the median regression with a LASSO-type penalty term for variable selection. With the fixed number of variables in regression model, a two-stage method is proposed for simultaneous estimation and variable selection where the degree of penalty is adaptively chosen. A Bayesian information criterion type approach is proposed and used to obtain a data-driven procedure which is proved to automatically select asymptotically optimal tuning parameters. It is shown that the resultant estimator achieves the so-called oracle property. The combination of the median regression and LASSO penalty is computationally easy to implement via the standard linear programming. A random perturbation scheme can be made use of to get simple estimator of the standard error. Simulation studies are conducted to assess the finite-sample performance of the proposed method. We illustrate the methodology with a real example.
2016-01-01
DNA double-strand breaks are lesions that form during metabolism, DNA replication and exposure to mutagens. When a double-strand break occurs one of a number of repair mechanisms is recruited, all of which have differing propensities for mutational events. Despite DNA repair being of crucial importance, the relative contribution of these mechanisms and their regulatory interactions remain to be fully elucidated. Understanding these mutational processes will have a profound impact on our knowledge of genomic instability, with implications across health, disease and evolution. Here we present a new method to model the combined activation of non-homologous end joining, single strand annealing and alternative end joining, following exposure to ionising radiation. We use Bayesian statistics to integrate eight biological data sets of double-strand break repair curves under varying genetic knockouts and confirm that our model is predictive by re-simulating and comparing to additional data. Analysis of the model suggests that there are at least three disjoint modes of repair, which we assign as fast, slow and intermediate. Our results show that when multiple data sets are combined, the rate for intermediate repair is variable amongst genetic knockouts. Further analysis suggests that the ratio between slow and intermediate repair depends on the presence or absence of DNA-PKcs and Ku70, which implies that non-homologous end joining and alternative end joining are not independent. Finally, we consider the proportion of double-strand breaks within each mechanism as a time series and predict activity as a function of repair rate. We outline how our insights can be directly tested using imaging and sequencing techniques and conclude that there is evidence of variable dynamics in alternative repair pathways. Our approach is an important step towards providing a unifying theoretical framework for the dynamics of DNA repair processes. PMID:27741226
Strom, Daniel J; Joyce, Kevin E; MacLellan, Jay A; Watson, David J; Lynch, Timothy P; Antonio, Cheryl L; Birchall, Alan; Anderson, Kevin K; Zharov, Peter A
2012-04-01
In making low-level radioactivity measurements of populations, it is commonly observed that a substantial portion of net results is negative. Furthermore, the observed variance of the measurement results arises from a combination of measurement uncertainty and population variability. This paper presents a method for disaggregating measurement uncertainty from population variability to produce a probability density function (PDF) of possibly true results. To do this, simple, justifiable and reasonable assumptions are made about the relationship of the measurements to the measurands (the 'true values'). The measurements are assumed to be unbiased, that is, that their average value is the average of the measurands. Using traditional estimates of each measurement's uncertainty, a likelihood PDF for each individual's measurand is produced. Then using the same assumptions and all the data from the population of individuals, a prior PDF of measurands for the population is produced. The prior PDF is non-negative, and the average is equal to the average of the measurement results for the population. Using Bayes's theorem, posterior PDFs of each individual measurand are calculated. The uncertainty in these bayesian posterior PDFs appears to be all Berkson with no remaining classical component. The method is applied to baseline bioassay data from the Hanford site. The data include (90)Sr urinalysis measurements of 128 people, (137)Cs in vivo measurements of 5337 people and (239)Pu urinalysis measurements of 3270 people. The method produces excellent results for the (90)Sr and (137)Cs measurements, since there are non-zero concentrations of these global fallout radionuclides in people who have not been occupationally exposed. The method does not work for the (239)Pu measurements in non-occupationally exposed people because the population average is essentially zero relative to the sensitivity of the measurement technique. The method is shown to give results similar to
NetDiff - Bayesian model selection for differential gene regulatory network inference.
Thorne, Thomas
2016-12-16
Differential networks allow us to better understand the changes in cellular processes that are exhibited in conditions of interest, identifying variations in gene regulation or protein interaction between, for example, cases and controls, or in response to external stimuli. Here we present a novel methodology for the inference of differential gene regulatory networks from gene expression microarray data. Specifically we apply a Bayesian model selection approach to compare models of conserved and varying network structure, and use Gaussian graphical models to represent the network structures. We apply a variational inference approach to the learning of Gaussian graphical models of gene regulatory networks, that enables us to perform Bayesian model selection that is significantly more computationally efficient than Markov Chain Monte Carlo approaches. Our method is demonstrated to be more robust than independent analysis of data from multiple conditions when applied to synthetic network data, generating fewer false positive predictions of differential edges. We demonstrate the utility of our approach on real world gene expression microarray data by applying it to existing data from amyotrophic lateral sclerosis cases with and without mutations in C9orf72, and controls, where we are able to identify differential network interactions for further investigation.
Link, William; Sauer, John R.
2016-01-01
The analysis of ecological data has changed in two important ways over the last 15 years. The development and easy availability of Bayesian computational methods has allowed and encouraged the fitting of complex hierarchical models. At the same time, there has been increasing emphasis on acknowledging and accounting for model uncertainty. Unfortunately, the ability to fit complex models has outstripped the development of tools for model selection and model evaluation: familiar model selection tools such as Akaike's information criterion and the deviance information criterion are widely known to be inadequate for hierarchical models. In addition, little attention has been paid to the evaluation of model adequacy in context of hierarchical modeling, i.e., to the evaluation of fit for a single model. In this paper, we describe Bayesian cross-validation, which provides tools for model selection and evaluation. We describe the Bayesian predictive information criterion and a Bayesian approximation to the BPIC known as the Watanabe-Akaike information criterion. We illustrate the use of these tools for model selection, and the use of Bayesian cross-validation as a tool for model evaluation, using three large data sets from the North American Breeding Bird Survey.
Link, William A; Sauer, John R
2016-07-01
The analysis of ecological data has changed in two important ways over the last 15 years. The development and easy availability of Bayesian computational methods has allowed and encouraged the fitting of complex hierarchical models. At the same time, there has been increasing emphasis on acknowledging and accounting for model uncertainty. Unfortunately, the ability to fit complex models has outstripped the development of tools for model selection and model evaluation: familiar model selection tools such as Akaike's information criterion and the deviance information criterion are widely known to be inadequate for hierarchical models. In addition, little attention has been paid to the evaluation of model adequacy in context of hierarchical modeling, i.e., to the evaluation of fit for a single model. In this paper, we describe Bayesian cross-validation, which provides tools for model selection and evaluation. We describe the Bayesian predictive information criterion and a Bayesian approximation to the BPIC known as the Watanabe-Akaike information criterion. We illustrate the use of these tools for model selection, and the use of Bayesian cross-validation as a tool for model evaluation, using three large data sets from the North American Breeding Bird Survey.
A Bayesian Network Approach for Offshore Risk Analysis Through Linguistic Variables
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
This paper presents a new approach for offshore risk analysis that is capable of dealing with linguistic probabilities in Bayesian networks (BNs). In this paper, linguistic probabilities are used to describe occurrence likelihood of hazardous events that may cause possible accidents in offshore operations. In order to use fuzzy information, an f-weighted valuation function is proposed to transform linguistic judgements into crisp probability distributions which can be easily put into a BN to model causal relationships among risk factors. The use of linguistic variables makes it easier for human experts to express their knowledge, and the transformation of linguistic judgements into crisp probabilities can significantly save the cost of computation, modifying and maintaining a BN model. The flexibility of the method allows for multiple forms of information to be used to quantify model relationships, including formally assessed expert opinion when quantitative data are lacking, or when only qualitative or vague statements can be made. The model is a modular representation of uncertain knowledge caused due to randomness, vagueness and ignorance. This makes the risk analysis of offshore engineering systems more functional and easier in many assessment contexts. Specifically, the proposed f-weighted valuation function takes into account not only the dominating values, but also the α-level values that are ignored by conventional valuation methods. A case study of the collision risk between a Floating Production, Storage and Off-loading (FPSO) unit and the authorised vessels due to human elements during operation is used to illustrate the application of the proposed model.
Variables influencing victim selection in genocide.
Komar, Debra A
2008-01-01
While victims of racially motivated violence may be identified through observation of morphological features, those targeted because of their ethnic, religious, or national identity are not easily recognized. This study examines how perpetrators of genocide recognize their victims. Court documents, including indictments, witness statements, and testimony from the International Criminal Tribunals for Rwanda and the former Yugoslavia (FY) detail the interactions between victim and assailant. A total of 6012 decedents were included in the study; only 20.8% had been positively identified. Variables influencing victim selection in Rwanda included location, segregation, incitement, and prior relationship, while significant factors in FY were segregation, location, age/gender, and social data. Additional contributing factors in both countries included self-identification, victim behavior, linguistic or clothing evidence, and morphological features. Understanding the system of recognition used by perpetrators aids investigators tasked with establishing victim identity in such prosecutions.
Liepe, Juliane; Kirk, Paul; Filippi, Sarah; Toni, Tina; Barnes, Chris P.; Stumpf, Michael P.H.
2016-01-01
As modeling becomes a more widespread practice in the life- and biomedical sciences, we require reliable tools to calibrate models against ever more complex and detailed data. Here we present an approximate Bayesian computation framework and software environment, ABC-SysBio, which enables parameter estimation and model selection in the Bayesian formalism using Sequential Monte-Carlo approaches. We outline the underlying rationale, discuss the computational and practical issues, and provide detailed guidance as to how the important tasks of parameter inference and model selection can be carried out in practice. Unlike other available packages, ABC-SysBio is highly suited for investigating in particular the challenging problem of fitting stochastic models to data. Although computationally expensive, the additional insights gained in the Bayesian formalism more than make up for this cost, especially in complex problems. PMID:24457334
Selection of Trusted Service Providers by Enforcing Bayesian Analysis in iVCE
Institute of Scientific and Technical Information of China (English)
GU Bao-jun; LI Xiao-yong; WANG Wei-nong
2008-01-01
The initiative of internet-based virtual computing environment (iVCE) aims to provide the end users and applications With a harmonious, trustworthy and transparent integrated computing environment which will facilitate sharing and collaborating of network resources between applications. Trust management is an elementary component for iVCE. The uncertain and dynamic characteristics of iVCE necessitate the requirement for the trust management to be subjective, historical evidence based and context dependent. This paper presents a Bayesian analysis-based trust model, which aims to secure the active agents for selecting appropriate trustod services in iVCE. Simulations are made to analyze the properties of the trust model which show that the subjective prior information influences trust evaluation a lot and the model stimulates positive interactions.
Bayesian sensitivity analysis of incomplete data: bridging pattern-mixture and selection models.
Kaciroti, Niko A; Raghunathan, Trivellore
2014-11-30
Pattern-mixture models (PMM) and selection models (SM) are alternative approaches for statistical analysis when faced with incomplete data and a nonignorable missing-data mechanism. Both models make empirically unverifiable assumptions and need additional constraints to identify the parameters. Here, we first introduce intuitive parameterizations to identify PMM for different types of outcome with distribution in the exponential family; then we translate these to their equivalent SM approach. This provides a unified framework for performing sensitivity analysis under either setting. These new parameterizations are transparent, easy-to-use, and provide dual interpretation from both the PMM and SM perspectives. A Bayesian approach is used to perform sensitivity analysis, deriving inferences using informative prior distributions on the sensitivity parameters. These models can be fitted using software that implements Gibbs sampling.
Introduction to Bayesian statistics
Bolstad, William M
2017-01-01
There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this Third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian staistics. The author continues to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inferenfe cfor discrete random variables, bionomial proprotion, Poisson, normal mean, and simple linear regression. In addition, newly-developing topics in the field are presented in four new chapters: Bayesian inference with unknown mean and variance; Bayesian inference for Multivariate Normal mean vector; Bayesian inference for Multiple Linear RegressionModel; and Computati...
Lee, Sik-Yum; Song, Xin-Yuan; Cai, Jing-Heng
2010-01-01
Analysis of ordered binary and unordered binary data has received considerable attention in social and psychological research. This article introduces a Bayesian approach, which has several nice features in practical applications, for analyzing nonlinear structural equation models with dichotomous data. We demonstrate how to use the software…
Melucci, L M; Birchmeier, A N; Cappa, E P; Cantet, R J C
2009-10-01
An experimental Hereford herd established in 1960 was used from 1986 to 2006 to select for increased weaning weight (W) without increasing birth weight (B). Data were B and W collected over the 47 yr from 2,124 calves. Including ancestors, the pedigree file had 2,369 animals. Selection was practiced only in males. In the first stage (1986 to 1993), mass-selected bulls were chosen with the index I = B + 9374.76 RDG (relative daily gain). From 1994 to 2006, the selection criterion for bull i was I(i) = BLUP(i)(WD) - 2.33 BLUP(i)(BD), where the BLUP were for the direct BV of B (BD) and W (WD), respectively. Predictions were obtained from a 2-trait animal model with B having only BD, and W with WD and WM (maternal additive effects). Selection response was estimated using a Bayesian approach by means of the Gibbs sampler for a 2-trait animal model including BD, BM (maternal BV for B), WD, and WM. Estimated heritabilities for BD, BM, WD, and WM were 0.40, 0.23, 0.05, and 0.23, respectively. The correlation between BD and BM was close to zero (0.01), and between WD and WM was positive (0.37). The correlation between BD and WD was 0.07, and between BM and WM was 0.58. The 2 methods used to estimate selection response gave similar results. In both periods BD decreased, whereas BM increased. The reduction of BD due to selection was slightly larger in the second period than in the first one. The regression of BV for W increased due to selection in both stages, but selection response was 21.6% larger from 1986 to 1992 than from 1993 to 2006. The maternal effect, WM increased more than 3 times compared with WD in the first period, but ended up being almost the same value as WD in period 2. The Bulmer effect was manifested by the decrease in magnitude of all (co)variance components during selection. It is concluded that selection to increase BW at weaning in beef cattle, although not increasing BW at birth, was moderately effective.
Murphy, Thomas Brendan; Dean, Nema; Raftery, Adrian E
2010-03-01
Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins.
Energy Technology Data Exchange (ETDEWEB)
Farrell, Kathryn, E-mail: kfarrell@ices.utexas.edu; Oden, J. Tinsley, E-mail: oden@ices.utexas.edu; Faghihi, Danial, E-mail: danial@ices.utexas.edu
2015-08-15
A general adaptive modeling algorithm for selection and validation of coarse-grained models of atomistic systems is presented. A Bayesian framework is developed to address uncertainties in parameters, data, and model selection. Algorithms for computing output sensitivities to parameter variances, model evidence and posterior model plausibilities for given data, and for computing what are referred to as Occam Categories in reference to a rough measure of model simplicity, make up components of the overall approach. Computational results are provided for representative applications.
Comparison of Two Gas Selection Methodologies: An Application of Bayesian Model Averaging
Energy Technology Data Exchange (ETDEWEB)
Renholds, Andrea S.; Thompson, Sandra E.; Anderson, Kevin K.; Chilton, Lawrence K.
2006-03-31
One goal of hyperspectral imagery analysis is the detection and characterization of plumes. Characterization includes identifying the gases in the plumes, which is a model selection problem. Two gas selection methods compared in this report are Bayesian model averaging (BMA) and minimum Akaike information criterion (AIC) stepwise regression (SR). Simulated spectral data from a three-layer radiance transfer model were used to compare the two methods. Test gases were chosen to span the types of spectra observed, which exhibit peaks ranging from broad to sharp. The size and complexity of the search libraries were varied. Background materials were chosen to either replicate a remote area of eastern Washington or feature many common background materials. For many cases, BMA and SR performed the detection task comparably in terms of the receiver operating characteristic curves. For some gases, BMA performed better than SR when the size and complexity of the search library increased. This is encouraging because we expect improved BMA performance upon incorporation of prior information on background materials and gases.
Variable Selection of Partially Linear Single-index Mo dels
Institute of Scientific and Technical Information of China (English)
LU Yi-qiang; HU Bin
2014-01-01
In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average variance with adaptive l1 penalty. Implementation algorithm is given. Under some regular conditions, we demonstrate the oracle properties of aLASSO procedure for PLSIM. Simulations are used to investigate the effectiveness of the proposed method for variable selection of PLSIM.
Fast Selection of Spectral Variables with B-Spline Compression
Rossi, Fabrice; Wertz, Vincent; Meurens, Marc; Verleysen, Michel
2007-01-01
The large number of spectral variables in most data sets encountered in spectral chemometrics often renders the prediction of a dependent variable uneasy. The number of variables hopefully can be reduced, by using either projection techniques or selection methods; the latter allow for the interpretation of the selected variables. Since the optimal approach of testing all possible subsets of variables with the prediction model is intractable, an incremental selection approach using a nonparametric statistics is a good option, as it avoids the computationally intensive use of the model itself. It has two drawbacks however: the number of groups of variables to test is still huge, and colinearities can make the results unstable. To overcome these limitations, this paper presents a method to select groups of spectral variables. It consists in a forward-backward procedure applied to the coefficients of a B-Spline representation of the spectra. The criterion used in the forward-backward procedure is the mutual infor...
Bayesian model selection for testing the no-hair theorem with black hole ringdowns
Gossan, S; Sathyaprakash, B S
2011-01-01
General relativity predicts that a black hole that results from the merger of two compact stars (either black holes or neutron stars) is initially highly deformed but soon settles down to a quiescent state by emitting a superposition of quasi-normal modes (QNMs). The QNMs are damped sinusoids with characteristic frequencies and decay times that depend only on the mass and spin of the black hole and no other parameter - a statement of the no-hair theorem. In this paper we have examined the extent to which QNMs could be used to test the no-hair theorem with future ground- and space-based gravitational-wave detectors. We model departures from general relativity (GR) by introducing extra parameters which change the mode frequencies or decay times from their general relativistic values. With the aid of numerical simulations and Bayesian model selection, we assess the extent to which the presence of such a parameter could be inferred, and its value estimated. We find that it is harder to decipher the departure of d...
Schöniger, Anneli; Illman, Walter A.; Wöhling, Thomas; Nowak, Wolfgang
2015-12-01
Groundwater modelers face the challenge of how to assign representative parameter values to the studied aquifer. Several approaches are available to parameterize spatial heterogeneity in aquifer parameters. They differ in their conceptualization and complexity, ranging from homogeneous models to heterogeneous random fields. While it is common practice to invest more effort into data collection for models with a finer resolution of heterogeneities, there is a lack of advice which amount of data is required to justify a certain level of model complexity. In this study, we propose to use concepts related to Bayesian model selection to identify this balance. We demonstrate our approach on the characterization of a heterogeneous aquifer via hydraulic tomography in a sandbox experiment (Illman et al., 2010). We consider four increasingly complex parameterizations of hydraulic conductivity: (1) Effective homogeneous medium, (2) geology-based zonation, (3) interpolation by pilot points, and (4) geostatistical random fields. First, we investigate the shift in justified complexity with increasing amount of available data by constructing a model confusion matrix. This matrix indicates the maximum level of complexity that can be justified given a specific experimental setup. Second, we determine which parameterization is most adequate given the observed drawdown data. Third, we test how the different parameterizations perform in a validation setup. The results of our test case indicate that aquifer characterization via hydraulic tomography does not necessarily require (or justify) a geostatistical description. Instead, a zonation-based model might be a more robust choice, but only if the zonation is geologically adequate.
Cabras, Stefano; Castellanos, Maria Eugenia; Perra, Silvia
2014-11-20
This paper considers the problem of selecting a set of regressors when the response variable is distributed according to a specified parametric model and observations are censored. Under a Bayesian perspective, the most widely used tools are Bayes factors (BFs), which are undefined when improper priors are used. In order to overcome this issue, fractional (FBF) and intrinsic (IBF) BFs have become common tools for model selection. Both depend on the size, Nt , of a minimal training sample (MTS), while the IBF also depends on the specific MTS used. In the case of regression with censored data, the definition of an MTS is problematic because only uncensored data allow to turn the improper prior into a proper posterior and also because full exploration of the space of the MTSs, which includes also censored observations, is needed to avoid bias in model selection. To address this concern, a sequential MTS was proposed, but it has the drawback of an increase of the number of possible MTSs as Nt becomes random. For this reason, we explore the behaviour of the FBF, contextualizing its definition to censored data. We show that these are consistent, providing also the corresponding fractional prior. Finally, a large simulation study and an application to real data are used to compare IBF, FBF and the well-known Bayesian information criterion.
Bayesian latent variable models for the analysis of experimental psychology data.
Merkle, Edgar C; Wang, Ting
2016-03-18
In this paper, we address the use of Bayesian factor analysis and structural equation models to draw inferences from experimental psychology data. While such application is non-standard, the models are generally useful for the unified analysis of multivariate data that stem from, e.g., subjects' responses to multiple experimental stimuli. We first review the models and the parameter identification issues inherent in the models. We then provide details on model estimation via JAGS and on Bayes factor estimation. Finally, we use the models to re-analyze experimental data on risky choice, comparing the approach to simpler, alternative methods.
Langmore, Ian; Davis, Anthony B.; Bal, Guillaume; Marzouk, Youssef M.
2012-01-01
We describe a method for accelerating a 3D Monte Carlo forward radiative transfer model to the point where it can be used in a new kind of Bayesian retrieval framework. The remote sensing challenge is to detect and quantify a chemical effluent of a known absorbing gas produced by an industrial facility in a deep valley. The available data is a single low resolution noisy image of the scene in the near IR at an absorbing wavelength for the gas of interest. The detected sunlight has been multiply reflected by the variable terrain and/or scattered by an aerosol that is assumed partially known and partially unknown. We thus introduce a new class of remote sensing algorithms best described as "multi-pixel" techniques that call necessarily for a 3D radaitive transfer model (but demonstrated here in 2D); they can be added to conventional ones that exploit typically multi- or hyper-spectral data, sometimes with multi-angle capability, with or without information about polarization. The novel Bayesian inference methodology uses adaptively, with efficiency in mind, the fact that a Monte Carlo forward model has a known and controllable uncertainty depending on the number of sun-to-detector paths used.
ANALYSIS OF BAYESIAN CLASSIFIER ACCURACY
Directory of Open Access Journals (Sweden)
Felipe Schneider Costa
2013-01-01
Full Text Available The naÃ¯ve Bayes classifier is considered one of the most effective classification algorithms today, competing with more modern and sophisticated classifiers. Despite being based on unrealistic (naÃ¯ve assumption that all variables are independent, given the output class, the classifier provides proper results. However, depending on the scenario utilized (network structure, number of samples or training cases, number of variables, the network may not provide appropriate results. This study uses a process variable selection, using the chi-squared test to verify the existence of dependence between variables in the data model in order to identify the reasons which prevent a Bayesian network to provide good performance. A detailed analysis of the data is also proposed, unlike other existing work, as well as adjustments in case of limit values between two adjacent classes. Furthermore, variable weights are used in the calculation of a posteriori probabilities, calculated with mutual information function. Tests were applied in both a naÃ¯ve Bayesian network and a hierarchical Bayesian network. After testing, a significant reduction in error rate has been observed. The naÃ¯ve Bayesian network presented a drop in error rates from twenty five percent to five percent, considering the initial results of the classification process. In the hierarchical network, there was not only a drop in fifteen percent error rate, but also the final result came to zero.
THE TIME DOMAIN SPECTROSCOPIC SURVEY: VARIABLE SELECTION AND ANTICIPATED RESULTS
Energy Technology Data Exchange (ETDEWEB)
Morganson, Eric; Green, Paul J. [Harvard Smithsonian Center for Astrophysics, 60 Garden St, Cambridge, MA 02138 (United States); Anderson, Scott F.; Ruan, John J. [Department of Astronomy, University of Washington, Box 351580, Seattle, WA 98195 (United States); Myers, Adam D. [Department of Physics and Astronomy, University of Wyoming, Laramie, WY 82071 (United States); Eracleous, Michael; Brandt, William Nielsen [Department of Astronomy and Astrophysics, 525 Davey Laboratory, The Pennsylvania State University, University Park, PA 16802 (United States); Kelly, Brandon [Department of Physics, Broida Hall, University of California, Santa Barbara, CA 93106-9530 (United States); Badenes, Carlos [Department of Physics and Astronomy and Pittsburgh Particle Physics, Astrophysics and Cosmology Center (PITT PACC), University of Pittsburgh, 3941 O’Hara St, Pittsburgh, PA 15260 (United States); Bañados, Eduardo [Max-Planck-Institut für Astronomie, Königstuhl 17, D-69117 Heidelberg (Germany); Blanton, Michael R. [Center for Cosmology and Particle Physics, Department of Physics, New York University, 4 Washington Place, New York, NY 10003 (United States); Bershady, Matthew A. [Department of Astronomy, University of Wisconsin, 475 N. Charter St., Madison, WI 53706 (United States); Borissova, Jura [Instituto de Física y Astronomía, Universidad de Valparaíso, Av. Gran Bretaña 1111, Playa Ancha, Casilla 5030, and Millennium Institute of Astrophysics (MAS), Santiago (Chile); Burgett, William S. [GMTO Corp, Suite 300, 251 S. Lake Ave, Pasadena, CA 91101 (United States); Chambers, Kenneth, E-mail: emorganson@cfa.harvard.edu [Institute for Astronomy, University of Hawaii at Manoa, Honolulu, HI 96822 (United States); and others
2015-06-20
We present the selection algorithm and anticipated results for the Time Domain Spectroscopic Survey (TDSS). TDSS is an Sloan Digital Sky Survey (SDSS)-IV Extended Baryon Oscillation Spectroscopic Survey (eBOSS) subproject that will provide initial identification spectra of approximately 220,000 luminosity-variable objects (variable stars and active galactic nuclei across 7500 deg{sup 2} selected from a combination of SDSS and multi-epoch Pan-STARRS1 photometry. TDSS will be the largest spectroscopic survey to explicitly target variable objects, avoiding pre-selection on the basis of colors or detailed modeling of specific variability characteristics. Kernel Density Estimate analysis of our target population performed on SDSS Stripe 82 data suggests our target sample will be 95% pure (meaning 95% of objects we select have genuine luminosity variability of a few magnitudes or more). Our final spectroscopic sample will contain roughly 135,000 quasars and 85,000 stellar variables, approximately 4000 of which will be RR Lyrae stars which may be used as outer Milky Way probes. The variability-selected quasar population has a smoother redshift distribution than a color-selected sample, and variability measurements similar to those we develop here may be used to make more uniform quasar samples in large surveys. The stellar variable targets are distributed fairly uniformly across color space, indicating that TDSS will obtain spectra for a wide variety of stellar variables including pulsating variables, stars with significant chromospheric activity, cataclysmic variables, and eclipsing binaries. TDSS will serve as a pathfinder mission to identify and characterize the multitude of variable objects that will be detected photometrically in even larger variability surveys such as Large Synoptic Survey Telescope.
Wentworth, Mami Tonoe
Uncertainty quantification plays an important role when making predictive estimates of model responses. In this context, uncertainty quantification is defined as quantifying and reducing uncertainties, and the objective is to quantify uncertainties in parameter, model and measurements, and propagate the uncertainties through the model, so that one can make a predictive estimate with quantified uncertainties. Two of the aspects of uncertainty quantification that must be performed prior to propagating uncertainties are model calibration and parameter selection. There are several efficient techniques for these processes; however, the accuracy of these methods are often not verified. This is the motivation for our work, and in this dissertation, we present and illustrate verification frameworks for model calibration and parameter selection in the context of biological and physical models. First, HIV models, developed and improved by [2, 3, 8], describe the viral infection dynamics of an HIV disease. These are also used to make predictive estimates of viral loads and T-cell counts and to construct an optimal control for drug therapy. Estimating input parameters is an essential step prior to uncertainty quantification. However, not all the parameters are identifiable, implying that they cannot be uniquely determined by the observations. These unidentifiable parameters can be partially removed by performing parameter selection, a process in which parameters that have minimal impacts on the model response are determined. We provide verification techniques for Bayesian model calibration and parameter selection for an HIV model. As an example of a physical model, we employ a heat model with experimental measurements presented in [10]. A steady-state heat model represents a prototypical behavior for heat conduction and diffusion process involved in a thermal-hydraulic model, which is a part of nuclear reactor models. We employ this simple heat model to illustrate verification
Bayesian modeling of measurement error in predictor variables using item response theory
Fox, Jean-Paul; Glas, Cees A.W.
2003-01-01
It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between t
Research on Some Questions About Selection of Independent Variables
Institute of Scientific and Technical Information of China (English)
TAO Jing-xuan
2002-01-01
The paper studies four methods about selection of independent variables in multivariate analysis. In general condition, advanced statistical method and backward statistical method could not obtain the best subset of independent variables. It is possibly affected by the orders of variables or associations among variables. When multicollinearity is presented in a set of explanatory variables-abnormal state, it is not effective to use the method, although stepwise regression and optimum selecting method of total subsets is widely used.According to this case, the paper proposes a new method which combines deleting variables with ingredient analysis and is used in research and science practically.The important characteristic of this paper is that it gives some examples to support each conclusion.
DEFF Research Database (Denmark)
Burgess, Stephen; Thompson, Simon G; Thompson, Grahame
2010-01-01
Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context...... of multiple genetic markers measured in multiple studies, based on the analysis of individual participant data. First, for a single genetic marker in one study, we show that the usual ratio of coefficients approach can be reformulated as a regression with heterogeneous error in the explanatory variable...
Improving Cluster Analysis with Automatic Variable Selection Based on Trees
2014-12-01
ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES by Anton D. Orr December 2014 Thesis Advisor: Samuel E. Buttrey Second Reader...DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE IMPROVING CLUSTER ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES 5. FUNDING NUMBERS 6...2006 based on classification and regression trees to address problems with determining dissimilarity. Current algorithms do not simultaneously address
Nearly unbiased variable selection under minimax concave penalty
Zhang, Cun-Hui
2010-01-01
We propose MC+, a fast, continuous, nearly unbiased and accurate method of penalized variable selection in high-dimensional linear regression. The LASSO is fast and continuous, but biased. The bias of the LASSO may prevent consistent variable selection. Subset selection is unbiased but computationally costly. The MC+ has two elements: a minimax concave penalty (MCP) and a penalized linear unbiased selection (PLUS) algorithm. The MCP provides the convexity of the penalized loss in sparse regions to the greatest extent given certain thresholds for variable selection and unbiasedness. The PLUS computes multiple exact local minimizers of a possibly nonconvex penalized loss function in a certain main branch of the graph of critical points of the penalized loss. Its output is a continuous piecewise linear path encompassing from the origin for infinite penalty to a least squares solution for zero penalty. We prove that at a universal penalty level, the MC+ has high probability of matching the signs of the unknowns, ...
Directory of Open Access Journals (Sweden)
Mihaela SIMIONESCU
2016-03-01
Full Text Available Inflation rate determinants for the USA have been analyzed in this study starting with 2008, when the American economy was already in crisis. This research brings, as a novelty, the use of Bayesian Econometrics methods to identify the monthly inflation rate in the USA. The Stochastic Search Variable Selection (SSVS has been applied for a subjective probability acceptance of 0.3. The results are validated also by economic theory. The monthly inflation rate was influenced during 2008-2015 by: the unemployment rate, the exchange rate, crude oil prices, the trade weighted U.S. Dollar Index and the M2 Money Stock. The study might be continued by considering other potential determinants of the inflation rate.
Semmens, Brice X; Ward, Eric J; Moore, Jonathan W; Darimont, Chris T
2009-07-09
Variability in resource use defines the width of a trophic niche occupied by a population. Intra-population variability in resource use may occur across hierarchical levels of population structure from individuals to subpopulations. Understanding how levels of population organization contribute to population niche width is critical to ecology and evolution. Here we describe a hierarchical stable isotope mixing model that can simultaneously estimate both the prey composition of a consumer diet and the diet variability among individuals and across levels of population organization. By explicitly estimating variance components for multiple scales, the model can deconstruct the niche width of a consumer population into relevant levels of population structure. We apply this new approach to stable isotope data from a population of gray wolves from coastal British Columbia, and show support for extensive intra-population niche variability among individuals, social groups, and geographically isolated subpopulations. The analytic method we describe improves mixing models by accounting for diet variability, and improves isotope niche width analysis by quantitatively assessing the contribution of levels of organization to the niche width of a population.
Financial applications of a Tabu search variable selection model
Directory of Open Access Journals (Sweden)
Zvi Drezner
2001-01-01
Full Text Available We illustrate how a comparatively new technique, a Tabu search variable selection model [Drezner, Marcoulides and Salhi (1999], can be applied efficiently within finance when the researcher must select a subset of variables from among the whole set of explanatory variables under consideration. Several types of problems in finance, including corporate and personal bankruptcy prediction, mortgage and credit scoring, and the selection of variables for the Arbitrage Pricing Model, require the researcher to select a subset of variables from a larger set. In order to demonstrate the usefulness of the Tabu search variable selection model, we: (1 illustrate its efficiency in comparison to the main alternative search procedures, such as stepwise regression and the Maximum R2 procedure, and (2 show how a version of the Tabu search procedure may be implemented when attempting to predict corporate bankruptcy. We accomplish (2 by indicating that a Tabu Search procedure increases the predictability of corporate bankruptcy by up to 10 percentage points in comparison to Altman's (1968 Z-Score model.
Goury, Olivier; Amsallem, David; Bordas, Stéphane Pierre Alain; Liu, Wing Kam; Kerfriden, Pierre
2016-08-01
In this paper, we present new reliable model order reduction strategies for computational micromechanics. The difficulties rely mainly upon the high dimensionality of the parameter space represented by any load path applied onto the representative volume element. We take special care of the challenge of selecting an exhaustive snapshot set. This is treated by first using a random sampling of energy dissipating load paths and then in a more advanced way using Bayesian optimization associated with an interlocked division of the parameter space. Results show that we can insure the selection of an exhaustive snapshot set from which a reliable reduced-order model can be built.
Variable selection and estimation for longitudinal survey data
Wang, Li
2014-09-01
There is wide interest in studying longitudinal surveys where sample subjects are observed successively over time. Longitudinal surveys have been used in many areas today, for example, in the health and social sciences, to explore relationships or to identify significant variables in regression settings. This paper develops a general strategy for the model selection problem in longitudinal sample surveys. A survey weighted penalized estimating equation approach is proposed to select significant variables and estimate the coefficients simultaneously. The proposed estimators are design consistent and perform as well as the oracle procedure when the correct submodel was known. The estimating function bootstrap is applied to obtain the standard errors of the estimated parameters with good accuracy. A fast and efficient variable selection algorithm is developed to identify significant variables for complex longitudinal survey data. Simulated examples are illustrated to show the usefulness of the proposed methodology under various model settings and sampling designs. © 2014 Elsevier Inc.
Bayesian modeling of measurement error in predictor variables using item response theory
Fox, Jean-Paul; Glas, Cees A.W.
2000-01-01
This paper focuses on handling measurement error in predictor variables using item response theory (IRT). Measurement error is of great important in assessment of theoretical constructs, such as intelligence or the school climate. Measurement error is modeled by treating the predictors as unobserved
Can, Seda; van de Schoot, Rens; Hox, Joop
2015-01-01
Because variables may be correlated in the social and behavioral sciences, multicollinearity might be problematic. This study investigates the effect of collinearity manipulated in within and between levels of a two-level confirmatory factor analysis by Monte Carlo simulation. Furthermore, the influence of the size of the intraclass correlation…
Can, Seda; van de Schoot, Rens; Hox, Joop
2014-01-01
Because variables may be correlated in the social and behavioral sciences, multicollinearity might be problematic. This study investigates the effect of collinearity manipulated in within and between levels of a two-level confirmatory factor analysis by Monte Carlo simulation. Furthermore, the influ
Yun, Yong-Huan; Wang, Wei-Ting; Tan, Min-Li; Liang, Yi-Zeng; Li, Hong-Dong; Cao, Dong-Sheng; Lu, Hong-Mei; Xu, Qing-Song
2014-01-07
Nowadays, with a high dimensionality of dataset, it faces a great challenge in the creation of effective methods which can select an optimal variables subset. In this study, a strategy that considers the possible interaction effect among variables through random combinations was proposed, called iteratively retaining informative variables (IRIV). Moreover, the variables are classified into four categories as strongly informative, weakly informative, uninformative and interfering variables. On this basis, IRIV retains both the strongly and weakly informative variables in every iterative round until no uninformative and interfering variables exist. Three datasets were employed to investigate the performance of IRIV coupled with partial least squares (PLS). The results show that IRIV is a good alternative for variable selection strategy when compared with three outstanding and frequently used variable selection methods such as genetic algorithm-PLS, Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS) and competitive adaptive reweighted sampling (CARS). The MATLAB source code of IRIV can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.
Walter, William D.; Smith, Rick; Vanderklok, Mike; VerCauterren, Kurt C.
2014-01-01
Bovine tuberculosis is a bacterial disease caused by Mycobacterium bovis in livestock and wildlife with hosts that include Eurasian badgers (Meles meles), brushtail possum (Trichosurus vulpecula), and white-tailed deer (Odocoileus virginianus). Risk-assessment efforts in Michigan have been initiated on farms to minimize interactions of cattle with wildlife hosts but research onM. bovis on cattle farms has not investigated the spatial context of disease epidemiology. To incorporate spatially explicit data, initial likelihood of infection probabilities for cattle farms tested for M. bovis, prevalence of M. bovis in white-tailed deer, deer density, and environmental variables for each farm were modeled in a Bayesian hierarchical framework. We used geo-referenced locations of 762 cattle farms that have been tested for M. bovis, white-tailed deer prevalence, and several environmental variables that may lead to long-term survival and viability of M. bovis on farms and surrounding habitats (i.e., soil type, habitat type). Bayesian hierarchical analyses identified deer prevalence and proportion of sandy soil within our sampling grid as the most supported model. Analysis of cattle farms tested for M. bovisidentified that for every 1% increase in sandy soil resulted in an increase in odds of infection by 4%. Our analysis revealed that the influence of prevalence of M. bovis in white-tailed deer was still a concern even after considerable efforts to prevent cattle interactions with white-tailed deer through on-farm mitigation and reduction in the deer population. Cattle farms test positive for M. bovis annually in our study area suggesting that the potential for an environmental source either on farms or in the surrounding landscape may contributing to new or re-infections with M. bovis. Our research provides an initial assessment of potential environmental factors that could be incorporated into additional modeling efforts as more knowledge of deer herd
Davies, Andrew J; Hope, Max J
2015-07-15
Contingency plans are essential in guiding the response to marine oil spills. However, they are written before the pollution event occurs so must contain some degree of assumption and prediction and hence may be unsuitable for a real incident when it occurs. The use of Bayesian networks in ecology, environmental management, oil spill contingency planning and post-incident analysis is reviewed and analysed to establish their suitability for use as real-time environmental decision support systems during an oil spill response. It is demonstrated that Bayesian networks are appropriate for facilitating the re-assessment and re-validation of contingency plans following pollutant release, thus helping ensure that the optimum response strategy is adopted. This can minimise the possibility of sub-optimal response strategies causing additional environmental and socioeconomic damage beyond the original pollution event.
Directory of Open Access Journals (Sweden)
Rodríguez-Prieto Víctor
2012-08-01
Full Text Available Abstract Background Bovine tuberculosis (bTB is a chronic infectious disease mainly caused by Mycobacterium bovis. Although eradication is a priority for the European authorities, bTB remains active or even increasing in many countries, causing significant economic losses. The integral consideration of epidemiological factors is crucial to more cost-effectively allocate control measures. The aim of this study was to identify the nature and extent of the association between TB distribution and a list of potential risk factors regarding cattle, wild ungulates and environmental aspects in Ciudad Real, a Spanish province with one of the highest TB herd prevalences. Results We used a Bayesian mixed effects multivariable logistic regression model to predict TB occurrence in either domestic or wild mammals per municipality in 2007 by using information from the previous year. The municipal TB distribution and endemicity was clustered in the western part of the region and clearly overlapped with the explanatory variables identified in the final model: (1 incident cattle farms, (2 number of years of veterinary inspection of big game hunting events, (3 prevalence in wild boar, (4 number of sampled cattle, (5 persistent bTB-infected cattle farms, (6 prevalence in red deer, (7 proportion of beef farms, and (8 farms devoted to bullfighting cattle. Conclusions The combination of these eight variables in the final model highlights the importance of the persistence of the infection in the hosts, surveillance efforts and some cattle management choices in the circulation of M. bovis in the region. The spatial distribution of these variables, together with particular Mediterranean features that favour the wildlife-livestock interface may explain the M. bovis persistence in this region. Sanitary authorities should allocate efforts towards specific areas and epidemiological situations where the wildlife-livestock interface seems to critically hamper the definitive b
DEFF Research Database (Denmark)
2010-01-01
Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context...... an overall estimate of the causal relationship between the phenotype and the outcome, and an assessment of its heterogeneity across studies. As an example, we estimate the causal relationship of blood concentrations of C-reactive protein on fibrinogen levels using data from 11 studies. These methods provide...... a flexible framework for efficient estimation of causal relationships derived from multiple studies. Issues discussed include weak instrument bias, analysis of binary outcome data such as disease risk, missing genetic data, and the use of haplotypes....
Energy Technology Data Exchange (ETDEWEB)
Karagiannis, Georgios; Lin, Guang
2014-02-15
Generalized polynomial chaos (gPC) expansions allow the representation of the solution of a stochastic system as a series of polynomial terms. The number of gPC terms increases dramatically with the dimension of the random input variables. When the number of the gPC terms is larger than that of the available samples, a scenario that often occurs if the evaluations of the system are expensive, the evaluation of the gPC expansion can be inaccurate due to over-fitting. We propose a fully Bayesian approach that allows for global recovery of the stochastic solution, both in spacial and random domains, by coupling Bayesian model uncertainty and regularization regression methods. It allows the evaluation of the PC coefficients on a grid of spacial points via (1) Bayesian model average or (2) medial probability model, and their construction as functions on the spacial domain via spline interpolation. The former accounts the model uncertainty and provides Bayes-optimal predictions; while the latter, additionally, provides a sparse representation of the solution by evaluating the expansion on a subset of dominating gPC bases when represented as a gPC expansion. Moreover, the method quantifies the importance of the gPC bases through inclusion probabilities. We design an MCMC sampler that evaluates all the unknown quantities without the need of ad-hoc techniques. The proposed method is suitable for, but not restricted to, problems whose stochastic solution is sparse at the stochastic level with respect to the gPC bases while the deterministic solver involved is expensive. We demonstrate the good performance of the proposed method and make comparisons with others on 1D, 14D and 40D in random space elliptic stochastic partial differential equations.
Brunetti, Carlotta; Linde, Niklas; Vrugt, Jasper A.
2017-04-01
Geophysical data can help to discriminate among multiple competing subsurface hypotheses (conceptual models). Here, we explore the merits of Bayesian model selection in hydrogeophysics using crosshole ground-penetrating radar data from the South Oyster Bacterial Transport Site in Virginia, USA. Implementation of Bayesian model selection requires computation of the marginal likelihood of the measured data, or evidence, for each conceptual model being used. In this paper, we compare three different evidence estimators, including (1) the brute force Monte Carlo method, (2) the Laplace-Metropolis method, and (3) the numerical integration method proposed by Volpi et al. (2016). The three types of subsurface models that we consider differ in their treatment of the porosity distribution and use (a) horizontal layering with fixed layer thicknesses, (b) vertical layering with fixed layer thicknesses and (c) a multi-Gaussian field. Our results demonstrate that all three estimators provide equivalent results in low parameter dimensions, yet in higher dimensions the brute force Monte Carlo method is inefficient. The isotropic multi-Gaussian model is most supported by the travel time data with Bayes factors that are larger than 10100 compared to conceptual models that assume horizontal or vertical layering of the porosity field.
Variable Selection in the Partially Linear Errors-in-Variables Models for Longitudinal Data
Institute of Scientific and Technical Information of China (English)
Yi-ping YANG; Liu-gen XUE; Wei-hu CHENG
2012-01-01
This paper proposes a new approach for variable selection in partially linear errors-in-variables (EV) models for longitudinal data by penalizing appropriate estimating functions.We apply the SCAD penalty to simultaneously select significant variables and estimate unknown parameters.The rate of convergence and the asymptotic normality of the resulting estimators are established.Furthermore,with proper choice of regularization parameters,we show that the proposed estimators perform as well as the oracle procedure.A new algorithm is proposed for solving penalized estimating equation.The asymptotic results are augmented by a simulation study.
Directory of Open Access Journals (Sweden)
Mabaso Musawenkosi LH
2007-09-01
Full Text Available Abstract Background Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana. Results Of 50 potential explanatory variables from eight environmental data themes, 42 were significantly associated with malaria prevalence in univariate logistic regression and were ranked by the Akaike Information Criterion. Those correlated with higher-ranking relatives of the same environmental theme, were temporarily excluded. The remaining 14 candidates were ranked by selection frequency after running automated step-wise selection procedures on 1000 bootstrap samples drawn from the data. A non-spatial multiple-variable model was developed through step-wise inclusion in order of selection frequency. Previously excluded variables were then re-evaluated for inclusion, using further step-wise bootstrap procedures, resulting in the exclusion of another variable. Finally a Bayesian geo-statistical model using Markov Chain Monte Carlo simulation was fitted to the data, resulting in a final model of three predictor variables, namely summer rainfall, mean annual temperature and altitude. Each was independently and significantly associated with malaria prevalence after allowing for spatial correlation. This model was used to predict malaria prevalence at unobserved locations, producing a smooth risk map for the whole country. Conclusion We have
The Properties of Model Selection when Retaining Theory Variables
DEFF Research Database (Denmark)
Hendry, David F.; Johansen, Søren
Economic theories are often fitted directly to data to avoid possible model selection biases. We show that embedding a theory model that specifies the correct set of m relevant exogenous variables, x{t}, within the larger set of m+k candidate variables, (x{t},w{t}), then selection over the second...... set by their statistical significance can be undertaken without affecting the estimator distribution of the theory parameters. This strategy returns the theory-parameter estimates when the theory is correct, yet protects against the theory being under-specified because some w{t} are relevant....
Gerretzen, Jan; Szymańska, Ewa; Bart, Jacob; Davies, Antony N; van Manen, Henk-Jan; van den Heuvel, Edwin R; Jansen, Jeroen J; Buydens, Lutgarde M C
2016-09-28
The aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS, …) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of
用贝叶斯网络进行因果分析%Bayesian Causal Analysis
Institute of Scientific and Technical Information of China (English)
王双成; 林士敏; 陆玉昌
2000-01-01
The Bayesian causal analysis includes two techniques, one of which takes advantage of Bayesian network structure learning under the Causal Markov assumption and the presupposition that hidden variables are absent, and the other uses canonical form influence diagram. The two techniques possess their distinctive characteristics,and ought to be selected and put to use in the light of specific conditions.
A New Statistic for Variable Selection in Questionnaire Analysis
Institute of Scientific and Technical Information of China (English)
ZHANG Jun-hua; FANG Wei-wu
2001-01-01
In this paper, a new statistic is proposed for variable selection which is one of the important problems in analysis of questionnaire data. Contrasting to other methods, the approach introduced here can be used not only for two groups of samples but can also be easily generalized to the multi-group case.
Noncausal Bayesian Vector Autoregression
DEFF Research Database (Denmark)
Lanne, Markku; Luoto, Jani
We propose a Bayesian inferential procedure for the noncausal vector autoregressive (VAR) model that is capable of capturing nonlinearities and incorporating effects of missing variables. In particular, we devise a fast and reliable posterior simulator that yields the predictive distribution...
Energy Technology Data Exchange (ETDEWEB)
Ruan, John J.; Anderson, Scott F.; MacLeod, Chelsea L.; Becker, Andrew C.; Davenport, James R. A.; Ivezic, Zeljko [Department of Astronomy, University of Washington, Box 351580, Seattle, WA 98195 (United States); Burnett, T. H. [Department of Physics, University of Washington, Seattle, WA 98195-1560 (United States); Kochanek, Christopher S. [Department of Astronomy, Ohio State University, 140 West 18th Avenue, Columbus, OH 43210 (United States); Plotkin, Richard M. [Department of Astronomy, University of Michigan, 500 Church Street, Ann Arbor, MI 48109 (United States); Sesar, Branimir [Division of Physics, Mathematics and Astronomy, Caltech, Pasadena, CA 91125 (United States); Stuart, J. Scott, E-mail: jruan@astro.washington.edu [Lincoln Laboratory, Massachusetts Institute of Technology, 244 Wood Street, Lexington, MA 02420-9108 (United States)
2012-11-20
We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the {approx}30% of {gamma}-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the optical LINEAR asteroid survey, we characterize the optical variability of blazars by fitting a damped random walk model to individual light curves with two main model parameters, the characteristic timescales of variability {tau}, and driving amplitudes on short timescales {sigma}-circumflex. Imposing cuts on minimum {tau} and {sigma}-circumflex allows for blazar selection with high efficiency E and completeness C. To test the efficacy of this approach, we apply this method to optically variable LINEAR objects that fall within the several-arcminute error ellipses of {gamma}-ray sources in the Fermi 2FGL catalog. Despite the extreme stellar contamination at the shallow depth of the LINEAR survey, we are able to recover previously associated optical counterparts to Fermi active galactic nuclei with E {>=} 88% and C = 88% in Fermi 95% confidence error ellipses having semimajor axis r < 8'. We find that the suggested radio counterpart to Fermi source 2FGL J1649.6+5238 has optical variability consistent with other {gamma}-ray blazars and is likely to be the {gamma}-ray source. Our results suggest that the variability of the non-thermal jet emission in blazars is stochastic in nature, with unique variability properties due to the effects of relativistic beaming. After correcting for beaming, we estimate that the characteristic timescale of blazar variability is {approx}3 years in the rest frame of the jet, in contrast with the {approx}320 day disk flux timescale observed in quasars. The variability-based selection method presented will be useful for blazar identification in time-domain optical surveys and is also a probe of jet physics.
Bayesian Analysis of Inflation III: Slow Roll Reconstruction Using Model Selection
Noreña, Jorge; Verde, Licia; Peiris, Hiranya V; Easther, Richard
2012-01-01
We implement Slow Roll Reconstruction -- an optimal solution to the inverse problem for inflationary cosmology -- within ModeCode, a publicly available solver for the inflationary dynamics. We obtain up-to-date constraints on the reconstructed inflationary potential, derived from the WMAP 7-year dataset and South Pole Telescope observations, combined with large scale structure data derived from SDSS Data Release 7. Using ModeCode in conjunction with the MultiNest sampler, we compute Bayesian evidence for the reconstructed potential at each order in the truncated slow roll hierarchy. We find that the data are well-described by the first two slow roll parameters, \\epsilon and \\eta, and that there is no need to include a nontrivial \\xi parameter.
Meta-analysis based variable selection for gene expression data.
Li, Quefeng; Wang, Sijian; Huang, Chiang-Ching; Yu, Menggang; Shao, Jun
2014-12-01
Recent advance in biotechnology and its wide applications have led to the generation of many high-dimensional gene expression data sets that can be used to address similar biological questions. Meta-analysis plays an important role in summarizing and synthesizing scientific evidence from multiple studies. When the dimensions of datasets are high, it is desirable to incorporate variable selection into meta-analysis to improve model interpretation and prediction. According to our knowledge, all existing methods conduct variable selection with meta-analyzed data in an "all-in-or-all-out" fashion, that is, a gene is either selected in all of studies or not selected in any study. However, due to data heterogeneity commonly exist in meta-analyzed data, including choices of biospecimens, study population, and measurement sensitivity, it is possible that a gene is important in some studies while unimportant in others. In this article, we propose a novel method called meta-lasso for variable selection with high-dimensional meta-analyzed data. Through a hierarchical decomposition on regression coefficients, our method not only borrows strength across multiple data sets to boost the power to identify important genes, but also keeps the selection flexibility among data sets to take into account data heterogeneity. We show that our method possesses the gene selection consistency, that is, when sample size of each data set is large, with high probability, our method can identify all important genes and remove all unimportant genes. Simulation studies demonstrate a good performance of our method. We applied our meta-lasso method to a meta-analysis of five cardiovascular studies. The analysis results are clinically meaningful.
Variable selection based cotton bollworm odor spectroscopic detection
Lü, Chengxu; Gai, Shasha; Luo, Min; Zhao, Bo
2016-10-01
Aiming at rapid automatic pest detection based efficient and targeting pesticide application and shooting the trouble of reflectance spectral signal covered and attenuated by the solid plant, the possibility of near infrared spectroscopy (NIRS) detection on cotton bollworm odor is studied. Three cotton bollworm odor samples and 3 blank air gas samples were prepared. Different concentrations of cotton bollworm odor were prepared by mixing the above gas samples, resulting a calibration group of 62 samples and a validation group of 31 samples. Spectral collection system includes light source, optical fiber, sample chamber, spectrometer. Spectra were pretreated by baseline correction, modeled with partial least squares (PLS), and optimized by genetic algorithm (GA) and competitive adaptive reweighted sampling (CARS). Minor counts differences are found among spectra of different cotton bollworm odor concentrations. PLS model of all the variables was built presenting RMSEV of 14 and RV2 of 0.89, its theory basis is insect volatilizes specific odor, including pheromone and allelochemics, which are used for intra-specific and inter-specific communication and could be detected by NIR spectroscopy. 28 sensitive variables are selected by GA, presenting the model performance of RMSEV of 14 and RV2 of 0.90. Comparably, 8 sensitive variables are selected by CARS, presenting the model performance of RMSEV of 13 and RV2 of 0.92. CARS model employs only 1.5% variables presenting smaller error than that of all variable. Odor gas based NIR technique shows the potential for cotton bollworm detection.
Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas
2017-04-15
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd.
Portfolio Selection Based on Distance between Fuzzy Variables
Directory of Open Access Journals (Sweden)
Weiyi Qian
2014-01-01
Full Text Available This paper researches portfolio selection problem in fuzzy environment. We introduce a new simple method in which the distance between fuzzy variables is used to measure the divergence of fuzzy investment return from a prior one. Firstly, two new mathematical models are proposed by expressing divergence as distance, investment return as expected value, and risk as variance and semivariance, respectively. Secondly, the crisp forms of the new models are also provided for different types of fuzzy variables. Finally, several numerical examples are given to illustrate the effectiveness of the proposed approach.
Ruan, J.J.; Anderson, S.F.; MacLeod, C.L.; Becker, A.C.; Burnett, T.H.; Davenport, J.R.A.; Ivezić, Z.; Kochanek, C.S.; Plotkin, R.M.; Sesar, B.; Stuart, J.C.
2012-01-01
We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the ~30% of γ-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the opt
The group exponential lasso for bi-level variable selection.
Breheny, Patrick
2015-09-01
In many applications, covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. One important example arises in genetic association studies, where genes may have several variants capable of contributing to disease. An ideal penalized regression approach would select variables by balancing both the direct evidence of a feature's importance as well as the indirect evidence offered by the grouping structure. This work proposes a new approach we call the group exponential lasso (GEL) which features a decay parameter controlling the degree to which feature selection is coupled together within groups. We demonstrate that the GEL has a number of statistical and computational advantages over previously proposed group penalties such as the group lasso, group bridge, and composite MCP. Finally, we apply these methods to the problem of detecting rare variants in a genetic association study.
Two-step variable selection in quantile regression models
Directory of Open Access Journals (Sweden)
FAN Yali
2015-06-01
Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions,in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform l1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.
A Bayesian Optimisation Algorithm for the Nurse Scheduling Problem
Jingpeng, Li
2008-01-01
A Bayesian optimization algorithm for the nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurses assignment. Unlike our previous work that used Gas to implement implicit learning, the learning in the proposed algorithm is explicit, ie. Eventually, we will be able to identify and mix building blocks directly. The Bayesian optimization algorithm is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated, ie in our case, a new rule string has been obtained. Another set of rule strings will be generated in this way, some of which will replace previous strings based on fitness selection. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again usin...
Multi-scale inference of interaction rules in animal groups using Bayesian model selection.
Mann, Richard P; Perna, Andrea; Strömbom, Daniel; Garnett, Roman; Herbert-Read, James E; Sumpter, David J T; Ward, Ashley J W
2012-01-01
Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis). We show that these exhibit a stereotypical 'phase transition', whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have 'memory' of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture fine scale rules of interaction, which are primarily mediated by physical contact. Conversely, the Markovian self-propelled particle model captures the fine scale rules of interaction but fails to reproduce global dynamics. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics. We conclude that prawns' movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects.
Multi-scale inference of interaction rules in animal groups using Bayesian model selection.
Directory of Open Access Journals (Sweden)
Richard P Mann
2012-01-01
Full Text Available Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis. We show that these exhibit a stereotypical 'phase transition', whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have 'memory' of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture fine scale rules of interaction, which are primarily mediated by physical contact. Conversely, the Markovian self-propelled particle model captures the fine scale rules of interaction but fails to reproduce global dynamics. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics. We conclude that prawns' movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects.
Bayesian Variable Selection to identify QTL affecting a simulated quantitative trait
Schurink, A.; Janss, L.L.G.; Heuven, H.C.M.
2012-01-01
Abstract Background: Recent developments in genetic technology and methodology enable accurate detection of QTL and estimation of breeding values, even in individuals without phenotypes. The QTL-MAS workshop offers the opportunity to test different methods to perform a genome-wide association study
Isoenzymatic variability in tropical maize populations under reciprocal recurrent selection
Directory of Open Access Journals (Sweden)
Pinto Luciana Rossini
2003-01-01
Full Text Available Maize (Zea mays L. is one of the crops in which the genetic variability has been extensively studied at isoenzymatic loci. The genetic variability of the maize populations BR-105 and BR-106, and the synthetics IG-3 and IG-4, obtained after one cycle of a high-intensity reciprocal recurrent selection (RRS, was investigated at seven isoenzymatic loci. A total of twenty alleles were identified, and most of the private alleles were found in the BR-106 population. One cycle of reciprocal recurrent selection (RRS caused reductions of 12% in the number of alleles in both populations. Changes in allele frequencies were also observed between populations and synthetics, mainly for the Est 2 locus. Populations presented similar values for the number of alleles per locus, percentage of polymorphic loci, and observed and expected heterozygosities. A decrease of the genetic variation values was observed for the synthetics as a consequence of genetic drift effects and reduction of the effective population sizes. The distribution of the genetic diversity within and between populations revealed that most of the diversity was maintained within them, i.e. BR-105 x BR-106 (G ST = 3.5% and IG-3 x IG-4 (G ST = 4.0%. The genetic distances between populations and synthetics increased approximately 21%. An increase in the genetic divergence between the populations occurred without limiting new selection procedures.
Robust nonlinear variable selective control for networked systems
Rahmani, Behrooz
2016-10-01
This paper is concerned with the networked control of a class of uncertain nonlinear systems. In this way, Takagi-Sugeno (T-S) fuzzy modelling is used to extend the previously proposed variable selective control (VSC) methodology to nonlinear systems. This extension is based upon the decomposition of the nonlinear system to a set of fuzzy-blended locally linearised subsystems and further application of the VSC methodology to each subsystem. To increase the applicability of the T-S approach for uncertain nonlinear networked control systems, this study considers the asynchronous premise variables in the plant and the controller, and then introduces a robust stability analysis and control synthesis. The resulting optimal switching-fuzzy controller provides a minimum guaranteed cost on an H2 performance index. Simulation studies on three nonlinear benchmark problems demonstrate the effectiveness of the proposed method.
von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo
2014-06-01
Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.
Characterizing the Optical Variability of Bright Blazars: Variability-Based Selection of Fermi AGN
Ruan, John J; MacLeod, Chelsea L; Becker, Andrew C; Burnett, T H; Davenport, James R A; Ivezic, Zeljko; Kochanek, Christopher S; Plotkin, Richard M; Sesar, Branimir; Stuart, J Scott
2012-01-01
We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the ~30% of gamma-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the optical LINEAR asteroid survey, we characterize the optical variability of blazars by fitting a damped random walk model to individual light curves with two main model parameters, the characteristic timescales of variability (tau), and driving amplitudes on short timescales (sigma). Imposing cuts on minimum tau and sigma allows for blazar selection with high efficiency E and completeness C. To test the efficacy of this approach, we apply this method to optically variable LINEAR objects that fall within the several-arcminute error ellipses of gamma-ray sources in the Fermi 2FGL catalog. Despite the extreme stellar contamination at the shallow depth of the LINEAR survey, we are able to recover previously-associated ...
Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romanach, Stephanie; Watling, James I.; Mazzotti, Frank J.
2017-01-01
Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable
Estimation and variable selection for generalized additive partial linear models
Wang, Li
2011-08-01
We study generalized additive partial linear models, proposing the use of polynomial spline smoothing for estimation of nonparametric functions, and deriving quasi-likelihood based estimators for the linear parameters. We establish asymptotic normality for the estimators of the parametric components. The procedure avoids solving large systems of equations as in kernel-based procedures and thus results in gains in computational simplicity. We further develop a class of variable selection procedures for the linear parameters by employing a nonconcave penalized quasi-likelihood, which is shown to have an asymptotic oracle property. Monte Carlo simulations and an empirical example are presented for illustration. © Institute of Mathematical Statistics, 2011.
Prediction and variable selection with the adaptive Lasso
van de Geer, Sara; Zhou, Shuheng
2010-01-01
We revisit the adaptive Lasso in a high-dimensional linear model, and provide bounds for its prediction error and for its number of false positive selections. We compare the adaptive Lasso with an "oracle" that trades off approximation error against an l_0-penalty. Considering prediction error and false positives simultaneously is a way to study variable selection performance in settings where non-zero regression coefficients can be smaller than the detection limit. We show that an appropriate choice of the tuning parameter yields a prediction error of the same order as that of the least squares refitted initial Lasso after thresholding, while the number of false positives is small, depending on the size of the trimmed harmonic mean of the oracle coefficients.
Penalized maximum likelihood estimation and variable selection in geostatistics
Chu, Tingjin; Wang, Haonan; 10.1214/11-AOS919
2012-01-01
We consider the problem of selecting covariates in spatial linear models with Gaussian process errors. Penalized maximum likelihood estimation (PMLE) that enables simultaneous variable selection and parameter estimation is developed and, for ease of computation, PMLE is approximated by one-step sparse estimation (OSE). To further improve computational efficiency, particularly with large sample sizes, we propose penalized maximum covariance-tapered likelihood estimation (PMLE$_{\\mathrm{T}}$) and its one-step sparse estimation (OSE$_{\\mathrm{T}}$). General forms of penalty functions with an emphasis on smoothly clipped absolute deviation are used for penalized maximum likelihood. Theoretical properties of PMLE and OSE, as well as their approximations PMLE$_{\\mathrm{T}}$ and OSE$_{\\mathrm{T}}$ using covariance tapering, are derived, including consistency, sparsity, asymptotic normality and the oracle properties. For covariance tapering, a by-product of our theoretical results is consistency and asymptotic normal...
Eigenvector Subset Selection Using Bayesian Optimization Algorithm%基于贝叶斯优化算法的脸面特征向量子集选择
Institute of Scientific and Technical Information of China (English)
郭卫锋; 林亚平; 罗光平
2002-01-01
Eigenvector subset selection is the key to face recognition. In this paper ,we propose ESS-BOA, a newrandomized, population-based evolutionary algorithm which deals with the Eigenvector Subset Selection (ESS)prob-lem on face recognition application. In ESS-BOA ,the ESS problem, stated as a search problem ,uses the BayesianOptimization Algorithm (BOA) as searching engine and the distance degree as the object function to select eigenvec-tor. Experimental results show that ESS-BOA outperforms the traditional the eigenface selection algorithm.
Directory of Open Access Journals (Sweden)
Kaski Kimmo
2007-05-01
Full Text Available Abstract Background A key challenge in metabonomics is to uncover quantitative associations between multidimensional spectroscopic data and biochemical measures used for disease risk assessment and diagnostics. Here we focus on clinically relevant estimation of lipoprotein lipids by 1H NMR spectroscopy of serum. Results A Bayesian methodology, with a biochemical motivation, is presented for a real 1H NMR metabonomics data set of 75 serum samples. Lipoprotein lipid concentrations were independently obtained for these samples via ultracentrifugation and specific biochemical assays. The Bayesian models were constructed by Markov chain Monte Carlo (MCMC and they showed remarkably good quantitative performance, the predictive R-values being 0.985 for the very low density lipoprotein triglycerides (VLDL-TG, 0.787 for the intermediate, 0.943 for the low, and 0.933 for the high density lipoprotein cholesterol (IDL-C, LDL-C and HDL-C, respectively. The modelling produced a kernel-based reformulation of the data, the parameters of which coincided with the well-known biochemical characteristics of the 1H NMR spectra; particularly for VLDL-TG and HDL-C the Bayesian methodology was able to clearly identify the most characteristic resonances within the heavily overlapping information in the spectra. For IDL-C and LDL-C the resulting model kernels were more complex than those for VLDL-TG and HDL-C, probably reflecting the severe overlap of the IDL and LDL resonances in the 1H NMR spectra. Conclusion The systematic use of Bayesian MCMC analysis is computationally demanding. Nevertheless, the combination of high-quality quantification and the biochemical rationale of the resulting models is expected to be useful in the field of metabonomics.
UPS Delivers Optimal Phase Diagram in High Dimensional Variable Selection
Ji, Pengsheng
2010-01-01
Consider linear regression in the so-called regime of p much larger than n. We propose the UPS as a new variable selection method. This is a Screen and Clean procedure [Wasserman and Roeder (2009)], in which we screen with the Univariate thresholding, and clean with the Penalized MLE. In many situations, the UPS possesses two important properties: Sure Screening and Separable After Screening (SAS). These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. We measure the performance of variable selection procedure by the Hamming distance. In many situations, we find that the UPS achieves the optimal rate of convergence, and also yields an optimal partition of the so-called phase diagram. In the two-dimensional phase space calibrated by the signal sparsity and signal strength, there is a three-phase diagram shared by many choices of design matrices. In the first phase, it is possible to recover all signals. In the second phase, exa...
Yue, Yu Ryan; Wang, Xiao-Feng
2016-05-10
This paper is motivated from a retrospective study of the impact of vitamin D deficiency on the clinical outcomes for critically ill patients in multi-center critical care units. The primary predictors of interest, vitamin D2 and D3 levels, are censored at a known detection limit. Within the context of generalized linear mixed models, we investigate statistical methods to handle multiple censored predictors in the presence of auxiliary variables. A Bayesian joint modeling approach is proposed to fit the complex heterogeneous multi-center data, in which the data information is fully used to estimate parameters of interest. Efficient Monte Carlo Markov chain algorithms are specifically developed depending on the nature of the response. Simulation studies demonstrate the outperformance of the proposed Bayesian approach over other existing methods. An application to the data set from the vitamin D deficiency study is presented. Possible extensions of the method regarding the absence of auxiliary variables, semiparametric models, as well as the type of censoring are also discussed.
Secondary eclipses in the CoRoT light curves: A homogeneous search based on Bayesian model selection
Parviainen, Hannu; Belmonte, Juan Antonio
2012-01-01
We aim to identify and characterize secondary eclipses in the original light curves of all published CoRoT planets using uniform detection and evaluation critetia. Our analysis is based on a Bayesian model selection between two competing models: one with and one without an eclipse signal. The search is carried out by mapping the Bayes factor in favor of the eclipse model as a function of the eclipse center time, after which the characterization of plausible eclipse candidates is done by estimating the posterior distributions of the eclipse model parameters using Markov Chain Monte Carlo. We discover statistically significant eclipse events for two planets, CoRoT-6b and CoRoT-11b, and for one brown dwarf, CoRoT-15b. We also find marginally significant eclipse events passing our plausibility criteria for CoRoT-3b, 13b, 18b, and 21b. The previously published CoRoT-1b and CoRoT-2b eclipses are also confirmed.
Energy Technology Data Exchange (ETDEWEB)
Balabin, Roman M., E-mail: balabin@org.chem.ethz.ch [Department of Chemistry and Applied Biosciences, ETH Zurich, 8093 Zurich (Switzerland); Smirnov, Sergey V. [Unimilk Joint Stock Co., 143421 Moscow Region (Russian Federation)
2011-04-29
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm{sup -1}) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic
Relationship of adolescent self-esteem to selected academic variables.
Filozof, E M; Albertin, H K; Jones, C R; Steme, S S; Myers, L; McDermott, R J
1998-02-01
This study investigated whether self-esteem precedes various academic behaviors and beliefs among 593 high school students (63.7% female, 60.9% African American). Measures of home and school self-esteem, grade point average, perceived academic standing and progress, and educational plans were collected by survey and archival review of grade and attendance records at the beginning (pre-test) and end of the school year (post-test). Self-esteem and academic variables differed by gender, race, and guardianship. Self-esteem related significantly to academics and absenteeism. Results suggest selected academic variables predict self-esteem even when the effects of gender, race, and guardianship are removed and pretest self-esteem scores are controlled. In conclusion, student academic performance influences subsequent academic and home self-esteem. Creation of positive academic experiences for youth may be a critical activity, since experts contend that low self-esteem is associated with subsequent behavioral problems. The markedly lower self-esteem of Native American and Hispanic youth warrants further investigation.
Robust Bayesian Regularized Estimation Based on t Regression Model
Directory of Open Access Journals (Sweden)
Zean Li
2015-01-01
Full Text Available The t distribution is a useful extension of the normal distribution, which can be used for statistical modeling of data sets with heavy tails, and provides robust estimation. In this paper, in view of the advantages of Bayesian analysis, we propose a new robust coefficient estimation and variable selection method based on Bayesian adaptive Lasso t regression. A Gibbs sampler is developed based on the Bayesian hierarchical model framework, where we treat the t distribution as a mixture of normal and gamma distributions and put different penalization parameters for different regression coefficients. We also consider the Bayesian t regression with adaptive group Lasso and obtain the Gibbs sampler from the posterior distributions. Both simulation studies and real data example show that our method performs well compared with other existing methods when the error distribution has heavy tails and/or outliers.
Institute of Scientific and Technical Information of China (English)
Peixin ZHAO
2013-01-01
In this paper,we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data.By constructing a penalized auxiliary vector ingeniously,we propose an empirical likelihood based variable selection procedure,and show that it is consistent and satisfies the sparsity.The simulation studies show that the proposed variable selection method is workable.
Optimality of Graphlet Screening in High Dimensional Variable Selection
Jin, Jiashun; Zhang, Qi
2012-01-01
Consider a linear regression model where the design matrix X has n rows and p columns. We assume (a) p is much large than n, (b) the coefficient vector beta is sparse in the sense that only a small fraction of its coordinates is nonzero, and (c) the Gram matrix G = X'X is sparse in the sense that each row has relatively few large coordinates (diagonals of G are normalized to 1). The sparsity in G naturally induces the sparsity of the so-called graph of strong dependence (GOSD). We find an interesting interplay between the signal sparsity and the graph sparsity, which ensures that in a broad context, the set of true signals decompose into many different small-size components of GOSD, where different components are disconnected. We propose Graphlet Screening (GS) as a new approach to variable selection, which is a two-stage Screen and Clean method. The key methodological innovation of GS is to use GOSD to guide both the screening and cleaning. Compared to m-variate brute-forth screening that has a computational...
Birth order and selected work-related personality variables.
Phillips, A S; Bedeian, A G; Mossholder, K W; Touliatos, J
1988-12-01
A possible link between birth order and various individual characteristics (e. g., intelligence, potential eminence, need for achievement, sociability) has been suggested by personality theorists such as Adler for over a century. The present study examines whether birth order is associated with selected personality variables that may be related to various work outcomes. 3 of 7 hypotheses were supported and the effect sizes for these were small. Firstborns scored significantly higher than later borns on measures of dominance, good impression, and achievement via conformity. No differences between firstborns and later borns were found in managerial potential, work orientation, achievement via independence, and sociability. The study's sample consisted of 835 public, government, and industrial accountants responding to a national US survey of accounting professionals. The nature of the sample may have been partially responsible for the results obtained. Its homogeneity may have caused any birth order effects to wash out. It can be argued that successful membership in the accountancy profession requires internalization of a set of prescribed rules and standards. It may be that accountants as a group are locked in to a behavioral framework. Any differentiation would result from spurious interpersonal differences, not from predictable birth-order related characteristics. A final interpretation is that birth order effects are nonexistent or statistical artifacts. Given the present data and particularistic sample, however, the authors have insufficient information from which to draw such a conclusion.
Selective IgA Deficiency and Common Variable Immunodeficiency
Directory of Open Access Journals (Sweden)
Kadri Kamber
2009-09-01
Full Text Available Selective IgA deficiency (sIgAD, using 5 mg/dl of serum IgA as the upper limit for diagnosis and concomitant lack of secretory IgA, is the most common form of primary immunodeficiency. The pathogenesis of IgA deficiency is not known, although abnormalities in Ig class switching and the cytokines involved in isotype switching have been implicated. Common Variable Immunodeficiency (CVID is a heterogenous group of B cell deficiency syndromes characterized by hypogammaglobulinemia, impaired antibody production and recurrent bacterial infections. Defective T-cell activation may lead to an impairment in cognate T-B-cell interaction due to impaired expression of CD40 ligand and/or abnormalities in the production of T-cell-derived cytokines required for fully functional B-cell activation, proliferation and/or differentiation which could indeed explain the impairment in antibody production present in CVID patients. It has been found that cytokines are produced in low levels due to the decreased T cell function which occurs as a result of the defect in CD40L expression in CVID patients. (Journal of Current Pediatrics 2009; 7: 90-5
Institute of Scientific and Technical Information of China (English)
Zheng-yan Lin; Yu-ze Yuan
2012-01-01
Semiparametric models with diverging number of predictors arise in many contemporary scientific areas. Variable selection for these models consists of two components: model selection for non-parametric components and selection of significant variables for the parametric portion.In this paper,we consider a variable selection procedure by combining basis function approximation with SCAD penalty.The proposed procedure simultaneously selects significant variables in the parametric components and the nonparametric components.With appropriate selection of tuning parameters,we establish the consistency and sparseness of this procedure.
Variable Selection for Varying-Coefficient Models with Missing Response at Random
Institute of Scientific and Technical Information of China (English)
Pei Xin ZHAO; Liu Gen XUE
2011-01-01
In this paper, we present a variable selection procedure by combining basis function approximations with penalized estimating equations for varying-coefficient models with missing response at random. With appropriate selection of the tuning parameters, we establish the consistency of the variable selection procedure and the optimal convergence rate of the regularized estimators. A simulation study is undertaken to assess the finite sample performance of the proposed variable selection procedure.
DEFF Research Database (Denmark)
Jensen, Finn Verner; Nielsen, Thomas Dyhre
2016-01-01
and edges. The nodes represent variables, which may be either discrete or continuous. An edge between two nodes A and B indicates a direct influence between the state of A and the state of B, which in some domains can also be interpreted as a causal relation. The wide-spread use of Bayesian networks...
Hybrid Batch Bayesian Optimization
Azimi, Javad; Fern, Xiaoli
2012-01-01
Bayesian Optimization aims at optimizing an unknown non-convex/concave function that is costly to evaluate. We are interested in application scenarios where concurrent function evaluations are possible. Under such a setting, BO could choose to either sequentially evaluate the function, one input at a time and wait for the output of the function before making the next selection, or evaluate the function at a batch of multiple inputs at once. These two different settings are commonly referred to as the sequential and batch settings of Bayesian Optimization. In general, the sequential setting leads to better optimization performance as each function evaluation is selected with more information, whereas the batch setting has an advantage in terms of the total experimental time (the number of iterations). In this work, our goal is to combine the strength of both settings. Specifically, we systematically analyze Bayesian optimization using Gaussian process as the posterior estimator and provide a hybrid algorithm t...
Lesaffre, Emmanuel
2012-01-01
The growth of biostatistics has been phenomenal in recent years and has been marked by considerable technical innovation in both methodology and computational practicality. One area that has experienced significant growth is Bayesian methods. The growing use of Bayesian methodology has taken place partly due to an increasing number of practitioners valuing the Bayesian paradigm as matching that of scientific discovery. In addition, computational advances have allowed for more complex models to be fitted routinely to realistic data sets. Through examples, exercises and a combination of introd
Variability survey of brightest stars in selected OB associations
Laur, Jaan; Eenmäe, Tõnis; Tuvikene, Taavi; Leedjärv, Laurits
2016-01-01
The stellar evolution theory of massive stars remains uncalibrated with high-precision photometric observational data mainly due to a small number of luminous stars that are monitored from space. Automated all-sky surveys have revealed numerous variable stars but most of the luminous stars are often overexposed. Targeted campaigns can improve the time base of photometric data for those objects. The aim of this investigation is to study the variability of luminous stars at different timescales in young open clusters and OB associations. We monitored 22 open clusters and associations from 2011 to 2013 using a 0.25-m telescope. Variable stars were detected by comparing the overall light-curve scatter with measurement uncertainties. Variability was analysed by the light curve feature extraction tool FATS. Periods of pulsating stars were determined using the discrete Fourier transform code SigSpec. We then classified the variable stars based on their pulsation periods and available spectral information. We obtaine...
Variability survey of brightest stars in selected OB associations
Laur, Jaan; Kolka, Indrek; Eenmäe, Tõnis; Tuvikene, Taavi; Leedjärv, Laurits
2017-02-01
Context. The stellar evolution theory of massive stars remains uncalibrated with high-precision photometric observational data mainly due to a small number of luminous stars that are monitored from space. Automated all-sky surveys have revealed numerous variable stars but most of the luminous stars are often overexposed. Targeted campaigns can improve the time base of photometric data for those objects. Aims: The aim of this investigation is to study the variability of luminous stars at different timescales in young open clusters and OB associations. Methods: We monitored 22 open clusters and associations from 2011 to 2013 using a 0.25-m telescope. Variable stars were detected by comparing the overall light-curve scatter with measurement uncertainties. Variability was analysed by the light curve feature extraction tool FATS. Periods of pulsating stars were determined using the discrete Fourier transform code SigSpec. We then classified the variable stars based on their pulsation periods and available spectral information. Results: We obtained light curves for more than 20 000 sources of which 354 were found to be variable. Amongst them we find 80 eclipsing binaries, 31 α Cyg, 13 β Cep, 62 Be, 16 slowly pulsating B, 7 Cepheid, 1 γ Doradus, 3 Wolf-Rayet and 63 late-type variable stars. Up to 55% of these stars are potential new discoveries as they are not present in the Variable Star Index (VSX) database. We find the cluster membership fraction for variable stars to be 13% with an upper limit of 35%. Variable star catalogue (Tables A.1-A.10) and light curves are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/598/A108
Bayesian Face Sketch Synthesis.
Wang, Nannan; Gao, Xinbo; Sun, Leiyu; Li, Jie
2017-03-01
Exemplar-based face sketch synthesis has been widely applied to both digital entertainment and law enforcement. In this paper, we propose a Bayesian framework for face sketch synthesis, which provides a systematic interpretation for understanding the common properties and intrinsic difference in different methods from the perspective of probabilistic graphical models. The proposed Bayesian framework consists of two parts: the neighbor selection model and the weight computation model. Within the proposed framework, we further propose a Bayesian face sketch synthesis method. The essential rationale behind the proposed Bayesian method is that we take the spatial neighboring constraint between adjacent image patches into consideration for both aforementioned models, while the state-of-the-art methods neglect the constraint either in the neighbor selection model or in the weight computation model. Extensive experiments on the Chinese University of Hong Kong face sketch database demonstrate that the proposed Bayesian method could achieve superior performance compared with the state-of-the-art methods in terms of both subjective perceptions and objective evaluations.
Bayesian Analysis Made Simple An Excel GUI for WinBUGS
Woodward, Philip
2011-01-01
From simple NLMs to complex GLMMs, this book describes how to use the GUI for WinBUGS - BugsXLA - an Excel add-in written by the author that allows a range of Bayesian models to be easily specified. With case studies throughout, the text shows how to routinely apply even the more complex aspects of model specification, such as GLMMs, outlier robust models, random effects Emax models, auto-regressive errors, and Bayesian variable selection. It provides brief, up-to-date discussions of current issues in the practical application of Bayesian methods. The author also explains how to obtain free so
Bayesian modeling using WinBUGS
Ntzoufras, Ioannis
2009-01-01
A hands-on introduction to the principles of Bayesian modeling using WinBUGS Bayesian Modeling Using WinBUGS provides an easily accessible introduction to the use of WinBUGS programming techniques in a variety of Bayesian modeling settings. The author provides an accessible treatment of the topic, offering readers a smooth introduction to the principles of Bayesian modeling with detailed guidance on the practical implementation of key principles. The book begins with a basic introduction to Bayesian inference and the WinBUGS software and goes on to cover key topics, including: Markov Chain Monte Carlo algorithms in Bayesian inference Generalized linear models Bayesian hierarchical models Predictive distribution and model checking Bayesian model and variable evaluation Computational notes and screen captures illustrate the use of both WinBUGS as well as R software to apply the discussed techniques. Exercises at the end of each chapter allow readers to test their understanding of the presented concepts and all ...
新家, 健精
2013-01-01
© 2012 Springer Science+Business Media, LLC. All rights reserved. Article Outline: Glossary Definition of the Subject and Introduction The Bayesian Statistical Paradigm Three Examples Comparison with the Frequentist Statistical Paradigm Future Directions Bibliography
Gilkey, Kelly M.; Myers, Jerry G.; McRae, Michael P.; Griffin, Elise A.; Kallrui, Aditya S.
2012-01-01
The Exploration Medical Capability project is creating a catalog of risk assessments using the Integrated Medical Model (IMM). The IMM is a software-based system intended to assist mission planners in preparing for spaceflight missions by helping them to make informed decisions about medical preparations and supplies needed for combating and treating various medical events using Probabilistic Risk Assessment. The objective is to use statistical analyses to inform the IMM decision tool with estimated probabilities of medical events occurring during an exploration mission. Because data regarding astronaut health are limited, Bayesian statistical analysis is used. Bayesian inference combines prior knowledge, such as data from the general U.S. population, the U.S. Submarine Force, or the analog astronaut population located at the NASA Johnson Space Center, with observed data for the medical condition of interest. The posterior results reflect the best evidence for specific medical events occurring in flight. Bayes theorem provides a formal mechanism for combining available observed data with data from similar studies to support the quantification process. The IMM team performed Bayesian updates on the following medical events: angina, appendicitis, atrial fibrillation, atrial flutter, dental abscess, dental caries, dental periodontal disease, gallstone disease, herpes zoster, renal stones, seizure, and stroke.
Ciarleglio, Maria M; Arendt, Christopher D; Makuch, Robert W; Peduzzi, Peter N
2015-03-01
Specification of the treatment effect that a clinical trial is designed to detect (θA) plays a critical role in sample size and power calculations. However, no formal method exists for using prior information to guide the choice of θA. This paper presents a hybrid classical and Bayesian procedure for choosing an estimate of the treatment effect to be detected in a clinical trial that formally integrates prior information into this aspect of trial design. The value of θA is found that equates the pre-specified frequentist power and the conditional expected power of the trial. The conditional expected power averages the traditional frequentist power curve using the conditional prior distribution of the true unknown treatment effect θ as the averaging weight. The Bayesian prior distribution summarizes current knowledge of both the magnitude of the treatment effect and the strength of the prior information through the assumed spread of the distribution. By using a hybrid classical and Bayesian approach, we are able to formally integrate prior information on the uncertainty and variability of the treatment effect into the design of the study, mitigating the risk that the power calculation will be overly optimistic while maintaining a frequentist framework for the final analysis. The value of θA found using this method may be written as a function of the prior mean μ0 and standard deviation τ0, with a unique relationship for a given ratio of μ0/τ0. Results are presented for Normal, Uniform, and Gamma priors for θ.
Fluctuating selection: the perpetual renewal of adaptation in variable environments.
Bell, Graham
2010-01-12
Darwin insisted that evolutionary change occurs very slowly over long periods of time, and this gradualist view was accepted by his supporters and incorporated into the infinitesimal model of quantitative genetics developed by R. A. Fisher and others. It dominated the first century of evolutionary biology, but has been challenged in more recent years both by field surveys demonstrating strong selection in natural populations and by quantitative trait loci and genomic studies, indicating that adaptation is often attributable to mutations in a few genes. The prevalence of strong selection seems inconsistent, however, with the high heritability often observed in natural populations, and with the claim that the amount of morphological change in contemporary and fossil lineages is independent of elapsed time. I argue that these discrepancies are resolved by realistic accounts of environmental and evolutionary changes. First, the physical and biotic environment varies on all time-scales, leading to an indefinite increase in environmental variance over time. Secondly, the intensity and direction of natural selection are also likely to fluctuate over time, leading to an indefinite increase in phenotypic variance in any given evolving lineage. Finally, detailed long-term studies of selection in natural populations demonstrate that selection often changes in direction. I conclude that the traditional gradualist scheme of weak selection acting on polygenic variation should be supplemented by the view that adaptation is often based on oligogenic variation exposed to commonplace, strong, fluctuating natural selection.
Cross Validation of Selection of Variables in Multiple Regression.
1979-12-01
Bomber IBMNAV * BOMNAV Navigation-Cargo * * CARNAV Sensory-Fighter * SF FGTSEN Sensory - Bomber * SB BOMSEN Communication - Fighter IFGCOM CF FGTCOM...of Variables Variable No. Recode FGTNAV 1 0 LESS THAN 1 1 OR OVER BONNAV 2 0 LESS THAN S1 OR OVER CARNAV 3 0 LESS THAN S1 OR OVER FGTSEN 4 0 LESS THAN...cc x x x x x x x CARNAV X X X X X X x XMTR x X X X X x PD X x X X X X UP x- *Those which AID determined. 44 This value was lowered to 3 in the
Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.
2013-01-01
In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.
Bayesian peak picking for NMR spectra.
Cheng, Yichen; Gao, Xin; Liang, Faming
2014-02-01
Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein-DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method.
Bayesian Peak Picking for NMR Spectra
Cheng, Yichen
2014-02-01
Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method.
Implementing Bayesian Vector Autoregressions Implementing Bayesian Vector Autoregressions
Directory of Open Access Journals (Sweden)
Richard M. Todd
1988-03-01
Full Text Available Implementing Bayesian Vector Autoregressions This paper discusses how the Bayesian approach can be used to construct a type of multivariate forecasting model known as a Bayesian vector autoregression (BVAR. In doing so, we mainly explain Doan, Littermann, and Sims (1984 propositions on how to estimate a BVAR based on a certain family of prior probability distributions. indexed by a fairly small set of hyperparameters. There is also a discussion on how to specify a BVAR and set up a BVAR database. A 4-variable model is used to iliustrate the BVAR approach.
Venkat, Priya S; Narayanan, Krishna R; Datta, Aniruddha
2017-04-01
An important problem in computational biology is the identification of potential points of intervention that can lead to modified network behavior in a genetic regulatory network. We consider the problem of deducing the effect of individual genes on the behavior of the network in a statistical framework. In this article, we make use of biological information from the literature to develop a Bayesian network and introduce a method to estimate parameters of this network using data that are relevant to the biological phenomena under study. Then, we give a novel approach to select significant nodes in the network using a decision-theoretic approach. The proposed method is applied to the analysis of the mitogen-activated protein kinase pathway in the plant defense response to pathogens. Results from applying the method to experimental data show that the proposed approach is effective in selecting genes that play crucial roles in the biological phenomenon being studied.
Implementations of tests on the exogeneity of selected variables and their performance in practice
Pleus, M.
2015-01-01
In order to consistently estimate a causal economic relationship at least as many exogenous non-explanatory instrumental variables are required as there are endogenous explanatory variables. This thesis studies various techniques that can be used to classify selected variables as either exogenous or
Recursive Feature Selection with Significant Variables of Support Vectors
Directory of Open Access Journals (Sweden)
Chen-An Tsai
2012-01-01
Full Text Available The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE and recursive support vector machine (RSVM. The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.
VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.
Bayesian phylogeography finds its roots.
Directory of Open Access Journals (Sweden)
Philippe Lemey
2009-09-01
Full Text Available As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has been frequently interpreted in the light of their genetic histories. Unfortunately, inference of historical dispersal or migration patterns of viruses has mainly been restricted to model-free heuristic approaches that provide little insight into the temporal setting of the spatial dynamics. The introduction of probabilistic models of evolution, however, offers unique opportunities to engage in this statistical endeavor. Here we introduce a Bayesian framework for inference, visualization and hypothesis testing of phylogeographic history. By implementing character mapping in a Bayesian software that samples time-scaled phylogenies, we enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. Standard Markov model inference is extended with a stochastic search variable selection procedure that identifies the parsimonious descriptions of the diffusion process. In addition, we propose priors that can incorporate geographical sampling distributions or characterize alternative hypotheses about the spatial dynamics. To visualize the spatial and temporal information, we summarize inferences using virtual globe software. We describe how Bayesian phylogeography compares with previous parsimony analysis in the investigation of the influenza A H5N1 origin and H5N1 epidemiological linkage among sampling localities. Analysis of rabies in West African dog populations reveals how virus diffusion may enable endemic maintenance through continuous epidemic cycles. From these analyses, we conclude that our phylogeographic framework will make an important asset in molecular epidemiology that can be easily generalized to infer biogeogeography from genetic data for many organisms.
Anthropogenic environments exert variable selection on cranial capacity in mammals.
Snell-Rood, Emilie C; Wick, Naomi
2013-10-22
It is thought that behaviourally flexible species will be able to cope with novel and rapidly changing environments associated with human activity. However, it is unclear whether such environments are selecting for increases in behavioural plasticity, and whether some species show more pronounced evolutionary changes in plasticity. To test whether anthropogenic environments are selecting for increased behavioural plasticity within species, we measured variation in relative cranial capacity over time and space in 10 species of mammals. We predicted that urban populations would show greater cranial capacity than rural populations and that cranial capacity would increase over time in urban populations. Based on relevant theory, we also predicted that species capable of rapid population growth would show more pronounced evolutionary responses. We found that urban populations of two small mammal species had significantly greater cranial capacity than rural populations. In addition, species with higher fecundity showed more pronounced differentiation between urban and rural populations. Contrary to expectations, we found no increases in cranial capacity over time in urban populations-indeed, two species tended to have a decrease in cranial capacity over time in urban populations. Furthermore, rural populations of all insectivorous species measured showed significant increases in relative cranial capacity over time. Our results provide partial support for the hypothesis that urban environments select for increased behavioural plasticity, although this selection may be most pronounced early during the urban colonization process. Furthermore, these data also suggest that behavioural plasticity may be simultaneously favoured in rural environments, which are also changing because of human activity.
Bernardo, Jose M
2000-01-01
This highly acclaimed text, now available in paperback, provides a thorough account of key concepts and theoretical results, with particular emphasis on viewing statistical inference as a special case of decision theory. Information-theoretic concepts play a central role in the development of the theory, which provides, in particular, a detailed discussion of the problem of specification of so-called prior ignorance . The work is written from the authors s committed Bayesian perspective, but an overview of non-Bayesian theories is also provided, and each chapter contains a wide-ranging critica
Villalba, Jesús
2015-01-01
In this document we are going to derive the equations needed to implement a Variational Bayes estimation of the parameters of the simplified probabilistic linear discriminant analysis (SPLDA) model. This can be used to adapt SPLDA from one database to another with few development data or to implement the fully Bayesian recipe. Our approach is similar to Bishop's VB PPCA.
Thresholded Lasso for high dimensional variable selection and statistical estimation
Zhou, Shuheng
2010-01-01
Given $n$ noisy samples with $p$ dimensions, where $n \\ll p$, we show that the multi-step thresholding procedure based on the Lasso -- we call it the {\\it Thresholded Lasso}, can accurately estimate a sparse vector $\\beta \\in \\R^p$ in a linear model $Y = X \\beta + \\epsilon$, where $X$ is an $n \\times p$ design matrix, and $\\epsilon \\sim N(0, \\sigma^2 I_n)$. We show that under the restricted eigenvalue (RE) condition (Bickel-Ritov-Tsybakov 09), it is possible to achieve the $\\ell_2$ loss within a logarithmic factor of the ideal mean square error one would achieve with an {\\em oracle} while selecting a sufficiently sparse model -- hence achieving {\\it sparse oracle inequalities}; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. In some sense, the Thresholded Lasso recovers the choices that would have been made by the $\\ell_0$ penalized least squares estimators, in that it selects a sufficiently sparse model without sacrificing the accuracy in ...
Cameron, Ewan
2013-01-01
In the second paper of this series we extend our Bayesian reanalysis of the evidence for a cosmic variation of the fine structure constant to the semi-parametric modelling regime. By adopting a mixture of Dirichlet processes prior for the unexplained errors in each instrumental subgroup of the benchmark quasar dataset we go some way towards freeing our model selection procedure from the apparent subjectivity of a fixed distributional form. Despite the infinite-dimensional domain of the error hierarchy so constructed we are able to demonstrate a recursive scheme for marginal likelihood estimation with prior-sensitivity analysis directly analogous to that presented in Paper I, thereby allowing the robustness of our posterior Bayes factors to hyper-parameter choice and model specification to be readily verified. In the course of this work we elucidate various similarities between unexplained error problems in the seemingly disparate fields of astronomy and clinical meta-analysis, and we highlight a number of sop...
da Silva, Arlindo M.; Norris, Peter M.
2013-01-01
Part I presented a Monte Carlo Bayesian method for constraining a complex statistical model of GCM sub-gridcolumn moisture variability using high-resolution MODIS cloud data, thereby permitting large-scale model parameter estimation and cloud data assimilation. This part performs some basic testing of this new approach, verifying that it does indeed significantly reduce mean and standard deviation biases with respect to the assimilated MODIS cloud optical depth, brightness temperature and cloud top pressure, and that it also improves the simulated rotational-Ramman scattering cloud optical centroid pressure (OCP) against independent (non-assimilated) retrievals from the OMI instrument. Of particular interest, the Monte Carlo method does show skill in the especially difficult case where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach allows finite jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. This paper also examines a number of algorithmic and physical sensitivities of the new method and provides guidance for its cost-effective implementation. One obvious difficulty for the method, and other cloud data assimilation methods as well, is the lack of information content in the cloud observables on cloud vertical structure, beyond cloud top pressure and optical thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard (1998) provides some help in this respect, by better honoring inversion structures in the background state.
Variability-based Active Galactic Nucleus Selection Using Image Subtraction in the SDSS and LSST Era
Choi, Yumi; Gibson, Robert R.; Becker, Andrew C.; Ivezić, Željko; Connolly, Andrew J.; MacLeod, Chelsea L.; Ruan, John J.; Anderson, Scott F.
2014-02-01
With upcoming all-sky surveys such as LSST poised to generate a deep digital movie of the optical sky, variability-based active galactic nucleus (AGN) selection will enable the construction of highly complete catalogs with minimum contamination. In this study, we generate g-band difference images and construct light curves (LCs) for QSO/AGN candidates listed in Sloan Digital Sky Survey Stripe 82 public catalogs compiled from different methods, including spectroscopy, optical colors, variability, and X-ray detection. Image differencing excels at identifying variable sources embedded in complex or blended emission regions such as Type II AGNs and other low-luminosity AGNs that may be omitted from traditional photometric or spectroscopic catalogs. To separate QSOs/AGNs from other sources using our difference image LCs, we explore several LC statistics and parameterize optical variability by the characteristic damping timescale (τ) and variability amplitude. By virtue of distinguishable variability parameters of AGNs, we are able to select them with high completeness of 93.4% and efficiency (i.e., purity) of 71.3%. Based on optical variability, we also select highly variable blazar candidates, whose infrared colors are consistent with known blazars. One-third of them are also radio detected. With the X-ray selected AGN candidates, we probe the optical variability of X-ray detected optically extended sources using their difference image LCs for the first time. A combination of optical variability and X-ray detection enables us to select various types of host-dominated AGNs. Contrary to the AGN unification model prediction, two Type II AGN candidates (out of six) show detectable variability on long-term timescales like typical Type I AGNs. This study will provide a baseline for future optical variability studies of extended sources.
Directory of Open Access Journals (Sweden)
Keeeun Lee
2016-01-01
Full Text Available The enhanced R&D cooperative efforts between large firms and small and medium-sized enterprises (SMEs have been emphasized to perform innovation projects and succeed in deploying profitable businesses. In order to promote such win-win alliances, it is necessary to consider the capabilities of large firms and SMEs, respectively. Thus, this paper proposes a new approach of partner selection when a large firm assesses SMEs as potential candidates for R&D collaboration. The first step of the suggested approach is to define the necessary technology for a firm by referring to a structured technology roadmap, which is a useful technique in the partner selection from the perspectives of a large firm. Second, a list of appropriate SME candidates is generated by patent information. Finally, a Bayesian network model is formulated to select an SME as an R&D collaboration partner which fits in the industry and the large firm by utilizing a bibliography with United States patents. This paper applies the proposed approach to the semiconductor industry and selects potential R&D partners for a large firm. This paper will explain how to use the model as a systematic and analytic approach for creating effective partnerships between large firms and SMEs.
Variable selection for multiply-imputed data with application to dioxin exposure study.
Chen, Qixuan; Wang, Sijian
2013-09-20
Multiple imputation (MI) is a commonly used technique for handling missing data in large-scale medical and public health studies. However, variable selection on multiply-imputed data remains an important and longstanding statistical problem. If a variable selection method is applied to each imputed dataset separately, it may select different variables for different imputed datasets, which makes it difficult to interpret the final model or draw scientific conclusions. In this paper, we propose a novel multiple imputation-least absolute shrinkage and selection operator (MI-LASSO) variable selection method as an extension of the least absolute shrinkage and selection operator (LASSO) method to multiply-imputed data. The MI-LASSO method treats the estimated regression coefficients of the same variable across all imputed datasets as a group and applies the group LASSO penalty to yield a consistent variable selection across multiple-imputed datasets. We use a simulation study to demonstrate the advantage of the MI-LASSO method compared with the alternatives. We also apply the MI-LASSO method to the University of Michigan Dioxin Exposure Study to identify important circumstances and exposure factors that are associated with human serum dioxin concentration in Midland, Michigan.
Compiling Relational Bayesian Networks for Exact Inference
DEFF Research Database (Denmark)
Jaeger, Manfred; Chavira, Mark; Darwiche, Adnan
2004-01-01
We describe a system for exact inference with relational Bayesian networks as defined in the publicly available \\primula\\ tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating...... and differentiating these circuits in time linear in their size. We report on experimental results showing the successful compilation, and efficient inference, on relational Bayesian networks whose {\\primula}--generated propositional instances have thousands of variables, and whose jointrees have clusters...
Bayesian LASSO, scale space and decision making in association genetics.
Directory of Open Access Journals (Sweden)
Leena Pasanen
Full Text Available LASSO is a penalized regression method that facilitates model fitting in situations where there are as many, or even more explanatory variables than observations, and only a few variables are relevant in explaining the data. We focus on the Bayesian version of LASSO and consider four problems that need special attention: (i controlling false positives, (ii multiple comparisons, (iii collinearity among explanatory variables, and (iv the choice of the tuning parameter that controls the amount of shrinkage and the sparsity of the estimates. The particular application considered is association genetics, where LASSO regression can be used to find links between chromosome locations and phenotypic traits in a biological organism. However, the proposed techniques are relevant also in other contexts where LASSO is used for variable selection.We separate the true associations from false positives using the posterior distribution of the effects (regression coefficients provided by Bayesian LASSO. We propose to solve the multiple comparisons problem by using simultaneous inference based on the joint posterior distribution of the effects. Bayesian LASSO also tends to distribute an effect among collinear variables, making detection of an association difficult. We propose to solve this problem by considering not only individual effects but also their functionals (i.e. sums and differences. Finally, whereas in Bayesian LASSO the tuning parameter is often regarded as a random variable, we adopt a scale space view and consider a whole range of fixed tuning parameters, instead. The effect estimates and the associated inference are considered for all tuning parameters in the selected range and the results are visualized with color maps that provide useful insights into data and the association problem considered. The methods are illustrated using two sets of artificial data and one real data set, all representing typical settings in association genetics.
Discriminative variable selection for clustering with the sparse Fisher-EM algorithm
Bouveyron, Charles
2012-01-01
The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the efficiency of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through $\\ell_{1}$-type penalizations. Experimental comparisons with existing approach...
Variability-based AGN selection using image subtraction in the SDSS and LSST era
Choi, Yumi; Becker, Andrew C; Ivezić, \\vZeljko; Connolly, Andrew J; MacLeod, Chelsea L; Ruan, John J; Anderson, Scott F
2013-01-01
With upcoming all sky surveys such as LSST poised to generate a deep digital movie of the optical sky, variability-based AGN selection will enable the construction of highly-complete catalogs with minimum contamination. In this study, we generate $g$-band difference images and construct light curves for QSO/AGN candidates listed in SDSS Stripe 82 public catalogs compiled from different methods, including spectroscopy, optical colors, variability, and X-ray detection. Image differencing excels at identifying variable sources embedded in complex or blended emission regions such as Type II AGNs and other low-luminosity AGNs that may be omitted from traditional photometric or spectroscopic catalogs. To separate QSOs/AGNs from other sources using our difference image light curves, we explore several light curve statistics and parameterize optical variability by the characteristic damping timescale ($\\tau$) and variability amplitude. By virtue of distinguishable variability parameters of AGNs, we are able to select...
Variable selection in identification of a high dimensional nonlinear non-parametric system
Institute of Scientific and Technical Information of China (English)
Er-Wei BAI; Wenxiao ZHAO; Weixing ZHENG
2015-01-01
The problem of variable selection in system identification of a high dimensional nonlinear non-parametric system is described. The inherent difficulty, the curse of dimensionality, is introduced. Then its connections to various topics and research areas are briefly discussed, including order determination, pattern recognition, data mining, machine learning, statistical regression and manifold embedding. Finally, some results of variable selection in system identification in the recent literature are presented.
Variable selection in multiple linear regression: The influence of individual cases
Directory of Open Access Journals (Sweden)
SJ Steel
2007-12-01
Full Text Available The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the C_p criterion and Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is proposed as the selection influence of the specific omitted case. Four standard examples from the literature are considered and the selection influence of the cases is calculated. It is argued that the selection procedure may be improved by taking the selection influence of individual data cases into account.
Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection
DEFF Research Database (Denmark)
Karaman, Ibrahim; Qannari, El Mostafa; Martens, Harald
2013-01-01
The objective of this study was to compare two different techniques of variable selection, Sparse PLSR and Jack-knife PLSR, with respect to their predictive ability and their ability to identify relevant variables. Sparse PLSR is a method that is frequently used in genomics, whereas Jack-knife PL...
Beecher, Clarence
1978-01-01
Attempts to determine significant differences between selected variables and attitudes toward participation in curriculum planning, curriculum use, adaptation of curriculum content, and curriculum role patterns and to verify significant relationships between the dependent variables and teaching experience, grade level, and education. Data were…
Hedlund, Jonas
2014-01-01
This paper introduces private sender information into a sender-receiver game of Bayesian persuasion with monotonic sender preferences. I derive properties of increasing differences related to the precision of signals and use these to fully characterize the set of equilibria robust to the intuitive criterion. In particular, all such equilibria are either separating, i.e., the sender's choice of signal reveals his private information to the receiver, or fully disclosing, i.e., the outcome of th...
Kirstein, Roland
2005-01-01
This paper presents a modification of the inspection game: The ?Bayesian Monitoring? model rests on the assumption that judges are interested in enforcing compliant behavior and making correct decisions. They may base their judgements on an informative but imperfect signal which can be generated costlessly. In the original inspection game, monitoring is costly and generates a perfectly informative signal. While the inspection game has only one mixed strategy equilibrium, three Perfect Bayesia...
3D Bayesian contextual classifiers
DEFF Research Database (Denmark)
Larsen, Rasmus
2000-01-01
We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours.......We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours....
The Time Domain Spectroscopic Survey: Variable Object Selection and Anticipated Results
Morganson, Eric; Anderson, Scott F; Ruan, John J; Myers, Adam D; Eracleous, Michael; Kelly, Brandon; Badenes, Carlos; Banados, Eduardo; Blanton, Michael R; Bershady, Matthew A; Borissova, Jura; Brandt, William Nielsen; Burgett, William S; Chambers, Kenneth; Draper, Peter W; Davenport, James R A; Flewelling, Heather; Garnavich, Peter; Hawley, Suzanne L; Hodapp, Klaus W; Isler, Jedidah C; Kaiser, Nick; Kinemuchi, Karen; Kudritzki, Rolf P; Metcalfe, Nigel; Morgan, Jeffrey S; Paris, Isabelle; Parvizi, Mahmoud; Poleski, Radoslaw; Price, Paul A; Salvato, Mara; Shanks, Tom; Schlafly, Eddie F; Schneider, Donald P; Shen, Yue; Stassun, Keivan; Tonry, John T; Walter, Fabian; Waters, Chris Z
2015-01-01
We present the selection algorithm and anticipated results for the Time Domain Spectroscopic Survey (TDSS). TDSS is an SDSS-IV eBOSS subproject that will provide initial identification spectra of approximately 220,000 luminosity-variable objects (variable stars and AGN) across 7,500 square degrees selected from a combination of SDSS and multi-epoch Pan-STARRS1 photometry. TDSS will be the largest spectroscopic survey to explicitly target variable objects, avoiding pre-selection on the basis of colors or detailed modeling of specific variability characteristics. Kernel Density Estimate (KDE) analysis of our target population performed on SDSS Stripe 82 data suggests our target sample will be 95% pure (meaning 95% of objects we select have genuine luminosity variability of a few magnitudes or more). Our final spectroscopic sample will contain roughly 135,000 quasars and 85,000 stellar variables, approximately 4,000 of which will be RR Lyrae stars which may be used as outer Milky Way probes. The variability-sele...
Wachter, Jenny; Hill, Stuart
2016-01-01
Pathogenic species of Neisseria utilize variable outer membrane proteins to facilitate infection and proliferation within the human host. However, the mechanisms behind the evolution of these variable alleles remain largely unknown due to analysis of previously limited datasets. In this study, we have expanded upon the previous analyses to substantially increase the number of analyzed sequences by including multiple diverse strains, from various geographic locations, to determine whether positive selective pressure is exerted on the evolution of these variable genes. Although Neisseria are naturally competent, this analysis indicates that only intrastrain horizontal gene transfer among the pathogenic Neisseria principally account for these genes exhibiting linkage equilibrium which drives the polymorphisms evidenced within these alleles. As the majority of polymorphisms occur across species, the divergence of these variable genes is dependent upon the species and is independent of geographical location, disease severity, or serogroup. Tests of neutrality were able to detect strong selection pressures acting upon both the opa and pil gene families, and were able to locate the majority of these sites within the exposed variable regions of the encoded proteins. Evidence of positive selection acting upon the hypervariable domains of Opa contradicts previous beliefs and provides evidence for selection of receptor binding. As the pathogenic Neisseria reside exclusively within the human host, the strong selection pressures acting upon both the opa and pil gene families provide support for host immune system pressure driving sequence polymorphisms within these variable genes.
Variable selection methods in PLS regression - a comparison study on metabolomics data
DEFF Research Database (Denmark)
Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach
for the purpose of reducing over-fitting problems and providing useful interpretation tools. It has excellent possibilities for giving a graphical overview of sample and variation patterns. It can handle co-linearity in an efficient way and make it possible to use different highly correlated data sets in one...... different strategies for variable selection on PLSR method were considered and compared with respect to selected subset of variables and the possibility for biological validation. Sparse PLSR [1] as well as PLSR with Jack-knifing [2] was applied to data in order to achieve variable selection prior...... to comparison. Sparse PLSR is based on penalization of the loading weights (by elastic net, soft/hard thresholding etc.) on a PLSR model. In PLSR with Jack-knifing, significance of variables are calculated by uncertainty test. The data set used in this study is LC-MS data from an animal intervention study...
Novel Harmonic Regularization Approach for Variable Selection in Cox’s Proportional Hazards Model
Directory of Open Access Journals (Sweden)
Ge-Jin Chu
2014-01-01
Full Text Available Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq (1/2select key risk factors in the Cox’s proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL, the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.
A survey of variable selection methods in two Chinese epidemiology journals
Directory of Open Access Journals (Sweden)
Lynn Henry S
2010-09-01
Full Text Available Abstract Background Although much has been written on developing better procedures for variable selection, there is little research on how it is practiced in actual studies. This review surveys the variable selection methods reported in two high-ranking Chinese epidemiology journals. Methods Articles published in 2004, 2006, and 2008 in the Chinese Journal of Epidemiology and the Chinese Journal of Preventive Medicine were reviewed. Five categories of methods were identified whereby variables were selected using: A - bivariate analyses; B - multivariable analysis; e.g. stepwise or individual significance testing of model coefficients; C - first bivariate analyses, followed by multivariable analysis; D - bivariate analyses or multivariable analysis; and E - other criteria like prior knowledge or personal judgment. Results Among the 287 articles that reported using variable selection methods, 6%, 26%, 30%, 21%, and 17% were in categories A through E, respectively. One hundred sixty-three studies selected variables using bivariate analyses, 80% (130/163 via multiple significance testing at the 5% alpha-level. Of the 219 multivariable analyses, 97 (44% used stepwise procedures, 89 (41% tested individual regression coefficients, but 33 (15% did not mention how variables were selected. Sixty percent (58/97 of the stepwise routines also did not specify the algorithm and/or significance levels. Conclusions The variable selection methods reported in the two journals were limited in variety, and details were often missing. Many studies still relied on problematic techniques like stepwise procedures and/or multiple testing of bivariate associations at the 0.05 alpha-level. These deficiencies should be rectified to safeguard the scientific validity of articles published in Chinese epidemiology journals.
Probability and Bayesian statistics
1987-01-01
This book contains selected and refereed contributions to the "Inter national Symposium on Probability and Bayesian Statistics" which was orga nized to celebrate the 80th birthday of Professor Bruno de Finetti at his birthplace Innsbruck in Austria. Since Professor de Finetti died in 1985 the symposium was dedicated to the memory of Bruno de Finetti and took place at Igls near Innsbruck from 23 to 26 September 1986. Some of the pa pers are published especially by the relationship to Bruno de Finetti's scientific work. The evolution of stochastics shows growing importance of probability as coherent assessment of numerical values as degrees of believe in certain events. This is the basis for Bayesian inference in the sense of modern statistics. The contributions in this volume cover a broad spectrum ranging from foundations of probability across psychological aspects of formulating sub jective probability statements, abstract measure theoretical considerations, contributions to theoretical statistics an...
Bessiere, Pierre; Ahuactzin, Juan Manuel; Mekhnacha, Kamel
2013-01-01
Probability as an Alternative to Boolean LogicWhile logic is the mathematical foundation of rational reasoning and the fundamental principle of computing, it is restricted to problems where information is both complete and certain. However, many real-world problems, from financial investments to email filtering, are incomplete or uncertain in nature. Probability theory and Bayesian computing together provide an alternative framework to deal with incomplete and uncertain data. Decision-Making Tools and Methods for Incomplete and Uncertain DataEmphasizing probability as an alternative to Boolean
Kim, Dae-Won; Byun, Yong-Ik; Alcock, Charles; Khardon, Roni
2011-01-01
We present a new QSO selection algorithm using a Support Vector Machine (SVM), a supervised classification method, on a set of extracted times series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars and microlensing events using 58 known QSOs, 1,629 variable stars and 4,288 non-variables using the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ~80% of known QSOs with a 25% false positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) dataset, which consists of 40 million lightcurves, and found 1,620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false po...
Studies of an x ray selected sample of cataclysmic variables. Ph.D. Thesis
Silber, Andrew D.
1986-01-01
Just prior to the thesis research, an all-sky survey in hard x rays with the HEAO-1 satellite and further observations in the optical resulted in a catalog of about 700 x-ray sources with known optical counterparts. This sample includes 43 cataclysmic variables, which are binaries consisting of a detached white-dwarf and a Roche lobe filling companion star. This thesis consists of studies of the x-ray selected sample of catalcysmic variables.
Low rank updated LS-SVM classifiers for fast variable selection.
Ojeda, Fabian; Suykens, Johan A K; De Moor, Bart
2008-01-01
Least squares support vector machine (LS-SVM) classifiers are a class of kernel methods whose solution follows from a set of linear equations. In this work we present low rank modifications to the LS-SVM classifiers that are useful for fast and efficient variable selection. The inclusion or removal of a candidate variable can be represented as a low rank modification to the kernel matrix (linear kernel) of the LS-SVM classifier. In this way, the LS-SVM solution can be updated rather than being recomputed, which improves the efficiency of the overall variable selection process. Relevant variables are selected according to a closed form of the leave-one-out (LOO) error estimator, which is obtained as a by-product of the low rank modifications. The proposed approach is applied to several benchmark data sets as well as two microarray data sets. When compared to other related algorithms used for variable selection, simulations applying our approach clearly show a lower computational complexity together with good stability on the generalization error.
Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection
Chen, Lisha
2012-12-01
The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.
Seleção de variáveis em QSAR Variable selection in QSAR
Directory of Open Access Journals (Sweden)
Márcia Miguel Castro Ferreira
2002-05-01
Full Text Available The process of building mathematical models in quantitative structure-activity relationship (QSAR studies is generally limited by the size of the dataset used to select variables from. For huge datasets, the task of selecting a given number of variables that produces the best linear model can be enormous, if not unfeasible. In this case, some methods can be used to separate good parameter combinations from the bad ones. In this paper three methodologies are analyzed: systematic search, genetic algorithm and chemometric methods. These methods have been exposed and discussed through practical examples.
The use of vector bootstrapping to improve variable selection precision in Lasso models.
Laurin, Charles; Boomsma, Dorret; Lubke, Gitta
2016-08-01
The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping. Data were simulated to represent genomic data under a polygenic model as well as under a model with effect sizes representative of typical GWAS results. We compared these approaches to each other as well as to software defaults for the Lasso. Nested cross-validation had the most precise variable selection at small effect sizes. At larger effect sizes, there was no advantage to nesting. We illustrated the nested approach with empirical data comprising SNPs and SNP-SNP interactions from the most significant SNPs in a GWAS of borderline personality symptoms. In the empirical example, we found that the default Lasso selected low-reliability SNPs and interactions which were excluded by bootstrapping.
Current Debates on Variability in Child Welfare Decision-Making: A Selected Literature Review
Directory of Open Access Journals (Sweden)
Emily Keddell
2014-11-01
Full Text Available This article considers selected drivers of decision variability in child welfare decision-making and explores current debates in relation to these drivers. Covering the related influences of national orientation, risk and responsibility, inequality and poverty, evidence-based practice, constructions of abuse and its causes, domestic violence and cognitive processes, it discusses the literature in regards to how each of these influences decision variability. It situates these debates in relation to the ethical issue of variability and the equity issues that variability raises. I propose that despite the ecological complexity that drives decision variability, that improving internal (within-country decision consistency is still a valid goal. It may be that the use of annotated case examples, kind learning systems, and continued commitments to the social justice issues of inequality and individualisation can contribute to this goal.
Bayesian Methods and Universal Darwinism
Campbell, John
2010-01-01
Bayesian methods since the time of Laplace have been understood by their practitioners as closely aligned to the scientific method. Indeed a recent champion of Bayesian methods, E. T. Jaynes, titled his textbook on the subject Probability Theory: the Logic of Science. Many philosophers of science including Karl Popper and Donald Campbell have interpreted the evolution of Science as a Darwinian process consisting of a 'copy with selective retention' algorithm abstracted from Darwin's theory of Natural Selection. Arguments are presented for an isomorphism between Bayesian Methods and Darwinian processes. Universal Darwinism, as the term has been developed by Richard Dawkins, Daniel Dennett and Susan Blackmore, is the collection of scientific theories which explain the creation and evolution of their subject matter as due to the operation of Darwinian processes. These subject matters span the fields of atomic physics, chemistry, biology and the social sciences. The principle of Maximum Entropy states that system...
Universal Darwinism as a process of Bayesian inference
Campbell, John O
2016-01-01
Many of the mathematical frameworks describing natural selection are equivalent to Bayes Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment". Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description clo...
Bayesian inference for OPC modeling
Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.
2016-03-01
The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.
Wang, T; Diamandis, E P; Lane, A; Baines, A D
1994-02-01
Chloride measurements by ion-selective electrodes are vulnerable to interference by anions such as iodide, thiocyanate, nitrate, and bromide. We have found that the degree of interference of these anions on the Hitachi chemistry analyzer chloride electrode varies from electrode to electrode and this variation can even occur within the same lot of membrane. This variation is not dependent upon the length of time the cartridge has been in the analyzer because no correlation existed between the usage time and the electrode response to interfering ions. Neither is this variation due to the deterioration of the electrode because all electrodes tested had calibration slopes within the manufacturer's specification. Our study, however, showed that even after repeated exposure to a plasma sample containing 2 mM thiocyanate, the chloride electrode was still able to accurately measure the chloride in plasma without thiocyanate, thus confirming that a carryover effect does not exist from a previous thiocyanate-containing sample.
Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romanach, Stephanie; Watling, James I.; Mazzotti, Frank J.
2017-01-01
The data we used for this study include species occurrence data (n=15 species), climate data and predictions, an expert opinion questionnaire, and species masks that represented the model domain for each species. For this data release, we include the results of the expert opinion questionnaire and the species model domains (or masks). We developed an expert opinion questionnaire to gather information on expert opinion regarding the importance of climate variables in determining a species geographic range. The species masks, or model domains, were defined separately for each species using a variation of the “target-group” approach (Phillips et al. 2009), where the domain was determined using convex polygons including occurrence data for at least three phylogenetically related and similar species (Watling et al. 2012). The species occurrence data, climate data, and climate predictions are freely available online, and therefore not included in this data release. The species occurrence data were obtained from the online database Global Biodiversity Information Facility (GBIF; http://www.gbif.org/), and from scientific literature (Watling et al. 2011). Climate data were obtained from the WorldClim database (Hijmans et al. 2005) and climate predictions were obtained from the Center for Ocean-Atmosphere Prediction Studies (COAPS) at Florida State University (https://floridaclimateinstitute.org/resources/data-sets/regional-downscaling). See metadata for references.
Boehm Vock, Laura F; Reich, Brian J; Fuentes, Montserrat; Dominici, Francesca
2015-03-01
Multi-site time series studies have reported evidence of an association between short term exposure to particulate matter (PM) and adverse health effects, but the effect size varies across the United States. Variability in the effect may partially be due to differing community level exposure and health characteristics, but also due to the chemical composition of PM which is known to vary greatly by location and time. The objective of this article is to identify particularly harmful components of this chemical mixture. Because of the large number of highly-correlated components, we must incorporate some regularization into a statistical model. We assume that, at each spatial location, the regression coefficients come from a mixture model with the flavor of stochastic search variable selection, but utilize a copula to share information about variable inclusion and effect magnitude across locations. The model differs from current spatial variable selection techniques by accommodating both local and global variable selection. The model is used to study the association between fine PM (PM <2.5μm) components, measured at 115 counties nationally over the period 2000-2008, and cardiovascular emergency room admissions among Medicare patients.
Variable selection in PLSR and extensions to a multi-block setting for metabolomics data
DEFF Research Database (Denmark)
Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach
When applying LC-MS or NMR spectroscopy in metabolomics studies, high-dimensional data are generated and effective tools for variable selection are needed in order to detect the important metabolites. Methods based on sparsity combined with PLSR have recently attracted attention in the field...
The Effect of Listening to Specific Musical Genre Selections on Measures of Heart Rate Variability
Orman, Evelyn K.
2011-01-01
University students (N = 30) individually listened to the Billboard 100 top-ranked musical selection for their most and least liked musical genre. Two minutes of silence preceded each musical listening condition, and heart rate variability (HRV) was recorded throughout. All HRV measures decreased during music listening as compared with silence.…
Meta-Statistics for Variable Selection: The R Package BioMark
Directory of Open Access Journals (Sweden)
Ron Wehrens
2012-11-01
Full Text Available Biomarker identification is an ever more important topic in the life sciences. With the advent of measurement methodologies based on microarrays and mass spectrometry, thousands of variables are routinely being measured on complex biological samples. Often, the question is what makes two groups of samples different. Classical hypothesis testing suffers from the multiple testing problem; however, correcting for this often leads to a lack of power. In addition, choosing α cutoff levels remains somewhat arbitrary. Also in a regression context, a model depending on few but relevant variables will be more accurate and precise, and easier to interpret biologically.We propose an R package, BioMark, implementing two meta-statistics for variable selection. The first, higher criticism, presents a data-dependent selection threshold for significance, instead of a cookbook value of α = 0.05. It is applicable in all cases where two groups are compared. The second, stability selection, is more general, and can also be applied in a regression context. This approach uses repeated subsampling of the data in order to assess the variability of the model coefficients and selects those that remain consistently important. It is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data. BioMark also contains functionality for simulating data with specific characteristics for algorithm development and testing.
A QSAR Study of Environmental Estrogens Based on a Novel Variable Selection Method
Directory of Open Access Journals (Sweden)
Aiqian Zhang
2012-05-01
Full Text Available A large number of descriptors were employed to characterize the molecular structure of 53 natural, synthetic, and environmental chemicals which are suspected of disrupting endocrine functions by mimicking or antagonizing natural hormones and may thus pose a serious threat to the health of humans and wildlife. In this work, a robust quantitative structure-activity relationship (QSAR model with a novel variable selection method has been proposed for the effective estrogens. The variable selection method is based on variable interaction (VSMVI with leave-multiple-out cross validation (LMOCV to select the best subset. During variable selection, model construction and assessment, the Organization for Economic Co-operation and Development (OECD principles for regulation of QSAR acceptability were fully considered, such as using an unambiguous multiple-linear regression (MLR algorithm to build the model, using several validation methods to assessment the performance of the model, giving the define of applicability domain and analyzing the outliers with the results of molecular docking. The performance of the QSAR model indicates that the VSMVI is an effective, feasible and practical tool for rapid screening of the best subset from large molecular descriptors.
ESS++: a C++ objected-oriented algorithm for Bayesian stochastic search model exploration
Bottolo, L; Chadeau-Hyam, M.; Hastie, D. I.; Langley, S. R.; Petretto, E.; Tiret, L.; Tregouet, D.; Richardson, S
2011-01-01
Summary: ESS++ is a C++ implementation of a fully Bayesian variable selection approach for single and multiple response linear regression. ESS++ works well both when the number of observations is larger than the number of predictors and in the ‘large p, small n’ case. In the current version, ESS++ can handle several hundred observations, thousands of predictors and a few responses simultaneously. The core engine of ESS++ for the selection of relevant predictors is based on Evolutionary Monte ...
A note on the robustness of a full Bayesian method for nonignorable missing data analysis
Zhang, Zhiyong; Wang,Lijuan
2012-01-01
A full Bayesian method utilizing data augmentation and Gibbs sampling algorithms is presented for analyzing nonignorable missing data. The discussion focuses on a simplified selection model for regression analysis. Regardless of missing mechanisms, it is assumed that missingness only depends on the missing variable itself. Simulation results demonstrate that the simplified selection model can recover regression model parameters under both correctly specified situations and many misspecified s...
Bayesian nonparametric data analysis
Müller, Peter; Jara, Alejandro; Hanson, Tim
2015-01-01
This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.
Variable selection in the explorative analysis of several data blocks in metabolomics
DEFF Research Database (Denmark)
Karaman, İbrahim; Nørskov, Natalja; Yde, Christian Clement
highly correlated data sets in one integrated approach. Due to the high number of variables in data sets from metabolomics (both raw data and after peak picking) the selection of important variables in an explorative analysis is difficult, especially when different data sets of metabolomics data need...... to be related. Tools for the handling of mental overflow minimising false discovery rates both by using statistical and biological validation in an integrative approach are needed. In this paper different strategies for variable selection were considered with respect to false discovery and the possibility...... for biological validation. The data set used in this study is metabolomics data from an animal intervention study. The aim of the metabolomics study was to investigate the metabolic profile in pigs fed various cereal fractions with special attention to the metabolism of lignans using NMR and LC-MS based...
Knowledge-based variable selection for learning rules from proteomic data
Directory of Open Access Journals (Sweden)
Hogan William R
2009-09-01
Full Text Available Abstract Background The incorporation of biological knowledge can enhance the analysis of biomedical data. We present a novel method that uses a proteomic knowledge base to enhance the performance of a rule-learning algorithm in identifying putative biomarkers of disease from high-dimensional proteomic mass spectral data. In particular, we use the Empirical Proteomics Ontology Knowledge Base (EPO-KB that contains previously identified and validated proteomic biomarkers to select m/zs in a proteomic dataset prior to analysis to increase performance. Results We show that using EPO-KB as a pre-processing method, specifically selecting all biomarkers found only in the biofluid of the proteomic dataset, reduces the dimensionality by 95% and provides a statistically significantly greater increase in performance over no variable selection and random variable selection. Conclusion Knowledge-based variable selection even with a sparsely-populated resource such as the EPO-KB increases overall performance of rule-learning for disease classification from high-dimensional proteomic mass spectra.
Penalized variable selection procedure for Cox models with semiparametric relative risk
Du, Pang; Liang, Hua; 10.1214/09-AOS780
2010-01-01
We study the Cox models with semiparametric relative risk, which can be partially linear with one nonparametric component, or multiple additive or nonadditive nonparametric components. A penalized partial likelihood procedure is proposed to simultaneously estimate the parameters and select variables for both the parametric and the nonparametric parts. Two penalties are applied sequentially. The first penalty, governing the smoothness of the multivariate nonlinear covariate effect function, provides a smoothing spline ANOVA framework that is exploited to derive an empirical model selection tool for the nonparametric part. The second penalty, either the smoothly-clipped-absolute-deviation (SCAD) penalty or the adaptive LASSO penalty, achieves variable selection in the parametric part. We show that the resulting estimator of the parametric part possesses the oracle property, and that the estimator of the nonparametric part achieves the optimal rate of convergence. The proposed procedures are shown to work well i...
Bayesian inference in geomagnetism
Backus, George E.
1988-01-01
The inverse problem in empirical geomagnetic modeling is investigated, with critical examination of recently published studies. Particular attention is given to the use of Bayesian inference (BI) to select the damping parameter lambda in the uniqueness portion of the inverse problem. The mathematical bases of BI and stochastic inversion are explored, with consideration of bound-softening problems and resolution in linear Gaussian BI. The problem of estimating the radial magnetic field B(r) at the earth core-mantle boundary from surface and satellite measurements is then analyzed in detail, with specific attention to the selection of lambda in the studies of Gubbins (1983) and Gubbins and Bloxham (1985). It is argued that the selection method is inappropriate and leads to lambda values much larger than those that would result if a reasonable bound on the heat flow at the CMB were assumed.
Friedel, Matthias; Patz, Claus-Dieter; Dietrich, Helmut
2013-12-15
For more than a decade, Fourier-transform infrared (FTIR) spectroscopy combined with partial least squares (PLS) regression has been used as a fast and reliable method for simultaneous estimation of multiple parameters in wine. In this study, different FTIR instruments (single bounce attenuated total reflection, transmission with variable and defined pathlength) and different variable selection techniques (full spectrum PLS, genetic algorithm PLS, interval PLS, principal variable PLS) were compared on an identical sample set of international wines and ten wine parameters. Results suggest that the single bounce attenuated total reflection technique is well suited for the analysis of ethanol, relative density and sugars, but less accurate in the analysis of organic acid content. The transmission instrument with variable pathlength shows good validation results for the analysis of organic acids, but less accurate results for the analysis of ethanol and relative density as compared to the other instruments. The transmission instrument with defined pathlength was well suited for the analysis for all parameters investigated in this study. Variable selection improved model robustness and calibration results, with genetic algorithm PLS being the most effective technique.
Attention in a bayesian framework
DEFF Research Database (Denmark)
Whiteley, Louise Emma; Sahani, Maneesh
2012-01-01
The behavioral phenomena of sensory attention are thought to reflect the allocation of a limited processing resource, but there is little consensus on the nature of the resource or why it should be limited. Here we argue that a fundamental bottleneck emerges naturally within Bayesian models...... of perception, and use this observation to frame a new computational account of the need for, and action of, attention - unifying diverse attentional phenomena in a way that goes beyond previous inferential, probabilistic and Bayesian models. Attentional effects are most evident in cluttered environments......, and include both selective phenomena, where attention is invoked by cues that point to particular stimuli, and integrative phenomena, where attention is invoked dynamically by endogenous processing. However, most previous Bayesian accounts of attention have focused on describing relatively simple experimental...
Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao
2017-03-01
Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction.
Bayesian artificial intelligence
Korb, Kevin B
2003-01-01
As the power of Bayesian techniques has become more fully realized, the field of artificial intelligence has embraced Bayesian methodology and integrated it to the point where an introduction to Bayesian techniques is now a core course in many computer science programs. Unlike other books on the subject, Bayesian Artificial Intelligence keeps mathematical detail to a minimum and covers a broad range of topics. The authors integrate all of Bayesian net technology and learning Bayesian net technology and apply them both to knowledge engineering. They emphasize understanding and intuition but also provide the algorithms and technical background needed for applications. Software, exercises, and solutions are available on the authors' website.
Bayesian artificial intelligence
Korb, Kevin B
2010-01-01
Updated and expanded, Bayesian Artificial Intelligence, Second Edition provides a practical and accessible introduction to the main concepts, foundation, and applications of Bayesian networks. It focuses on both the causal discovery of networks and Bayesian inference procedures. Adopting a causal interpretation of Bayesian networks, the authors discuss the use of Bayesian networks for causal modeling. They also draw on their own applied research to illustrate various applications of the technology.New to the Second EditionNew chapter on Bayesian network classifiersNew section on object-oriente
Genuer, Robin; Toussile, Wilson
2011-01-01
Malaria control strategies aiming at reducing disease transmission intensity may impact both oocyst intensity and infection prevalence in the mosquito vector. Thus far, mathematical models failed to identify a clear relationship between Plasmodium falciparum gametocytes and their infectiousness to mosquitoes. Natural isolates of gametocytes are genetically diverse and biologically complex. Infectiousness to mosquitoes relies on multiple parameters such as density, sex-ratio, maturity, parasite genotypes and host immune factors. In this article, we investigated how density and genetic diversity of gametocytes impact on the success of transmission in the mosquito vector. We analyzed data for which the number of covariates plus attendant interactions is at least of order of the sample size, precluding usage of classical models such as general linear models. We then considered the variable importance from random forests to address the problem of selecting the most influent variables. The selected covariates were ...
Fernandez-Lozano, C; Canto, C; Gestal, M; Andrade-Garda, J M; Rabuñal, J R; Dorado, J; Pazos, A
2013-01-01
Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected.
Alpkaya, Ufuk
2013-01-01
The purpose of this study is to determine the influence of gymnastics training integrated with physical education courses on selected motor performance variables in seven year old girls. Subjects were divided into two groups: (1) control group (N=15, X=7.56 plus or minus 0.46 year old); (2) gymnastics group (N=16, X=7.60 plus or minus 0.50 year…
Compiling Relational Bayesian Networks for Exact Inference
DEFF Research Database (Denmark)
Jaeger, Manfred; Darwiche, Adnan; Chavira, Mark
2006-01-01
We describe in this paper a system for exact inference with relational Bayesian networks as defined in the publicly available PRIMULA tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference...... by evaluating and differentiating these circuits in time linear in their size. We report on experimental results showing successful compilation and efficient inference on relational Bayesian networks, whose PRIMULA--generated propositional instances have thousands of variables, and whose jointrees have clusters...
Selected Macroeconomic Variables and Stock Market Movements: Empirical evidence from Thailand
Directory of Open Access Journals (Sweden)
Joseph Ato Forson
2014-06-01
Full Text Available This paper investigates and analyzes the long-run equilibrium relationship between the Thai stock Exchange Index (SETI and selected macroeconomic variables using monthly time series data that cover a 20-year period from January 1990 to December 2009. The following macroeconomic variables are included in our analysis: money supply (MS, the consumer price index (CPI, interest rate (IR and the industrial production index (IP (as a proxy for GDP. Our findings prove that the SET Index and the selected macroeconomic variables are cointegrated at I (1 and have a significant equilibrium relationship over the long run. Money supply demonstrates a strong positive relationship with the SET Index over the long run, whereas the industrial production index and consumer price index show negative long-run relationships with the SET Index. Furthermore, in non-equilibrium situations, the error correction mechanism suggests that the consumer price index, industrial production index and money supply each contribute in some way to restore equilibrium. In addition, using Toda and Yamamoto’s augmented Granger causality test, we identify a bi-causal relationship between industrial production and money supply and unilateral causal relationships between CPI and IR, IP and CPI, MS and CPI, and IP and SETI, indicating that all of these variables are sensitive to Thai stock market movements. The policy implications of these findings are also discussed.
Applied Bayesian Hierarchical Methods
Congdon, Peter D
2010-01-01
Bayesian methods facilitate the analysis of complex models and data structures. Emphasizing data applications, alternative modeling specifications, and computer implementation, this book provides a practical overview of methods for Bayesian analysis of hierarchical models.
Gelman, Andrew; Stern, Hal S; Dunson, David B; Vehtari, Aki; Rubin, Donald B
2013-01-01
FUNDAMENTALS OF BAYESIAN INFERENCEProbability and InferenceSingle-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian ApproachesHierarchical ModelsFUNDAMENTALS OF BAYESIAN DATA ANALYSISModel Checking Evaluating, Comparing, and Expanding ModelsModeling Accounting for Data Collection Decision AnalysisADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional ApproximationsREGRESSION MODELS Introduction to Regression Models Hierarchical Linear
Directory of Open Access Journals (Sweden)
Guillermo A. Cañadas
2010-12-01
Full Text Available Burnout syndrome has a high incidence among professional healthcare and social workers. This leads to deterioration in the quality of their working life and affects their health, the organization where they work and, via their clients, society itself. Given these serious effects, many studies have investigated this construct and identified groups at increased risk of the syndrome. The present work has 2 main aims: to compare burnout levels in potential risk groups among professional healthcare workers; and to compare them using standard and Bayesian statistical analysis. The sample consisted of 108 psycho-social care workers based at 2 centers run by the Granada Council in Spain. All participants, anonymously and individually, filled in a booklet that included questions on personal information and the Spanish adaptation of the Maslach Burnout Inventory (MBI. Standard and Bayesian analysis of variance were used to identify the risk factors associated with different levels of burnout. It was found that the information provided by the Bayesian procedure complemented that provided by the standard procedure.
Variational Bayesian Approximation methods for inverse problems
Mohammad-Djafari, Ali
2012-09-01
Variational Bayesian Approximation (VBA) methods are recent tools for effective Bayesian computations. In this paper, these tools are used for inverse problems where the prior models include hidden variables and where where the estimation of the hyper parameters has also to be addressed. In particular two specific prior models (Student-t and mixture of Gaussian models) are considered and details of the algorithms are given.
Bayesian Analysis of High Dimensional Classification
Mukhopadhyay, Subhadeep; Liang, Faming
2009-12-01
Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.
The Use of Variable Q1 Isolation Windows Improves Selectivity in LC-SWATH-MS Acquisition.
Zhang, Ying; Bilbao, Aivett; Bruderer, Tobias; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard; Varesio, Emmanuel
2015-10-02
As tryptic peptides and metabolites are not equally distributed along the mass range, the probability of cross fragment ion interference is higher in certain windows when fixed Q1 SWATH windows are applied. We evaluated the benefits of utilizing variable Q1 SWATH windows with regards to selectivity improvement. Variable windows based on equalizing the distribution of either the precursor ion population (PIP) or the total ion current (TIC) within each window were generated by an in-house software, swathTUNER. These two variable Q1 SWATH window strategies outperformed, with respect to quantification and identification, the basic approach using a fixed window width (FIX) for proteomic profiling of human monocyte-derived dendritic cells (MDDCs). Thus, 13.8 and 8.4% additional peptide precursors, which resulted in 13.1 and 10.0% more proteins, were confidently identified by SWATH using the strategy PIP and TIC, respectively, in the MDDC proteomic sample. On the basis of the spectral library purity score, some improvement warranted by variable Q1 windows was also observed, albeit to a lesser extent, in the metabolomic profiling of human urine. We show that the novel concept of "scheduled SWATH" proposed here, which incorporates (i) variable isolation windows and (ii) precursor retention time segmentation further improves both peptide and metabolite identifications.
DEFF Research Database (Denmark)
Christensen, Lars P.B.; Larsen, Jan
2006-01-01
A general Variational Bayesian framework for iterative data and parameter estimation for coherent detection is introduced as a generalization of the EM-algorithm. Explicit solutions are given for MIMO channel estimation with Gaussian prior and noise covariance estimation with inverse-Wishart prio...
Social variables exert selective pressures in the evolution and form of primate mimetic musculature.
Burrows, Anne M; Li, Ly; Waller, Bridget M; Micheletta, Jerome
2016-04-01
Mammals use their faces in social interactions more so than any other vertebrates. Primates are an extreme among most mammals in their complex, direct, lifelong social interactions and their frequent use of facial displays is a means of proximate visual communication with conspecifics. The available repertoire of facial displays is primarily controlled by mimetic musculature, the muscles that move the face. The form of these muscles is, in turn, limited by and influenced by phylogenetic inertia but here we use examples, both morphological and physiological, to illustrate the influence that social variables may exert on the evolution and form of mimetic musculature among primates. Ecomorphology is concerned with the adaptive responses of morphology to various ecological variables such as diet, foliage density, predation pressures, and time of day activity. We present evidence that social variables also exert selective pressures on morphology, specifically using mimetic muscles among primates as an example. Social variables include group size, dominance 'style', and mating systems. We present two case studies to illustrate the potential influence of social behavior on adaptive morphology of mimetic musculature in primates: (1) gross morphology of the mimetic muscles around the external ear in closely related species of macaque (Macaca mulatta and Macaca nigra) characterized by varying dominance styles and (2) comparative physiology of the orbicularis oris muscle among select ape species. This muscle is used in both facial displays/expressions and in vocalizations/human speech. We present qualitative observations of myosin fiber-type distribution in this muscle of siamang (Symphalangus syndactylus), chimpanzee (Pan troglodytes), and human to demonstrate the potential influence of visual and auditory communication on muscle physiology. In sum, ecomorphologists should be aware of social selective pressures as well as ecological ones, and that observed morphology might
Approximate Bayesian computation.
Directory of Open Access Journals (Sweden)
Mikael Sunnåker
Full Text Available Approximate Bayesian computation (ABC constitutes a class of computational methods rooted in Bayesian statistics. In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences (e.g., in population genetics, ecology, epidemiology, and systems biology.
Variable selection for distribution-free models for longitudinal zero-inflated count responses.
Chen, Tian; Wu, Pan; Tang, Wan; Zhang, Hui; Feng, Changyong; Kowalski, Jeanne; Tu, Xin M
2016-07-20
Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
Letícia da Silveira Pinheiro
2012-06-01
Full Text Available The objective of this work was to determine the effect of male sterility or manual recombination on genetic variability of rice recurrent selection populations. The populations CNA-IRAT 4, with a gene for male sterility, and CNA 12, which was manually recombined, were evaluated. Genetic variability among selection cycles was estimated using14 simple sequence repeat (SSR markers. A total of 926 plants were analyzed, including ten genitors and 180 individuals from each of the evaluated cycles (1, 2 and 5 of the population CNA-IRAT 4, and 16 genitors and 180 individuals from each of the cycles (1 and 2 of CNA 12. The analysis allowed the identification of alleles not present among the genitors for both populations, in all cycles, especially for the CNA-IRAT 4 population. These alleles resulted from unwanted fertilization with genotypes that were not originally part of the populations. The parameters of Wright's F-statistic (F IS and F IT indicated that the manual recombination expands the genetic variability of the CNA 12 population, whereas male sterility reduces the one of CNA-IRAT 4.
Most frugal explanations in Bayesian networks
Kwisthout, J.H.P.
2015-01-01
Inferring the most probable explanation to a set of variables, given a partial observation of the remaining variables, is one of the canonical computational problems in Bayesian networks, with widespread applications in AI and beyond. This problem, known as MAP, is computationally intractable (NP-ha
Selection of Variable in Sampling Investigation%抽样调查中变量选择
Institute of Scientific and Technical Information of China (English)
陶凤梅; 杨启昌; 胡锡衡
2002-01-01
在抽样调查中,问卷的设计者常常尽可能多地设计变量,以保证不丢失有用的信息.但是,问卷中含有太多变量会减少问卷的回收率,进而导致分析结果.本文通过对应分析的方法介绍了幼儿主体性发展的变量选择,并分析了其合理性.%In sampling investigation,the designer of questionnaire usually attempt to collect the questions as many as possible,so as to avoid losing some useful information.Whereas,the questionnaire including too many questions might reduce the ratio of receiving answer and make trouble in analysing the investigation results.In this paper,we select the variables of questionnaire for infant activity development by the method of variable selection in correspondence analysis and analyze the rationality of the selection.
Nikoloulopoulos, Aristidis K
2016-06-30
The method of generalized estimating equations (GEE) is popular in the biostatistics literature for analyzing longitudinal binary and count data. It assumes a generalized linear model for the outcome variable, and a working correlation among repeated measurements. In this paper, we introduce a viable competitor: the weighted scores method for generalized linear model margins. We weight the univariate score equations using a working discretized multivariate normal model that is a proper multivariate model. Because the weighted scores method is a parametric method based on likelihood, we propose composite likelihood information criteria as an intermediate step for model selection. The same criteria can be used for both correlation structure and variable selection. Simulations studies and the application example show that our method outperforms other existing model selection methods in GEE. From the example, it can be seen that our methods not only improve on GEE in terms of interpretability and efficiency but also can change the inferential conclusions with respect to GEE. Copyright © 2016 John Wiley & Sons, Ltd.
Bayesian Calibration of Microsimulation Models.
Rutter, Carolyn M; Miglioretti, Diana L; Savarino, James E
2009-12-01
Microsimulation models that describe disease processes synthesize information from multiple sources and can be used to estimate the effects of screening and treatment on cancer incidence and mortality at a population level. These models are characterized by simulation of individual event histories for an idealized population of interest. Microsimulation models are complex and invariably include parameters that are not well informed by existing data. Therefore, a key component of model development is the choice of parameter values. Microsimulation model parameter values are selected to reproduce expected or known results though the process of model calibration. Calibration may be done by perturbing model parameters one at a time or by using a search algorithm. As an alternative, we propose a Bayesian method to calibrate microsimulation models that uses Markov chain Monte Carlo. We show that this approach converges to the target distribution and use a simulation study to demonstrate its finite-sample performance. Although computationally intensive, this approach has several advantages over previously proposed methods, including the use of statistical criteria to select parameter values, simultaneous calibration of multiple parameters to multiple data sources, incorporation of information via prior distributions, description of parameter identifiability, and the ability to obtain interval estimates of model parameters. We develop a microsimulation model for colorectal cancer and use our proposed method to calibrate model parameters. The microsimulation model provides a good fit to the calibration data. We find evidence that some parameters are identified primarily through prior distributions. Our results underscore the need to incorporate multiple sources of variability (i.e., due to calibration data, unknown parameters, and estimated parameters and predicted values) when calibrating and applying microsimulation models.
Effect of Selected Organic Acids on Cadmium Sorption by Variable-and Permanent-Charge Soils
Institute of Scientific and Technical Information of China (English)
HU Hong-Qing; LIU Hua-Liang; HE Ji-Zheng; HUANG Qiao-Yun
2007-01-01
Batch equilibrium experiments were conducted to investigate cadmium (Cd) sorption by two permanent-charge soils, a yellow-cinnamon soil and a yellow-brown soil, and two variable-charge soils, a red soil and a latosol, with addition of selected organic acids (acetate, tartrate, and citrate). Results showed that with an increase in acetate concentrations from 0 to 3.0 mmol L-1, Cd sorption percentage by the yellow-cinnamon soil, the yellow-brown soil, and the latosol decreased. The sorption percentage of Cd by the yellow-cinnamon soil and generally the yellow-brown soil (permanent-charge soils)decreased with an increase in tartrate concentration, but increased at low tartrate concentrations for the red soil and the latosol. Curves of percentage of Cd sorption for citrate were similar to those for tartrate. For the variable-charge soils with tartrate and citrate, there were obvious peaks in Cd sorption percentage. These peaks, where organic acids had maximum influence, changed with soil type, and were at a higher organic acid concentration for the variable-charge soils than for the permanent charge soils. Addition of cadmium after tartrate adsorption resulted in higher sorption increase for the variable-charge soils than permanent-charge soils. When tartrate and Cd solution were added together, sorption of Cd decreased with tartrate concentration for the yellow-brown soil, but increased at low tartrate concentrations and then decreased with tartrate concentration for the red soil and the latosol.
Implementation of Phonetic Context Variable Length Unit Selection Module for Malay Text to Speech
Directory of Open Access Journals (Sweden)
Tian-Swee Tan
2008-01-01
Full Text Available Problem statement: The main problem with current Malay Text-To-Speech (MTTS synthesis system is the poor quality of the generated speech sound due to the inability of traditional TTS system to provide multiple choices of unit for generating more accurate synthesized speech. Approach: This study proposes a phonetic context variable length unit selection MTTS system that is capable of providing more natural and accurate unit selection for synthesized speech. It implemented a phonetic context algorithm for unit selection for MTTS. The unit selection method (without phonetic context may encounter the problem of selecting the speech unit from different sources and affect the quality of concatenation. This study proposes the design of speech corpus and unit selection method according to phonetic context so that it can select a string of continuous phoneme from same source instead of individual phoneme from different sources. This can further reduce the concatenation point and increase the quality of concatenation. The speech corpus was transcribed according to phonetic context to preserve the phonetic information. This method utilizes word base concatenation method. Firstly it will search through the speech corpus for the target word, if the target is found; it will be used for concatenation. If the word does not exist, then it will construct the words from phoneme sequence. Results: This system had been tested with 40 participants in Mean Opinion Score (MOS listening test with the average rates for naturalness, pronunciation and intelligibility are 3.9, 4.1 and 3.9. Conclusion/Recommendation: Through this study, a very first version of Corpus-based MTTS has been designed; it has improved the naturalness, pronunciation and intelligibility of synthetic speech. But it still has some lacking that need to be perfected such as the prosody module to support the phrasing analysis and intonation of input text to match with the waveform modifier.
Bayesian Modeling of a Human MMORPG Player
Synnaeve, Gabriel
2010-01-01
This paper describes an application of Bayesian programming to the control of an autonomous avatar in a multiplayer role-playing game (the example is based on World of Warcraft). We model a particular task, which consists of choosing what to do and to select which target in a situation where allies and foes are present. We explain the model in Bayesian programming and show how we could learn the conditional probabilities from data gathered during human-played sessions.
Bayesian Modeling of a Human MMORPG Player
Synnaeve, Gabriel; Bessière, Pierre
2011-03-01
This paper describes an application of Bayesian programming to the control of an autonomous avatar in a multiplayer role-playing game (the example is based on World of Warcraft). We model a particular task, which consists of choosing what to do and to select which target in a situation where allies and foes are present. We explain the model in Bayesian programming and show how we could learn the conditional probabilities from data gathered during human-played sessions.
Sparse Event Modeling with Hierarchical Bayesian Kernel Methods
2016-01-05
SECURITY CLASSIFICATION OF: The research objective of this proposal was to develop a predictive Bayesian kernel approach to model count data based on...several predictive variables. Such an approach, which we refer to as the Poisson Bayesian kernel model, is able to model the rate of occurrence of... kernel methods made use of: (i) the Bayesian property of improving predictive accuracy as data are dynamically obtained, and (ii) the kernel function
An Approach with Support Vector Machine using Variable Features Selection on Breast Cancer Prognosis
Directory of Open Access Journals (Sweden)
Sandeep Chaurasia
2013-09-01
Full Text Available Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of machine learning. In this paper we have used an approach by using support vector machine classifier to construct a model that is useful for the breast cancer survivability prediction. We have used both 5 cross and 10 cross validation of variable selection on input feature vectors and the performance measurement through bio-learning class performance while measuring AUC, specificity and sensitivity. The performance of the SVM is much better than the other machine learning classifier.
Adaptive variable selection for extended Nijboer-Zernike aberration retrieval via lasso
Wang, Bin; Diao, Huai-An; Guo, Jianhua; Liu, Xiyang; Wu, Yuanhao
2017-02-01
In this paper, we propose extended Nijboer-Zernike (ENZ) method for aberration retrieval by incorporating lasso variable selection method which can improve the accuracy of aberration retrieval. The proposed model is computed by the state-of-art algorithm of the Bregman iterative algorithm (Bregman, 1967 [1]; Cai et al., 2008 [2]; Yin et al., 2008 [3]) for L1 minimization problem with adaptive regularized parameter choice based on the strategy (Ito et al., 2011 [4]). Numerical simulations for real world and simulated phase data validate the effectiveness of the proposed ENZ AR via lasso.
Soft Sensing Modelling Based on Optimal Selection of Secondary Variables and Its Application
Institute of Scientific and Technical Information of China (English)
Qi Li; Cheng Shao
2009-01-01
The composition of the distillation column is a very important quality value in refineries, unfortunately, few hardware sensors are available on-line to measure the distillation compositions. In this paper, a novel method using sensitivity matrix analysis and kernel ridge regression (KRR) to implement on-line soft sensing of distillation compositions is proposed. In this approach, the sensitivity matrix analysis is presented to select the most suitable secondary variables to be used as the soft sensor's input. The KRR is used to build the composition soft sensor. Application to a simulated distillation column demonstrates the effectiveness of the method.
Sun, Jianan; Chen, Yunxiao; Liu, Jingchen; Ying, Zhiliang; Xin, Tao
2016-12-01
We develop a latent variable selection method for multidimensional item response theory models. The proposed method identifies latent traits probed by items of a multidimensional test. Its basic strategy is to impose an [Formula: see text] penalty term to the log-likelihood. The computation is carried out by the expectation-maximization algorithm combined with the coordinate descent algorithm. Simulation studies show that the resulting estimator provides an effective way in correctly identifying the latent structures. The method is applied to a real dataset involving the Eysenck Personality Questionnaire.
Pan, Yangbin; Mao, Weiping; Liu, Xuanxuan; Xu, Chong; He, Zhijuan; Wang, Wenqian; Yan, Hao
2012-11-01
We applied a ribosome display technique to a mouse single chain variable fragment (scFv) library to select scFvs specific for the inducible costimulator (ICOS). mRNA was isolated from the spleens of BALB/c mice immunized with ICOS protein. Heavy and κ chain genes (VH and κ) were amplified separately by reverse transcriptase polymerase chain reaction, and the anti-ICOS VH/κ chain ribosome display library was constructed with a special flexible linker by overlap extension PCR. The VH/κ chain library was transcribed and translated in vitro using a rabbit reticulocyte lysate system. Then, antibody-ribosome-mRNA complexes were produced and panned against ICOS protein under appropriate conditions. However, in order to isolate specific scFvs for ICOS, negative selection using CD28 was carried out before three rounds of positive selection on ICOS. After three rounds of panning, the selected scFv DNAs were cloned into pET43.1a and detected by SDS-PAGE. Then, enzyme-linked immunosorbent assay showed that we successfully constructed a native ribosome display library, and among seven clones, clone 5 had the highest affinity for the ICOS and low for the CD28. Anti-ICOS scFvs are assessed for binding specificity and affinity and may provide the potential for development of the humanized and acute and chronic allograft rejection.
Space Shuttle RTOS Bayesian Network
Morris, A. Terry; Beling, Peter A.
2001-01-01
With shrinking budgets and the requirements to increase reliability and operational life of the existing orbiter fleet, NASA has proposed various upgrades for the Space Shuttle that are consistent with national space policy. The cockpit avionics upgrade (CAU), a high priority item, has been selected as the next major upgrade. The primary functions of cockpit avionics include flight control, guidance and navigation, communication, and orbiter landing support. Secondary functions include the provision of operational services for non-avionics systems such as data handling for the payloads and caution and warning alerts to the crew. Recently, a process to selection the optimal commercial-off-the-shelf (COTS) real-time operating system (RTOS) for the CAU was conducted by United Space Alliance (USA) Corporation, which is a joint venture between Boeing and Lockheed Martin, the prime contractor for space shuttle operations. In order to independently assess the RTOS selection, NASA has used the Bayesian network-based scoring methodology described in this paper. Our two-stage methodology addresses the issue of RTOS acceptability by incorporating functional, performance and non-functional software measures related to reliability, interoperability, certifiability, efficiency, correctness, business, legal, product history, cost and life cycle. The first stage of the methodology involves obtaining scores for the various measures using a Bayesian network. The Bayesian network incorporates the causal relationships between the various and often competing measures of interest while also assisting the inherently complex decision analysis process with its ability to reason under uncertainty. The structure and selection of prior probabilities for the network is extracted from experts in the field of real-time operating systems. Scores for the various measures are computed using Bayesian probability. In the second stage, multi-criteria trade-off analyses are performed between the scores
DEFF Research Database (Denmark)
Dashab, Golam Reza; Kadri, Naveen Kumar; Mahdi Shariati, Mohammad;
2012-01-01
) Mixed model analysis (MMA), 2) Random haplotype model (RHM), 3) Genealogy-based mixed model (GENMIX), and 4) Bayesian variable selection (BVS). The data consisted of phenotypes of 2000 animals from 20 sire families and were genotyped with 9990 SNPs on five chromosomes. Results: Out of the eight...
A Bayesian approach to linear regression in astronomy
Sereno, Mauro
2015-01-01
Linear regression is common in astronomical analyses. I discuss a Bayesian hierarchical modeling of data with heteroscedastic and possibly correlated measurement errors and intrinsic scatter. The method fully accounts for time evolution. The slope, the normalization, and the intrinsic scatter of the relation can evolve with the redshift. The intrinsic distribution of the independent variable is approximated using a mixture of Gaussian distributions whose means and standard deviations depend on time. The method can address scatter in the measured independent variable (a kind of Eddington bias), selection effects in the response variable (Malmquist bias), and departure from linearity in form of a knee. I tested the method with toy models and simulations and quantified the effect of biases and inefficient modeling. The R-package LIRA (LInear Regression in Astronomy) is made available to perform the regression.
Directory of Open Access Journals (Sweden)
Pawłowski Dominik
2014-12-01
Full Text Available Geostatistical methods for 2D and 3D modelling spatial variability of selected physicochemical properties of biogenic sediments were applied to a small valley mire in order to identify the processes that lead to the formation of various types of peat. A sequential Gaussian simulation was performed to reproduce the statistical distribution of the input data (pH and organic matter and their semivariances, as well as to honouring of data values, yielding more ‘realistic’ models that show microscale spatial variability, despite the fact that the input sample cores were sparsely distributed in the X-Y space of the study area. The stratigraphy of peat deposits in the Ldzań mire shows a record of long-term evolution of water conditions, which is associated with the variability in water supply over time. Ldzań is a fen (a rheotrophic mire with a through-flow of groundwater. Additionally, the vicinity of the Grabia River is marked by seasonal inundations of the southwest part of the mire and increased participation of mineral matter in the peat. In turn, the upper peat layers of some of the central part of Ldzań mire are rather spongy, and these peat-forming phytocoenoses probably formed during permanent waterlogging.
Gayou, Olivier; Das, Shiva K; Zhou, Su-Min; Marks, Lawrence B; Parda, David S; Miften, Moyed
2008-12-01
A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.
A variant of sparse partial least squares for variable selection and data exploration
Directory of Open Access Journals (Sweden)
Megan Jodene Olson Hunt
2014-03-01
Full Text Available When data are sparse and/or predictors multicollinear, current implementation of sparse partial least squares (SPLS does not give estimates for non-selected predictors nor provide a measure of inference. In response, an approach termed all-possible SPLS is proposed, which fits a SPLS model for all tuning parameter values across a set grid. Noted is the percentage of time a given predictor is chosen, as well as the average non-zero parameter estimate. Using a large number of multicollinear predictors, simulation confirmed variables not associated with the outcome were least likely to be chosen as sparsity increased across the grid of tuning parameters, while the opposite was true for those strongly associated. Lastly, variables with a weak association were chosen more often than those with no association, but less often than those with a strong relationship to the outcome. Similarly, predictors most strongly related to the outcome had the largest average parameter estimate magnitude, followed by those with a weak relationship, followed by those with no relationship. Across two independent studies regarding the relationship between volumetric MRI measures and a cognitive test score, this method confirmed a priori hypotheses about which brain regions would be selected most often and have the largest average parameter estimates. In conclusion, the percentage of time a predictor is chosen is a useful measure for ordering the strength of the relationship between the independent and dependent variables, serving as a form of inference. The average parameter estimates give further insight regarding the direction and strength of association. As a result, all-possible SPLS gives more information than the dichotomous output of traditional SPLS, making it useful when undertaking data exploration and hypothesis generation for a large number of potential predictors.
A variant of sparse partial least squares for variable selection and data exploration.
Olson Hunt, Megan J; Weissfeld, Lisa; Boudreau, Robert M; Aizenstein, Howard; Newman, Anne B; Simonsick, Eleanor M; Van Domelen, Dane R; Thomas, Fridtjof; Yaffe, Kristine; Rosano, Caterina
2014-01-01
When data are sparse and/or predictors multicollinear, current implementation of sparse partial least squares (SPLS) does not give estimates for non-selected predictors nor provide a measure of inference. In response, an approach termed "all-possible" SPLS is proposed, which fits a SPLS model for all tuning parameter values across a set grid. Noted is the percentage of time a given predictor is chosen, as well as the average non-zero parameter estimate. Using a "large" number of multicollinear predictors, simulation confirmed variables not associated with the outcome were least likely to be chosen as sparsity increased across the grid of tuning parameters, while the opposite was true for those strongly associated. Lastly, variables with a weak association were chosen more often than those with no association, but less often than those with a strong relationship to the outcome. Similarly, predictors most strongly related to the outcome had the largest average parameter estimate magnitude, followed by those with a weak relationship, followed by those with no relationship. Across two independent studies regarding the relationship between volumetric MRI measures and a cognitive test score, this method confirmed a priori hypotheses about which brain regions would be selected most often and have the largest average parameter estimates. In conclusion, the percentage of time a predictor is chosen is a useful measure for ordering the strength of the relationship between the independent and dependent variables, serving as a form of inference. The average parameter estimates give further insight regarding the direction and strength of association. As a result, all-possible SPLS gives more information than the dichotomous output of traditional SPLS, making it useful when undertaking data exploration and hypothesis generation for a large number of potential predictors.
Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis
Directory of Open Access Journals (Sweden)
Ueki Masao
2012-05-01
Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.
Selection and affinity maturation of IgNAR variable domains targeting Plasmodium falciparum AMA1.
Nuttall, Stewart D; Humberstone, Karen S; Krishnan, Usha V; Carmichael, Jennifer A; Doughty, Larissa; Hattarki, Meghan; Coley, Andrew M; Casey, Joanne L; Anders, Robin F; Foley, Michael; Irving, Robert A; Hudson, Peter J
2004-04-01
The new antigen receptor (IgNAR) is an antibody unique to sharks and consists of a disulphide-bonded dimer of two protein chains, each containing a single variable and five constant domains. The individual variable (V(NAR)) domains bind antigen independently, and are candidates for the smallest antibody-based immune recognition units. We have previously produced a library of V(NAR) domains with extensive variability in the CDR1 and CDR3 loops displayed on the surface of bacteriophage. Now, to test the efficacy of this library, and further explore the dynamics of V(NAR) antigen binding we have performed selection experiments against an infectious disease target, the malarial Apical Membrane Antigen-1 (AMA1) from Plasmodium falciparum. Two related V(NAR) clones were selected, characterized by long (16- and 18-residue) CDR3 loops. These recombinant V(NAR)s could be harvested at yields approaching 5mg/L of monomeric protein from the E. coli periplasm, and bound AMA1 with nanomolar affinities (K(D)= approximately 2 x 10(-7) M). One clone, designated 12Y-2, was affinity-matured by error prone PCR, resulting in several variants with mutations mapping to the CDR1 and CDR3 loops. The best of these variants showed approximately 10-fold enhanced affinity over 12Y-2 and was Plasmodium falciparum strain-specific. Importantly, we demonstrated that this monovalent V(NAR) co-localized with rabbit anti-AMA1 antisera on the surface of malarial parasites and thus may have utility in diagnostic applications.
Bayesian geostatistical modeling of leishmaniasis incidence in Brazil.
Directory of Open Access Journals (Sweden)
Dimitrios-Alexios Karagiannis-Voules
Full Text Available BACKGROUND: Leishmaniasis is endemic in 98 countries with an estimated 350 million people at risk and approximately 2 million cases annually. Brazil is one of the most severely affected countries. METHODOLOGY: We applied Bayesian geostatistical negative binomial models to analyze reported incidence data of cutaneous and visceral leishmaniasis in Brazil covering a 10-year period (2001-2010. Particular emphasis was placed on spatial and temporal patterns. The models were fitted using integrated nested Laplace approximations to perform fast approximate Bayesian inference. Bayesian variable selection was employed to determine the most important climatic, environmental, and socioeconomic predictors of cutaneous and visceral leishmaniasis. PRINCIPAL FINDINGS: For both types of leishmaniasis, precipitation and socioeconomic proxies were identified as important risk factors. The predicted number of cases in 2010 were 30,189 (standard deviation [SD]: 7,676 for cutaneous leishmaniasis and 4,889 (SD: 288 for visceral leishmaniasis. Our risk maps predicted the highest numbers of infected people in the states of Minas Gerais and Pará for visceral and cutaneous leishmaniasis, respectively. CONCLUSIONS/SIGNIFICANCE: Our spatially explicit, high-resolution incidence maps identified priority areas where leishmaniasis control efforts should be targeted with the ultimate goal to reduce disease incidence.
Bayesian Games with Intentions
Directory of Open Access Journals (Sweden)
Adam Bjorndahl
2016-06-01
Full Text Available We show that standard Bayesian games cannot represent the full spectrum of belief-dependent preferences. However, by introducing a fundamental distinction between intended and actual strategies, we remove this limitation. We define Bayesian games with intentions, generalizing both Bayesian games and psychological games, and prove that Nash equilibria in psychological games correspond to a special class of equilibria as defined in our setting.
Bayesian Data-Model Fit Assessment for Structural Equation Modeling
Levy, Roy
2011-01-01
Bayesian approaches to modeling are receiving an increasing amount of attention in the areas of model construction and estimation in factor analysis, structural equation modeling (SEM), and related latent variable models. However, model diagnostics and model criticism remain relatively understudied aspects of Bayesian SEM. This article describes…
Directory of Open Access Journals (Sweden)
Shuwei Zhang
2011-02-01
Full Text Available Experimental pEC50s for 216 selective respiratory syncytial virus (RSV inhibitors are used to develop classification models as a potential screening tool for a large library of target compounds. Variable selection algorithm coupled with random forests (VS-RF is used to extract the physicochemical features most relevant to the RSV inhibition. Based on the selected small set of descriptors, four other widely used approaches, i.e., support vector machine (SVM, Gaussian process (GP, linear discriminant analysis (LDA and k nearest neighbors (kNN routines are also employed and compared with the VS-RF method in terms of several of rigorous evaluation criteria. The obtained results indicate that the VS-RF model is a powerful tool for classification of RSV inhibitors, producing the highest overall accuracy of 94.34% for the external prediction set, which significantly outperforms the other four methods with the average accuracy of 80.66%. The proposed model with excellent prediction capacity from internal to external quality should be important for screening and optimization of potential RSV inhibitors prior to chemical synthesis in drug development.
Bayesian statistics an introduction
Lee, Peter M
2012-01-01
Bayesian Statistics is the school of thought that combines prior beliefs with the likelihood of a hypothesis to arrive at posterior beliefs. The first edition of Peter Lee’s book appeared in 1989, but the subject has moved ever onwards, with increasing emphasis on Monte Carlo based techniques. This new fourth edition looks at recent techniques such as variational methods, Bayesian importance sampling, approximate Bayesian computation and Reversible Jump Markov Chain Monte Carlo (RJMCMC), providing a concise account of the way in which the Bayesian approach to statistics develops as wel
Understanding Computational Bayesian Statistics
Bolstad, William M
2011-01-01
A hands-on introduction to computational statistics from a Bayesian point of view Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistic
The bugs book a practical introduction to Bayesian analysis
Lunn, David; Best, Nicky; Thomas, Andrew; Spiegelhalter, David
2012-01-01
Introduction: Probability and ParametersProbabilityProbability distributionsCalculating properties of probability distributionsMonte Carlo integrationMonte Carlo Simulations Using BUGSIntroduction to BUGSDoodleBUGSUsing BUGS to simulate from distributionsTransformations of random variablesComplex calculations using Monte CarloMultivariate Monte Carlo analysisPredictions with unknown parametersIntroduction to Bayesian InferenceBayesian learningPosterior predictive distributionsConjugate Bayesian inferenceInference about a discrete parameterCombinations of conjugate analysesBayesian and classica
Fan, Shu-Xiang; Huang, Wen-Qian; Li, Jiang-Bo; Guo, Zhi-Ming; Zhaq, Chun-Jiang
2014-10-01
In order to detect the soluble solids content(SSC)of apple conveniently and rapidly, a ring fiber probe and a portable spectrometer were applied to obtain the spectroscopy of apple. Different wavelength variable selection methods, including unin- formative variable elimination (UVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm (GA) were pro- posed to select effective wavelength variables of the NIR spectroscopy of the SSC in apple based on PLS. The back interval LS- SVM (BiLS-SVM) and GA were used to select effective wavelength variables based on LS-SVM. Selected wavelength variables and full wavelength range were set as input variables of PLS model and LS-SVM model, respectively. The results indicated that PLS model built using GA-CARS on 50 characteristic variables selected from full-spectrum which had 1512 wavelengths achieved the optimal performance. The correlation coefficient (Rp) and root mean square error of prediction (RMSEP) for prediction sets were 0.962, 0.403°Brix respectively for SSC. The proposed method of GA-CARS could effectively simplify the portable detection model of SSC in apple based on near infrared spectroscopy and enhance the predictive precision. The study can provide a reference for the development of portable apple soluble solids content spectrometer.
Directory of Open Access Journals (Sweden)
Adrion Christine
2012-09-01
provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.
Directory of Open Access Journals (Sweden)
Crowcroft Natasha S
2010-12-01
Full Text Available Abstract Background Encephalitis is an acute clinical syndrome of the central nervous system (CNS, often associated with fatal outcome or permanent damage, including cognitive and behavioural impairment, affective disorders and epileptic seizures. Infection of the central nervous system is considered to be a major cause of encephalitis and more than 100 different pathogens have been recognized as causative agents. However, a large proportion of cases have unknown disease etiology. Methods We perform hierarchical cluster analysis on a multicenter England encephalitis data set with the aim of identifying sub-groups in human encephalitis. We use the simple matching similarity measure which is appropriate for binary data sets and performed variable selection using cluster heatmaps. We also use heatmaps to visually assess underlying patterns in the data, identify the main clinical and laboratory features and identify potential risk factors associated with encephalitis. Results Our results identified fever, personality and behavioural change, headache and lethargy as the main characteristics of encephalitis. Diagnostic variables such as brain scan and measurements from cerebrospinal fluids are also identified as main indicators of encephalitis. Our analysis revealed six major clusters in the England encephalitis data set. However, marked within-cluster heterogeneity is observed in some of the big clusters indicating possible sub-groups. Overall, the results show that patients are clustered according to symptom and diagnostic variables rather than causal agents. Exposure variables such as recent infection, sick person contact and animal contact have been identified as potential risk factors. Conclusions It is in general assumed and is a common practice to group encephalitis cases according to disease etiology. However, our results indicate that patients are clustered with respect to mainly symptom and diagnostic variables rather than causal agents
Approach to the Correlation Discovery of Chinese Linguistic Parameters Based on Bayesian Method
Institute of Scientific and Technical Information of China (English)
WANG Wei(王玮); CAI LianHong(蔡莲红)
2003-01-01
Bayesian approach is an important method in statistics. The Bayesian belief network is a powerful knowledge representation and reasoning tool under the conditions of uncertainty.It is a graphics model that encodes probabilistic relationships among variables of interest. In this paper, an approach to Bayesian network construction is given for discovering the Chinese linguistic parameter relationship in the corpus.
Fiurasek, Jaromir
2012-01-01
The noiseless amplification or attenuation are two heralded filtering operations that enable respectively to increase or decrease the mean field of any quantum state of light with no added noise, at the cost of a small success probability. We show that inserting such noiseless operations in a transmission line improves the performance of continuous-variable quantum key distribution over this line. Remarkably, these noiseless operations do not need to be physically implemented but can simply be simulated in the data post-processing stage. Hence, virtual noiseless amplification or attenuation amounts to perform a Gaussian post-selection, which enhances the secure range or tolerable excess noise while keeping the benefits of Gaussian security proofs.
The role of the c-statistic in variable selection for propensity score models.
Westreich, Daniel; Cole, Stephen R; Funk, Michele Jonsson; Brookhart, M Alan; Stürmer, Til
2011-03-01
The applied literature on propensity scores has often cited the c-statistic as a measure of the ability of the propensity score to control confounding. However, a high c-statistic in the propensity model is neither necessary nor sufficient for control of confounding. Moreover, use of the c-statistic as a guide in constructing propensity scores may result in less overlap in propensity scores between treated and untreated subjects; this may require the analyst to restrict populations for inference. Such restrictions may reduce precision of estimates and change the population to which the estimate applies. Variable selection based on prior subject matter knowledge, empirical observation, and sensitivity analysis is preferable and avoids many of these problems.
Bayesian priors for transiting planets
Kipping, David M
2016-01-01
As astronomers push towards discovering ever-smaller transiting planets, it is increasingly common to deal with low signal-to-noise ratio (SNR) events, where the choice of priors plays an influential role in Bayesian inference. In the analysis of exoplanet data, the selection of priors is often treated as a nuisance, with observers typically defaulting to uninformative distributions. Such treatments miss a key strength of the Bayesian framework, especially in the low SNR regime, where even weak a priori information is valuable. When estimating the parameters of a low-SNR transit, two key pieces of information are known: (i) the planet has the correct geometric alignment to transit and (ii) the transit event exhibits sufficient signal-to-noise to have been detected. These represent two forms of observational bias. Accordingly, when fitting transits, the model parameter priors should not follow the intrinsic distributions of said terms, but rather those of both the intrinsic distributions and the observational ...
A Bayesian Hierarchical Model for Relating Multiple SNPs within Multiple Genes to Disease Risk
Directory of Open Access Journals (Sweden)
Lewei Duan
2013-01-01
Full Text Available A variety of methods have been proposed for studying the association of multiple genes thought to be involved in a common pathway for a particular disease. Here, we present an extension of a Bayesian hierarchical modeling strategy that allows for multiple SNPs within each gene, with external prior information at either the SNP or gene level. The model involves variable selection at the SNP level through latent indicator variables and Bayesian shrinkage at the gene level towards a prior mean vector and covariance matrix that depend on external information. The entire model is fitted using Markov chain Monte Carlo methods. Simulation studies show that the approach is capable of recovering many of the truly causal SNPs and genes, depending upon their frequency and size of their effects. The method is applied to data on 504 SNPs in 38 candidate genes involved in DNA damage response in the WECARE study of second breast cancers in relation to radiotherapy exposure.
Universal Darwinism As a Process of Bayesian Inference.
Campbell, John O
2016-01-01
Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment." Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.
Universal Darwinism as a process of Bayesian inference
Directory of Open Access Journals (Sweden)
John Oberon Campbell
2016-06-01
Full Text Available Many of the mathematical frameworks describing natural selection are equivalent to Bayes’ Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians. As Bayesian inference can always be cast in terms of (variational free energy minimization, natural selection can be viewed as comprising two components: a generative model of an ‘experiment’ in the external world environment, and the results of that 'experiment' or the 'surprise' entailed by predicted and actual outcomes of the ‘experiment’. Minimization of free energy implies that the implicit measure of 'surprise' experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.
DeMaris, Alfred
2014-09-01
Unmeasured confounding is the principal threat to unbiased estimation of treatment "effects" (i.e., regression parameters for binary regressors) in nonexperimental research. It refers to unmeasured characteristics of individuals that lead them both to be in a particular "treatment" category and to register higher or lower values than others on a response variable. In this article, I introduce readers to 2 econometric techniques designed to control the problem, with a particular emphasis on the Heckman selection model (HSM). Both techniques can be used with only cross-sectional data. Using a Monte Carlo experiment, I compare the performance of instrumental-variable regression (IVR) and HSM to that of ordinary least squares (OLS) under conditions with treatment and unmeasured confounding both present and absent. I find HSM generally to outperform IVR with respect to mean-square-error of treatment estimates, as well as power for detecting either a treatment effect or unobserved confounding. However, both HSM and IVR require a large sample to be fully effective. The use of HSM and IVR in tandem with OLS to untangle unobserved confounding bias in cross-sectional data is further demonstrated with an empirical application. Using data from the 2006-2010 General Social Survey (National Opinion Research Center, 2014), I examine the association between being married and subjective well-being.
Strong Variability of Overlapping Iron Broad Absorption Lines in five Radio-selected Quasars
Zhang, Shaohua; Wang, Tinggui; Wang, Huiyuan; Shi, Xiheng; Liu, Bo; Liu, Wenjuan; Li, Zhenzhen; Wang, Shufen
2015-01-01
We present the results of a variability study of broad absorption lines (BALs) in a uniformly radio-selected sample of 28 BAL quasars using the archival data from the first bright quasar survey (FBQS) and the Sloan Digital Sky Survey (SDSS), as well as those obtained by ourselves, covering time scales $\\sim 1-10$ years in the quasar's rest-frame. The variable absorption troughs are detected in 12 BAL quasars. Among them, five cases showed strong spectral variations and are all belong to a special subclass of overlapping iron low ionization BALs (OFeLoBALs). The absorbers of \\ion{Fe}{2} are estimated to be formed by a relative dense (\\mbox{$n\\rm _{e} > 10^6~cm^{-3}$}) gas at a distance from the subparsec scale to the dozens of parsec-scale from the continuum source. They differ from those of invariable non-overlapping FeLoBALs (non-OFeLoBALs), which are the low-density gas and locate at the distance of hundreds to thousands parsecs. OFeLoBALs and non-OFeLoBALs, i.e., FeLoBALs with/without strong BAL variations...
DEFF Research Database (Denmark)
Mauricio Iglesias, Miguel; Valverde Perez, Borja; Sin, Gürkan
Selecting the right controlled variables in a bioprocess is challenging since the objectives of the process (yields, product or substrate concentration) are difficult to relate with a given actuator. We apply here process control tools that can be used to assist in the selection of controlled var...
Smith, Stanley Jeffery
This study investigated the relationship between organizational climate and selected organizational variables--reading achievement, teacher experience, and teacher attrition. The study sample consisted of the total teaching staffs and 642 randomly selected students from five elementary schools in a metropolitan school district. Data were collected…
Dynamic Batch Bayesian Optimization
Azimi, Javad; Fern, Xiaoli
2011-01-01
Bayesian optimization (BO) algorithms try to optimize an unknown function that is expensive to evaluate using minimum number of evaluations/experiments. Most of the proposed algorithms in BO are sequential, where only one experiment is selected at each iteration. This method can be time inefficient when each experiment takes a long time and more than one experiment can be ran concurrently. On the other hand, requesting a fix-sized batch of experiments at each iteration causes performance inefficiency in BO compared to the sequential policies. In this paper, we present an algorithm that asks a batch of experiments at each time step t where the batch size p_t is dynamically determined in each step. Our algorithm is based on the observation that the sequence of experiments selected by the sequential policy can sometimes be almost independent from each other. Our algorithm identifies such scenarios and request those experiments at the same time without degrading the performance. We evaluate our proposed method us...
Directory of Open Access Journals (Sweden)
C. Roger James
2007-03-01
Full Text Available The objectives were to determine the number of trials necessary to achieve performance stability of selected ground reaction force (GRF variables during landing and to compare two methods of determining stability. Ten subjects divided into two groups each completed a minimum of 20 drop or step-off landings from 0.60 or 0.61 m onto a force platform (1000 Hz. Five vertical GRF variables (first and second peaks, average loading rates to these peaks, and impulse were quantified during the initial 100 ms post-contact period. Test-retest reliability (stability was determined using two methods: (1 intra-class correlation coefficient (ICC analysis, and (2 sequential averaging analysis. Results of the ICC analysis indicated that an average of four trials (mean 3.8 ± 2.7 Group 1; 3.6 ± 1.7 Group 2 were necessary to achieve maximum ICC values. Maximum ICC values ranged from 0.55 to 0.99 and all were significantly (p < 0. 05 different from zero. Results of the sequential averaging analysis revealed that an average of 12 trials (mean 11.7 ± 3.1 Group 1; 11.5 ± 4.5 Group 2 were necessary to achieve performance stability using criteria previously reported in the literature. Using 10 reference trials, the sequential averaging technique required standard deviation criterion values of 0.60 and 0.49 for Groups 1 and 2, respectively, in order to approximate the ICC results. The results of the study suggest that the ICC might be a less conservative, but more objective method for determining stability, especially when compared to previous applications of the sequential averaging technique. Moreover, criteria for implementing the sequential averaging technique can be adjusted so that results closely approximate the results from ICC. In conclusion, subjects in landing experiments should perform a minimum of four and possibly as many as eight trials to achieve performance stability of selected GRF variables. Researchers should use this information to plan future
Yuan, Ying; MacKinnon, David P.
2009-01-01
In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…
Predictive validity of variables used to select students for postgraduate management courses.
Lane, John; Lane, Andrew M
2002-06-01
The present study set in the United Kingdom examined the predictive validity of variables used to select graduate students into postgraduate management programs at a UK business school. 303 postgraduate students completed a cognitive ability test (MD5, Mental Ability Test), a questionnaire to assess perceptions of self-efficacy to succeed on the program, and reported their performance on their first (undergraduate) degree. Students completed these measures at the start of the programs. Each program comprised 12 modules, which all students were required to complete successfully. Students' performance was measured by the average grade obtained over the 12 modules. Multiple regression indicated that only 22% of the variance (Adjusted R2 = .22, p<.001) in students' performance was predicted significantly by cognitive ability scores. Results show that neither performance on first degree nor scores for self-efficacy showed a significant relationship to the criterion measure. Findings from the present study suggest that in the UK, the use of cognitive ability tests may play a significant role in the selection of students into postgraduate programs. Nonsignificant self-efficacy and performance relationships are ascribed to unclear knowledge of the demands of the program. We suggest that there is need for further research to examine factors related to performance.
Selection of AGN candidates in the GOODS-South Field through SPITZER/MIPS 24 microns variability
García-González, Judit; Pérez-González, Pablo G; Hernán-Caballero, Antonio; Sarajedini, Vicki L; Villar, Víctor
2014-01-01
We present a study of galaxies showing mid-infrared variability in the deepest Spitzer/MIPS 24 $\\mu$m surveys in the GOODS-South field. We divide the dataset in epochs and subepochs to study the long-term (months-years) and the short-term (days) variability. We use a $\\chi^2$-statistics method to select AGN candidates with a probability $\\leq$ 1% that the observed variability is due to statistical errors alone. We find 39 (1.7% of the parent sample) sources that show long-term variability and 55 (2.2% of the parent sample) showing short-term variability. We compare our candidates with AGN selected in the X-ray and radio bands, and AGN candidates selected by their IR emission. Approximately, 50% of the MIPS 24 $\\mu$m variable sources would be identified as AGN with these other methods. Therefore, MIPS 24 $\\mu$m variability is a new method to identify AGN candidates, possibly dust obscured and low luminosity AGN that might be missed by other methods. However, the contribution of the MIPS 24 $\\mu$m variable iden...
Directory of Open Access Journals (Sweden)
Ildikó Ungvári
Full Text Available Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls. The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA. This method uses bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated.With frequentist methods one SNP (rs3751464 in the FRMD6 gene provided evidence for an association with asthma (OR = 1.43(1.2-1.8; p = 3×10(-4. The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics.In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance.
FCERI AND HISTAMINE METABOLISM GENE VARIABILITY IN SELECTIVE RESPONDERS TO NSAIDS
Directory of Open Access Journals (Sweden)
Gemma Amo
2016-09-01
Full Text Available The high-affinity IgE receptor (Fcε RI is a heterotetramer of three subunits: Fcε RIα, Fcε RIβ and Fcε RIγ (αβγ2 encoded by three genes designated as FCER1A, FCER1B (MS4A2 and FCER1G, respectively. Recent evidence points to FCERI gene variability as a relevant factor in the risk of developing allergic diseases. Because Fcε RI plays a key role in the events downstream of the triggering factors in immunological response, we hypothesized that FCERI gene variants might be related with the risk of, or with the clinical response to, selective (IgE mediated non-steroidal anti-inflammatory (NSAID hypersensitivity.From a cohort of 314 patients suffering from selective hypersensitivity to metamizole, ibuprofen, diclofenac, paracetamol, acetylsalicylic acid (ASA, propifenazone, naproxen, ketoprofen, dexketoprofen, etofenamate, aceclofenac, etoricoxib, dexibuprofen, indomethacin, oxyphenylbutazone or piroxicam, and 585 unrelated healthy controls that tolerated these NSAIDs, we analyzed the putative effects of the FCERI SNPs FCER1A rs2494262, rs2427837 and rs2251746; FCER1B rs1441586, rs569108 and rs512555; FCER1G rs11587213, rs2070901 and rs11421. Furthermore, in order to identify additional genetic markers which might be associated with the risk of developing selective NSAID hypersensitivity, or which may modify the putative association of FCERI gene variations with risk, we analyzed polymorphisms known to affect histamine synthesis or metabolism, such as rs17740607, rs2073440, rs1801105, rs2052129, rs10156191, rs1049742 and rs1049793 in the HDC, HNMT and DAO genes.No major genetic associations with risk or with clinical presentation, and no gene-gene interactions, or gene-phenotype interactions (including age, gender, IgE concentration, antecedents of atopy, culprit drug or clinical presentation were identified in patients. However, logistic regression analyses indicated that the presence of antecedents of atopy and the DAO SNP rs2052129 (GG
FCERI and Histamine Metabolism Gene Variability in Selective Responders to NSAIDS
Amo, Gemma; Cornejo-García, José A.; García-Menaya, Jesus M.; Cordobes, Concepcion; Torres, M. J.; Esguevillas, Gara; Mayorga, Cristobalina; Martinez, Carmen; Blanca-Lopez, Natalia; Canto, Gabriela; Ramos, Alfonso; Blanca, Miguel; Agúndez, José A. G.; García-Martín, Elena
2016-01-01
The high-affinity IgE receptor (Fcε RI) is a heterotetramer of three subunits: Fcε RIα, Fcε RIβ, and Fcε RIγ (αβγ2) encoded by three genes designated as FCER1A, FCER1B (MS4A2), and FCER1G, respectively. Recent evidence points to FCERI gene variability as a relevant factor in the risk of developing allergic diseases. Because Fcε RI plays a key role in the events downstream of the triggering factors in immunological response, we hypothesized that FCERI gene variants might be related with the risk of, or with the clinical response to, selective (IgE mediated) non-steroidal anti-inflammatory (NSAID) hypersensitivity. From a cohort of 314 patients suffering from selective hypersensitivity to metamizole, ibuprofen, diclofenac, paracetamol, acetylsalicylic acid (ASA), propifenazone, naproxen, ketoprofen, dexketoprofen, etofenamate, aceclofenac, etoricoxib, dexibuprofen, indomethacin, oxyphenylbutazone, or piroxicam, and 585 unrelated healthy controls that tolerated these NSAIDs, we analyzed the putative effects of the FCERI SNPs FCER1A rs2494262, rs2427837, and rs2251746; FCER1B rs1441586, rs569108, and rs512555; FCER1G rs11587213, rs2070901, and rs11421. Furthermore, in order to identify additional genetic markers which might be associated with the risk of developing selective NSAID hypersensitivity, or which may modify the putative association of FCERI gene variations with risk, we analyzed polymorphisms known to affect histamine synthesis or metabolism, such as rs17740607, rs2073440, rs1801105, rs2052129, rs10156191, rs1049742, and rs1049793 in the HDC, HNMT, and DAO genes. No major genetic associations with risk or with clinical presentation, and no gene-gene interactions, or gene-phenotype interactions (including age, gender, IgE concentration, antecedents of atopy, culprit drug, or clinical presentation) were identified in patients. However, logistic regression analyses indicated that the presence of antecedents of atopy and the DAO SNP rs2052129 (GG
Effects of Parceling on Model Selection: Parcel-Allocation Variability in Model Ranking.
Sterba, Sonya K; Rights, Jason D
2016-01-25
Research interest often lies in comparing structural model specifications implying different relationships among latent factors. In this context parceling is commonly accepted, assuming the item-level measurement structure is well known and, conservatively, assuming items are unidimensional in the population. Under these assumptions, researchers compare competing structural models, each specified using the same parcel-level measurement model. However, little is known about consequences of parceling for model selection in this context-including whether and when model ranking could vary across alternative item-to-parcel allocations within-sample. This article first provides a theoretical framework that predicts the occurrence of parcel-allocation variability (PAV) in model selection index values and its consequences for PAV in ranking of competing structural models. These predictions are then investigated via simulation. We show that conditions known to manifest PAV in absolute fit of a single model may or may not manifest PAV in model ranking. Thus, one cannot assume that low PAV in absolute fit implies a lack of PAV in ranking, and vice versa. PAV in ranking is shown to occur under a variety of conditions, including large samples. To provide an empirically supported strategy for selecting a model when PAV in ranking exists, we draw on relationships between structural model rankings in parcel- versus item-solutions. This strategy employs the across-allocation modal ranking. We developed software tools for implementing this strategy in practice, and illustrate them with an example. Even if a researcher has substantive reason to prefer one particular allocation, investigating PAV in ranking within-sample still provides an informative sensitivity analysis.
Smail, Linda
2016-06-01
The basic task of any probabilistic inference system in Bayesian networks is computing the posterior probability distribution for a subset or subsets of random variables, given values or evidence for some other variables from the same Bayesian network. Many methods and algorithms have been developed to exact and approximate inference in Bayesian networks. This work compares two exact inference methods in Bayesian networks-Lauritzen-Spiegelhalter and the successive restrictions algorithm-from the perspective of computational efficiency. The two methods were applied for comparison to a Chest Clinic Bayesian Network. Results indicate that the successive restrictions algorithm shows more computational efficiency than the Lauritzen-Spiegelhalter algorithm.
PAC-Bayesian Policy Evaluation for Reinforcement Learning
Fard, Mahdi MIlani; Szepesvari, Csaba
2012-01-01
Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first PAC-Bayesian bound for the batch reinforcement learning problem with function approximation. We show how this bound can be used to perform model-selection in a transfer learning scenario. Our empirical results confirm that PAC-Bayesian policy evaluation is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignore them when they are misleading.
Konstruksi Bayesian Network Dengan Algoritma Bayesian Association Rule Mining Network
Octavian
2015-01-01
Beberapa tahun terakhir, Bayesian Network telah menjadi konsep yang populer digunakan dalam berbagai bidang kehidupan seperti dalam pengambilan sebuah keputusan dan menentukan peluang suatu kejadian dapat terjadi. Sayangnya, pengkonstruksian struktur dari Bayesian Network itu sendiri bukanlah hal yang sederhana. Oleh sebab itu, penelitian ini mencoba memperkenalkan algoritma Bayesian Association Rule Mining Network untuk memudahkan kita dalam mengkonstruksi Bayesian Network berdasarkan data ...
Bayesian Networks: Aspects of Approximate Inference
Bolt, J.H.
2008-01-01
A Bayesian network can be used to model consisely the probabilistic knowledge with respect to a given problem domain. Such a network consists of an acyclic directed graph in which the nodes represent stochastic variables, supplemented with probabilities indicating the strength of the influences betw
Bayesian analysis of factors associated with fibromyalgia syndrome subjects
Jayawardana, Veroni; Mondal, Sumona; Russek, Leslie
2015-01-01
Factors contributing to movement-related fear were assessed by Russek, et al. 2014 for subjects with Fibromyalgia (FM) based on the collected data by a national internet survey of community-based individuals. The study focused on the variables, Activities-Specific Balance Confidence scale (ABC), Primary Care Post-Traumatic Stress Disorder screen (PC-PTSD), Tampa Scale of Kinesiophobia (TSK), a Joint Hypermobility Syndrome screen (JHS), Vertigo Symptom Scale (VSS-SF), Obsessive-Compulsive Personality Disorder (OCPD), Pain, work status and physical activity dependent from the "Revised Fibromyalgia Impact Questionnaire" (FIQR). The study presented in this paper revisits same data with a Bayesian analysis where appropriate priors were introduced for variables selected in the Russek's paper.
Balasundaram, B; Harrison, S T L
2006-06-01
Intracellular products, not secreted from the microbial cell, are released by breaking the cell envelope consisting of cytoplasmic membrane and an outer cell wall. Hydrodynamic cavitation has been reported to cause microbial cell disruption. By manipulating the operating variables involved, a wide range of intensity of cavitation can be achieved resulting in a varying extent of disruption. The effect of the process variables including cavitation number, initial cell concentration of the suspension and the number of passes across the cavitation zone on the release of enzymes from various locations of the Brewers' yeast was studied. The release profile of the enzymes studied include alpha-glucosidase (periplasmic), invertase (cell wall bound), alcohol dehydrogenase (ADH; cytoplasmic) and glucose-6-phosphate dehydrogenase (G6PDH; cytoplasmic). An optimum cavitation number Cv of 0.13 for maximum disruption was observed across the range Cv 0.09-0.99. The optimum cell concentration was found to be 0.5% (w/v, wet wt) when varying over the range 0.1%-5%. The sustained effect of cavitation on the yeast cell wall when re-circulating the suspension across the cavitation zone was found to release the cell wall bound enzyme invertase (86%) to a greater extent than the enzymes from other locations of the cell (e.g. periplasmic alpha-glucosidase at 17%). Localised damage to the cell wall could be observed using transmission electron microscopy (TEM) of cells subjected to less intense cavitation conditions. Absence of the release of cytoplasmic enzymes to a significant extent, absence of micronisation as observed by TEM and presence of a lower number of proteins bands in the culture supernatant on SDS-PAGE analysis following hydrodynamic cavitation compared to disruption by high-pressure homogenisation confirmed the selective release offered by hydrodynamic cavitation.
Chen, Xiaoxiao; Mukkamala, Ramakrishna
2008-01-01
Heart rate (HR) power spectral indexes are limited as measures of the cardiac autonomic nervous systems (CANS) in that they neither offer an effective marker of the beta-sympathetic nervous system (SNS) due to its overlap with the parasympathetic nervous system (PNS) in the low-frequency (LF) band nor afford specific measures of the CANS due to input contributions to HR [e.g., arterial blood pressure (ABP) and instantaneous lung volume (ILV)]. We derived new PNS and SNS indexes by multisignal analysis of cardiorespiratory variability. The basic idea was to identify the autonomically mediated transfer functions relating fluctuations in ILV to HR (ILV-->HR) and fluctuations in ABP to HR (ABP-->HR) so as to eliminate the input contributions to HR and then separate each estimated transfer function in the time domain into PNS and SNS indexes using physiological knowledge. We evaluated these indexes with respect to selective pharmacological autonomic nervous blockade in 14 humans. Our results showed that the PNS index derived from the ABP-->HR transfer function was correctly decreased after vagal and double (vagal + beta-sympathetic) blockade (P < 0.01) and did not change after beta-sympathetic blockade, whereas the SNS index derived from the same transfer function was correctly reduced after beta-sympathetic blockade in the standing posture and double blockade (P < 0.05) and remained the same after vagal blockade. However, this SNS index did not significantly decrease after beta-sympathetic blockade in the supine posture. Overall, these predictions were better than those provided by the traditional high-frequency (HF) power, LF-to-HF ratio, and normalized LF power of HR variability.
Model Diagnostics for Bayesian Networks
Sinharay, Sandip
2006-01-01
Bayesian networks are frequently used in educational assessments primarily for learning about students' knowledge and skills. There is a lack of works on assessing fit of Bayesian networks. This article employs the posterior predictive model checking method, a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A…
DEFF Research Database (Denmark)
Omidikia, Nematollah; Kompany-Zareh, Mohsen
2013-01-01
as collinearity reliability of the regression coefficient's magnitude is suspicious. Successive Projection Algorithm (SPA) and Gram-Schmidt Orthogonalization (GSO) were implemented as pre-selection technique for removing collinearity and redundancy among variables in the model. Uninformative variable elimination......-partial least squares (UVE-PLS) was performed on the pre-selected data set and C-value's were calculated for each descriptor. In this case the C-value's of LIVE assisted by SPA or GSO could be used in order to rank the variables according to their importance. Leave-many-out cross-validation (LMO-CV) was applied...... applying SPA-UVE-PLS on the anti-HIV data, nine descriptors were selected out of 160 with q(2) = 0.81, R-2 = 0.84 and Q(F3)(2) = 0.8. (C) 2013 Elsevier B.V. All rights reserved....
Variable selection in monotone single-index models via the adaptive LASSO.
Foster, Jared C; Taylor, Jeremy M G; Nan, Bin
2013-09-30
We consider the problem of variable selection for monotone single-index models. A single-index model assumes that the expectation of the outcome is an unknown function of a linear combination of covariates. Assuming monotonicity of the unknown function is often reasonable and allows for more straightforward inference. We present an adaptive LASSO penalized least squares approach to estimating the index parameter and the unknown function in these models for continuous outcome. Monotone function estimates are achieved using the pooled adjacent violators algorithm, followed by kernel regression. In the iterative estimation process, a linear approximation to the unknown function is used, therefore reducing the situation to that of linear regression and allowing for the use of standard LASSO algorithms, such as coordinate descent. Results of a simulation study indicate that the proposed methods perform well under a variety of circumstances and that an assumption of monotonicity, when appropriate, noticeably improves performance. The proposed methods are applied to data from a randomized clinical trial for the treatment of a critical illness in the intensive care unit.
Variable Selection for Functional Logistic Regression in fMRI Data Analysis
Directory of Open Access Journals (Sweden)
Nedret BILLOR
2015-03-01
Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.
Variation in Age and Training on Selected Biochemical Variables of Indian Hockey Players
Directory of Open Access Journals (Sweden)
I. Manna
2010-04-01
Full Text Available The present study was aimed to find out the variation of age and training on biochemical variables of Indian elite hockey players. A total of 120 hockey players who volunteered for the present study, were equally divided (n=30 into 4 groups: under 16 years (14-15 yrs; under 19 years (16-18 yrs; under 23 years (19-22 yrs; and senior (23-30 yrs. The training sessions were divided into 3 phases: Transition Phase (TP, Preparatory Phase (PP, and Competitive Phase (CP. The training programme consisted of aerobic, anaerobic and skill training; and completed 4 hours in morning and evening sessions, 5 days/week. Selected biochemical parameters were measured and data were analyzed by applying Two-way ANOVA and Post hoc test. The mean values of haemoglobin (Hb, total cholesterol (TC, triglyceride (TG, high density lipoprotein cholesterol (HDL-C and low density lipoprotein cholesterol (LDL-C have been increased significantly (P<0.05 with the advancement of age of players. A significant increase (P<0.05 in serum urea, uric acid and HDL-C and a significant decrease (P<0.05 in Hb, TC, TG and LDL-C have been noted in PP and CP when compared to that of TP. The present study would provide useful information for biochemical monitoring of training of hockey players.
Prioritizing individual genetic variants after kernel machine testing using variable selection.
He, Qianchuan; Cai, Tianxi; Liu, Yang; Zhao, Ni; Harmon, Quaker E; Almli, Lynn M; Binder, Elisabeth B; Engel, Stephanie M; Ressler, Kerry J; Conneely, Karen N; Lin, Xihong; Wu, Michael C
2016-12-01
Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.
Resiliency and subjective health assessment. Moderating role of selected psychosocial variables
Directory of Open Access Journals (Sweden)
Michalina Sołtys
2015-12-01
Full Text Available Background Resiliency is defined as a relatively permanent personality trait, which may be assigned to the category of health resources. The aim of this study was to determine conditions in which resiliency poses a significant health resource (moderation, thereby broadening knowledge of the specifics of the relationship between resiliency and subjective health assessment. Participants and procedure The study included 142 individuals. In order to examine the level of resiliency, the Assessment Resiliency Scale (SPP-25 by N. Ogińska-Bulik and Z. Juczyński was used. Participants evaluated subjective health state by means of an analogue-visual scale. Additionally, in the research the following moderating variables were controlled: sex, objective health status, having a partner, professional activity and age. These data were obtained by personal survey. Results The results confirmed the relationship between resiliency and subjective health assessment. Multiple regression analysis revealed that sex, having a partner and professional activity are significant moderators of associations between level of resiliency and subjective health evaluation. However, statistically significant interaction effects for health status and age as a moderator were not observed. Conclusions Resiliency is associated with subjective health assessment among adults, and selected socio-demographic features (such as sex, having a partner, professional activity moderate this relationship. This confirms the significant role of resiliency as a health resource and a reason to emphasize the benefits of enhancing the potential of individuals for their psychophysical wellbeing. However, the research requires replication in a more homogeneous sample.
Institute of Scientific and Technical Information of China (English)
朱波; 王延晖; 牛红; 陈燕; 张路培; 高会江; 高雪; 李俊雅; 孙少华
2014-01-01
Variety selection in livestock breeding occupies an important position. Genomic selection, as a novel technology in livestock breeding, has raised considerable concern. It can shorten the generation interval, speed up the genetic progress, and it can select the candidate individuals as breeding stock without phenotypic data. In 2001, Meuviwisen proposed the concept of genomic selection, which was first applied in dairy cattle. Until August 2014, there were 34 member countries of Interbull organization that had applicated genomic selection in their national dairy cattle breeding group. With the popularization and continuous promotion of genomic selection, some problems of the accuracy of genomic estimated breeding value need to be solved. Various methods of genomic selection have been proposed and more efficient models are being developed. So it has great practical significance to exploit better models and algorithm to improve the accuracy of genomic estimated breeding value. So far, there were 17 Bayesian methods that have been successively proposed. This thesis briefly introduced the classical BayesA and BayesB methods for genomic selection. BayesA assumed that all loca have effect, while BayesB supposed that a small part of locus have effect, and the percentage was extremely small. Therefore, BayesA and BayesB had different models and algorithms. After Meuviwisen proposed classic Bayesian methods, other methods were like mushrooms springing up. New Bayesian methods were based on the classical Bayesian methods, which was optimized by improving the hypothetical model and algorithm. For example, BayesC method, which was based on BayesB, optimized theπvalue in the model. BayesCπ and BayesDπ were the improvement of BayesC, and these two approaches assumed that marker effect variance of each locus had the same value, whereas BayesC assumed that its marker effect variance of each locus was different. BayesDπ, which was based on BayesCπ, optimized the scale
An experiment on selecting most informative variables in socio-economic data
Directory of Open Access Journals (Sweden)
L. Jenkins
2014-01-01
Full Text Available In many studies where data are collected on several variables, there is a motivation to find if fewer variables would provide almost as much information. Variance of a variable about its mean is the common statistical measure of information content, and that is used here. We are interested whether the variability in one variable is sufficiently correlated with that in one or more of the other variables that the first variable is redundant. We wish to find one or more ‘principal variables’ that sufficiently reflect the information content in all the original variables. The paper explains the method of principal variables and reports experiments using the technique to see if just a few variables are sufficient to reflect the information in 11 socioeconomic variables on 130 countries from a World Bank (WB database. While the method of principal variables is highly successful in a statistical sense, the WB data varies greatly from year to year, demonstrating that fewer variables wo uld be inadequate for this data.
Comparative performance of selected variability detection techniques in photometric time series data
Sokolovsky, K V; Karampelas, A; Antipin, S V; Bellas-Velidis, I; Benni, P; Bonanos, A Z; Burdanov, A Y; Derlopa, S; Hatzidimitriou, D; Khokhryakova, A D; Kolesnikova, D M; Korotkiy, S A; Lapukhin, E G; Moretti, M I; Popov, A A; Pouliasis, E; Samus, N N; Spetsieri, Z; Veselkov, S A; Volkov, K V; Yang, M; Zubareva, A M
2016-01-01
Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time-series datasets obtained with telescopes ranging in size from a telephoto lens to 1m-class and probing variability on timescales from minutes to decades. The test datasets together include lightcurves of 127539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampli...
Selection of AGN candidates in the GOODS-South Field through SPITZER/MIPS 24 $\\mu$m variability
García-González, Judit; Pérez-González, Pablo G; Hernán-Caballero, Antonio; Sarajedini, Vicki L; Villar, Víctor
2014-01-01
We present a study of galaxies showing mid-infrared variability in data taken in the deepest Spitzer/MIPS 24 $\\mu$m surveys in the GOODS-South field. We divide the dataset in epochs and subepochs to study the long-term (months-years) and the short-term (days) variability. We use a $\\chi^2$-statistics method to select AGN candidates with a probability $\\leq$ 1% that the observed variability is due to statistical errors alone. We find 39 (1.7% of the parent sample) sources that show long-term variability and 55 (2.2% of the parent sample) showing short-term variability. That is, 0.03 sources $\\times$ arcmin$^{-2}$ for both, long-term and short-term variable sources. After removing the expected number of false positives inherent to the method, the estimated percentages are 1.0% and 1.4% of the parent sample for the long-term and short-term respectively. We compare our candidates with AGN selected in the X-ray and radio bands, and AGN candidates selected by their IR emission. Approximately, 50% of the MIPS 24 $\\m...
Bayesian networks in neuroscience: a survey.
Bielza, Concha; Larrañaga, Pedro
2014-01-01
Bayesian networks are a type of probabilistic graphical models lie at the intersection between statistics and machine learning. They have been shown to be powerful tools to encode dependence relationships among the variables of a domain under uncertainty. Thanks to their generality, Bayesian networks can accommodate continuous and discrete variables, as well as temporal processes. In this paper we review Bayesian networks and how they can be learned automatically from data by means of structure learning algorithms. Also, we examine how a user can take advantage of these networks for reasoning by exact or approximate inference algorithms that propagate the given evidence through the graphical structure. Despite their applicability in many fields, they have been little used in neuroscience, where they have focused on specific problems, like functional connectivity analysis from neuroimaging data. Here we survey key research in neuroscience where Bayesian networks have been used with different aims: discover associations between variables, perform probabilistic reasoning over the model, and classify new observations with and without supervision. The networks are learned from data of any kind-morphological, electrophysiological, -omics and neuroimaging-, thereby broadening the scope-molecular, cellular, structural, functional, cognitive and medical- of the brain aspects to be studied.
Bayesian Inference Methods for Sparse Channel Estimation
DEFF Research Database (Denmark)
Pedersen, Niels Lovmand
2013-01-01
This thesis deals with sparse Bayesian learning (SBL) with application to radio channel estimation. As opposed to the classical approach for sparse signal representation, we focus on the problem of inferring complex signals. Our investigations within SBL constitute the basis for the development...... of Bayesian inference algorithms for sparse channel estimation. Sparse inference methods aim at finding the sparse representation of a signal given in some overcomplete dictionary of basis vectors. Within this context, one of our main contributions to the field of SBL is a hierarchical representation...... analysis of the complex prior representation, where we show that the ability to induce sparse estimates of a given prior heavily depends on the inference method used and, interestingly, whether real or complex variables are inferred. We also show that the Bayesian estimators derived from the proposed...
Bayesian Image Reconstruction Based on Voronoi Diagrams
Cabrera, G F; Hitschfeld, N
2007-01-01
We present a Bayesian Voronoi image reconstruction technique (VIR) for interferometric data. Bayesian analysis applied to the inverse problem allows us to derive the a-posteriori probability of a novel parameterization of interferometric images. We use a variable Voronoi diagram as our model in place of the usual fixed pixel grid. A quantization of the intensity field allows us to calculate the likelihood function and a-priori probabilities. The Voronoi image is optimized including the number of polygons as free parameters. We apply our algorithm to deconvolve simulated interferometric data. Residuals, restored images and chi^2 values are used to compare our reconstructions with fixed grid models. VIR has the advantage of modeling the image with few parameters, obtaining a better image from a Bayesian point of view.
Bayesian Lensing Shear Measurement
Bernstein, Gary M
2013-01-01
We derive an estimator of weak gravitational lensing shear from background galaxy images that avoids noise-induced biases through a rigorous Bayesian treatment of the measurement. The Bayesian formalism requires a prior describing the (noiseless) distribution of the target galaxy population over some parameter space; this prior can be constructed from low-noise images of a subsample of the target population, attainable from long integrations of a fraction of the survey field. We find two ways to combine this exact treatment of noise with rigorous treatment of the effects of the instrumental point-spread function and sampling. The Bayesian model fitting (BMF) method assigns a likelihood of the pixel data to galaxy models (e.g. Sersic ellipses), and requires the unlensed distribution of galaxies over the model parameters as a prior. The Bayesian Fourier domain (BFD) method compresses galaxies to a small set of weighted moments calculated after PSF correction in Fourier space. It requires the unlensed distributi...
Fox, G.J.A.; Berg, van den S.M.; Veldkamp, B.P.; Irwing, P.; Booth, T.; Hughes, D.
2015-01-01
In educational and psychological studies, psychometric methods are involved in the measurement of constructs, and in constructing and validating measurement instruments. Assessment results are typically used to measure student proficiency levels and test characteristics. Recently, Bayesian item resp
Granade, Christopher; Cory, D G
2015-01-01
In recent years, Bayesian methods have been proposed as a solution to a wide range of issues in quantum state and process tomography. State-of- the-art Bayesian tomography solutions suffer from three problems: numerical intractability, a lack of informative prior distributions, and an inability to track time-dependent processes. Here, we solve all three problems. First, we use modern statistical methods, as pioneered by Husz\\'ar and Houlsby and by Ferrie, to make Bayesian tomography numerically tractable. Our approach allows for practical computation of Bayesian point and region estimators for quantum states and channels. Second, we propose the first informative priors on quantum states and channels. Finally, we develop a method that allows online tracking of time-dependent states and estimates the drift and diffusion processes affecting a state. We provide source code and animated visual examples for our methods.
Comparative performance of selected variability detection techniques in photometric time series data
Sokolovsky, K. V.; Gavras, P.; Karampelas, A.; Antipin, S. V.; Bellas-Velidis, I.; Benni, P.; Bonanos, A. Z.; Burdanov, A. Y.; Derlopa, S.; Hatzidimitriou, D.; Khokhryakova, A. D.; Kolesnikova, D. M.; Korotkiy, S. A.; Lapukhin, E. G.; Moretti, M. I.; Popov, A. A.; Pouliasis, E.; Samus, N. N.; Spetsieri, Z.; Veselkov, S. A.; Volkov, K. V.; Yang, M.; Zubareva, A. M.
2016-09-01
Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time-series datasets obtained with telescopes ranging in size from a telephoto lens to 1 m-class and probing variability on timescales from minutes to decades. The test datasets together include lightcurves of 127539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampling patterns, photometric accuracies, and percentages of outlier measurements. The first index is the interquartile range (IQR) of magnitude measurements, sensitive to variability irrespective of a timescale and resistant to outliers. It can be complemented by the ratio of the lightcurve variance to the mean square successive difference, 1/η, which is efficient in detecting variability on timescales longer than the typical time interval between observations. Variable objects have larger 1/η and/or IQR values than non-variable objects of similar brightness. Another approach to variability detection is to combine many variability indices using principal component analysis. We present 124 previously unknown variable stars found in the test data.
Comparative performance of selected variability detection techniques in photometric time series data
Sokolovsky, K. V.; Gavras, P.; Karampelas, A.; Antipin, S. V.; Bellas-Velidis, I.; Benni, P.; Bonanos, A. Z.; Burdanov, A. Y.; Derlopa, S.; Hatzidimitriou, D.; Khokhryakova, A. D.; Kolesnikova, D. M.; Korotkiy, S. A.; Lapukhin, E. G.; Moretti, M. I.; Popov, A. A.; Pouliasis, E.; Samus, N. N.; Spetsieri, Z.; Veselkov, S. A.; Volkov, K. V.; Yang, M.; Zubareva, A. M.
2017-01-01
Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time series data sets obtained with telescopes ranging in size from a telephoto lens to 1 m-class and probing variability on time-scales from minutes to decades. The test data sets together include light curves of 127 539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampling patterns, photometric accuracies and percentages of outlier measurements. The first index is the interquartile range (IQR) of magnitude measurements, sensitive to variability irrespective of a time-scale and resistant to outliers. It can be complemented by the ratio of the light-curve variance to the mean square successive difference, 1/η, which is efficient in detecting variability on time-scales longer than the typical time interval between observations. Variable objects have larger 1/η and/or IQR values than non-variable objects of similar brightness. Another approach to variability detection is to combine many variability indices using principal component analysis. We present 124 previously unknown variable stars found in the test data.
Variability Selected Low-Luminosity Active Galactic Nuclei in the 4 Ms Chandra Deep Field-South
Young, M.; Brandt, W. N.; Xue, Y. Q.; Paolillo, D. M.; Alexander, F. E.; Bauer, F. E.; Lehmer, B. D.; Luo, B.; Shemmer, O.; Schneider, D. P.; Vignail, C.
2012-01-01
The 4 Ms Chandra Deep Field-South (CDF-S) and other deep X-ray surveys have been highly effective at selecting active galactic nuclei (AGN). However, cosmologically distant low-luminosity AGN (LLAGN) have remained a challenge to identify due to significant contribution from the host galaxy. We identify long-term X ray variability (approx. month years, observed frame) in 20 of 92 CDF-S galaxies spanning redshifts approx equals 00.8 - 1.02 that do not meet other AGN selection criteria. We show that the observed variability cannot be explained by X-ray binary populations or ultraluminous X-ray sources, so the variability is most likely caused by accretion onto a supermassive black hole. The variable galaxies are not heavily obscured in general, with a stacked effective power-law photon index of Gamma(sub Stack) approx equals 1.93 +/- 0.13, and arc therefore likely LLAGN. The LLAGN tend to lie it factor of approx equal 6-89 below the extrapolated linear variability-luminosity relation measured for luminous AGN. This may he explained by their lower accretion rates. Variability-independent black-hole mass and accretion-rate estimates for variable galaxies show that they sample a significantly different black hole mass-accretion-rate space, with masses a factor of 2.4 lower and accretion rates a factor of 22.5 lower than variable luminous AGNs at the same redshift. We find that an empirical model based on a universal broken power-law power spectral density function, where the break frequency depends on SMBH mass and accretion rate, roughly reproduces the shape, but not the normalization, of the variability-luminosity trends measured for variable galaxies and more luminous AGNs.
Dimensionality reduction in Bayesian estimation algorithms
Directory of Open Access Journals (Sweden)
G. W. Petty
2013-03-01
Full Text Available An idealized synthetic database loosely resembling 3-channel passive microwave observations of precipitation against a variable background is employed to examine the performance of a conventional Bayesian retrieval algorithm. For this dataset, algorithm performance is found to be poor owing to an irreconcilable conflict between the need to find matches in the dependent database versus the need to exclude inappropriate matches. It is argued that the likelihood of such conflicts increases sharply with the dimensionality of the observation space of real satellite sensors, which may utilize 9 to 13 channels to retrieve precipitation, for example. An objective method is described for distilling the relevant information content from N real channels into a much smaller number (M of pseudochannels while also regularizing the background (geophysical plus instrument noise component. The pseudochannels are linear combinations of the original N channels obtained via a two-stage principal component analysis of the dependent dataset. Bayesian retrievals based on a single pseudochannel applied to the independent dataset yield striking improvements in overall performance. The differences between the conventional Bayesian retrieval and reduced-dimensional Bayesian retrieval suggest that a major potential problem with conventional multichannel retrievals – whether Bayesian or not – lies in the common but often inappropriate assumption of diagonal error covariance. The dimensional reduction technique described herein avoids this problem by, in effect, recasting the retrieval problem in a coordinate system in which the desired covariance is lower-dimensional, diagonal, and unit magnitude.
Bayesian modeling of flexible cognitive control.
Jiang, Jiefeng; Heller, Katherine; Egner, Tobias
2014-10-01
"Cognitive control" describes endogenous guidance of behavior in situations where routine stimulus-response associations are suboptimal for achieving a desired goal. The computational and neural mechanisms underlying this capacity remain poorly understood. We examine recent advances stemming from the application of a Bayesian learner perspective that provides optimal prediction for control processes. In reviewing the application of Bayesian models to cognitive control, we note that an important limitation in current models is a lack of a plausible mechanism for the flexible adjustment of control over conflict levels changing at varying temporal scales. We then show that flexible cognitive control can be achieved by a Bayesian model with a volatility-driven learning mechanism that modulates dynamically the relative dependence on recent and remote experiences in its prediction of future control demand. We conclude that the emergent Bayesian perspective on computational mechanisms of cognitive control holds considerable promise, especially if future studies can identify neural substrates of the variables encoded by these models, and determine the nature (Bayesian or otherwise) of their neural implementation.
Multi-Fraction Bayesian Sediment Transport Model
Directory of Open Access Journals (Sweden)
Mark L. Schmelter
2015-09-01
Full Text Available A Bayesian approach to sediment transport modeling can provide a strong basis for evaluating and propagating model uncertainty, which can be useful in transport applications. Previous work in developing and applying Bayesian sediment transport models used a single grain size fraction or characterized the transport of mixed-size sediment with a single characteristic grain size. Although this approach is common in sediment transport modeling, it precludes the possibility of capturing processes that cause mixed-size sediments to sort and, thereby, alter the grain size available for transport and the transport rates themselves. This paper extends development of a Bayesian transport model from one to k fractional dimensions. The model uses an existing transport function as its deterministic core and is applied to the dataset used to originally develop the function. The Bayesian multi-fraction model is able to infer the posterior distributions for essential model parameters and replicates predictive distributions of both bulk and fractional transport. Further, the inferred posterior distributions are used to evaluate parametric and other sources of variability in relations representing mixed-size interactions in the original model. Successful OPEN ACCESS J. Mar. Sci. Eng. 2015, 3 1067 development of the model demonstrates that Bayesian methods can be used to provide a robust and rigorous basis for quantifying uncertainty in mixed-size sediment transport. Such a method has heretofore been unavailable and allows for the propagation of uncertainty in sediment transport applications.
Bayesian modeling of flexible cognitive control
Jiang, Jiefeng; Heller, Katherine; Egner, Tobias
2014-01-01
“Cognitive control” describes endogenous guidance of behavior in situations where routine stimulus-response associations are suboptimal for achieving a desired goal. The computational and neural mechanisms underlying this capacity remain poorly understood. We examine recent advances stemming from the application of a Bayesian learner perspective that provides optimal prediction for control processes. In reviewing the application of Bayesian models to cognitive control, we note that an important limitation in current models is a lack of a plausible mechanism for the flexible adjustment of control over conflict levels changing at varying temporal scales. We then show that flexible cognitive control can be achieved by a Bayesian model with a volatility-driven learning mechanism that modulates dynamically the relative dependence on recent and remote experiences in its prediction of future control demand. We conclude that the emergent Bayesian perspective on computational mechanisms of cognitive control holds considerable promise, especially if future studies can identify neural substrates of the variables encoded by these models, and determine the nature (Bayesian or otherwise) of their neural implementation. PMID:24929218
Bayesian Inference in Queueing Networks
Sutton, Charles
2010-01-01
Modern Web services, such as those at Google, Yahoo!, and Amazon, handle billions of requests per day on clusters of thousands of computers. Because these services operate under strict performance requirements, a statistical understanding of their performance is of great practical interest. Such services are modeled by networks of queues, where one queue models each of the individual computers in the system. A key challenge is that the data is incomplete, because recording detailed information about every request to a heavily used system can require unacceptable overhead. In this paper we develop a Bayesian perspective on queueing models in which the arrival and departure times that are not observed are treated as latent variables. Underlying this viewpoint is the observation that a queueing model defines a deterministic transformation between the data and a set of independent variables called the service times. With this viewpoint in hand, we sample from the posterior distribution over missing data and model...
Directory of Open Access Journals (Sweden)
C. Reche
2011-03-01
Full Text Available In many large cities of Europe standard air quality limit values of particulate matter (PM are exceeded. Emissions from road traffic and biomass burning are frequently reported to be the major causes. As a consequence of these exceedances a large number of air quality plans, most of them focusing on traffic emissions reductions, have been implemented in the last decade. In spite of this implementation, a number of cities did not record a decrease of PM levels. Thus, is the efficiency of air quality plans overestimated? Or do we need a more specific metric to evaluate the impact of the above emissions on the levels of urban aerosols?
This study shows the results of the interpretation of the 2009 variability of levels of PM, black carbon (BC, aerosol number concentration (N and a number of gaseous pollutants in seven selected urban areas covering road traffic, urban background, urban-industrial, and urban-shipping environments from southern, central and northern Europe.
The results showed that variations of PM and N levels do not always reflect the variation of the impact of road traffic emissions on urban aerosols. However, BC levels vary proportionally with those of traffic related gaseous pollutants, such as CO, NO_{2} and NO. Due to this high correlation, one may suppose that monitoring the levels of these gaseous pollutants would be enough to extrapolate exposure to traffic-derived BC levels. However, the BC/CO, BC/NO_{2} and BC/NO ratios vary widely among the cities studied, as a function of distance to traffic emissions, vehicle fleet composition and the influence of other emission sources such as biomass burning. Thus, levels of BC should be measured at air quality monitoring sites.
During traffic rush hours, a narrow variation in the N/BC ratio was evidenced, but a wide variation of this ratio was determined for the noon period. Although in central and northern Europe N and BC levels tend to vary
Network-based group variable selection for detecting expression quantitative trait loci (eQTL
Directory of Open Access Journals (Sweden)
Zhang Xuegong
2011-06-01
Full Text Available Abstract Background Analysis of expression quantitative trait loci (eQTL aims to identify the genetic loci associated with the expression level of genes. Penalized regression with a proper penalty is suitable for the high-dimensional biological data. Its performance should be enhanced when we incorporate biological knowledge of gene expression network and linkage disequilibrium (LD structure between loci in high-noise background. Results We propose a network-based group variable selection (NGVS method for QTL detection. Our method simultaneously maps highly correlated expression traits sharing the same biological function to marker sets formed by LD. By grouping markers, complex joint activity of multiple SNPs can be considered and the dimensionality of eQTL problem is reduced dramatically. In order to demonstrate the power and flexibility of our method, we used it to analyze two simulations and a mouse obesity and diabetes dataset. We considered the gene co-expression network, grouped markers into marker sets and treated the additive and dominant effect of each locus as a group: as a consequence, we were able to replicate results previously obtained on the mouse linkage dataset. Furthermore, we observed several possible sex-dependent loci and interactions of multiple SNPs. Conclusions The proposed NGVS method is appropriate for problems with high-dimensional data and high-noise background. On eQTL problem it outperforms the classical Lasso method, which does not consider biological knowledge. Introduction of proper gene expression and loci correlation information makes detecting causal markers more accurate. With reasonable model settings, NGVS can lead to novel biological findings.
Ershadi, Ali
2013-05-01
The influence of uncertainty in land surface temperature, air temperature, and wind speed on the estimation of sensible heat flux is analyzed using a Bayesian inference technique applied to the Surface Energy Balance System (SEBS) model. The Bayesian approach allows for an explicit quantification of the uncertainties in input variables: a source of error generally ignored in surface heat flux estimation. An application using field measurements from the Soil Moisture Experiment 2002 is presented. The spatial variability of selected input meteorological variables in a multitower site is used to formulate the prior estimates for the sampling uncertainties, and the likelihood function is formulated assuming Gaussian errors in the SEBS model. Land surface temperature, air temperature, and wind speed were estimated by sampling their posterior distribution using a Markov chain Monte Carlo algorithm. Results verify that Bayesian-inferred air temperature and wind speed were generally consistent with those observed at the towers, suggesting that local observations of these variables were spatially representative. Uncertainties in the land surface temperature appear to have the strongest effect on the estimated sensible heat flux, with Bayesian-inferred values differing by up to ±5°C from the observed data. These differences suggest that the footprint of the in situ measured land surface temperature is not representative of the larger-scale variability. As such, these measurements should be used with caution in the calculation of surface heat fluxes and highlight the importance of capturing the spatial variability in the land surface temperature: particularly, for remote sensing retrieval algorithms that use this variable for flux estimation.
Kim, Dae-Won; Protopapas, Pavlos; Byun, Yong-Ik; Alcock, Charles; Khardon, Roni; Trichas, Markos
2011-07-01
We present a new quasi-stellar object (QSO) selection algorithm using a Support Vector Machine, a supervised classification method, on a set of extracted time series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars, and microlensing events using 58 known QSOs, 1629 variable stars, and 4288 non-variables in the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ~80% of known QSOs with a 25% false-positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) data set, which consists of 40 million light curves, and found 1620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false-positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxy's Evolution LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.
BAYESIAN DEMONSTRATION TEST METHOD WITH MIXED BETA DISTRIBUTION
Institute of Scientific and Technical Information of China (English)
MING Zhimao; TAO Junyong; CHEN Xun; ZHANG Yunan
2008-01-01
A complex mechatronics system Bayesian plan of demonstration test is studied based on the mixed beta distribution. During product design and improvement various information is appropriately considered by introducing inheritance factor, moreover, the inheritance factor is thought as a random variable, and the Bayesian decision of the qualification test plan is obtained, and the correctness of a Bayesian model presented is verified. The results show that the quantity of the test is too conservative according to classical methods under small binomial samples. Although traditional Bayesian analysis can consider test information of related or similar products, it ignores differences between such products. The method has solved the above problem, furthermore, considering the requirement in many practical projects, the differences among this method, the classical method and Bayesian with beta distribution are compared according to the plan of reliability acceptance test.
A Bayesian Concept Learning Approach to Crowdsourcing
DEFF Research Database (Denmark)
Viappiani, Paolo Renato; Zilles, Sandra; Hamilton, Howard J.;
2011-01-01
techniques, inference methods, and query selection strategies to assist a user charged with choosing a configuration that satisfies some (partially known) concept. Our model is able to simultaneously learn the concept definition and the types of the experts. We evaluate our model with simulations, showing......We develop a Bayesian approach to concept learning for crowdsourcing applications. A probabilistic belief over possible concept definitions is maintained and updated according to (noisy) observations from experts, whose behaviors are modeled using discrete types. We propose recommendation...... that our Bayesian strategies are effective even in large concept spaces with many uninformative experts....
Bayesian Methods and Universal Darwinism
Campbell, John
2009-12-01
Bayesian methods since the time of Laplace have been understood by their practitioners as closely aligned to the scientific method. Indeed a recent Champion of Bayesian methods, E. T. Jaynes, titled his textbook on the subject Probability Theory: the Logic of Science. Many philosophers of science including Karl Popper and Donald Campbell have interpreted the evolution of Science as a Darwinian process consisting of a `copy with selective retention' algorithm abstracted from Darwin's theory of Natural Selection. Arguments are presented for an isomorphism between Bayesian Methods and Darwinian processes. Universal Darwinism, as the term has been developed by Richard Dawkins, Daniel Dennett and Susan Blackmore, is the collection of scientific theories which explain the creation and evolution of their subject matter as due to the Operation of Darwinian processes. These subject matters span the fields of atomic physics, chemistry, biology and the social sciences. The principle of Maximum Entropy states that Systems will evolve to states of highest entropy subject to the constraints of scientific law. This principle may be inverted to provide illumination as to the nature of scientific law. Our best cosmological theories suggest the universe contained much less complexity during the period shortly after the Big Bang than it does at present. The scientific subject matter of atomic physics, chemistry, biology and the social sciences has been created since that time. An explanation is proposed for the existence of this subject matter as due to the evolution of constraints in the form of adaptations imposed on Maximum Entropy. It is argued these adaptations were discovered and instantiated through the Operations of a succession of Darwinian processes.
Bayesian microsaccade detection
Mihali, Andra; van Opheusden, Bas; Ma, Wei Ji
2017-01-01
Microsaccades are high-velocity fixational eye movements, with special roles in perception and cognition. The default microsaccade detection method is to determine when the smoothed eye velocity exceeds a threshold. We have developed a new method, Bayesian microsaccade detection (BMD), which performs inference based on a simple statistical model of eye positions. In this model, a hidden state variable changes between drift and microsaccade states at random times. The eye position is a biased random walk with different velocity distributions for each state. BMD generates samples from the posterior probability distribution over the eye state time series given the eye position time series. Applied to simulated data, BMD recovers the “true” microsaccades with fewer errors than alternative algorithms, especially at high noise. Applied to EyeLink eye tracker data, BMD detects almost all the microsaccades detected by the default method, but also apparent microsaccades embedded in high noise—although these can also be interpreted as false positives. Next we apply the algorithms to data collected with a Dual Purkinje Image eye tracker, whose higher precision justifies defining the inferred microsaccades as ground truth. When we add artificial measurement noise, the inferences of all algorithms degrade; however, at noise levels comparable to EyeLink data, BMD recovers the “true” microsaccades with 54% fewer errors than the default algorithm. Though unsuitable for online detection, BMD has other advantages: It returns probabilities rather than binary judgments, and it can be straightforwardly adapted as the generative model is refined. We make our algorithm available as a software package. PMID:28114483
Subgroup finding via Bayesian additive regression trees.
Sivaganesan, Siva; Müller, Peter; Huang, Bin
2017-03-09
We provide a Bayesian decision theoretic approach to finding subgroups that have elevated treatment effects. Our approach separates the modeling of the response variable from the task of subgroup finding and allows a flexible modeling of the response variable irrespective of potential subgroups of interest. We use Bayesian additive regression trees to model the response variable and use a utility function defined in terms of a candidate subgroup and the predicted response for that subgroup. Subgroups are identified by maximizing the expected utility where the expectation is taken with respect to the posterior predictive distribution of the response, and the maximization is carried out over an a priori specified set of candidate subgroups. Our approach allows subgroups based on both quantitative and categorical covariates. We illustrate the approach using simulated data set study and a real data set. Copyright © 2017 John Wiley & Sons, Ltd.
Bieniek, A.; Graba, M.; Prażnowski, K.
2016-09-01
The paper presents results of research on the effect of frequency control signal on the course selected operating parameters of the continuously variable transmission CVT. The study used a gear Fuji Hyper M6 with electro-hydraulic control system and proprietary software for control and data acquisition developed in LabView environment.
Strauser, David R.; Lustig, Daniel C.; Uruk, Aye Ciftci
2006-01-01
In the current study, the authors examined whether the influence of trauma symptomatology on select career variables differs based on disability status. A total of 131 college students and 81 individuals with disabilities completed the "Career Thoughts Inventory," "My Vocational Situation," "Developmental Work Personality…
Variational Bayesian Inference of Line Spectra
DEFF Research Database (Denmark)
Badiu, Mihai Alin; Hansen, Thomas Lundgaard; Fleury, Bernard Henri
2016-01-01
In this paper, we address the fundamental problem of line spectral estimation in a Bayesian framework. We target model order and parameter estimation via variational inference in a probabilistic model in which the frequencies are continuous-valued, i.e., not restricted to a grid; and the coeffici......In this paper, we address the fundamental problem of line spectral estimation in a Bayesian framework. We target model order and parameter estimation via variational inference in a probabilistic model in which the frequencies are continuous-valued, i.e., not restricted to a grid......; and the coefficients are governed by a Bernoulli-Gaussian prior model turning model order selection into binary sequence detection. Unlike earlier works which retain only point estimates of the frequencies, we undertake a more complete Bayesian treatment by estimating the posterior probability density functions (pdfs...
Directory of Open Access Journals (Sweden)
Renata Bujak
2016-07-01
Full Text Available Non-targeted metabolomics constitutes a part of systems biology and aims to determine many metabolites in complex biological samples. Datasets obtained in non-targeted metabolomics studies are multivariate and high-dimensional due to the sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA without and with multiple testing correction as well as least absolute shrinkage and selection operator (LASSO were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction, selected 46 and 218 variables based on VIP criteria using Pareto and UV scaling, respectively. In the case of the PH study, 217 and 320 variables were selected based on VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built with multiple testing correction, selected 4 and 19 variables as statistically significant in terms of Pareto and UV scaling, respectively. For PH study, 14 and 18 variables were selected based on VIP criteria in terms of Pareto and UV scaling, respectively. Additionally, the concept and fundaments of the least absolute shrinkage and selection operator (LASSO with bootstrap procedure evaluating reproducibility of results, was demonstrated. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3% and 100%. However, apart from the popularity of PLS-DA and OPLS-DA methods in metabolomics, it should be highlighted that they do not control type I or type II error, but only arbitrarily establish a cut-off value for PLS-DA loadings
Loturco, Irineu; Artioli, Guilherme Giannini; Kobal, Ronaldo; Gil, Saulo; Franchini, Emerson
2014-07-01
This study investigated the relationship between punching acceleration and selected strength and power variables in 19 professional karate athletes from the Brazilian National Team (9 men and 10 women; age, 23 ± 3 years; height, 1.71 ± 0.09 m; and body mass [BM], 67.34 ± 13.44 kg). Punching acceleration was assessed under 4 different conditions in a randomized order: (a) fixed distance aiming to attain maximum speed (FS), (b) fixed distance aiming to attain maximum impact (FI), (c) self-selected distance aiming to attain maximum speed, and (d) self-selected distance aiming to attain maximum impact. The selected strength and power variables were as follows: maximal dynamic strength in bench press and squat-machine, squat and countermovement jump height, mean propulsive power in bench throw and jump squat, and mean propulsive velocity in jump squat with 40% of BM. Upper- and lower-body power and maximal dynamic strength variables were positively correlated to punch acceleration in all conditions. Multiple regression analysis also revealed predictive variables: relative mean propulsive power in squat jump (W·kg-1), and maximal dynamic strength 1 repetition maximum in both bench press and squat-machine exercises. An impact-oriented instruction and a self-selected distance to start the movement seem to be crucial to reach the highest acceleration during punching execution. This investigation, while demonstrating strong correlations between punching acceleration and strength-power variables, also provides important information for coaches, especially for designing better training strategies to improve punching speed.
Bayesian experimental design for the active nitridation of graphite by atomic nitrogen
Terejanu, Gabriel; Miki, Kenji
2011-01-01
The problem of optimal data collection to efficiently learn the model parameters of a graphite nitridation experiment is studied in the context of Bayesian analysis using both synthetic and real experimental data. The paper emphasizes that the optimal design can be obtained as a result of an information theoretic sensitivity analysis. Thus, the preferred design is where the statistical dependence between the model parameters and observables is the highest possible. In this paper, the statistical dependence between random variables is quantified by mutual information and estimated using a k-nearest neighbor based approximation. It is shown, that by monitoring the inference process via measures such as entropy or Kullback-Leibler divergence, one can determine when to stop the data collection process. The methodology is applied to select the most informative designs on both a simulated data set and on an experimental data set, previously published in the literature. It is also shown that the sequential Bayesian ...
DEFF Research Database (Denmark)
Ehsani, Alireza; Sørensen, Peter; Pomp, Daniel;
2012-01-01
Background To understand the genetic architecture of complex traits and bridge the genotype-phenotype gap, it is useful to study intermediate -omics data, e.g. the transcriptome. The present study introduces a method for simultaneous quantification of the contributions from single nucleotide...... polymorphisms (SNPs) and transcript abundances in explaining phenotypic variance, using Bayesian whole-omics models. Bayesian mixed models and variable selection models were used and, based on parameter samples from the model posterior distributions, explained variances were further partitioned at the level......-modal distribution of genomic values collapses, when gene expressions are added to the model Conclusions With increased availability of various -omics data, integrative approaches are promising tools for understanding the genetic architecture of complex traits. Partitioning of explained variances at the chromosome...
Bayesian synthetic evaluation of multistage reliability growth with instant and delayed fix modes
Institute of Scientific and Technical Information of China (English)
无
2008-01-01
In the multistage reliability growth tests with instant and delayed fix modes, the failure data can be assumed to follow Weibull processes with different parameters at different stages. For the Weibull process within a stage, by the proper selection of prior distribution form and the parameters, a concise posterior distribution form is obtained, thus simplifying the Bayesian analysis. In the multistage tests, the improvement factor is used to convert the posterior of one stage to the prior of the subsequent stage. The conversion criterion is carefully analyzed to determine the distribution parameters of the subsequent stage's variable reasonably. Based on the mentioned results, a new synthetic Bayesian evaluation program and algorithm framework is put forward to evaluate the multistage reliability growth tests with instant and delayed fix modes. The example shows the effectiveness and flexibility of this method.
Graham, Carroll M.; Nafukho, Fredrick Muyia
2007-01-01
Purpose: The purpose of this study is to determine the relationship between four independent variables educational level, longevity, type of enterprise, and gender and the dependent variable culture, as a dimension that explains organizational learning readiness in seven small-size business enterprises. Design/methodology/approach: An exploratory…
Applied Music Teaching Behavior as a Function of Selected Personality Variables.
Schmidt, Charles P.
1989-01-01
Investigates the relationships among applied music teaching behaviors and personality variables as measured by the Myers-Briggs Type Indicator (MBTI). Suggests that personality variables may be important factors underlying four applied music teaching behaviors: approvals, rate of reinforcement, teacher model/performance, and pace. (LS)
Godsey, Brian
2013-01-01
Inferring gene regulatory networks from expression data is difficult, but it is common and often useful. Most network problems are under-determined--there are more parameters than data points--and therefore data or parameter set reduction is often necessary. Correlation between variables in the model also contributes to confound network coefficient inference. In this paper, we present an algorithm that uses integrated, probabilistic clustering to ease the problems of under-determination and correlated variables within a fully Bayesian framework. Specifically, ours is a dynamic Bayesian network with integrated Gaussian mixture clustering, which we fit using variational Bayesian methods. We show, using public, simulated time-course data sets from the DREAM4 Challenge, that our algorithm outperforms non-clustering methods in many cases (7 out of 25) with fewer samples, rarely underperforming (1 out of 25), and often selects a non-clustering model if it better describes the data. Source code (GNU Octave) for BAyesian Clustering Over Networks (BACON) and sample data are available at: http://code.google.com/p/bacon-for-genetic-networks.
Directory of Open Access Journals (Sweden)
Giuseppe Palermo
2009-05-01
Full Text Available Giuseppe Palermo1, Paolo Piraino2, Hans-Dieter Zucht31Digilab BioVision GmbH, Hannover, Germany; 2Dr Paolo Piraino Statistical Consulting, Rende (CS, Italy; 3Proteome Sciences R&D GmbH and C. KG, Frankfurt am Main, GermanyAbstract: Multivariate partial least square (PLS regression allows the modeling of complex biological events, by considering different factors at the same time. It is unaffected by data collinearity, representing a valuable method for modeling high-dimensional biological data (as derived from genomics, proteomics and peptidomics. In presence of multiple responses, it is of particular interest how to appropriately “dissect” the model, to reveal the importance of single attributes with regard to individual responses (for example, variable selection. In this paper, performances of multivariate PLS regression coefficients, in selecting relevant predictors for different responses in omics-type of data, were investigated by means of a receiver operating characteristic (ROC analysis. For this purpose, simulated data, mimicking the covariance structures of microarray and liquid chromatography mass spectrometric data, were used to generate matrices of predictors and responses. The relevant predictors were set a priori. The influences of noise, the source of data with different covariance structure and the size of relevant predictors were investigated. Results demonstrate the applicability of PLS regression coeffi cients in selecting variables for each response of a multivariate PLS, in omics-type of data. Comparisons with other feature selection methods, such as variable importance in the projection scores, principal component regression, and least absolute shrinkage and selection operator regression were also provided.Keywords: partial least square regression, regression coefficients, variable selection, biomarker discovery, omics-data
The role of protozoa-driven selection in shaping human genetic variability.
Pozzoli, Uberto; Fumagalli, Matteo; Cagliani, Rachele; Comi, Giacomo P; Bresolin, Nereo; Clerici, Mario; Sironi, Manuela
2010-03-01
Protozoa exert a strong selective pressure in humans. The selection signatures left by these pathogens can be exploited to identify genetic modulators of infection susceptibility. We show that protozoa diversity in different geographic locations is a good measure of protozoa-driven selective pressure; protozoa diversity captured selection signatures at known malaria resistance loci and identified several selected single nucleotide polymorphisms in immune and hemolytic anemia genes. A genome-wide search enabled us to identify 5180 variants mapping to 1145 genes that are subjected to protozoa-driven selective pressure. We provide a genome-wide estimate of protozoa-driven selective pressure and identify candidate susceptibility genes for protozoa-borne diseases.
Oracle Efficient Variable Selection in Random and Fixed Effects Panel Data Models
DEFF Research Database (Denmark)
Kock, Anders Bredahl
This paper generalizes the results for the Bridge estimator of Huang et al. (2008) to linear random and fixed effects panel data models which are allowed to grow in both dimensions. In particular we show that the Bridge estimator is oracle efficient. It can correctly distinguish between relevant...... and irrelevant variables and the asymptotic distribution of the estimators of the coefficients of the relevant variables is the same as if only these had been included in the model, i.e. as if an oracle had revealed the true model prior to estimation. In the case of more explanatory variables than observations...
Learning Bayesian Networks from Data by Particle Swarm Optimization
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
Learning Bayesian network is an NP-hard problem. When the number of variables is large, the process of searching optimal network structure could be very time consuming and tends to return a structure which is local optimal. The particle swarm optimization (PSO) was introduced to the problem of learning Bayesian networks and a novel structure learning algorithm using PSO was proposed. To search in directed acyclic graphs spaces efficiently, a discrete PSO algorithm especially for structure learning was proposed based on the characteristics of Bayesian networks. The results of experiments show that our PSO based algorithm is fast for convergence and can obtain better structures compared with genetic algorithm based algorithms.
Directory of Open Access Journals (Sweden)
Legarra Andrés
2006-09-01
Full Text Available Abstract Breeding sheep populations for scrapie resistance could result in a loss of genetic variability. In this study, the effect on genetic variability of selection for increasing the ARR allele frequency was estimated in the Latxa breed. Two sources of information were used, pedigree and genetic polymorphisms (fifteen microsatellites. The results based on the genealogical information were conditioned by a low pedigree completeness level that revealed the interest of also using the information provided by the molecular markers. The overall results suggest that no great negative effect on genetic variability can be expected in the short time in the population analysed by selection of only ARR/ARR males. The estimated average relationship of ARR/ARR males with reproductive females was similar to that of all available males whatever its genotype: 0.010 vs. 0.012 for a genealogical relationship and 0.257 vs. 0.296 for molecular coancestry, respectively. However, selection of only ARR/ARR males implied important losses in founder animals (87 percent and low frequency alleles (30 percent in the ram population. The evaluation of mild selection strategies against scrapie susceptibility based on the use of some ARR heterozygous males was difficult because the genetic relationships estimated among animals differed when pedigree or molecular information was used, and the use of more molecular markers should be evaluated.
Variable selection under multiple imputation using the bootstrap in a prognostic study
Heymans, M.W.; Buuren, S. van; Knol, D.L.; Mechelen, W. van; Vet, H.C.W. de
2007-01-01
Background. Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable s
Bayesian least squares deconvolution
Asensio Ramos, A.; Petit, P.
2015-11-01
Aims: We develop a fully Bayesian least squares deconvolution (LSD) that can be applied to the reliable detection of magnetic signals in noise-limited stellar spectropolarimetric observations using multiline techniques. Methods: We consider LSD under the Bayesian framework and we introduce a flexible Gaussian process (GP) prior for the LSD profile. This prior allows the result to automatically adapt to the presence of signal. We exploit several linear algebra identities to accelerate the calculations. The final algorithm can deal with thousands of spectral lines in a few seconds. Results: We demonstrate the reliability of the method with synthetic experiments and we apply it to real spectropolarimetric observations of magnetic stars. We are able to recover the magnetic signals using a small number of spectral lines, together with the uncertainty at each velocity bin. This allows the user to consider if the detected signal is reliable. The code to compute the Bayesian LSD profile is freely available.
Bayesian least squares deconvolution
Ramos, A Asensio
2015-01-01
Aims. To develop a fully Bayesian least squares deconvolution (LSD) that can be applied to the reliable detection of magnetic signals in noise-limited stellar spectropolarimetric observations using multiline techniques. Methods. We consider LSD under the Bayesian framework and we introduce a flexible Gaussian Process (GP) prior for the LSD profile. This prior allows the result to automatically adapt to the presence of signal. We exploit several linear algebra identities to accelerate the calculations. The final algorithm can deal with thousands of spectral lines in a few seconds. Results. We demonstrate the reliability of the method with synthetic experiments and we apply it to real spectropolarimetric observations of magnetic stars. We are able to recover the magnetic signals using a small number of spectral lines, together with the uncertainty at each velocity bin. This allows the user to consider if the detected signal is reliable. The code to compute the Bayesian LSD profile is freely available.
Bayesian Exploratory Factor Analysis
DEFF Research Database (Denmark)
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.;
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corr......This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor......, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates...
Center, Julian L.; Knuth, Kevin H.
2011-03-01
Visual odometry refers to tracking the motion of a body using an onboard vision system. Practical visual odometry systems combine the complementary accuracy characteristics of vision and inertial measurement units. The Mars Exploration Rovers, Spirit and Opportunity, used this type of visual odometry. The visual odometry algorithms in Spirit and Opportunity were based on Bayesian methods, but a number of simplifying approximations were needed to deal with onboard computer limitations. Furthermore, the allowable motion of the rover had to be severely limited so that computations could keep up. Recent advances in computer technology make it feasible to implement a fully Bayesian approach to visual odometry. This approach combines dense stereo vision, dense optical flow, and inertial measurements. As with all true Bayesian methods, it also determines error bars for all estimates. This approach also offers the possibility of using Micro-Electro Mechanical Systems (MEMS) inertial components, which are more economical, weigh less, and consume less power than conventional inertial components.
Adamczyk, Michał
2016-01-01
Synchronic variability in the area of phonetics, phonology, vocabulary, morphology and syntax is a natural feature of any language, including English. The existence of competing variants is in itself a fascinating phenomenon, but it is also a prerequisite for diachronic changes. This volume is a collection of studies which investigate variability from a contemporary and historical perspective, in both native and non-native varieties of English. The topics include Middle English spelling varia...
Pan, Indranil; Ghosh, Soumyajit; Gupta, Amitava; 10.1109/PACC.2011.5978958
2012-01-01
Networked Control Systems (NCSs) are often associated with problems like random data losses which might lead to system instability. This paper proposes a method based on the use of variable controller gains to achieve maximum parametric robustness of the plant controlled over a network. Stability using variable controller gains under data loss conditions is analyzed using a suitable Linear Matrix Inequality (LMI) formulation. Also, a Particle Swarm Optimization (PSO) based technique is used to maximize parametric robustness of the plant.
Probabilistic Inferences in Bayesian Networks
Ding, Jianguo
2010-01-01
This chapter summarizes the popular inferences methods in Bayesian networks. The results demonstrates that the evidence can propagated across the Bayesian networks by any links, whatever it is forward or backward or intercausal style. The belief updating of Bayesian networks can be obtained by various available inference techniques. Theoretically, exact inferences in Bayesian networks is feasible and manageable. However, the computing and inference is NP-hard. That means, in applications, in ...
Extended Bayesian Information Criteria for Gaussian Graphical Models
Foygel, Rina
2010-01-01
Gaussian graphical models with sparsity in the inverse covariance matrix are of significant interest in many modern applications. For the problem of recovering the graphical structure, information criteria provide useful optimization objectives for algorithms searching through sets of graphs or for selection of tuning parameters of other methods such as the graphical lasso, which is a likelihood penalization technique. In this paper we establish the consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow. Compared to earlier work on the regression case, our treatment allows for growth in the number of non-zero parameters in the true model, which is necessary in order to cover connected graphs. We demonstrate the performance of this criterion on simulated data when used in conjunction with the graphical lasso, and verify that the criterion indeed performs better than either cross-validation or the ordi...
A Bayesian subgroup analysis using collections of ANOVA models.
Liu, Jinzhong; Sivaganesan, Siva; Laud, Purushottam W; Müller, Peter
2017-03-20
We develop a Bayesian approach to subgroup analysis using ANOVA models with multiple covariates, extending an earlier work. We assume a two-arm clinical trial with normally distributed response variable. We also assume that the covariates for subgroup finding are categorical and are a priori specified, and parsimonious easy-to-interpret subgroups are preferable. We represent the subgroups of interest by a collection of models and use a model selection approach to finding subgroups with heterogeneous effects. We develop suitable priors for the model space and use an objective Bayesian approach that yields multiplicity adjusted posterior probabilities for the models. We use a structured algorithm based on the posterior probabilities of the models to determine which subgroup effects to report. Frequentist operating characteristics of the approach are evaluated using simulation. While our approach is applicable in more general cases, we mainly focus on the 2 × 2 case of two covariates each at two levels for ease of presentation. The approach is illustrated using a real data example.
Bayesian Subset Modeling for High-Dimensional Generalized Linear Models
Liang, Faming
2013-06-01
This article presents a new prior setting for high-dimensional generalized linear models, which leads to a Bayesian subset regression (BSR) with the maximum a posteriori model approximately equivalent to the minimum extended Bayesian information criterion model. The consistency of the resulting posterior is established under mild conditions. Further, a variable screening procedure is proposed based on the marginal inclusion probability, which shares the same properties of sure screening and consistency with the existing sure independence screening (SIS) and iterative sure independence screening (ISIS) procedures. However, since the proposed procedure makes use of joint information from all predictors, it generally outperforms SIS and ISIS in real applications. This article also makes extensive comparisons of BSR with the popular penalized likelihood methods, including Lasso, elastic net, SIS, and ISIS. The numerical results indicate that BSR can generally outperform the penalized likelihood methods. The models selected by BSR tend to be sparser and, more importantly, of higher prediction ability. In addition, the performance of the penalized likelihood methods tends to deteriorate as the number of predictors increases, while this is not significant for BSR. Supplementary materials for this article are available online. © 2013 American Statistical Association.
Bayesian multiple target tracking
Streit, Roy L
2013-01-01
This second edition has undergone substantial revision from the 1999 first edition, recognizing that a lot has changed in the multiple target tracking field. One of the most dramatic changes is in the widespread use of particle filters to implement nonlinear, non-Gaussian Bayesian trackers. This book views multiple target tracking as a Bayesian inference problem. Within this framework it develops the theory of single target tracking, multiple target tracking, and likelihood ratio detection and tracking. In addition to providing a detailed description of a basic particle filter that implements
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Kim, Won Hwa; Kim, Hyunwoo J; Adluru, Nagesh; Singh, Vikas
2016-06-01
A major goal of imaging studies such as the (ongoing) Human Connectome Project (HCP) is to characterize the structural network map of the human brain and identify its associations with covariates such as genotype, risk factors, and so on that correspond to an individual. But the set of image derived measures and the set of covariates are both large, so we must first estimate a 'parsimonious' set of relations between the measurements. For instance, a Gaussian graphical model will show conditional independences between the random variables, which can then be used to setup specific downstream analyses. But most such data involve a large list of 'latent' variables that remain unobserved, yet affect the 'observed' variables sustantially. Accounting for such latent variables is not directly addressed by standard precision matrix estimation, and is tackled via highly specialized optimization methods. This paper offers a unique harmonic analysis view of this problem. By casting the estimation of the precision matrix in terms of a composition of low-frequency latent variables and high-frequency sparse terms, we show how the problem can be formulated using a new wavelet-type expansion in non-Euclidean spaces. Our formulation poses the estimation problem in the frequency space and shows how it can be solved by a simple sub-gradient scheme. We provide a set of scientific results on ~500 scans from the recently released HCP data where our algorithm recovers highly interpretable and sparse conditional dependencies between brain connectivity pathways and well-known covariates.
Variable Selection for Modeling the Absolute Magnitude at Maximum of Type Ia Supernovae
Uemura, Makoto; Kawabata, S; Ikeda, Shiro; Maeda, Keiichi
2015-01-01
We discuss what is an appropriate set of explanatory variables in order to predict the absolute magnitude at the maximum of Type Ia supernovae. In order to have a good prediction, the error for future data, which is called the "generalization error," should be small. We use cross-validation in order to control the generalization error and LASSO-type estimator in order to choose the set of variables. This approach can be used even in the case that the number of samples is smaller than the number of candidate variables. We studied the Berkeley supernova database with our approach. Candidates of the explanatory variables include normalized spectral data, variables about lines, and previously proposed flux-ratios, as well as the color and light-curve widths. As a result, we confirmed the past understanding about Type Ia supernova: i) The absolute magnitude at maximum depends on the color and light-curve width. ii) The light-curve width depends on the strength of Si II. Recent studies have suggested to add more va...
Graham, Matthew J; Drake, Andrew J; Mahabal, Ashish A; Chang, Melissa; Stern, Daniel; Donalek, Ciro; Glikman, Eilat
2014-01-01
We compare quasar selection techniques based on their optical variability using data from the Catalina Real-time Transient Survey (CRTS). We introduce a new technique based on Slepian wavelet variance (SWV) that shows comparable or better performance to structure functions and damped random walk models but with fewer assumptions. Combining these methods with WISE mid-IR colors produces a highly efficient quasar selection technique which we have validated spectroscopically. The SWV technique also identifies characteristic timescales in a time series and we find a characteristic rest frame timescale of ~54 days, confirmed in the light curves of ~18000 quasars from CRTS, SDSS and MACHO data, and anticorrelated with absolute magnitude. This indicates a transition between a damped random walk and $P(f) \\propto f^{-1/3}$ behaviours and is the first strong indication that a damped random walk model may be too simplistic to describe optical quasar variability.
A stochastic analysis of terrain evaluation variables for path selection. [roving vehicle navigation
Donohue, J. G.; Shen, C. N.
1978-01-01
A stochastic analysis was performed on the variables associated with the characteristics of the terrain encountered by a roving system with an autonomous navigation system. A laser rangefinder is employed to detect terrain features at ranges up to 75 m. Analytic expressions and a numerical scheme were developed to calculate the variance of data on these four variables: (1) body clearance, (2) in-path slope, (3) tilt slope, and (4) wheel deviation. The variance is due to noise in the range data. It was found that the standard deviation of these terrain variables is large enough to warrant the use of a safety margin to aid the roving vehicle in avoiding high risk areas.
Murphy, Thomas Brendan; Dean, Nema; Raftery, Adrian E.
2010-01-01
Food authenticity studies are concerned with determining if food samples have been correctly labeled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give ...
Multiview Bayesian Correlated Component Analysis
DEFF Research Database (Denmark)
Kamronn, Simon Due; Poulsen, Andreas Trier; Hansen, Lars Kai
2015-01-01
Correlated component analysis as proposed by Dmochowski, Sajda, Dias, and Parra (2012) is a tool for investigating brain process similarity in the responses to multiple views of a given stimulus. Correlated components are identified under the assumption that the involved spatial networks are iden......Correlated component analysis as proposed by Dmochowski, Sajda, Dias, and Parra (2012) is a tool for investigating brain process similarity in the responses to multiple views of a given stimulus. Correlated components are identified under the assumption that the involved spatial networks...... we denote Bayesian correlated component analysis, evaluates favorably against three relevant algorithms in simulated data. A well-established benchmark EEG data set is used to further validate the new model and infer the variability of spatial representations across multiple subjects....
Bayesian Particle Tracking of Traffic Flows
2014-01-01
We develop a Bayesian particle filter for tracking traffic flows that is capable of capturing non-linearities and discontinuities present in flow dynamics. Our model includes a hidden state variable that captures sudden regime shifts between traffic free flow, breakdown and recovery. We develop an efficient particle learning algorithm for real time on-line inference of states and parameters. This requires a two step approach, first, resampling the current particles, with a mixture predictive ...
Control of Complex Systems Using Bayesian Networks and Genetic Algorithm
Marwala, Tshilidzi
2007-01-01
A method based on Bayesian neural networks and genetic algorithm is proposed to control the fermentation process. The relationship between input and output variables is modelled using Bayesian neural network that is trained using hybrid Monte Carlo method. A feedback loop based on genetic algorithm is used to change input variables so that the output variables are as close to the desired target as possible without the loss of confidence level on the prediction that the neural network gives. The proposed procedure is found to reduce the distance between the desired target and measured outputs significantly.
Bayesian methods for hackers probabilistic programming and Bayesian inference
Davidson-Pilon, Cameron
2016-01-01
Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power. Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention. Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples a...
Bayesian network learning for natural hazard assessments
Vogel, Kristin
2016-04-01
Even though quite different in occurrence and consequences, from a modelling perspective many natural hazards share similar properties and challenges. Their complex nature as well as lacking knowledge about their driving forces and potential effects make their analysis demanding. On top of the uncertainty about the modelling framework, inaccurate or incomplete event observations and the intrinsic randomness of the natural phenomenon add up to different interacting layers of uncertainty, which require a careful handling. Thus, for reliable natural hazard assessments it is crucial not only to capture and quantify involved uncertainties, but also to express and communicate uncertainties in an intuitive way. Decision-makers, who often find it difficult to deal with uncertainties, might otherwise return to familiar (mostly deterministic) proceedings. In the scope of the DFG research training group „NatRiskChange" we apply the probabilistic framework of Bayesian networks for diverse natural hazard and vulnerability studies. The great potential of Bayesian networks was already shown in previous natural hazard assessments. Treating each model component as random variable, Bayesian networks aim at capturing the joint distribution of all considered variables. Hence, each conditional distribution of interest (e.g. the effect of precautionary measures on damage reduction) can be inferred. The (in-)dependencies between the considered variables can be learned purely data driven or be given by experts. Even a combination of both is possible. By translating the (in-)dependences into a graph structure, Bayesian networks provide direct insights into the workings of the system and allow to learn about the underlying processes. Besides numerous studies on the topic, learning Bayesian networks from real-world data remains challenging. In previous studies, e.g. on earthquake induced ground motion and flood damage assessments, we tackled the problems arising with continuous variables
Pereira, Hebert Vinicius; Amador, Victória Silva; Sena, Marcelo Martins; Augusti, Rodinei; Piccin, Evandro
2016-10-12
Paper spray mass spectrometry (PS-MS) combined with partial least squares discriminant analysis (PLS-DA) was applied for the first time in a forensic context to a fast and effective differentiation of beers. Eight different brands of American standard lager beers produced by four different breweries (141 samples from 55 batches) were studied with the aim at performing a differentiation according to their market prices. The three leader brands in the Brazilian beer market, which have been subject to fraud, were modeled as the higher-price class, while the five brands most used for counterfeiting were modeled as the lower-price class. Parameters affecting the paper spray ionization were examined and optimized. The best MS signal stability and intensity was obtained while using the positive ion mode, with PS(+) mass spectra characterized by intense pairs of signals corresponding to sodium and potassium adducts of malto-oligosaccharides. Discrimination was not apparent neither by using visual inspection nor principal component analysis (PCA). However, supervised classification models provided high rates of sensitivity and specificity. A PLS-DA model using full scan mass spectra were improved by variable selection with ordered predictors selection (OPS), providing 100% of reliability rate and reducing the number of variables from 1701 to 60. This model was interpreted by detecting fifteen variables as the most significant VIP (variable importance in projection) scores, which were therefore considered diagnostic ions for this type of beer counterfeit.
Directory of Open Access Journals (Sweden)
S. Koley
2017-01-01
Full Text Available : The purpose of this study was of two-fold: first, to estimate the back strength of Indian inter-university male field hockey players and, second, to search the correlations of it with selected anthropometric variables and performance tests. To serve this purpose, a total of nine anthropometric variables, such as height, weight, body mass index, percent body fat, knee height, length of femur, femur biepicondylar diameter, skeletal mass and back strength, and two performance tests, such as sit and reach test and Slalom sprint and dribble test were measured on purposely selected 120 Indian inter-university male hockey players aged 18–25 years collected from the inter-university competition held in Guru Nanak Dev University, Amritsar, India during March, 2014. An adequate number of controls (n=119 were also taken from the same place for comparison. The results showed that the hockey players had the higher mean values in all the variables, except percent body fat and slalom sprint and dribble test than their control counterparts, showing statistically significant differences (p ≤ 0.003 – 0.001 between them. No significant correlations of back strength were found with any of the variables in Indian inter-university male field hockey players. In conclusion, it may be stated that back strength may not be used as one of the indicating factors for the performance of the field hockey players.
Pérez‐Rodríguez, Paulino; Veturi, Yogasudha; Simianer, Henner; de los Campos, Gustavo
2015-01-01
Summary Genome‐wide association studies (GWAS) have detected large numbers of variants associated with complex human traits and diseases. However, the proportion of variance explained by GWAS‐significant single nucleotide polymorphisms has been usually small. This brought interest in the use of whole‐genome regression (WGR) methods. However, there has been limited research on the factors that affect prediction accuracy (PA) of WGRs when applied to human data of distantly related individuals. Here, we examine, using real human genotypes and simulated phenotypes, how trait complexity, marker‐quantitative trait loci (QTL) linkage disequilibrium (LD), and the model used affect the performance of WGRs. Our results indicated that the estimated rate of missing heritability is dependent on the extent of marker‐QTL LD. However, this parameter was not greatly affected by trait complexity. Regarding PA our results indicated that: (a) under perfect marker‐QTL LD WGR can achieve moderately high prediction accuracy, and with simple genetic architectures variable selection methods outperform shrinkage procedures and (b) under imperfect marker‐QTL LD, variable selection methods can achieved reasonably good PA with simple or moderately complex genetic architectures; however, the PA of these methods deteriorated as trait complexity increases and with highly complex traits variable selection and shrinkage methods both performed poorly. This was confirmed with an analysis of human height. PMID:25600682
DEFF Research Database (Denmark)
Antoniou, Constantinos; Harrison, Glenn W.; Lau, Morten I.;
2015-01-01
A large literature suggests that many individuals do not apply Bayes’ Rule when making decisions that depend on them correctly pooling prior information and sample data. We replicate and extend a classic experimental study of Bayesian updating from psychology, employing the methods of experimental...
Henderson, N. B.; And Others
Perinatal variables were used to predict 7-year outcome for 538 children, 32% Negro and 68% white. Mother's age, birthplace, education, occupation, marital status, neuropsychiatric status, family income, number supported, birth weight, one- and five-minute Apgar scores were regressed on 7-year Verbal, Performance and Full Scale IQ, Bender, Wide…
Cortical Response Variability as a Developmental Index of Selective Auditory Attention
Strait, Dana L.; Slater, Jessica; Abecassis, Victor; Kraus, Nina
2014-01-01
Attention induces synchronicity in neuronal firing for the encoding of a given stimulus at the exclusion of others. Recently, we reported decreased variability in scalp-recorded cortical evoked potentials to attended compared with ignored speech in adults. Here we aimed to determine the developmental time course for this neural index of auditory…
Directory of Open Access Journals (Sweden)
JORGE E. CÓRDOBA MAQUILÓN
2011-01-01
Full Text Available Aplicando encuestas de preferencias reveladas y cuestionarios psicológicos se realizó un estudio detectando variables psicológicas claves de la conducta que intervienen en la elección de un modo de transporte en un grupo de habitantes del Área Metropolitana del Valle de Aburrá. Se tuvo en cuenta la teoría de la utilidad aleatoriapara los modelos de elección discreta y la acción razonada para evaluar las creencias y se utilice como herramienta de análisis de las variables psicológicas el cuestionario de factor de personalidad (16PF. Además de las encuestas de preferencias reveladas, se aplicaron otras dos encuestas: una de categorías socioeconómicas, y otra con indicadores latentes. Esta metodología permite una integración de modelos de elección discreta y de variables latentes, que lo hace operativo y cuantifica las variables psicológicas inobservables. El resultado más relevante que se obtuvo fue que la ansiedad incide en la elección de un modo de transporte urbano y se muestra que una alteración fisiológica, problemas en la percepción, y las creencias pueden afectar el proceso de toma de decisiones.
Fault Diagnosis for Fuel Cell Based on Naive Bayesian Classification
Directory of Open Access Journals (Sweden)
Liping Fan
2013-07-01
Full Text Available Many kinds of uncertain factors may exist in the process of fault diagnosis and affect diagnostic results. Bayesian network is one of the most effective theoretical models for uncertain knowledge expression and reasoning. The method of naive Bayesian classification is used in this paper in fault diagnosis of a proton exchange membrane fuel cell (PEMFC system. Based on the model of PEMFC, fault data are obtained through simulation experiment, learning and training of the naive Bayesian classification are finished, and some testing samples are selected to validate this method. Simulation results demonstrate that the method is feasible.
Firefly as a novel swarm intelligence variable selection method in spectroscopy.
Goodarzi, Mohammad; dos Santos Coelho, Leandro
2014-12-10
A critical step in multivariate calibration is wavelength selection, which is used to build models with better prediction performance when applied to spectral data. Up to now, many feature selection techniques have been developed. Among all different types of feature selection techniques, those based on swarm intelligence optimization methodologies are more interesting since they are usually simulated based on animal and insect life behavior to, e.g., find the shortest path between a food source and their nests. This decision is made by a crowd, leading to a more robust model with less falling in local minima during the optimization cycle. This paper represents a novel feature selection approach to the selection of spectroscopic data, leading to more robust calibration models. The performance of the firefly algorithm, a swarm intelligence paradigm, was evaluated and compared with genetic algorithm and particle swarm optimization. All three techniques were coupled with partial least squares (PLS) and applied to three spectroscopic data sets. They demonstrate improved prediction results in comparison to when only a PLS model was built using all wavelengths. Results show that firefly algorithm as a novel swarm paradigm leads to a lower number of selected wavelengths while the prediction performance of built PLS stays the same.
Automated classification of Hipparcos unsolved variables
Rimoldini, L; Süveges, M; López, M; Sarro, L M; Blomme, J; De Ridder, J; Cuypers, J; Guy, L; Mowlavi, N; Lecoeur-Taïbi, I; Beck, M; Jan, A; Nienartowicz, K; Ordóñez-Blanco, D; Lebzelter, T; Eyer, L; 10.1111/j.1365-2966.2012.21752.x
2013-01-01
We present an automated classification of stars exhibiting periodic, non-periodic and irregular light variations. The Hipparcos catalogue of unsolved variables is employed to complement the training set of periodic variables of Dubath et al. with irregular and non-periodic representatives, leading to 3881 sources in total which describe 24 variability types. The attributes employed to characterize light-curve features are selected according to their relevance for classification. Classifier models are produced with random forests and a multistage methodology based on Bayesian networks, achieving overall misclassification rates under 12 per cent. Both classifiers are applied to predict variability types for 6051 Hipparcos variables associated with uncertain or missing types in the literature.
Bayesian multimodel inference for dose-response studies
Link, W.A.; Albers, P.H.
2007-01-01
Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.
Richards, Joseph W; Brink, Henrik; Miller, Adam A; Bloom, Joshua S; Butler, Nathaniel R; James, J Berian; Long, James P; Rice, John
2011-01-01
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---i...
Identifying market segments in consumer markets: variable selection and data interpretation
Tonks, D G
2004-01-01
Market segmentation is often articulated as being a process which displays the recognised features of classical rationalism but in part; convention, convenience, prior experience and the overarching impact of rhetoric will influence if not determine the outcomes of a segmentation exercise. Particular examples of this process are addressed critically in this paper which concentrates on the issues of variable choice for multivariate approaches to market segmentation and also the methods used fo...
Directory of Open Access Journals (Sweden)
Boney Wera
2015-07-01
Full Text Available Successful crop breeding program incorporating agronomic and consumer preferred traits can be achieved by recognizing the existence and degree of variability among sweetpotato (Ipomoea batatas, (L. Lam. genotypes. Understanding genetic variability, genotypic and phenotypic correlation and inheritance among agronomic traits is fundamental to improvement of any crop. The study was carried out with the objective to estimate the genotypic variability and other yield related traits of highlands sweetpotato in Papua New Guinea in a polycross population. A total of 8 genotypes of sweetpotato derived from the polycross were considered in two cycles of replicated field experiments. Analysis of Variance was computed to contrast the variability within the selected genotypes based on high yielding β-carotene rich orange-fleshed sweetpotato. The results revealed significant differences among the genotypes. Genotypic coefficient of variation (GCV % was lower than phenotypic coefficient of variation (PCV % for all traits studied. Relatively high genetic variance, along with high heritability and expected genetic advances were observed in NMTN and ABYield. Harvest index (HI, scab and gall mite damage scores had heritability of 67%, 66% and 37% respectively. Marketable tuber yield (MTYield and total tuber yield (TTYield had lower genetic variance, low heritability and low genetic advance. There is need to investigate correlated inheritance among these traits. Selecting directly for yield improvement in polycross population may not be very efficient as indicated by the results. Therefore, it can be conclude that the variability within sweetpotato genotypes collected from polycross population in Aiyura Research Station for tuber yield is low and the extent of its yield improvement is narrow.
X-ray spectral variability of LINERs selected from the Palomar sample
Hernández-García, L; Masegosa, J; Márquez, I
2014-01-01
Variability is a general property of active galactic nuclei (AGN). At X-rays, the way in which these changes occur is not yet clear. In the particular case of low ionisation nuclear emission line region (LINER) nuclei, variations on months/years timescales have been found for some objects, but the main driver of these changes is still an open question. The main purpose of this work is to investigate the X-ray variability in LINERs, including the main driver of such variations, and to search for eventual differences between type 1 and 2 objects. We use the 18 LINERs in the Palomar sample with data retrieved from Chandra and/or XMM-Newton archives corresponding to observations gathered at different epochs. All the spectra for the same object are simultaneously fitted in order to study long term variations. The nature of the variability patterns are studied allowing different parameters to vary during the spectral fit. Whenever possible, short term variations from the analysis of the light curves and UV variabil...
Target selection of classical pulsating variables for space-based photometry
Plachy, E.; Molnar, L.; Szabo, R.; Kolenberg, K.; Banyai, E.
2016-05-01
In a few years the Kepler and TESS missions will provide ultra- precise photometry for thousands of RR Lyrae and hundreds of Cepheid stars. In the extended Kepler mission all targets are proposed in the Guest Observer (GO) Program, while the TESS space telescope will work with full frame images and a ~15-16th mag brightness limit with the possibility of short cadence measurements for a limited number of pre-selected objects. This paper highlights some details of the enormous and important work of the target selection process made by the members of Working Group 7 (WG#7) of the Kepler and TESS Asteroseismic Science Consortium.
Target selection of classical pulsating variables for space-based photometry
Plachy, E; Szabó, R; Kolenberg, K; Bányai, E
2016-01-01
In a few years the Kepler and TESS missions will provide ultra-precise photometry for thousands of RR Lyrae and hundreds of Cepheid stars. In the extended Kepler mission all targets are proposed in the Guest Observer (GO) Program, while the TESS space telescope will work with full frame images and a ~15-16th mag brightness limit with the possibility of short cadence measurements for a limited number of pre-selected objects. This paper highlights some details of the enormous and important work of the target selection process made by the members of Working Group 7 (WG#7) of the Kepler and TESS Asteroseismic Science Consortium.
Directory of Open Access Journals (Sweden)
da Silva SMD
2016-03-01
Full Text Available Silvia Maria Doria da Silva, Ilma Aparecida Paschoal, Eduardo Mello De Capitani, Marcos Mello Moreira, Luciana Campanatti Palhares, Mônica Corso PereiraPneumology Service, Department of Internal Medicine, School of Medical Sciences, State University of Campinas (UNICAMP, Campinas, São Paulo, BrazilBackground: Computed tomography (CT phenotypic characterization helps in understanding the clinical diversity of chronic obstructive pulmonary disease (COPD patients, but its clinical relevance and its relationship with functional features are not clarified. Volumetric capnography (VC uses the principle of gas washout and analyzes the pattern of CO2 elimination as a function of expired volume. The main variables analyzed were end-tidal concentration of carbon dioxide (ETCO2, Slope of phase 2 (Slp2, and Slope of phase 3 (Slp3 of capnogram, the curve which represents the total amount of CO2 eliminated by the lungs during each breath.Objective: To investigate, in a group of patients with severe COPD, if the phenotypic analysis by CT could identify different subsets of patients, and if there was an association of CT findings and functional variables.Subjects and methods: Sixty-five patients with COPD Gold III–IV were admitted for clinical evaluation, high-resolution CT, and functional evaluation (spirometry, 6-minute walk test [6MWT], and VC. The presence and profusion of tomography findings were evaluated, and later, the patients were identified as having emphysema (EMP or airway disease (AWD phenotype. EMP and AWD groups were compared; tomography findings scores were evaluated versus spirometric, 6MWT, and VC variables.Results: Bronchiectasis was found in 33.8% and peribronchial thickening in 69.2% of the 65 patients. Structural findings of airways had no significant correlation with spirometric variables. Air trapping and EMP were strongly correlated with VC variables, but in opposite directions. There was some overlap between the EMP and AWD
Prater, Tracie; Tilson, Will; Jones, Zack
2015-01-01
The absence of an economy of scale in spaceflight hardware makes additive manufacturing an immensely attractive option for propulsion components. As additive manufacturing techniques are increasingly adopted by government and industry to produce propulsion hardware in human-rated systems, significant development efforts are needed to establish these methods as reliable alternatives to conventional subtractive manufacturing. One of the critical challenges facing powder bed fusion techniques in this application is variability between machines used to perform builds. Even with implementation of robust process controls, it is possible for two machines operating at identical parameters with equivalent base materials to produce specimens with slightly different material properties. The machine variability study presented here evaluates 60 specimens of identical geometry built using the same parameters. 30 samples were produced on machine 1 (M1) and the other 30 samples were built on machine 2 (M2). Each of the 30-sample sets were further subdivided into three subsets (with 10 specimens in each subset) to assess the effect of progressive heat treatment on machine variability. The three categories for post-processing were: stress relief, stress relief followed by hot isostatic press (HIP), and stress relief followed by HIP followed by heat treatment per AMS 5664. Each specimen (a round, smooth tensile) was mechanically tested per ASTM E8. Two formal statistical techniques, hypothesis testing for equivalency of means and one-way analysis of variance (ANOVA), were applied to characterize the impact of machine variability and heat treatment on six material properties: tensile stress, yield stress, modulus of elasticity, fracture elongation, and reduction of area. This work represents the type of development effort that is critical as NASA, academia, and the industrial base work collaboratively to establish a path to certification for additively manufactured parts. For future
Flugsrud, Marcia R.
This study is designed to determine whether data obtained cross-sectionally from a sample of subjects in the middle childhood range on selected personality characteristics could be well described by a concave parabolic curve and thus linked to the closure behaviour elicited from the subjects. Specifically, the investigation seeks to determine if…
The Relationship between Selected Body Composition Variables and Muscular Endurance in Women
Esco, Michael R.; Olson, Michele S.; Williford, Henry N.
2010-01-01
The primary purpose of this study was to determine if muscular endurance is affected by referenced waist circumference groupings, independent of body mass and subcutaneous abdominal fat, in women. This study also explored whether selected body composition measures were associated with muscular endurance. Eighty-four women were measured for height,…
Heterogeneous selection on a heritable temperament trait in a variable environment
Quinn, John L.; Patrick, Samantha C.; Bouwhuis, Sandra; Wilkin, Teddy A.; Sheldon, Ben C.
2009-01-01
P> Temperament traits increasingly provide a focus for investigating the evolutionary ecology of behavioural variation. Here, we examine the underlying causes and selective consequences of individual variation in the temperament trait 'exploration behaviour in a novel environment' (EB, based on an 8
Vázquez-Polo, Francisco-Jose; Moreno, Elías; Negrín, Miguel A; Martel, Maria
2015-04-01
In most cases, including those of discrete random variables, statistical meta-analysis is carried out using the normal random effect model. The authors argue that normal approximation does not always properly reflect the underlying uncertainty of the original discrete data. Furthermore, in the presence of rare events the results from this approximation can be very poor. This review proposes a Bayesian meta-analysis to address binary outcomes from sparse data and also introduces a simple way to examine the sensitivity of the quantities of interest in the meta-analysis with respect to the structure dependence selected. The findings suggest that for binary outcomes data it is possible to develop a Bayesian procedure, which can be directly applied to sparse data without ad hoc corrections. By choosing a suitable class of linking distributions, the authors found that a Bayesian robustness study can be easily implemented. For illustrative purposes, an example with real data is analyzed using the proposed Bayesian meta-analysis for binomial sparse data.
Du, Wen; Gu, Ting; Tang, Li-Juan; Jiang, Jian-Hui; Wu, Hai-Long; Shen, Guo-Li; Yu, Ru-Qin
2011-09-15
As a greedy search algorithm, classification and regression tree (CART) is easily relapsing into overfitting while modeling microarray gene expression data. A straightforward solution is to filter irrelevant genes via identifying significant ones. Considering some significant genes with multi-modal expression patterns exhibiting systematic difference in within-class samples are difficult to be identified by existing methods, a strategy that unimodal transform of variables selected by interval segmentation purity (UTISP) for CART modeling is proposed. First, significant genes exhibiting varied expression patterns can be properly identified by a variable selection method based on interval segmentation purity. Then, unimodal transform is implemented to offer unimodal featured variables for CART modeling via feature extraction. Because significant genes with complex expression patterns can be properly identified and unimodal feature extracted in advance, this developed strategy potentially improves the performance of CART in combating overfitting or underfitting while modeling microarray data. The developed strategy is demonstrated using two microarray data sets. The results reveal that UTISP-based CART provides superior performance to k-nearest neighbors or CARTs coupled with other gene identifying strategies, indicating UTISP-based CART holds great promise for microarray data analysis.
Directory of Open Access Journals (Sweden)
K. KAMALAKKANNAN
2011-06-01
Full Text Available The purpose of this study is to analyze the effect of aquatic plyometric training with and without the use ofweights on selected physical fitness variables among volleyball players. To achieve the purpose of these study 36physically active undergraduate volleyball players between 18 and 20 years of age volunteered as participants.The participants were randomly categorized into three groups of 12 each: a control group (CG, an aquaticPlyometric training with weight group (APTWG, and an aquatic Plyometric training without weight group(APTWOG. The subjects of the control group were not exposed to any training. Both experimental groupsunderwent their respective experimental treatment for 12 weeks, 3 days per week and a single session on eachday. Speed, endurance, and explosive power were measured as the dependent variables for this study. 36 days ofexperimental treatment was conducted for all the groups and pre and post data was collected. The collected datawere analyzed using an analysis of covariance (ANCOVA and followed by a Scheffé’s post hoc test. The resultsrevealed significant differences between groups on all the selected dependent variables. This study demonstratedthat aquatic plyometric training can be one effective means for improving speed, endurance, and explosivepower in volley ball players
Bayesian approach to decompression sickness model parameter estimation.
Howle, L E; Weber, P W; Nichols, J M
2017-03-01
We examine both maximum likelihood and Bayesian approaches for estimating probabilistic decompression sickness model parameters. Maximum likelihood estimation treats parameters as fixed values and determines the best estimate through repeated trials, whereas the Bayesian approach treats parameters as random variables and determines the parameter probability distributions. We would ultimately like to know the probability that a parameter lies in a certain range rather than simply make statements about the repeatability of our estimator. Although both represent powerful methods of inference, for models with complex or multi-peaked likelihoods, maximum likelihood parameter estimates can prove more difficult to interpret than the estimates of the parameter distributions provided by the Bayesian approach. For models of decompression sickness, we show that while these two estimation methods are complementary, the credible intervals generated by the Bayesian approach are more naturally suited to quantifying uncertainty in the model parameters.
Bayesian network approach to spatial data mining: a case study
Huang, Jiejun; Wan, Youchuan
2006-10-01
Spatial data mining is a process of discovering interesting, novel, and potentially useful information or knowledge hidden in spatial data sets. It involves different techniques and different methods from various areas of research. A Bayesian network is a graphical model that encodes causal probabilistic relationships among variables of interest, which has a powerful ability for representing and reasoning and provides an effective way to spatial data mining. In this paper we give an introduction to Bayesian networks, and discuss using Bayesian networks for spatial data mining. We propose a framework of spatial data mining based on Bayesian networks. Then we show a case study and use the experimental results to validate the practical viability of the proposed approach to spatial data mining. Finally, the paper gives a summary and some remarks.
Stochastic back analysis of permeability coefficient using generalized Bayesian method
Institute of Scientific and Technical Information of China (English)
Zheng Guilan; Wang Yuan; Wang Fei; Yang Jian
2008-01-01
Owing to the fact that the conventional deterministic back analysis of the permeability coefficient cannot reflect the uncertainties of parameters, including the hydraulic head at the boundary, the permeability coefficient and measured hydraulic head, a stochastic back analysis taking consideration of uncertainties of parameters was performed using the generalized Bayesian method. Based on the stochastic finite element method (SFEM) for a seepage field, the variable metric algorithm and the generalized Bayesian method, formulas for stochastic back analysis of the permeability coefficient were derived. A case study of seepage analysis of a sluice foundation was performed to illustrate the proposed method. The results indicate that, with the generalized Bayesian method that considers the uncertainties of measured hydraulic head, the permeability coefficient and the hydraulic head at the boundary, both the mean and standard deviation of the permeability coefficient can be obtained and the standard deviation is less than that obtained by the conventional Bayesian method. Therefore, the present method is valid and applicable.
Bayesian classifiers applied to the Tennessee Eastman process.
Dos Santos, Edimilson Batista; Ebecken, Nelson F F; Hruschka, Estevam R; Elkamel, Ali; Madhuranthakam, Chandra M R
2014-03-01
Fault diagnosis includes the main task of classification. Bayesian networks (BNs) present several advantages in the classification task, and previous works have suggested their use as classifiers. Because a classifier is often only one part of a larger decision process, this article proposes, for industrial process diagnosis, the use of a Bayesian method called dynamic Markov blanket classifier that has as its main goal the induction of accurate Bayesian classifiers having dependable probability estimates and revealing actual relationships among the most relevant variables. In addition, a new method, named variable ordering multiple offspring sampling capable of inducing a BN to be used as a classifier, is presented. The performance of these methods is assessed on the data of a benchmark problem known as the Tennessee Eastman process. The obtained results are compared with naive Bayes and tree augmented network classifiers, and confirm that both proposed algorithms can provide good classification accuracies as well as knowledge about relevant variables.
Selected topics in the classical theory of functions of a complex variable
Heins, Maurice
2014-01-01
Elegant and concise, this text is geared toward advanced undergraduate students acquainted with the theory of functions of a complex variable. The treatment presents such students with a number of important topics from the theory of analytic functions that may be addressed without erecting an elaborate superstructure. These include some of the theory's most celebrated results, which seldom find their way into a first course. After a series of preliminaries, the text discusses properties of meromorphic functions, the Picard theorem, and harmonic and subharmonic functions. Subsequent topics incl
Parthasarathy, S; Jaiganesh, K; Duraisamy
2014-01-01
The implementation of yogic practices has proven benefits in both organic and psychological diseases. Forty-five women with anxiety selected by a random sampling method were divided into three groups. Experimental group I was subjected to asanas, relaxation and pranayama while Experimental group II was subjected to an integrated yoga module. The control group did not receive any intervention. Anxiety was measured by Taylor's Manifest Anxiety Scale before and after treatment. Frustration was measured through Reaction to Frustration Scale. All data were spread in an Excel sheet to be analysed with SPSS 16 software using analysis of covariance (ANCOVA). Selected yoga and asanas decreased anxiety and frustration scores but treatment with an integrated yoga module resulted in significant reduction of anxiety and frustration. To conclude, the practice of asanas and yoga decreased anxiety in women, and yoga as an integrated module significantly improved anxiety scores in young women with proven anxiety without any ill effects.
DEFF Research Database (Denmark)
Mørup, Morten; Schmidt, Mikkel N
2012-01-01
Many networks of scientific interest naturally decompose into clusters or communities with comparatively fewer external than internal links; however, current Bayesian models of network communities do not exert this intuitive notion of communities. We formulate a nonparametric Bayesian model...... consistent with ground truth, and on real networks, it outperforms existing approaches in predicting missing links. This suggests that community structure is an important structural property of networks that should be explicitly modeled....... for community detection consistent with an intuitive definition of communities and present a Markov chain Monte Carlo procedure for inferring the community structure. A Matlab toolbox with the proposed inference procedure is available for download. On synthetic and real networks, our model detects communities...
Bayesian Independent Component Analysis
DEFF Research Database (Denmark)
Winther, Ole; Petersen, Kaare Brandt
2007-01-01
In this paper we present an empirical Bayesian framework for independent component analysis. The framework provides estimates of the sources, the mixing matrix and the noise parameters, and is flexible with respect to choice of source prior and the number of sources and sensors. Inside the engine...... in a Matlab toolbox, is demonstrated for non-negative decompositions and compared with non-negative matrix factorization....
Bayesian theory and applications
Dellaportas, Petros; Polson, Nicholas G; Stephens, David A
2013-01-01
The development of hierarchical models and Markov chain Monte Carlo (MCMC) techniques forms one of the most profound advances in Bayesian analysis since the 1970s and provides the basis for advances in virtually all areas of applied and theoretical Bayesian statistics. This volume guides the reader along a statistical journey that begins with the basic structure of Bayesian theory, and then provides details on most of the past and present advances in this field. The book has a unique format. There is an explanatory chapter devoted to each conceptual advance followed by journal-style chapters that provide applications or further advances on the concept. Thus, the volume is both a textbook and a compendium of papers covering a vast range of topics. It is appropriate for a well-informed novice interested in understanding the basic approach, methods and recent applications. Because of its advanced chapters and recent work, it is also appropriate for a more mature reader interested in recent applications and devel...
Bayesian system reliability assessment under fuzzy environments
Energy Technology Data Exchange (ETDEWEB)
Wu, H.-C
2004-03-01
The Bayesian system reliability assessment under fuzzy environments is proposed in this paper. In order to apply the Bayesian approach, the fuzzy parameters are assumed as fuzzy random variables with fuzzy prior distributions. The (conventional) Bayes estimation method will be used to create the fuzzy Bayes point estimator of system reliability by invoking the well-known theorem called 'Resolution Identity' in fuzzy sets theory. On the other hand, we also provide the computational procedures to evaluate the membership degree of any given Bayes point estimate of system reliability. In order to achieve this purpose, we transform the original problem into a nonlinear programming problem. This nonlinear programming problem is then divided into four subproblems for the purpose of simplifying computation. Finally, the subproblems can be solved by using any commercial optimizers, e.g. GAMS or LINGO.
Bayesian network modelling of upper gastrointestinal bleeding
Aisha, Nazziwa; Shohaimi, Shamarina; Adam, Mohd Bakri
2013-09-01
Bayesian networks are graphical probabilistic models that represent causal and other relationships between domain variables. In the context of medical decision making, these models have been explored to help in medical diagnosis and prognosis. In this paper, we discuss the Bayesian network formalism in building medical support systems and we learn a tree augmented naive Bayes Network (TAN) from gastrointestinal bleeding data. The accuracy of the TAN in classifying the source of gastrointestinal bleeding into upper or lower source is obtained. The TAN achieves a high classification accuracy of 86% and an area under curve of 92%. A sensitivity analysis of the model shows relatively high levels of entropy reduction for color of the stool, history of gastrointestinal bleeding, consistency and the ratio of blood urea nitrogen to creatinine. The TAN facilitates the identification of the source of GIB and requires further validation.
Learning Bayesian networks using genetic algorithm
Institute of Scientific and Technical Information of China (English)
Chen Fei; Wang Xiufeng; Rao Yimei
2007-01-01
A new method to evaluate the fitness of the Bayesian networks according to the observed data is provided. The main advantage of this criterion is that it is suitable for both the complete and incomplete cases while the others not.Moreover it facilitates the computation greatly. In order to reduce the search space, the notation of equivalent class proposed by David Chickering is adopted. Instead of using the method directly, the novel criterion, variable ordering, and equivalent class are combined,moreover the proposed mthod avoids some problems caused by the previous one. Later, the genetic algorithm which allows global convergence, lack in the most of the methods searching for Bayesian network is applied to search for a good model in thisspace. To speed up the convergence, the genetic algorithm is combined with the greedy algorithm. Finally, the simulation shows the validity of the proposed approach.
Directory of Open Access Journals (Sweden)
Concepcion Martires
1990-12-01
Full Text Available This study sought to determine the relationship between the conflict management styles of managers and certain management and organization factors. A total of 462 top, middle, and lower managers from 72 companies participated in the study which utilized the Thomas-Killman Conflict Mode Instrument. To facilitate the computation of the statistical data, a microcomputer and a software package was used.The majority of the managers of the 17 types of organization included in the study use collaborative mode of managing conflict. This finding is congruent with the findings of past studies conducted on managers of commercial banks, service, manufacturing, trading advertising, appliance, investment houses, and overseas recruitment industries showing their high degree of objectivity and assertiveness of their own personal goals and of other people's concerns. The second dominant style, which is compromising, indicates their desire in sharing and searching for solutions that result in satisfaction among conflicting parties. This finding is highly consistent with the strong Filipino value of smooth interpersonal relationships (SIR as reflected and discussed in the numerous researches on Filipino values.The chi-square tests generated by the computer package in statistics showed independence between the manager's conflict management styles and each of the variables of sex, civil status, position level at work, work experience, type of corporation, and number of subordinates. This result is again congruent with those of past studies conducted in the Philippines. The past and present findings may imply that conflict management mode may be a highly personal style that is not dependent on any of these variables included in the study. However, the chi-square tests show that management style is dependent on the manager's age and educational attainment.
A DNA-based system for selecting and displaying the combined result of two input variables
DEFF Research Database (Denmark)
Liu, Huajie; Wang, Jianbang; Song, S
2015-01-01
demonstrate this capability in a DNA-based system that takes two input numbers, represented in DNA strands, and returns the result of their multiplication, writing this as a number in a display. Unlike a conventional calculator, this system operates by selecting the result from a library of solutions rather...... than through logic operations. The multiplicative example demonstrated here illustrates a much more general capability—to generate a unique output for any distinct pair of DNA inputs. The system thereby functions as a lookup table and could be a key component in future, more powerful data......-processing systems for diagnostics and sensing....
Genetic variability and selection for laticiferous system characters in Hevea brasiliensis
Directory of Open Access Journals (Sweden)
Paulo de Souza Gonçalves
2005-09-01
Full Text Available Six laticiferous system characters were investigated in 22 three-year-old, half-sib rubber tree [Hevea brasiliensis (Willd. ex Adr. de Juss. Muell.-Arg.] progenies, evaluated at three sites (Votuporanga, Pindorama and Jaú, all in the São Paulo State, Brazil. The traits examined were: average rubber yield (Pp, average bark thickness (Bt, number of latex vessel rings (Lv, average distance between consecutive latex vessel rings (Dc, density of latex vessels per 5 mm per ring averaged over all rings (Dd and the diameter of the latex vessels (Di. The joint analysis showed that site effect and progeny x sites interaction were significant for all traits, except Lv. Estimates of individual heritabilities across the three sites were high for Bt; moderate for Lv, Pp and Dc; low for Dd and very low for Di. Genetic correlations in the joint analysis showed high positive correlations between Pp and the other traits. Selecting the best five progenies would result in genetic gains of 24.91% for Pp while selecting best two plants within a progeny would result in a Pp genetic gain of 30.98%.
Energy Technology Data Exchange (ETDEWEB)
Nitta, Yuriko; Hiramatsu, Yoshifumi; Imanaka, Toshinobu (Osaka Univ., Toyonaka (Japan))
1990-11-01
The effects of starting salts, supports, added amount of Na{sub 2}CO{sub 3}, and other precipitation variables on catalytic properties of supported cobalt catalysts were studied for the hydrogenation of cinnamaldehyde and crotonaldehyde by using TGA, XRD, and XPS. The catalysts prepared from cobalt chloride always exhibited high selectivities to unsaturated alcohols irrespective of the support employed. The amount of surface chlorine remaining after H{sub 2}-reduction of the Co/SiO{sub 2} precursors prepared from cobalt chloride decreased with increasing amount of Na{sub 2}CO{sub 3} added as the precipitant, and both activity and selectivity reached maxima at around Cl/Co = 0.2 in the catalyst surface. The enhanced selectivity of the catalyst prepared from cobalt chloride was explained by the effects of residual chlorine both in the H{sub 2}-reduction stage and in the reaction stage; the former leads to a favorable crystallite size distribution (CDS) of cobalt and the latter depresses the hydrogenation of C{double bond}C double bond. The difference in activities and selectivities of various supported catalysts prepared from cobalt nitrate was discussed based on the difference in the strength of metal-support interaction which leads to different CDSs of cobalt in theses catalysts.
SOMBI: Bayesian identification of parameter relations in unstructured cosmological data
Frank, Philipp; Enßlin, Torsten A
2016-01-01
This work describes the implementation and application of a correlation determination method based on Self Organizing Maps and Bayesian Inference (SOMBI). SOMBI aims to automatically identify relations between different observed parameters in unstructured cosmological or astrophysical surveys by automatically identifying data clusters in high-dimensional datasets via the Self Organizing Map neural network algorithm. Parameter relations are then revealed by means of a Bayesian inference within respective identified data clusters. Specifically such relations are assumed to be parametrized as a polynomial of unknown order. The Bayesian approach results in a posterior probability distribution function for respective polynomial coefficients. To decide which polynomial order suffices to describe correlation structures in data, we include a method for model selection, the Bayesian Information Criterion, to the analysis. The performance of the SOMBI algorithm is tested with mock data. As illustration we also provide ...
PAC-Bayesian Analysis of Martingales and Multiarmed Bandits
Seldin, Yevgeny; Shawe-Taylor, John; Peters, Jan; Auer, Peter
2011-01-01
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many ...
Accurate characterization of weak neutron fields by using a Bayesian approach.
Medkour Ishak-Boushaki, G; Allab, M
2017-04-01
A Bayesian analysis of data derived from neutron spectrometric measurements provides the advantage of determining rigorously integral physical quantities characterizing the neutron field and their respective related uncertainties. The first and essential step in a Bayesian approach is the parameterization of the investigated neutron spectrum. The aim of this paper is to investigate the sensitivity of the Bayesian results, mainly the neutron dose H(*)(10) required for radiation protection purposes and its correlated uncertainty, to the selected neutron spectrum parameterization.
Xu, Yunfei; Dass, Sarat; Maiti, Tapabrata
2016-01-01
This brief introduces a class of problems and models for the prediction of the scalar field of interest from noisy observations collected by mobile sensor networks. It also introduces the problem of optimal coordination of robotic sensors to maximize the prediction quality subject to communication and mobility constraints either in a centralized or distributed manner. To solve such problems, fully Bayesian approaches are adopted, allowing various sources of uncertainties to be integrated into an inferential framework effectively capturing all aspects of variability involved. The fully Bayesian approach also allows the most appropriate values for additional model parameters to be selected automatically by data, and the optimal inference and prediction for the underlying scalar field to be achieved. In particular, spatio-temporal Gaussian process regression is formulated for robotic sensors to fuse multifactorial effects of observations, measurement noise, and prior distributions for obtaining the predictive di...
Joint variable and rank selection for parsimonious estimation of high dimensional matrices
Bunea, Florentina; Wegkamp, Marten
2011-01-01
This article is devoted to optimal dimension reduction methods for sparse, high dimensional multivariate response regression models. Both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower dimensional approximations of the parameter matrix in such models. We show in this article that important gains in prediction accuracy can be obtained by considering them jointly. For this, we first motivate a new class of sparse multivariate regression models, in which the coefficient matrix has low rank {\\bf and} zero rows or can be well approximated by such a matrix. Then, we introduce estimators that are based on penalized least squares, with novel penalties that impose simultaneous row and rank restrictions on the coefficient matrix. We prove that these estimators indeed adapt to the unknown matrix sparsity and have fast rates of convergence. We support our theoretica...
Directory of Open Access Journals (Sweden)
Anna Zbierska
2016-09-01
Full Text Available The paper presents the evaluation of seasonal and long-term changes in selected nutrients of three lakes of the Poznań Lakeland. The lakes were selected due to the high risk of pollution from agricultural and residential areas. Water samples were taken in 6 control points in the spring, summer and autumn, from 2004 to 2014. Trophic status of the lakes was evaluated based on the concentration of nutrients (nitrates, nitrites, ammonium, nitrogen and phosphorus and indicators of eutrophication. Studies have shown that the concentration of nutrients varied greatly both in individual years and seasons of the analyzed decades, especially in Lakes Niepruszewskie and Pamiątkowskie. The main problem is the high concentration of nitrates. In general, it showed an upward trend until 2013, especially in the spring. This may indicate that actions restricting runoff pollution from agricultural sources have not been fully effective. On the other hand, a marked downward trend in the concentrations of NH4 over the years from 2004 to 2014, especially after 2007, indicates a gradual improvement of wastewater management. Moreover, seasonal variation in NH4 concentrations differed from those of NO3 and NO2. The highest values were reported in the autumn season, the lowest in the summer. Concentrations of nutrients and eutrophication indexes reached high values in all analysed lakes, indicating a eutrophic or hypertrophic state of the lakes. The high value of the N:P ratio indicates that the lakes had a huge surplus of nitrogen, and phosphorus is a productivity limiting factor.
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.
Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad
2014-11-01
Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.
Yao, Engui
1998-06-01
ATM (Asynchronous Transfer Mode) is an emerging technology in computer networking, which, in turn, is the physical media of information systems and networking/telecommunication systems. The technology provides potentiality for universities to build their networks based on the future vision of uniting voice, data, and video communications on ATM-technology-based equipment. A review of the literature revealed that minimal evidence exists to indicate whether the size, type, financial factors, and information processing maturity of a university affect a university's high-tech innovation adoptions. No research of this nature has been undertaken in the study of ATM adoption in any institutions of higher learning, nor has any research of this nature been found in other organizations, either. Such evidence is needed by university administrators, information systems managers, and LAN managers to understand their universities better, whether they have or have not adopted ATM, and to evaluate their current administrative, academic, and financial situations and current campus networking situations. The purpose of this study was to determine the relationships between ATM adoption and four organizational variables: university size, type, finances, and information processing maturity. Another purpose of the study was to identify the current status of ATM adoption in campus networking in the United States. Logistic regression was used as the statistical data analysis method. The results of the study provided evidence to show that ATM adoption in campus networking is significantly related to university size, university type, university finances, and university information processing maturity.
Spatial and temporal variability of microbes in selected soils at the Nevada Test Site
Energy Technology Data Exchange (ETDEWEB)
Angerer, J.P.; Winkel, V.K.; Ostler, W.K.; Hall, P.F.
1993-12-31
Large areas encompassing almost 800 hectares on the Nevada Test Site, Nellis Air Force Range and the Tonopah Test Range are contaminated with plutonium. Decontamination of plutonium from these sites may involve removal of plants and almost 370,000 cubic meters of soil. The soil may be subjected to a series of processes to remove plutonium. After decontamination, the soils will be returned to the site and revegetated. There is a paucity of information on the spatial and temporal distribution of microbes in soils of the Mojave and Great Basin Deserts. Therefore, this study was initiated to determine the biomass and diversity of microbes in soils prior to decontamination. Soils were collected to a depth of 10 cm along each of five randomly located 30-m transects at each of four sites. To ascertain spatial differences, soils were collected from beneath major shrubs and from associated interspaces. Soils were collected every three to four months to determine temporal (seasonal) differences in microbial parameters. Soils from beneath shrubs generally had greater active fungi and bacteria, and greater non-amended respiration than soils from interspaces. Temporal variability also was found; total and active fungi, and non-amended respiration were correlated with soil moisture at the time of sampling. Information from this study will aid in determining the effects of plutonium decontamination on soil microorganisms, and what measures, if any, will be required to restore microbial populations during revegetation of these sites.
Directory of Open Access Journals (Sweden)
BOROKINI T.I
2014-09-01
Full Text Available The adverse effects of climate variability and extremities on agriculture in Africa have been widely reported. This calls for adaptive strategies in farming so as to reduce vulnerability and ensure food security. This study was therefore conducted to evaluate the awareness of farmers to climate variability and their adaptation strategies in four selected farm settlements in Oyo State, Nigeria. . Structured questionnaires were administered to 120 farmers using a stratified random sampling method. The results showed very high awareness of climate variability among the farmers. However, majority of the farmers acquired their land by lease, while local farm tools are still used by most of the farmers. Sole cropping, mixed cropping and crop rotation were mostly practiced by the farmers. The farmers reported prevalence of crops pests and diseases, flooding, disappearance of bi-modal rainfall, increased temperature and drought in their farmlands, leading to increase in poverty, higher production costs and poor crop harvests as evidences of harsh climatic conditions. Adaptation strategies used by the farmers were changing planting dates, planting new varieties, intercropping and alternative income generating activities. The farmers are encouraged to acquire more efficient farming system and equipment, while they should strongly consider other adaptation strategies such as agricultural insurance, agroforestry, water conservation methods, soil conservation farming, irrigation farming, organic farming and mechanized farming. Furthermore, land tenure policy that could constrain the farmers should be reviewed, while they should be given proper training.
Moghimi, Saba; Schudlo, Larissa; Chau, Tom; Guerguerian, Anne-Marie
2015-01-01
Music-induced brain activity modulations in areas involved in emotion regulation may be useful in achieving therapeutic outcomes. Clinical applications of music may involve prolonged or repeated exposures to music. However, the variability of the observed brain activity patterns in repeated exposures to music is not well understood. We hypothesized that multiple exposures to the same music would elicit more consistent activity patterns than exposure to different music. In this study, the temporal and spatial variability of cerebral prefrontal hemodynamic response was investigated across multiple exposures to self-selected musical excerpts in 10 healthy adults. The hemodynamic changes were measured using prefrontal cortex near infrared spectroscopy and represented by instantaneous phase values. Based on spatial and temporal characteristics of these observed hemodynamic changes, we defined a consistency index to represent variability across these domains. The consistency index across repeated exposures to the same piece of music was compared to the consistency index corresponding to prefrontal activity from randomly matched non-identical musical excerpts. Consistency indexes were significantly different for identical versus non-identical musical excerpts when comparing a subset of repetitions. When all four exposures were compared, no significant difference was observed between the consistency indexes of randomly matched non-identical musical excerpts and the consistency index corresponding to repetitions of the same musical excerpts. This observation suggests the existence of only partial consistency between repeated exposures to the same musical excerpt, which may stem from the role of the prefrontal cortex in regulating other cognitive and emotional processes.
Afzali, Hossein Haji Ali; Moss, John R; Mahmood, Mohammad Afzal
2009-05-01
Over the past few decades, there has been an increasing interest in the measurement of hospital efficiency in developing countries and in Iran. While the choice of measurement methods in hospital efficiency assessment has been widely argued in the literature, few authors have offered a framework to specify variables that reflect different hospital functions, the quality of the process of care and the effectiveness of hospital services. However, without the knowledge of hospital objectives and all relevant functions, efficiency studies run the risk of making biased comparisons, particularly against hospitals that provide higher quality services requiring the use of more resources. Undertaking an in-depth investigation regarding the multi-product nature of hospitals, various hospital functions and the values of various stakeholders (patient, staff and community) with a focus on the Iranian public hospitals, this study has proposed a conceptual framework to select the most appropriate variables for measuring hospital efficiency using frontier-based techniques. This paper contributes to hospital efficiency studies by proposing a conceptual framework and incorporating a broader set of variables in Iran. This can enhance the validity of hospital efficiency studies using frontier-based methods in developing countries.
Muposhi, Victor K.; Gandiwa, Edson; Chemura, Abel; Bartels, Paul; Makuza, Stanley M.; Madiri, Tinaapi H.
2016-01-01
An understanding of the habitat selection patterns by wild herbivores is critical for adaptive management, particularly towards ecosystem management and wildlife conservation in semi arid savanna ecosystems. We tested the following predictions: (i) surface water availability, habitat quality and human presence have a strong influence on the spatial distribution of wild herbivores in the dry season, (ii) habitat suitability for large herbivores would be higher compared to medium-sized herbivores in the dry season, and (iii) spatial extent of suitable habitats for wild herbivores will be different between years, i.e., 2006 and 2010, in Matetsi Safari Area, Zimbabwe. MaxEnt modeling was done to determine the habitat suitability of large herbivores and medium-sized herbivores. MaxEnt modeling of habitat suitability for large herbivores using the environmental variables was successful for the selected species in 2006 and 2010, except for elephant (Loxodonta africana) for the year 2010. Overall, large herbivores probability of occurrence was mostly influenced by distance from rivers. Distance from roads influenced much of the variability in the probability of occurrence of medium-sized herbivores. The overall predicted area for large and medium-sized herbivores was not different. Large herbivores may not necessarily utilize larger habitat patches over medium-sized herbivores due to the habitat homogenizing effect of water provisioning. Effect of surface water availability, proximity to riverine ecosystems and roads on habitat suitability of large and medium-sized herbivores in the dry season was highly variable thus could change from one year to another. We recommend adaptive management initiatives aimed at ensuring dynamic water supply in protected areas through temporal closure and or opening of water points to promote heterogeneity of wildlife habitats. PMID:27680673
The influence of selected socio-demographic variables on symptoms occurring during the menopause
Directory of Open Access Journals (Sweden)
Marta Makara-Studzińska
2015-02-01
Full Text Available Introduction: It is considered that the lifestyle conditioned by socio-demographic or socio-economic factors determines the health condition of people to the greatest extent. The aim of this study is to evaluate the influence of selected socio-demographic factors on the kinds of symptoms occurring during menopause. Material and methods : The study group consisted of 210 women aged 45 to 65, not using hormone replacement therapy, staying at healthcare centers for rehabilitation treatment. The study was carried out in 2013-2014 in the Silesian, Podlaskie and Lesser Poland voivodeships. The set of tools consisted of the authors’ own survey questionnaire and the Menopause Rating Scale (MRS. Results : The most commonly occurring symptom in the group of studied women was a depressive mood, from the group of psychological symptoms, followed by physical and mental fatigue, and discomfort connected with muscle and joint pain. The greatest intensity of symptoms was observed in the group of women with the lowest level of education, reporting an average or bad material situation, and unemployed women. Conclusions : An alarmingly high number of reported psychological symptoms in the group of menopausal women was observed, and in particular among the group of low socio-economic status. Career seems to be a factor reducing the risk of occurrence of psychological symptoms. There is an urgent need for health promotion and prophylaxis in the group of menopausal women, and in many cases for implementation of specialist psychological assistance.
Stamford, B A; Weltman, A; Moffatt, R J; Fulco, C
1978-01-01
Physical performance and body composition characteristics of members (n = 75) and recruits (n = 61) of the Louisville Police Department (total n = 136) were assessed. Members were randomly selected males and ranged in age from 20 to 55 years and were ranked from the newest inductee through and including the Chief of Police. Members between the ages of 20 and 29 years assigned to active duty possessed average cardio-respiratory fitness (Vo2max). With age, cardio-respiratory fitness decreased and body weight and body fatness progressively increased. Male and female recruits entering basic training also demonstrated average cardio-respiratory fitness. Significant (P less than .05) increases for males and females in Vo2max and decreases in body fatness (males) were found following 4 months of physically rigorous recruit training. Fifteen of the male recruits who completed training were retested following 1 year of active duty. During active duty, physical activity involvement was limited to job requirements with no additional physical training imposed. Cardio-respiratory fitness and body fatness reverted to pre-training levels. It was concluded that the physical demands associated with police work are too low to permit maintenance of physical fitness.
Measure Transformer Semantics for Bayesian Machine Learning
Borgström, Johannes; Gordon, Andrew D.; Greenberg, Michael; Margetson, James; van Gael, Jurgen
The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.
Bayesian networks for maritime traffic accident prevention: benefits and challenges.
Hänninen, Maria
2014-12-01
Bayesian networks are quantitative modeling tools whose applications to the maritime traffic safety context are becoming more popular. This paper discusses the utilization of Bayesian networks in maritime safety modeling. Based on literature and the author's own experiences, the paper studies what Bayesian networks can offer to maritime accident prevention and safety modeling and discusses a few challenges in their application to this context. It is argued that the capability of representing rather complex, not necessarily causal but uncertain relationships makes Bayesian networks an attractive modeling tool for the maritime safety and accidents. Furthermore, as the maritime accident and safety data is still rather scarce and has some quality problems, the possibility to combine data with expert knowledge and the easy way of updating the model after acquiring more evidence further enhance their feasibility. However, eliciting the probabilities from the maritime experts might be challenging and the model validation can be tricky. It is concluded that with the utilization of several data sources, Bayesian updating, dynamic modeling, and hidden nodes for latent variables, Bayesian networks are rather well-suited tools for the maritime safety management and decision-making.
Technical note: Bayesian calibration of dynamic ruminant nutrition models.
Reed, K F; Arhonditsis, G B; France, J; Kebreab, E
2016-08-01
Mechanistic models of ruminant digestion and metabolism have advanced our understanding of the processes underlying ruminant animal physiology. Deterministic modeling practices ignore the inherent variation within and among individual animals and thus have no way to assess how sources of error influence model outputs. We introduce Bayesian calibration of mathematical models to address the need for robust mechanistic modeling tools that can accommodate error analysis by remaining within the bounds of data-based parameter estimation. For the purpose of prediction, the Bayesian approach generates a posterior predictive distribution that represents the current estimate of the value of the response variable, taking into account both the uncertainty about the parameters and model residual variability. Predictions are expressed as probability distributions, thereby conveying significantly more information than point estimates in regard to uncertainty. Our study illustrates some of the technical advantages of Bayesian calibration and discusses the future perspectives in the context of animal nutrition modeling.
Prediction of road accidents: A Bayesian hierarchical approach
DEFF Research Database (Denmark)
Deublein, Markus; Schubert, Matthias; Adey, Bryan T.;
2013-01-01
In this paper a novel methodology for the prediction of the occurrence of road accidents is presented. The methodology utilizes a combination of three statistical methods: (1) gamma-updating of the occurrence rates of injury accidents and injured road users, (2) hierarchical multivariate Poisson......-lognormal regression analysis taking into account correlations amongst multiple dependent model response variables and effects of discrete accident count data e.g. over-dispersion, and (3) Bayesian inference algorithms, which are applied by means of data mining techniques supported by Bayesian Probabilistic Networks...... in order to represent non-linearity between risk indicating and model response variables, as well as different types of uncertainties which might be present in the development of the specific models.Prior Bayesian Probabilistic Networks are first established by means of multivariate regression analysis...
Giuliani, Claudia; Lazzaro, Lorenzo; Calamassi, Roberto; Calamai, Luca; Romoli, Riccardo; Fico, Gelsomina; Foggi, Bruno; Mariotti Lippi, Marta
2016-10-01
The species of Helichrysum sect. Stoechadina (Asteraceae) are well-known for their secondary metabolite content and the characteristic aromatic bouquets. In the wild, populations exhibit a wide phenotypic plasticity which makes critical the circumscription of species and infraspecific ranks. Previous investigations on Helichrysum italicum complex focused on a possible phytochemical typification based on hydrodistilled essential oils. Aims of this paper are three-fold: (i) characterizing the volatile profiles of different populations, testing (ii) how these profiles vary across populations and (iii) how the phytochemical diversity may contribute in solving taxonomic problems. Nine selected Helichrysum populations, included within the H. italicum complex, Helichrysum litoreum and Helichrysum stoechas, were investigated. H. stoechas was chosen as outgroup for validating the method. After collection in the wild, plants were cultivated in standard growing conditions for over one year. Annual leafy shoots were screened in the post-blooming period for the emissions of volatile organic compounds (VOCs) by means of headspace solid phase microextraction coupled with gas-chromatography and mass spectrometry (HS-SPME-GC/MS). The VOC composition analysis revealed the production of overall 386 different compounds, with terpenes being the most represented compound class. Statistical data processing allowed the identification of the indicator compounds that differentiate the single populations, revealing the influence of the geographical provenance area in determining the volatile profiles. These results suggested the potential use of VOCs as valuable diacritical characters in discriminating the Helichrysum populations. In addition, the cross-validation analysis hinted the potentiality of this volatolomic study in the discrimination of the Helichrysum species and subspecies, highlighting a general congruence with the current taxonomic treatment of the genus. The consistency
Li, Hong-Dong; Xu, Qing-Song; Liang, Yi-Zeng
2012-08-31
The identification of disease-relevant genes represents a challenge in microarray-based disease diagnosis where the sample size is often limited. Among established methods, reversible jump Markov Chain Monte Carlo (RJMCMC) methods have proven to be quite promising for variable selection. However, the design and application of an RJMCMC algorithm requires, for example, special criteria for prior distributions. Also, the simulation from joint posterior distributions of models is computationally extensive, and may even be mathematically intractable. These disadvantages may limit the applications of RJMCMC algorithms. Therefore, the development of algorithms that possess the advantages of RJMCMC methods and are also efficient and easy to follow for selecting disease-associated genes is required. Here we report a RJMCMC-like method, called random frog that possesses the advantages of RJMCMC methods and is much easier to implement. Using the colon and the estrogen gene expression datasets, we show that random frog is effective in identifying discriminating genes. The top 2 ranked genes for colon and estrogen are Z50753, U00968, and Y10871_at, Z22536_at, respectively. (The source codes with GNU General Public License Version 2.0 are freely available to non-commercial users at: http://code.google.com/p/randomfrog/.).
DEFF Research Database (Denmark)
Hartelius, Karsten; Carstensen, Jens Michael
2003-01-01
A method for locating distorted grid structures in images is presented. The method is based on the theories of template matching and Bayesian image restoration. The grid is modeled as a deformable template. Prior knowledge of the grid is described through a Markov random field (MRF) model which...... nodes and the arc prior models variations in row and column spacing across the grid. Grid matching is done by placing an initial rough grid over the image and applying an ensemble annealing scheme to maximize the posterior distribution of the grid. The method can be applied to noisy images with missing...
Congdon, Peter
2014-01-01
This book provides an accessible approach to Bayesian computing and data analysis, with an emphasis on the interpretation of real data sets. Following in the tradition of the successful first edition, this book aims to make a wide range of statistical modeling applications accessible using tested code that can be readily adapted to the reader's own applications. The second edition has been thoroughly reworked and updated to take account of advances in the field. A new set of worked examples is included. The novel aspect of the first edition was the coverage of statistical modeling using WinBU
Singh, Amritpal; Saini, Barjinder Singh; Singh, Dilbag
2016-06-01
Multiscale approximate entropy (MAE) is used to quantify the complexity of a time series as a function of time scale τ. Approximate entropy (ApEn) tolerance threshold selection 'r' is based on either: (1) arbitrary selection in the recommended range (0.1-0.25) times standard deviation of time series (2) or finding maximum ApEn (ApEnmax) i.e., the point where self-matches start to prevail over other matches and choosing the corresponding 'r' (rmax) as threshold (3) or computing rchon by empirically finding the relation between rmax, SD1/SD2 ratio and N using curve fitting, where, SD1 and SD2 are short-term and long-term variability of a time series respectively. None of these methods is gold standard for selection of 'r'. In our previous study [1], an adaptive procedure for selection of 'r' is proposed for approximate entropy (ApEn). In this paper, this is extended to multiple time scales using MAEbin and multiscale cross-MAEbin (XMAEbin). We applied this to simulations i.e. 50 realizations (n = 50) of random number series, fractional Brownian motion (fBm) and MIX (P) [1] series of data length of N = 300 and short term recordings of HRV and SBPV performed under postural stress from supine to standing. MAEbin and XMAEbin analysis was performed on laboratory recorded data of 50 healthy young subjects experiencing postural stress from supine to upright. The study showed that (i) ApEnbin of HRV is more than SBPV in supine position but is lower than SBPV in upright position (ii) ApEnbin of HRV decreases from supine i.e. 1.7324 ± 0.112 (mean ± SD) to upright 1.4916 ± 0.108 due to vagal inhibition (iii) ApEnbin of SBPV increases from supine i.e. 1.5535 ± 0.098 to upright i.e. 1.6241 ± 0.101 due sympathetic activation (iv) individual and cross complexities of RRi and systolic blood pressure (SBP) series depend on time scale under consideration (v) XMAEbin calculated using ApEnmax is correlated with cross-MAE calculated using ApEn (0.1-0.26) in steps of 0
Classification using Bayesian neural nets
J.C. Bioch (Cor); O. van der Meer; R. Potharst (Rob)
1995-01-01
textabstractRecently, Bayesian methods have been proposed for neural networks to solve regression and classification problems. These methods claim to overcome some difficulties encountered in the standard approach such as overfitting. However, an implementation of the full Bayesian approach to neura
Bayesian Intersubjectivity and Quantum Theory
Pérez-Suárez, Marcos; Santos, David J.
2005-02-01
Two of the major approaches to probability, namely, frequentism and (subjectivistic) Bayesian theory, are discussed, together with the replacement of frequentist objectivity for Bayesian intersubjectivity. This discussion is then expanded to Quantum Theory, as quantum states and operations can be seen as structural elements of a subjective nature.
Bayesian Approach for Inconsistent Information.
Stein, M; Beer, M; Kreinovich, V
2013-10-01
In engineering situations, we usually have a large amount of prior knowledge that needs to be taken into account when processing data. Traditionally, the Bayesian approach is used to process data in the presence of prior knowledge. Sometimes, when we apply the traditional Bayesian techniques to engineering data, we get inconsistencies between the data and prior knowledge. These inconsistencies are usually caused by the fact that in the traditional approach, we assume that we know the exact sample values, that the prior distribution is exactly known, etc. In reality, the data is imprecise due to measurement errors, the prior knowledge is only approximately known, etc. So, a natural way to deal with the seemingly inconsistent information is to take this imprecision into account in the Bayesian approach - e.g., by using fuzzy techniques. In this paper, we describe several possible scenarios for fuzzifying the Bayesian approach. Particular attention is paid to the interaction between the estimated imprecise parameters. In this paper, to implement the corresponding fuzzy versions of the Bayesian formulas, we use straightforward computations of the related expression - which makes our computations reasonably time-consuming. Computations in the traditional (non-fuzzy) Bayesian approach are much faster - because they use algorithmically efficient reformulations of the Bayesian formulas. We expect that similar reformulations of the fuzzy Bayesian formulas will also drastically decrease the computation time and thus, enhance the practical use of the proposed methods.
Inference in hybrid Bayesian networks
DEFF Research Database (Denmark)
Lanseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael
2009-01-01
Since the 1980s, Bayesian Networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability-techniques (like fault trees...... decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability....
Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
bias of A1DE with minimal computational overhead. We here generalize that strategy to MI-weighted AnDE, using ws = MI(S, Y ), MI(s, Y ) = ∑ y∈Y ∑ xs...Operations and Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any...efficient in learning the model. WNANJE can model higher-order attribute interdependencies. 15. SUBJECT TERMS Big data, Low bias
Bayesian Inference on Gravitational Waves
Directory of Open Access Journals (Sweden)
Asad Ali
2015-12-01
Full Text Available The Bayesian approach is increasingly becoming popular among the astrophysics data analysis communities. However, the Pakistan statistics communities are unaware of this fertile interaction between the two disciplines. Bayesian methods have been in use to address astronomical problems since the very birth of the Bayes probability in eighteenth century. Today the Bayesian methods for the detection and parameter estimation of gravitational waves have solid theoretical grounds with a strong promise for the realistic applications. This article aims to introduce the Pakistan statistics communities to the applications of Bayesian Monte Carlo methods in the analysis of gravitational wave data with an overview of the Bayesian signal detection and estimation methods and demonstration by a couple of simplified examples.
Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G
2015-11-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from
A study of finite mixture model: Bayesian approach on financial time series data
Phoong, Seuk-Yen; Ismail, Mohd Tahir
2014-07-01
Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.
DEFF Research Database (Denmark)
Najjar, Mohammad; Moeini, Amirhossein; Dowlatabadi, Mohammadkazem Bakhshizadeh
2016-01-01
In this paper, the power quality standards such as IEC 61000-3-6, IEC 61000-2-12, EN 50160, and CIGRE WG 36-05 are fulfilled for single- and three-phase medium voltage applications by using Selective Harmonic Mitigation-PWM (SHM-PWM) in a Cascaded H-Bridge (CHB) converter. Furthermore, the ER G5....../4 power quality standard, which has the strictest grid codes at medium voltage level, will also be met in the whole range of the modulation indices. In order to achieve this goal, symmetrical CHB with variable DC Links are employed, while the converter has a low number of switching transitions. In other...
Borsboom, D.; Haig, B.D.
2013-01-01
Unlike most other statistical frameworks, Bayesian statistical inference is wedded to a particular approach in the philosophy of science (see Howson & Urbach, 2006); this approach is called Bayesianism. Rather than being concerned with model fitting, this position in the philosophy of science primar
A Bayesian Approach for Analyzing Longitudinal Structural Equation Models
Song, Xin-Yuan; Lu, Zhao-Hua; Hser, Yih-Ing; Lee, Sik-Yum
2011-01-01
This article considers a Bayesian approach for analyzing a longitudinal 2-level nonlinear structural equation model with covariates, and mixed continuous and ordered categorical variables. The first-level model is formulated for measures taken at each time point nested within individuals for investigating their characteristics that are dynamically…
Bayesian model discrimination for glucose-insulin homeostasis
DEFF Research Database (Denmark)
Andersen, Kim Emil; Brooks, Stephen P.; Højbjerre, Malene
the reformulation of existing deterministic models as stochastic state space models which properly accounts for both measurement and process variability. The analysis is further enhanced by Bayesian model discrimination techniques and model averaged parameter estimation which fully accounts for model as well...
Martin, B.A.; Saiki, M.K.
2005-01-01
We assessed the relation between abundance of desert pupfish, Cyprinodon macularius, and selected biological and physicochemical variables in natural and manmade habitats within the Salton Sea Basin. Field sampling in a natural tributary, Salt Creek, and three agricultural drains captured eight species including pupfish (1.1% of the total catch), the only native species encountered. According to Bray-Curtis resemblance functions, fish species assemblages differed mostly between Salt Creek and the drains (i.e., the three drains had relatively similar species assemblages). Pupfish numbers and environmental variables varied among sites and sample periods. Canonical correlation showed that pupfish abundance was positively correlated with abundance of western mosquitofish, Gambusia affinis, and negatively correlated with abundance of porthole livebearers, Poeciliopsis gracilis, tilapias (Sarotherodon mossambica and Tilapia zillii), longjaw mudsuckers, Gillichthys mirabilis, and mollies (Poecilia latipinnaandPoecilia mexicana). In addition, pupfish abundance was positively correlated with cover, pH, and salinity, and negatively correlated with sediment factor (a measure of sediment grain size) and dissolved oxygen. Pupfish abundance was generally highest in habitats where water quality extremes (especially high pH and salinity, and low dissolved oxygen) seemingly limited the occurrence of nonnative fishes. This study also documented evidence of predation by mudsuckers on pupfish. These findings support the contention of many resource managers that pupfish populations are adversely influenced by ecological interactions with nonnative fishes. ?? Springer 2005.
Directory of Open Access Journals (Sweden)
Saba Moghimi
Full Text Available Music-induced brain activity modulations in areas involved in emotion regulation may be useful in achieving therapeutic outcomes. Clinical applications of music may involve prolonged or repeated exposures to music. However, the variability of the observed brain activity patterns in repeated exposures to music is not well understood. We hypothesized that multiple exposures to the same music would elicit more consistent activity patterns than exposure to different music. In this study, the temporal and spatial variability of cerebral prefrontal hemodynamic response was investigated across multiple exposures to self-selected musical excerpts in 10 healthy adults. The hemodynamic changes were measured using prefrontal cortex near infrared spectroscopy and represented by instantaneous phase values. Based on spatial and temporal characteristics of these observed hemodynamic changes, we defined a consistency index to represent variability across these domains. The consistency index across repeated exposures to the same piece of music was compared to the consistency index corresponding to prefrontal activity from randomly matched non-identical musical excerpts. Consistency indexes were significantly different for identical versus non-identical musical excerpts when comparing a subset of repetitions. When all four exposures were compared, no significant difference was observed between the consistency indexes of randomly matched non-identical musical excerpts and the consistency index corresponding to repetitions of the same musical excerpts. This observation suggests the existence of only partial consistency between repeated exposures to the same musical excerpt, which may stem from the role of the prefrontal cortex in regulating other cognitive and emotional processes.
Bayesian Integration of multiscale environmental data
Energy Technology Data Exchange (ETDEWEB)
2016-08-22
The software is designed for efficiently integrating large-size of multi-scale environmental data using the Bayesian framework. Suppose we need to estimate the spatial distribution of variable X with high spatial resolution. The available data include (1) direct measurements Z of the unknowns with high resolution in a subset of the spatial domain (small spatial coverage), (2) measurements C of the unknowns at the median scale, and (3) measurements A of the unknowns at the coarsest scale but with large spatial coverage. The goal is to estimate the unknowns at the fine grids by conditioning to all the available data. We first consider all the unknowns as random variables and estimate conditional probability distribution of those variables by conditioning to the limited high-resolution observations (Z). We then treat the estimated probability distribution as the prior distribution. Within the Bayesian framework, we combine the median and large-scale measurements (C and A) through likelihood functions. Since we assume that all the relevant multivariate distributions are Gaussian, the resulting posterior distribution is a multivariate Gaussian distribution. The developed software provides numerical solutions of the posterior probability distribution. The software can be extended in several different ways to solve more general multi-scale data integration problems.
Mapping malaria risk in Bangladesh using Bayesian geostatistical models.
Reid, Heidi; Haque, Ubydul; Clements, Archie C A; Tatem, Andrew J; Vallely, Andrew; Ahmed, Syed Masud; Islam, Akramul; Haque, Rashidul
2010-10-01
Background malaria-control programs are increasingly dependent on accurate risk maps to effectively guide the allocation of interventions and resources. Advances in model-based geostatistics and geographical information systems (GIS) have enabled researchers to better understand factors affecting malaria transmission and thus, more accurately determine the limits of malaria transmission globally and nationally. Here, we construct Plasmodium falciparum risk maps for Bangladesh for 2007 at a scale enabling the malaria-control bodies to more accurately define the needs of the program. A comprehensive malaria-prevalence survey (N = 9,750 individuals; N = 354 communities) was carried out in 2007 across the regions of Bangladesh known to be endemic for malaria. Data were corrected to a standard age range of 2 to less than 10 years. Bayesian geostatistical logistic regression models with environmental covariates were used to predict P. falciparum prevalence for 2- to 10-year-old children (PfPR(2-10)) across the endemic areas of Bangladesh. The predictions were combined with gridded population data to estimate the number of individuals living in different endemicity classes. Across the endemic areas, the average PfPR(2-10) was 3.8%. Environmental variables selected for prediction were vegetation cover, minimum temperature, and elevation. Model validation statistics revealed that the final Bayesian geostatistical model had good predictive ability. Risk maps generated from the model showed a heterogeneous distribution of PfPR(2-10) ranging from 0.5% to 50%; 3.1 million people were estimated to be living in areas with a PfPR(2-10) greater than 1%. Contemporary GIS and model-based geostatistics can be used to interpolate malaria risk in Bangladesh. Importantly, malaria risk was found to be highly varied across the endemic regions, necessitating the targeting of resources to reduce the burden in these areas.
Road network safety evaluation using Bayesian hierarchical joint model.
Wang, Jie; Huang, Helai
2016-05-01
Safety and efficiency are commonly regarded as two significant performance indicators of transportation systems. In practice, road network planning has focused on road capacity and transport efficiency whereas the safety level of a road network has received little attention in the planning stage. This study develops a Bayesian hierarchical joint model for road network safety evaluation to help planners take traffic safety into account when planning a road network. The proposed model establishes relationships between road network risk and micro-level variables related to road entities and traffic volume, as well as socioeconomic, trip generation and network density variables at macro level which are generally used for long term transportation plans. In addition, network spatial correlation between intersections and their connected road segments is also considered in the model. A road network is elaborately selected in order to compare the proposed hierarchical joint model with a previous joint model and a negative binomial model. According to the results of the model comparison, the hierarchical joint model outperforms the joint model and negative binomial model in terms of the goodness-of-fit and predictive performance, which indicates the reasonableness of considering the hierarchical data structure in crash prediction and analysis. Moreover, both random effects at the TAZ level and the spatial correlation between intersections and their adjacent segments are found to be significant, supporting the employment of the hierarchical joint model as an alternative in road-network-level safety modeling as well.
Bayesian Recurrent Neural Network for Language Modeling.
Chien, Jen-Tzung; Ku, Yuan-Chu
2016-02-01
A language model (LM) is calculated as the probability of a word sequence that provides the solution to word prediction for a variety of information systems. A recurrent neural network (RNN) is powerful to learn the large-span dynamics of a word sequence in the continuous space. However, the training of the RNN-LM is an ill-posed problem because of too many parameters from a large dictionary size and a high-dimensional hidden layer. This paper presents a Bayesian approach to regularize the RNN-LM and apply it for continuous speech recognition. We aim to penalize the too complicated RNN-LM by compensating for the uncertainty of the estimated model parameters, which is represented by a Gaussian prior. The objective function in a Bayesian classification network is formed as the regularized cross-entropy error function. The regularized model is constructed not only by calculating the regularized parameters according to the maximum a posteriori criterion but also by estimating the Gaussian hyperparameter by maximizing the marginal likelihood. A rapid approximation to a Hessian matrix is developed to implement the Bayesian RNN-LM (BRNN-LM) by selecting a small set of salient outer-products. The proposed BRNN-LM achieves a sparser model than the RNN-LM. Experiments on different corpora show the robustness of system performance by applying the rapid BRNN-LM under different conditions.
An Integrated Procedure for Bayesian Reliability Inference Using MCMC
Directory of Open Access Journals (Sweden)
Jing Lin
2014-01-01
Full Text Available The recent proliferation of Markov chain Monte Carlo (MCMC approaches has led to the use of the Bayesian inference in a wide variety of fields. To facilitate MCMC applications, this paper proposes an integrated procedure for Bayesian inference using MCMC methods, from a reliability perspective. The goal is to build a framework for related academic research and engineering applications to implement modern computational-based Bayesian approaches, especially for reliability inferences. The procedure developed here is a continuous improvement process with four stages (Plan, Do, Study, and Action and 11 steps, including: (1 data preparation; (2 prior inspection and integration; (3 prior selection; (4 model selection; (5 posterior sampling; (6 MCMC convergence diagnostic; (7 Monte Carlo error diagnostic; (8 model improvement; (9 model comparison; (10 inference making; (11 data updating and inference improvement. The paper illustrates the proposed procedure using a case study.
Book review: Bayesian analysis for population ecology
Link, William A.
2011-01-01
Brian Dennis described the field of ecology as “fertile, uncolonized ground for Bayesian ideas.” He continued: “The Bayesian propagule has arrived at the shore. Ecologists need to think long and hard about the consequences of a Bayesian ecology. The Bayesian outlook is a successful competitor, but is it a weed? I think so.” (Dennis 2004)
Lü, Chengxu; Chen, Longjian; Yang, Zengling; Liu, Xian; Han, Lujia
2014-01-01
This article presents a novel method for combining auto-peak and cross-peak information for sensitive variable selection in synchronous two-dimensional correlation spectroscopy (2D-COS). This variable selection method is then applied to the case of near-infrared (NIR) microscopy discrimination of meat and bone meal (MBM). This is of important practical value because MBM is currently banned in ruminate animal compound feed. For the 2D-COS analysis, a set of NIR spectroscopy data of compound feed samples (adulterated with varying concentrations of MBM) was pretreated using standard normal variate and detrending (SNVD) and then mapped to the 2D-COS synchronous matrix. For the auto-peak analysis, 12 main sensitive variables were identified at 6852, 6388, 6320, 5788, 5600, 5244, 4900, 4768, 4572, 4336, 4256, and 4192 cm(-1). All these variables were assigned their specific spectral structure and chemical component. For the cross-peak analysis, these variables were divided into two groups, each group containing the six sensitive variables. This grouping resulted in a correlation between the spectral variables that was in accordance with the chemical-component content of the MBM and compound feed. These sensitive variables were then used to build a NIR microscopy discrimination model, which yielded a 97% correct classification. Moreover, this method detected the presence of MBM when its concentration was less than 1% in an adulterated compound feed sample. The concentration-dependent 2D-COS-based variable selection method developed in this study has the unique advantages of (1) introducing an interpretive aspect into variable selection, (2) substantially reducing the complexity of the computations, (3) enabling the transferability of the results to discriminant analysis, and (4) enabling the efficient compression of spectral data.
Institute of Scientific and Technical Information of China (English)
沈旭慧; 王珍
2011-01-01
BACKGROUND: The comparative study of groups design data is to compare the difference of response variable measurements on two or more groups of respondents. The Meta-analysis on the studies of this kind of design information is theoretically comparative maturity and consummate, but the researchers and system evaluators still face many difficulties during the Meta-analysis of groups design data. The Meta-analysis on group comparison study requires selecting carefully different effect sizes based on the structure of response variable.OBJECTIVE: To explore the effect size selection and announcements of response variable of different structure in Meta-analysis of groups design data.METHODS: Articles related to Meta-analysis or systematic review methodology literatu re about group comparison studies for the continuous, dichotomous, and combined outcomes in CNKI database, VIP database, Wanfang Chinese Doctoral database (1990/2009), and Pubmed database (1979/2009) were retrieved by computer. Outdated and repetitive researches were excluded .RESULTS AND CONCLUSION: Totally 30 literatu res were involved for summarization according to inclusion criteria. It is relatively common that outcome variables were presented in continuous or dichotomous forms in the research literature, in some situations, it may be more applicable, but when continuous outcome variables were converted to percentage, the determination of cut points may be too arbitrary after converted to percentage analysis, and some information were lost after the treatment of continuous variable dichotomy. It is still facing a huge methodological challenge for meta-analysis. In this review, we introduced a approach based on several treatments commonly used, namely, bayesian reconstruction method, which is based on bayesian model class (Hierarchical Bayesian Models) will overcome the shortcomings of arbitrary determination of cut points and information default.%背景:成组设计资料的比较研究,比较的是两
Ortega, Pedro A
2011-01-01
Discovering causal relationships is a hard task, often hindered by the need for intervention, and often requiring large amounts of data to resolve statistical uncertainty. However, humans quickly arrive at useful causal relationships. One possible reason is that humans use strong prior knowledge; and rather than encoding hard causal relationships, they encode beliefs over causal structures, allowing for sound generalization from the observations they obtain from directly acting in the world. In this work we propose a Bayesian approach to causal induction which allows modeling beliefs over multiple causal hypotheses and predicting the behavior of the world under causal interventions. We then illustrate how this method extracts causal information from data containing interventions and observations.
Blundell, Charles; Heller, Katherine A
2012-01-01
Hierarchical structure is ubiquitous in data across many domains. There are many hier- archical clustering methods, frequently used by domain experts, which strive to discover this structure. However, most of these meth- ods limit discoverable hierarchies to those with binary branching structure. This lim- itation, while computationally convenient, is often undesirable. In this paper we ex- plore a Bayesian hierarchical clustering algo- rithm that can produce trees with arbitrary branching structure at each node, known as rose trees. We interpret these trees as mixtures over partitions of a data set, and use a computationally efficient, greedy ag- glomerative algorithm to find the rose trees which have high marginal likelihood given the data. Lastly, we perform experiments which demonstrate that rose trees are better models of data than the typical binary trees returned by other hierarchical clustering algorithms.
Andrade, Daniel
2012-01-01
We present a new method to propagate lower bounds on conditional probability distributions in conventional Bayesian networks. Our method guarantees to provide outer approximations of the exact lower bounds. A key advantage is that we can use any available algorithms and tools for Bayesian networks in order to represent and infer lower bounds. This new method yields results that are provable exact for trees with binary variables, and results which are competitive to existing approximations in credal networks for all other network structures. Our method is not limited to a specific kind of network structure. Basically, it is also not restricted to a specific kind of inference, but we restrict our analysis to prognostic inference in this article. The computational complexity is superior to that of other existing approaches.
Bayesian Unsupervised Learning of DNA Regulatory Binding Regions
Directory of Open Access Journals (Sweden)
Jukka Corander
2009-01-01
positions within a set of DNA sequences are very rare in the literature. Here we show how such a learning problem can be formulated using a Bayesian model that targets to simultaneously maximize the marginal likelihood of sequence data arising under multiple motif types as well as under the background DNA model, which equals a variable length Markov chain. It is demonstrated how the adopted Bayesian modelling strategy combined with recently introduced nonstandard stochastic computation tools yields a more tractable learning procedure than is possible with the standard Monte Carlo approaches. Improvements and extensions of the proposed approach are also discussed.
Bayesian Inference for Functional Dynamics Exploring in fMRI Data.
Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing
2016-01-01
This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.
Bayesian Inference for Functional Dynamics Exploring in fMRI Data
Directory of Open Access Journals (Sweden)
Xuan Guo
2016-01-01
Full Text Available This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM, Bayesian Connectivity Change Point Model (BCCPM, and Dynamic Bayesian Variable Partition Model (DBVPM, and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.
Current trends in Bayesian methodology with applications
Upadhyay, Satyanshu K; Dey, Dipak K; Loganathan, Appaia
2015-01-01
Collecting Bayesian material scattered throughout the literature, Current Trends in Bayesian Methodology with Applications examines the latest methodological and applied aspects of Bayesian statistics. The book covers biostatistics, econometrics, reliability and risk analysis, spatial statistics, image analysis, shape analysis, Bayesian computation, clustering, uncertainty assessment, high-energy astrophysics, neural networking, fuzzy information, objective Bayesian methodologies, empirical Bayes methods, small area estimation, and many more topics.Each chapter is self-contained and focuses on
Multisnapshot Sparse Bayesian Learning for DOA
Gerstoft, Peter; Mecklenbrauker, Christoph F.; Xenaki, Angeliki; Nannuru, Santosh
2016-10-01
The directions of arrival (DOA) of plane waves are estimated from multi-snapshot sensor array data using Sparse Bayesian Learning (SBL). The prior source amplitudes is assumed independent zero-mean complex Gaussian distributed with hyperparameters the unknown variances (i.e. the source powers). For a complex Gaussian likelihood with hyperparameter the unknown noise variance, the corresponding Gaussian posterior distribution is derived. For a given number of DOAs, the hyperparameters are automatically selected by maximizing the evidence and promote sparse DOA estimates. The SBL scheme for DOA estimation is discussed and evaluated competitively against LASSO ($\\ell_1$-regularization), conventional beamforming, and MUSIC
A Bayesian Framework for Combining Valuation Estimates
Yee, Kenton K
2007-01-01
Obtaining more accurate equity value estimates is the starting point for stock selection, value-based indexing in a noisy market, and beating benchmark indices through tactical style rotation. Unfortunately, discounted cash flow, method of comparables, and fundamental analysis typically yield discrepant valuation estimates. Moreover, the valuation estimates typically disagree with market price. Can one form a superior valuation estimate by averaging over the individual estimates, including market price? This article suggests a Bayesian framework for combining two or more estimates into a superior valuation estimate. The framework justifies the common practice of averaging over several estimates to arrive at a final point estimate.
A COMPOUND POISSON MODEL FOR LEARNING DISCRETE BAYESIAN NETWORKS
Institute of Scientific and Technical Information of China (English)
Abdelaziz GHRIBI; Afif MASMOUDI
2013-01-01
We introduce here the concept of Bayesian networks, in compound Poisson model, which provides a graphical modeling framework that encodes the joint probability distribution for a set of random variables within a directed acyclic graph. We suggest an approach proposal which offers a new mixed implicit estimator. We show that the implicit approach applied in compound Poisson model is very attractive for its ability to understand data and does not require any prior information. A comparative study between learned estimates given by implicit and by standard Bayesian approaches is established. Under some conditions and based on minimal squared error calculations, we show that the mixed implicit estimator is better than the standard Bayesian and the maximum likelihood estimators. We illustrate our approach by considering a simulation study in the context of mobile communication networks.
Irregular-Time Bayesian Networks
Ramati, Michael
2012-01-01
In many fields observations are performed irregularly along time, due to either measurement limitations or lack of a constant immanent rate. While discrete-time Markov models (as Dynamic Bayesian Networks) introduce either inefficient computation or an information loss to reasoning about such processes, continuous-time Markov models assume either a discrete state space (as Continuous-Time Bayesian Networks), or a flat continuous state space (as stochastic dif- ferential equations). To address these problems, we present a new modeling class called Irregular-Time Bayesian Networks (ITBNs), generalizing Dynamic Bayesian Networks, allowing substantially more compact representations, and increasing the expressivity of the temporal dynamics. In addition, a globally optimal solution is guaranteed when learning temporal systems, provided that they are fully observed at the same irregularly spaced time-points, and a semiparametric subclass of ITBNs is introduced to allow further adaptation to the irregular nature of t...
Profile-Based LC-MS data alignment--a Bayesian approach.
Tsai, Tsung-Heng; Tadesse, Mahlet G; Wang, Yue; Ressom, Habtom W
2013-01-01
A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets.