Bayesian variable selection in regression
Energy Technology Data Exchange (ETDEWEB)
Mitchell, T.J.; Beauchamp, J.J.
1987-01-01
This paper is concerned with the selection of subsets of ''predictor'' variables in a linear regression model for the prediction of a ''dependent'' variable. We take a Bayesian approach and assign a probability distribution to the dependent variable through a specification of prior distributions for the unknown parameters in the regression model. The appropriate posterior probabilities are derived for each submodel and methods are proposed for evaluating the family of prior distributions. Examples are given that show the application of the Bayesian methodology. 23 refs., 3 figs.
Bayesian Group Bridge for Bi-level Variable Selection.
Mallick, Himel; Yi, Nengjun
2017-06-01
A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.
Bayesian Variable Selection on Model Spaces Constrained by Heredity Conditions.
Taylor-Rodriguez, Daniel; Womack, Andrew; Bliznyuk, Nikolay
2016-01-01
This paper investigates Bayesian variable selection when there is a hierarchical dependence structure on the inclusion of predictors in the model. In particular, we study the type of dependence found in polynomial response surfaces of orders two and higher, whose model spaces are required to satisfy weak or strong heredity conditions. These conditions restrict the inclusion of higher-order terms depending upon the inclusion of lower-order parent terms. We develop classes of priors on the model space, investigate their theoretical and finite sample properties, and provide a Metropolis-Hastings algorithm for searching the space of models. The tools proposed allow fast and thorough exploration of model spaces that account for hierarchical polynomial structure in the predictors and provide control of the inclusion of false positives in high posterior probability models.
Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences.
Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric
2016-01-01
Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach.
Bayesian Multiresolution Variable Selection for Ultra-High Dimensional Neuroimaging Data.
Zhao, Yize; Kang, Jian; Long, Qi
2018-01-01
Ultra-high dimensional variable selection has become increasingly important in analysis of neuroimaging data. For example, in the Autism Brain Imaging Data Exchange (ABIDE) study, neuroscientists are interested in identifying important biomarkers for early detection of the autism spectrum disorder (ASD) using high resolution brain images that include hundreds of thousands voxels. However, most existing methods are not feasible for solving this problem due to their extensive computational costs. In this work, we propose a novel multiresolution variable selection procedure under a Bayesian probit regression framework. It recursively uses posterior samples for coarser-scale variable selection to guide the posterior inference on finer-scale variable selection, leading to very efficient Markov chain Monte Carlo (MCMC) algorithms. The proposed algorithms are computationally feasible for ultra-high dimensional data. Also, our model incorporates two levels of structural information into variable selection using Ising priors: the spatial dependence between voxels and the functional connectivity between anatomical brain regions. Applied to the resting state functional magnetic resonance imaging (R-fMRI) data in the ABIDE study, our methods identify voxel-level imaging biomarkers highly predictive of the ASD, which are biologically meaningful and interpretable. Extensive simulations also show that our methods achieve better performance in variable selection compared to existing methods.
Joint Bayesian variable and graph selection for regression models with network-structured predictors
Peterson, C. B.; Stingo, F. C.; Vannucci, M.
2015-01-01
In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications since it allows the identification of pathways of functionally related genes or proteins which impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings, and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival. PMID:26514925
Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei
2013-01-01
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.
Directory of Open Access Journals (Sweden)
Bin Peng
Full Text Available The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.
Locating disease genes using Bayesian variable selection with the Haseman-Elston method
Directory of Open Access Journals (Sweden)
He Qimei
2003-12-01
Full Text Available Abstract Background We applied stochastic search variable selection (SSVS, a Bayesian model selection method, to the simulated data of Genetic Analysis Workshop 13. We used SSVS with the revisited Haseman-Elston method to find the markers linked to the loci determining change in cholesterol over time. To study gene-gene interaction (epistasis and gene-environment interaction, we adopted prior structures, which incorporate the relationship among the predictors. This allows SSVS to search in the model space more efficiently and avoid the less likely models. Results In applying SSVS, instead of looking at the posterior distribution of each of the candidate models, which is sensitive to the setting of the prior, we ranked the candidate variables (markers according to their marginal posterior probability, which was shown to be more robust to the prior. Compared with traditional methods that consider one marker at a time, our method considers all markers simultaneously and obtains more favorable results. Conclusions We showed that SSVS is a powerful method for identifying linked markers using the Haseman-Elston method, even for weak effects. SSVS is very effective because it does a smart search over the entire model space.
Santra, Tapesh; Kolch, Walter; Kholodenko, Boris N
2013-07-06
Recent advancements in genetics and proteomics have led to the acquisition of large quantitative data sets. However, the use of these data to reverse engineer biochemical networks has remained a challenging problem. Many methods have been proposed to infer biochemical network topologies from different types of biological data. Here, we focus on unraveling network topologies from steady state responses of biochemical networks to successive experimental perturbations. We propose a computational algorithm which combines a deterministic network inference method termed Modular Response Analysis (MRA) and a statistical model selection algorithm called Bayesian Variable Selection, to infer functional interactions in cellular signaling pathways and gene regulatory networks. It can be used to identify interactions among individual molecules involved in a biochemical pathway or reveal how different functional modules of a biological network interact with each other to exchange information. In cases where not all network components are known, our method reveals functional interactions which are not direct but correspond to the interaction routes through unknown elements. Using computer simulated perturbation responses of signaling pathways and gene regulatory networks from the DREAM challenge, we demonstrate that the proposed method is robust against noise and scalable to large networks. We also show that our method can infer network topologies using incomplete perturbation datasets. Consequently, we have used this algorithm to explore the ERBB regulated G1/S transition pathway in certain breast cancer cells to understand the molecular mechanisms which cause these cells to become drug resistant. The algorithm successfully inferred many well characterized interactions of this pathway by analyzing experimentally obtained perturbation data. Additionally, it identified some molecular interactions which promote drug resistance in breast cancer cells. The proposed algorithm
Shahbaba, Babak; Johnson, Wesley O
2013-05-30
High-throughput scientific studies involving no clear a priori hypothesis are common. For example, a large-scale genomic study of a disease may examine thousands of genes without hypothesizing that any specific gene is responsible for the disease. In these studies, the objective is to explore a large number of possible factors (e.g., genes) in order to identify a small number that will be considered in follow-up studies that tend to be more thorough and on smaller scales. A simple, hierarchical, linear regression model with random coefficients is assumed for case-control data that correspond to each gene. The specific model used will be seen to be related to a standard Bayesian variable selection model. Relatively large regression coefficients correspond to potential differences in responses for cases versus controls and thus to genes that might 'matter'. For large-scale studies, and using a Dirichlet process mixture model for the regression coefficients, we are able to find clusters of regression effects of genes with increasing potential effect or 'relevance', in relation to the outcome of interest. One cluster will always correspond to genes whose coefficients are in a neighborhood that is relatively close to zero and will be deemed least relevant. Other clusters will correspond to increasing magnitudes of the random/latent regression coefficients. Using simulated data, we demonstrate that our approach could be quite effective in finding relevant genes compared with several alternative methods. We apply our model to two large-scale studies. The first study involves transcriptome analysis of infection by human cytomegalovirus. The second study's objective is to identify differentially expressed genes between two types of leukemia. Copyright © 2012 John Wiley & Sons, Ltd.
Bayesian Variable Selection in Multilevel Item Response Theory Models with Application in Genomics.
Fragoso, Tiago M; de Andrade, Mariza; Pereira, Alexandre C; Rosa, Guilherme J M; Soler, Júlia M P
2016-04-01
The goal of this paper is to present an implementation of stochastic search variable selection (SSVS) to multilevel model from item response theory (IRT). As experimental settings get more complex and models are required to integrate multiple (and sometimes massive) sources of information, a model that can jointly summarize and select the most relevant characteristics can provide better interpretation and a deeper insight into the problem. A multilevel IRT model recently proposed in the literature for modeling multifactorial diseases is extended to perform variable selection in the presence of thousands of covariates using SSVS. We derive conditional distributions required for such a task as well as an acceptance-rejection step that allows for the SSVS in high dimensional settings using a Markov Chain Monte Carlo algorithm. We validate the variable selection procedure through simulation studies, and illustrate its application on a study with genetic markers associated with the metabolic syndrome. © 2016 WILEY PERIODICALS, INC.
Bhadra, Anindya
2013-04-22
We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. © 2013, The International Biometric Society.
Zhang, Linlin; Guindani, Michele; Versace, Francesco; Vannucci, Marina
2014-07-15
In this paper we present a novel wavelet-based Bayesian nonparametric regression model for the analysis of functional magnetic resonance imaging (fMRI) data. Our goal is to provide a joint analytical framework that allows to detect regions of the brain which exhibit neuronal activity in response to a stimulus and, simultaneously, infer the association, or clustering, of spatially remote voxels that exhibit fMRI time series with similar characteristics. We start by modeling the data with a hemodynamic response function (HRF) with a voxel-dependent shape parameter. We detect regions of the brain activated in response to a given stimulus by using mixture priors with a spike at zero on the coefficients of the regression model. We account for the complex spatial correlation structure of the brain by using a Markov random field (MRF) prior on the parameters guiding the selection of the activated voxels, therefore capturing correlation among nearby voxels. In order to infer association of the voxel time courses, we assume correlated errors, in particular long memory, and exploit the whitening properties of discrete wavelet transforms. Furthermore, we achieve clustering of the voxels by imposing a Dirichlet process (DP) prior on the parameters of the long memory process. For inference, we use Markov Chain Monte Carlo (MCMC) sampling techniques that combine Metropolis-Hastings schemes employed in Bayesian variable selection with sampling algorithms for nonparametric DP models. We explore the performance of the proposed model on simulated data, with both block- and event-related design, and on real fMRI data. Copyright © 2014 Elsevier Inc. All rights reserved.
Directory of Open Access Journals (Sweden)
Benoît Liquet
2016-01-01
Full Text Available Technological advances in molecular biology over the past decade have given rise to high dimensional and complex datasets offering the possibility to investigate biological associations between a range of genomic features and complex phenotypes. The analysis of this novel type of data generated unprecedented computational challenges which ultimately led to the definition and implementation of computationally efficient statistical models that were able to scale to genome-wide data, including Bayesian variable selection approaches. While extensive methodological work has been carried out in this area, only few methods capable of handling hundreds of thousands of predictors were implemented and distributed. Among these we recently proposed GUESS, a computationally optimised algorithm making use of graphics processing unit capabilities, which can accommodate multiple outcomes. In this paper we propose R2GUESS, an R package wrapping the original C++ source code. In addition to providing a user-friendly interface of the original code automating its parametrisation, and data handling, R2GUESS also incorporates many features to explore the data, to extend statistical inferences from the native algorithm (e.g., effect size estimation, significance assessment, and to visualize outputs from the algorithm. We first detail the model and its parametrisation, and describe in details its optimised implementation. Based on two examples we finally illustrate its statistical performances and flexibility.
Learning dynamic Bayesian networks with mixed variables
DEFF Research Database (Denmark)
Bøttcher, Susanne Gammelgaard
This paper considers dynamic Bayesian networks for discrete and continuous variables. We only treat the case, where the distribution of the variables is conditional Gaussian. We show how to learn the parameters and structure of a dynamic Bayesian network and also how the Markov order can be learn....... An automated procedure for specifying prior distributions for the parameters in a dynamic Bayesian network is presented. It is a simple extension of the procedure for the ordinary Bayesian networks. Finally the W¨olfer?s sunspot numbers are analyzed....
Tsai, Miao-Yu
2015-03-01
The problem of variable selection in the generalized linear-mixed models (GLMMs) is pervasive in statistical practice. For the purpose of variable selection, many methodologies for determining the best subset of explanatory variables currently exist according to the model complexity and differences between applications. In this paper, we develop a "higher posterior probability model with bootstrap" (HPMB) approach to select explanatory variables without fitting all possible GLMMs involving a small or moderate number of explanatory variables. Furthermore, to save computational load, we propose an efficient approximation approach with Laplace's method and Taylor's expansion to approximate intractable integrals in GLMMs. Simulation studies and an application of HapMap data provide evidence that this selection approach is computationally feasible and reliable for exploring true candidate genes and gene-gene associations, after adjusting for complex structures among clusters. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bayesian variable order Markov models: Towards Bayesian predictive state representations
Dimitrakakis, C.
2009-01-01
We present a Bayesian variable order Markov model that shares many similarities with predictive state representations. The resulting models are compact and much easier to specify and learn than classical predictive state representations. Moreover, we show that they significantly outperform a more
Can natural selection encode Bayesian priors?
Ramírez, Juan Camilo; Marshall, James A R
2017-08-07
The evolutionary success of many organisms depends on their ability to make decisions based on estimates of the state of their environment (e.g., predation risk) from uncertain information. These decision problems have optimal solutions and individuals in nature are expected to evolve the behavioural mechanisms to make decisions as if using the optimal solutions. Bayesian inference is the optimal method to produce estimates from uncertain data, thus natural selection is expected to favour individuals with the behavioural mechanisms to make decisions as if they were computing Bayesian estimates in typically-experienced environments, although this does not necessarily imply that favoured decision-makers do perform Bayesian computations exactly. Each individual should evolve to behave as if updating a prior estimate of the unknown environment variable to a posterior estimate as it collects evidence. The prior estimate represents the decision-maker's default belief regarding the environment variable, i.e., the individual's default 'worldview' of the environment. This default belief has been hypothesised to be shaped by natural selection and represent the environment experienced by the individual's ancestors. We present an evolutionary model to explore how accurately Bayesian prior estimates can be encoded genetically and shaped by natural selection when decision-makers learn from uncertain information. The model simulates the evolution of a population of individuals that are required to estimate the probability of an event. Every individual has a prior estimate of this probability and collects noisy cues from the environment in order to update its prior belief to a Bayesian posterior estimate with the evidence gained. The prior is inherited and passed on to offspring. Fitness increases with the accuracy of the posterior estimates produced. Simulations show that prior estimates become accurate over evolutionary time. In addition to these 'Bayesian' individuals, we also
Koslovsky, Matthew D; Swartz, Michael D; Chan, Wenyaw; Leon-Novelo, Luis; Wilkinson, Anna V; Kendzor, Darla E; Businelle, Michael S
2017-10-11
The application of sophisticated analytical methods to intensive longitudinal data, collected with ecological momentary assessments (EMA), has helped researchers better understand smoking behaviors after a quit attempt. Unfortunately, the wealth of information captured with EMAs is typically underutilized in practice. Thus, novel methods are needed to extract this information in exploratory research studies. One of the main objectives of intensive longitudinal data analysis is identifying relations between risk factors and outcomes of interest. Our goal is to develop and apply expectation maximization variable selection for Bayesian multistate Markov models with interval-censored data to generate new insights into the relation between potential risk factors and transitions between smoking states. Through simulation, we demonstrate the effectiveness of our method in identifying associated risk factors and its ability to outperform the LASSO in a special case. Additionally, we use the expectation conditional-maximization algorithm to simplify estimation, a deterministic annealing variant to reduce the algorithm's dependence on starting values, and Louis's method to estimate unknown parameter uncertainty. We then apply our method to intensive longitudinal data collected with EMA to identify risk factors associated with transitions between smoking states after a quit attempt in a cohort of socioeconomically disadvantaged smokers who were interested in quitting. © 2017, The International Biometric Society.
Zhang, Zhen; Sinha, Samiran; Maiti, Tapabrata; Shipp, Eva
2018-04-01
Accelerated failure time model is a popular model to analyze censored time-to-event data. Analysis of this model without assuming any parametric distribution for the model error is challenging, and the model complexity is enhanced in the presence of large number of covariates. We developed a nonparametric Bayesian method for regularized estimation of the regression parameters in a flexible accelerated failure time model. The novelties of our method lie in modeling the error distribution of the accelerated failure time nonparametrically, modeling the variance as a function of the mean, and adopting a variable selection technique in modeling the mean. The proposed method allowed for identifying a set of important regression parameters, estimating survival probabilities, and constructing credible intervals of the survival probabilities. We evaluated operating characteristics of the proposed method via simulation studies. Finally, we apply our new comprehensive method to analyze the motivating breast cancer data from the Surveillance, Epidemiology, and End Results Program, and estimate the five-year survival probabilities for women included in the Surveillance, Epidemiology, and End Results database who were diagnosed with breast cancer between 1990 and 2000.
Chemical identification using Bayesian model selection
Energy Technology Data Exchange (ETDEWEB)
Burr, Tom; Fry, H. A. (Herbert A.); McVey, B. D. (Brian D.); Sander, E. (Eric)
2002-01-01
Remote detection and identification of chemicals in a scene is a challenging problem. We introduce an approach that uses some of the image's pixels to establish the background characteristics while other pixels represent the target for which we seek to identify all chemical species present. This leads to a generalized least squares problem in which we focus on 'subset selection' to identify the chemicals thought to be present. Bayesian model selection allows us to approximate the posterior probability that each chemical in the library is present by adding the posterior probabilities of all the subsets which include the chemical. We present results using realistic simulated data for the case with 1 to 5 chemicals present in each target and compare performance to a hybrid of forward and backward stepwise selection procedure using the F statistic.
Bayesian site selection for fast Gaussian process regression
Pourhabib, Arash
2014-02-05
Gaussian Process (GP) regression is a popular method in the field of machine learning and computer experiment designs; however, its ability to handle large data sets is hindered by the computational difficulty in inverting a large covariance matrix. Likelihood approximation methods were developed as a fast GP approximation, thereby reducing the computation cost of GP regression by utilizing a much smaller set of unobserved latent variables called pseudo points. This article reports a further improvement to the likelihood approximation methods by simultaneously deciding both the number and locations of the pseudo points. The proposed approach is a Bayesian site selection method where both the number and locations of the pseudo inputs are parameters in the model, and the Bayesian model is solved using a reversible jump Markov chain Monte Carlo technique. Through a number of simulated and real data sets, it is demonstrated that with appropriate priors chosen, the Bayesian site selection method can produce a good balance between computation time and prediction accuracy: it is fast enough to handle large data sets that a full GP is unable to handle, and it improves, quite often remarkably, the prediction accuracy, compared with the existing likelihood approximations. © 2014 Taylor and Francis Group, LLC.
Smartphone technologies and Bayesian networks to assess shorebird habitat selection
Zeigler, Sara; Thieler, E. Robert; Gutierrez, Ben; Plant, Nathaniel G.; Hines, Megan K.; Fraser, James D.; Catlin, Daniel H.; Karpanty, Sarah M.
2017-01-01
Understanding patterns of habitat selection across a species’ geographic distribution can be critical for adequately managing populations and planning for habitat loss and related threats. However, studies of habitat selection can be time consuming and expensive over broad spatial scales, and a lack of standardized monitoring targets or methods can impede the generalization of site-based studies. Our objective was to collaborate with natural resource managers to define available nesting habitat for piping plovers (Charadrius melodus) throughout their U.S. Atlantic coast distribution from Maine to North Carolina, with a goal of providing science that could inform habitat management in response to sea-level rise. We characterized a data collection and analysis approach as being effective if it provided low-cost collection of standardized habitat-selection data across the species’ breeding range within 1–2 nesting seasons and accurate nesting location predictions. In the method developed, >30 managers and conservation practitioners from government agencies and private organizations used a smartphone application, “iPlover,” to collect data on landcover characteristics at piping plover nest locations and random points on 83 beaches and barrier islands in 2014 and 2015. We analyzed these data with a Bayesian network that predicted the probability a specific combination of landcover variables would be associated with a nesting site. Although we focused on a shorebird, our approach can be modified for other taxa. Results showed that the Bayesian network performed well in predicting habitat availability and confirmed predicted habitat preferences across the Atlantic coast breeding range of the piping plover. We used the Bayesian network to map areas with a high probability of containing nesting habitat on the Rockaway Peninsula in New York, USA, as an example application. Our approach facilitated the collation of evidence-based information on habitat selection
Model Criticism of Bayesian Networks with Latent Variables.
Williamson, David M.; Mislevy, Robert J.; Almond, Russell G.
This study investigated statistical methods for identifying errors in Bayesian networks (BN) with latent variables, as found in intelligent cognitive assessments. BN, commonly used in artificial intelligence systems, are promising mechanisms for scoring constructed-response examinations. The success of an intelligent assessment or tutoring system…
A guide to Bayesian model selection for ecologists
Hooten, Mevin B.; Hobbs, N.T.
2015-01-01
The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.
Bayesian Model Selection in Geophysics: The evidence
Vrugt, J. A.
2016-12-01
Bayesian inference has found widespread application and use in science and engineering to reconcile Earth system models with data, including prediction in space (interpolation), prediction in time (forecasting), assimilation of observations and deterministic/stochastic model output, and inference of the model parameters. Per Bayes theorem, the posterior probability, , P(H|D), of a hypothesis, H, given the data D, is equivalent to the product of its prior probability, P(H), and likelihood, L(H|D), divided by a normalization constant, P(D). In geophysics, the hypothesis, H, often constitutes a description (parameterization) of the subsurface for some entity of interest (e.g. porosity, moisture content). The normalization constant, P(D), is not required for inference of the subsurface structure, yet of great value for model selection. Unfortunately, it is not particularly easy to estimate P(D) in practice. Here, I will introduce the various building blocks of a general purpose method which provides robust and unbiased estimates of the evidence, P(D). This method uses multi-dimensional numerical integration of the posterior (parameter) distribution. I will then illustrate this new estimator by application to three competing subsurface models (hypothesis) using GPR travel time data from the South Oyster Bacterial Transport Site, in Virginia, USA. The three subsurface models differ in their treatment of the porosity distribution and use (a) horizontal layering with fixed layer thicknesses, (b) vertical layering with fixed layer thicknesses and (c) a multi-Gaussian field. The results of the new estimator are compared against the brute force Monte Carlo method, and the Laplace-Metropolis method.
Thomas, D.L.; Johnson, D.; Griffith, B.
2006-01-01
Modeling the probability of use of land units characterized by discrete and continuous measures, we present a Bayesian random-effects model to assess resource selection. This model provides simultaneous estimation of both individual- and population-level selection. Deviance information criterion (DIC), a Bayesian alternative to AIC that is sample-size specific, is used for model selection. Aerial radiolocation data from 76 adult female caribou (Rangifer tarandus) and calf pairs during 1 year on an Arctic coastal plain calving ground were used to illustrate models and assess population-level selection of landscape attributes, as well as individual heterogeneity of selection. Landscape attributes included elevation, NDVI (a measure of forage greenness), and land cover-type classification. Results from the first of a 2-stage model-selection procedure indicated that there is substantial heterogeneity among cow-calf pairs with respect to selection of the landscape attributes. In the second stage, selection of models with heterogeneity included indicated that at the population-level, NDVI and land cover class were significant attributes for selection of different landscapes by pairs on the calving ground. Population-level selection coefficients indicate that the pairs generally select landscapes with higher levels of NDVI, but the relationship is quadratic. The highest rate of selection occurs at values of NDVI less than the maximum observed. Results for land cover-class selections coefficients indicate that wet sedge, moist sedge, herbaceous tussock tundra, and shrub tussock tundra are selected at approximately the same rate, while alpine and sparsely vegetated landscapes are selected at a lower rate. Furthermore, the variability in selection by individual caribou for moist sedge and sparsely vegetated landscapes is large relative to the variability in selection of other land cover types. The example analysis illustrates that, while sometimes computationally intense, a
DISSECTING MAGNETAR VARIABILITY WITH BAYESIAN HIERARCHICAL MODELS
Energy Technology Data Exchange (ETDEWEB)
Huppenkothen, Daniela; Elenbaas, Chris; Watts, Anna L.; Horst, Alexander J. van der [Anton Pannekoek Institute for Astronomy, University of Amsterdam, Postbus 94249, 1090 GE Amsterdam (Netherlands); Brewer, Brendon J. [Department of Statistics, The University of Auckland, Private Bag 92019, Auckland 1142 (New Zealand); Hogg, David W. [Center for Data Science, New York University, 726 Broadway, 7th Floor, New York, NY 10003 (United States); Murray, Iain [School of Informatics, University of Edinburgh, Edinburgh EH8 9AB (United Kingdom); Frean, Marcus [School of Engineering and Computer Science, Victoria University of Wellington (New Zealand); Levin, Yuri [Monash Center for Astrophysics and School of Physics, Monash University, Clayton, Victoria 3800 (Australia); Kouveliotou, Chryssa, E-mail: daniela.huppenkothen@nyu.edu [Astrophysics Office, ZP 12, NASA/Marshall Space Flight Center, Huntsville, AL 35812 (United States)
2015-09-01
Neutron stars are a prime laboratory for testing physical processes under conditions of strong gravity, high density, and extreme magnetic fields. Among the zoo of neutron star phenomena, magnetars stand out for their bursting behavior, ranging from extremely bright, rare giant flares to numerous, less energetic recurrent bursts. The exact trigger and emission mechanisms for these bursts are not known; favored models involve either a crust fracture and subsequent energy release into the magnetosphere, or explosive reconnection of magnetic field lines. In the absence of a predictive model, understanding the physical processes responsible for magnetar burst variability is difficult. Here, we develop an empirical model that decomposes magnetar bursts into a superposition of small spike-like features with a simple functional form, where the number of model components is itself part of the inference problem. The cascades of spikes that we model might be formed by avalanches of reconnection, or crust rupture aftershocks. Using Markov Chain Monte Carlo sampling augmented with reversible jumps between models with different numbers of parameters, we characterize the posterior distributions of the model parameters and the number of components per burst. We relate these model parameters to physical quantities in the system, and show for the first time that the variability within a burst does not conform to predictions from ideas of self-organized criticality. We also examine how well the properties of the spikes fit the predictions of simplified cascade models for the different trigger mechanisms.
Bayesian assessment of the variability of reliability measures
Directory of Open Access Journals (Sweden)
Enrique López Droguett
2006-04-01
Full Text Available Population variability analysis, also known as the first stage in two-stage Bayesian updating, is an estimation procedure for the assessment of the variability of reliability measures among a group of sub-populations of similar systems. The estimated variability distributions are used as prior distributions in system-specific Bayesian updates. In this paper we present a Bayesian approach for population variability analysis involving the use of non-conjugate variability models that works over a continuous, rather than the discretized, variability model parameter space. The cases to be discussed are the ones typically encountered by the reliability practitioner: run-time data for failure rate assessment, demand-based data for failure probability assessment, and expert-based evidence for failure rate and failure probability analysis. We outline the estimation procedure itself as well as its link with conventional Bayesian updating procedures, describe the results generated by the procedures and their behavior under various data conditions, and provide numerical examples.Análise de variabilidade populacional, também conhecida como o primeiro estágio no processo de atualização Bayesiana em dois estágios, é um procedimento de estimação utilizado para a quantificação da variabilidade de métricas de confiabilidade num conjunto de sub-populações de sistemas similares. As distribuições de variabilidade obtidas são usadas como distribuições a priori em atualizações Bayesianas específicas para um sistema. Neste artigo, apresenta-se um procedimento Bayesiano para a análise da variabilidade populacional envolvendo o uso de modelos de variabilidade não-conjugados que utilizam um espaço contínuo, ao invés de discreto, dos parâmetros do modelo de variabilidade. Discutem-se casos tipicamente encontrados na prática: dados de falha no tempo, dados de falha sob demanda e opiniões de especialistas para a estimação da taxa e
Bayesian approach to errors-in-variables in regression models
Rozliman, Nur Aainaa; Ibrahim, Adriana Irawati Nur; Yunus, Rossita Mohammad
2017-05-01
In many applications and experiments, data sets are often contaminated with error or mismeasured covariates. When at least one of the covariates in a model is measured with error, Errors-in-Variables (EIV) model can be used. Measurement error, when not corrected, would cause misleading statistical inferences and analysis. Therefore, our goal is to examine the relationship of the outcome variable and the unobserved exposure variable given the observed mismeasured surrogate by applying the Bayesian formulation to the EIV model. We shall extend the flexible parametric method proposed by Hossain and Gustafson (2009) to another nonlinear regression model which is the Poisson regression model. We shall then illustrate the application of this approach via a simulation study using Markov chain Monte Carlo sampling methods.
Directory of Open Access Journals (Sweden)
Thanoon Y. Thanoon
2016-03-01
Full Text Available In this paper, ordered categorical variables are used to compare between linear and nonlinear interactions of fixed covariate and latent variables Bayesian structural equation models. Gibbs sampling method is applied for estimation and model comparison. Hidden continuous normal distribution (censored normal distribution is used to handle the problem of ordered categorical data. Statistical inferences, which involve estimation of parameters and their standard deviations, and residuals analyses for testing the selected model, are discussed. The proposed procedure is illustrated by a simulation data obtained from R program. Analysis are done by using OpenBUGS program.
Bayesian model selection: Evidence estimation based on DREAM simulation and bridge sampling
Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.
2017-04-01
Bayesian inference has found widespread application in Earth and Environmental Systems Modeling, providing an effective tool for prediction, data assimilation, parameter estimation, uncertainty analysis and hypothesis testing. Under multiple competing hypotheses, the Bayesian approach also provides an attractive alternative to traditional information criteria (e.g. AIC, BIC) for model selection. The key variable for Bayesian model selection is the evidence (or marginal likelihood) that is the normalizing constant in the denominator of Bayes theorem; while it is fundamental for model selection, the evidence is not required for Bayesian inference. It is computed for each hypothesis (model) by averaging the likelihood function over the prior parameter distribution, rather than maximizing it as by information criteria; the larger a model evidence the more support it receives among a collection of hypothesis as the simulated values assign relatively high probability density to the observed data. Hence, the evidence naturally acts as an Occam's razor, preferring simpler and more constrained models against the selection of over-fitted ones by information criteria that incorporate only the likelihood maximum. Since it is not particularly easy to estimate the evidence in practice, Bayesian model selection via the marginal likelihood has not yet found mainstream use. We illustrate here the properties of a new estimator of the Bayesian model evidence, which provides robust and unbiased estimates of the marginal likelihood; the method is coined Gaussian Mixture Importance Sampling (GMIS). GMIS uses multidimensional numerical integration of the posterior parameter distribution via bridge sampling (a generalization of importance sampling) of a mixture distribution fitted to samples of the posterior distribution derived from the DREAM algorithm (Vrugt et al., 2008; 2009). Some illustrative examples are presented to show the robustness and superiority of the GMIS estimator with
Bayesian item selection criteria for adaptive testing
van der Linden, Willem J.
1996-01-01
R.J. Owen (1975) proposed an approximate empirical Bayes procedure for item selection in adaptive testing. The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational
An approximate method for Bayesian entropy estimation for a discrete random variable.
Yokota, Yasunari
2004-01-01
This article proposes an approximated Bayesian entropy estimator for a discrete random variable. An entropy estimator that achieves least square error is obtained through Bayesian estimation of the occurrence probabilities of each value taken by the discrete random variable. This Bayesian entropy estimator requires large amount of calculation cost if the random variable takes numerous sorts of values. Therefore, the present article proposes a practical method for calculating an Bayesian entropy estimate; the proposed method utilizes approximation of the entropy function by a truncated Taylor series. Numerical experiments demonstrate that the proposed entropy estimation method improves estimation precision of entropy remarkably in comparison to the conventional entropy estimation method.
Lee, Sik-Yum; Song, Xin-Yuan; Tang, Nian-Sheng
2007-01-01
The analysis of interaction among latent variables has received much attention. This article introduces a Bayesian approach to analyze a general structural equation model that accommodates the general nonlinear terms of latent variables and covariates. This approach produces a Bayesian estimate that has the same statistical optimal properties as a…
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-12-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
Energy Technology Data Exchange (ETDEWEB)
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios, E-mail: junhankim@email.arizona.edu [Department of Astronomy and Steward Observatory, University of Arizona, 933 N. Cherry Avenue, Tucson, AZ 85721 (United States)
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.
A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data.
Mo, Qianxing; Shen, Ronglai; Guo, Cui; Vannucci, Marina; Chan, Keith S; Hilsenbeck, Susan G
2018-01-01
Identification of clinically relevant tumor subtypes and omics signatures is an important task in cancer translational research for precision medicine. Large-scale genomic profiling studies such as The Cancer Genome Atlas (TCGA) Research Network have generated vast amounts of genomic, transcriptomic, epigenomic, and proteomic data. While these studies have provided great resources for researchers to discover clinically relevant tumor subtypes and driver molecular alterations, there are few computationally efficient methods and tools for integrative clustering analysis of these multi-type omics data. Therefore, the aim of this article is to develop a fully Bayesian latent variable method (called iClusterBayes) that can jointly model omics data of continuous and discrete data types for identification of tumor subtypes and relevant omics features. Specifically, the proposed method uses a few latent variables to capture the inherent structure of multiple omics data sets to achieve joint dimension reduction. As a result, the tumor samples can be clustered in the latent variable space and relevant omics features that drive the sample clustering are identified through Bayesian variable selection. This method significantly improve on the existing integrative clustering method iClusterPlus in terms of statistical inference and computational speed. By analyzing TCGA and simulated data sets, we demonstrate the excellent performance of the proposed method in revealing clinically meaningful tumor subtypes and driver omics features. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Yau, Christopher; Holmes, Chris
2011-07-01
We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a 'sparsity prior' representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate's relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.
Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models
DEFF Research Database (Denmark)
Vehtari, Aki; Mononen, Tommi; Tolvanen, Ville
2016-01-01
The future predictive performance of a Bayesian model can be estimated using Bayesian cross-validation. In this article, we consider Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation (EP). We study the ...
Variable Selection via Partial Correlation.
Li, Runze; Liu, Jingyuan; Lou, Lejia
2017-07-01
Partial correlation based variable selection method was proposed for normal linear regression models by Bühlmann, Kalisch and Maathuis (2010) as a comparable alternative method to regularization methods for variable selection. This paper addresses two important issues related to partial correlation based variable selection method: (a) whether this method is sensitive to normality assumption, and (b) whether this method is valid when the dimension of predictor increases in an exponential rate of the sample size. To address issue (a), we systematically study this method for elliptical linear regression models. Our finding indicates that the original proposal may lead to inferior performance when the marginal kurtosis of predictor is not close to that of normal distribution. Our simulation results further confirm this finding. To ensure the superior performance of partial correlation based variable selection procedure, we propose a thresholded partial correlation (TPC) approach to select significant variables in linear regression models. We establish the selection consistency of the TPC in the presence of ultrahigh dimensional predictors. Since the TPC procedure includes the original proposal as a special case, our theoretical results address the issue (b) directly. As a by-product, the sure screening property of the first step of TPC was obtained. The numerical examples also illustrate that the TPC is competitively comparable to the commonly-used regularization methods for variable selection.
Variable Selection in ROC Regression
Directory of Open Access Journals (Sweden)
Binhuan Wang
2013-01-01
Full Text Available Regression models are introduced into the receiver operating characteristic (ROC analysis to accommodate effects of covariates, such as genes. If many covariates are available, the variable selection issue arises. The traditional induced methodology separately models outcomes of diseased and nondiseased groups; thus, separate application of variable selections to two models will bring barriers in interpretation, due to differences in selected models. Furthermore, in the ROC regression, the accuracy of area under the curve (AUC should be the focus instead of aiming at the consistency of model selection or the good prediction performance. In this paper, we obtain one single objective function with the group SCAD to select grouped variables, which adapts to popular criteria of model selection, and propose a two-stage framework to apply the focused information criterion (FIC. Some asymptotic properties of the proposed methods are derived. Simulation studies show that the grouped variable selection is superior to separate model selections. Furthermore, the FIC improves the accuracy of the estimated AUC compared with other criteria.
Bayesian genomic selection: the effect of haplotype lenghts and priors
DEFF Research Database (Denmark)
Villumsen, Trine Michelle; Janss, Luc
2009-01-01
Breeding values for animals with marker data are estimated using a genomic selection approach where data is analyzed using Bayesian multi-marker association models. Fourteen model scenarios with varying haplotype lengths, hyper parameter and prior distributions were compared to find the scenario...... expected to give the most correct genomic estimated breeding values for animals with marker information only. Five-fold cross validation was performed to assess the ability of models to estimate breeding values for animals in generation 3. In each of the five subsets, 20% of phenotypic records...... well. Correlations were 0.77-0.89 and predicted breeding values were biased. In addition the models seemed to over fit the genomic part of the variation. Highest correlations and most unbiased results were obtained when SNP markers were joined into haplotypes. Especially the scenarios with 5-SNP...
A Bayesian Technique for Selecting a Linear Forecasting Model
Ramona L. Trader
1983-01-01
The specification of a forecasting model is considered in the context of linear multiple regression. Several potential predictor variables are available, but some of them convey little information about the dependent variable which is to be predicted. A technique for selecting the "best" set of predictors which takes into account the inherent uncertainty in prediction is detailed. In addition to current data, there is often substantial expert opinion available which is relevant to the forecas...
Bayesian modeling of ChIP-chip data using latent variables
Directory of Open Access Journals (Sweden)
Tian Yanan
2009-10-01
Full Text Available Abstract Background The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations. Results In this paper, we propose a Bayesian latent model for the ChIP-chip data. The new model mainly differs from the existing Bayesian models, such as the joint deconvolution model, the hierarchical gamma mixture model, and the Bayesian hierarchical model, in two respects. Firstly, it works on the difference between the averaged treatment and control samples. This enables the use of a simple model for the data, which avoids the probe-specific effect and the sample (control/treatment effect. As a consequence, this enables an efficient MCMC simulation of the posterior distribution of the model, and also makes the model more robust to the outliers. Secondly, it models the neighboring dependence of probes by introducing a latent indicator vector. A truncated Poisson prior distribution is assumed for the latent indicator variable, with the rationale being justified at length. Conclusion The Bayesian latent method is successfully applied to real and ten simulated datasets, with comparisons with some of the existing Bayesian methods, hidden Markov model methods, and sliding window methods. The numerical results
Bayesian modeling of ChIP-chip data using latent variables.
Wu, Mingqi
2009-10-26
BACKGROUND: The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations. RESULTS: In this paper, we propose a Bayesian latent model for the ChIP-chip data. The new model mainly differs from the existing Bayesian models, such as the joint deconvolution model, the hierarchical gamma mixture model, and the Bayesian hierarchical model, in two respects. Firstly, it works on the difference between the averaged treatment and control samples. This enables the use of a simple model for the data, which avoids the probe-specific effect and the sample (control/treatment) effect. As a consequence, this enables an efficient MCMC simulation of the posterior distribution of the model, and also makes the model more robust to the outliers. Secondly, it models the neighboring dependence of probes by introducing a latent indicator vector. A truncated Poisson prior distribution is assumed for the latent indicator variable, with the rationale being justified at length. CONCLUSION: The Bayesian latent method is successfully applied to real and ten simulated datasets, with comparisons with some of the existing Bayesian methods, hidden Markov model methods, and sliding window methods. The numerical results indicate that the
Bayesian modeling of measurement error in predictor variables
Fox, Gerardus J.A.; Glas, Cornelis A.W.
2003-01-01
It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between
A Bayesian semiparametric latent variable approach to causal mediation.
Kim, Chanmin; Daniels, Michael; Li, Yisheng; Milbury, Kathrin; Cohen, Lorenzo
2018-03-30
In assessing causal mediation effects in randomized studies, a challenge is that the direct and indirect effects can vary across participants due to different measured and unmeasured characteristics. In that case, the population effect estimated from standard approaches implicitly averages over and does not estimate the heterogeneous direct and indirect effects. We propose a Bayesian semiparametric method to estimate heterogeneous direct and indirect effects via clusters, where the clusters are formed by both individual covariate profiles and individual effects due to unmeasured characteristics. These cluster-specific direct and indirect effects can be estimated through a set of regression models where specific coefficients are clustered by a stick-breaking prior. To let clustering be appropriately informed by individual direct and indirect effects, we specify a data-dependent prior. We conduct simulation studies to assess performance of the proposed method compared to other methods. We use this approach to estimate heterogeneous causal direct and indirect effects of an expressive writing intervention for patients with renal cell carcinoma. Copyright © 2017 John Wiley & Sons, Ltd.
Si, Yajuan; Reiter, Jerome P.
2013-01-01
In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…
International Nuclear Information System (INIS)
Elsheikh, Ahmed H.; Wheeler, Mary F.; Hoteit, Ibrahim
2014-01-01
A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems
Bayesian inference of selection in a heterogeneous environment from genetic time-series data.
Gompert, Zachariah
2016-01-01
Evolutionary geneticists have sought to characterize the causes and molecular targets of selection in natural populations for many years. Although this research programme has been somewhat successful, most statistical methods employed were designed to detect consistent, weak to moderate selection. In contrast, phenotypic studies in nature show that selection varies in time and that individual bouts of selection can be strong. Measurements of the genomic consequences of such fluctuating selection could help test and refine hypotheses concerning the causes of ecological specialization and the maintenance of genetic variation in populations. Herein, I proposed a Bayesian nonhomogeneous hidden Markov model to estimate effective population sizes and quantify variable selection in heterogeneous environments from genetic time-series data. The model is described and then evaluated using a series of simulated data, including cases where selection occurs on a trait with a simple or polygenic molecular basis. The proposed method accurately distinguished neutral loci from non-neutral loci under strong selection, but not from those under weak selection. Selection coefficients were accurately estimated when selection was constant or when the fitness values of genotypes varied linearly with the environment, but these estimates were less accurate when fitness was polygenic or the relationship between the environment and the fitness of genotypes was nonlinear. Past studies of temporal evolutionary dynamics in laboratory populations have been remarkably successful. The proposed method makes similar analyses of genetic time-series data from natural populations more feasible and thereby could help answer fundamental questions about the causes and consequences of evolution in the wild. © 2015 John Wiley & Sons Ltd.
Yang, Ziheng; Zhu, Tianqi
2018-02-20
The Bayesian method is noted to produce spuriously high posterior probabilities for phylogenetic trees in analysis of large datasets, but the precise reasons for this overconfidence are unknown. In general, the performance of Bayesian selection of misspecified models is poorly understood, even though this is of great scientific interest since models are never true in real data analysis. Here we characterize the asymptotic behavior of Bayesian model selection and show that when the competing models are equally wrong, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force while rejecting the others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method may become overconfident before it becomes reliable. We suggest that this extreme behavior may be a major factor for the spuriously high posterior probabilities for evolutionary trees. The philosophical implications of our results to the application of Bayesian model selection to evaluate opposing scientific hypotheses are yet to be explored, as are the behaviors of non-Bayesian methods in similar situations.
Application of Bayesian Model Selection for Metal Yield Models using ALEGRA and Dakota.
Energy Technology Data Exchange (ETDEWEB)
Portone, Teresa; Niederhaus, John Henry; Sanchez, Jason James; Swiler, Laura Painton
2018-02-01
This report introduces the concepts of Bayesian model selection, which provides a systematic means of calibrating and selecting an optimal model to represent a phenomenon. This has many potential applications, including for comparing constitutive models. The ideas described herein are applied to a model selection problem between different yield models for hardened steel under extreme loading conditions.
International Nuclear Information System (INIS)
Huang, Xiaodong; Grace, Peter; Rowlings, David; Mengersen, Kerrie
2013-01-01
Soil-based emissions of nitrous oxide (N 2 O), a well-known greenhouse gas, have been associated with changes in soil water-filled pore space (WFPS) and soil temperature in many previous studies. However, it is acknowledged that the environment–N 2 O relationship is complex and still relatively poorly unknown. In this article, we employed a Bayesian model selection approach (Reversible jump Markov chain Monte Carlo) to develop a data-informed model of the relationship between daily N 2 O emissions and daily WFPS and soil temperature measurements between March 2007 and February 2009 from a soil under pasture in Queensland, Australia, taking seasonal factors and time-lagged effects into account. The model indicates a very strong relationship between a hybrid seasonal structure and daily N 2 O emission, with the latter substantially increased in summer. Given the other variables in the model, daily soil WFPS, lagged by a week, had a negative influence on daily N 2 O; there was evidence of a nonlinear positive relationship between daily soil WFPS and daily N 2 O emission; and daily soil temperature tended to have a linear positive relationship with daily N 2 O emission when daily soil temperature was above a threshold of approximately 19 °C. We suggest that this flexible Bayesian modeling approach could facilitate greater understanding of the shape of the covariate-N 2 O flux relation and detection of effect thresholds in the natural temporal variation of environmental variables on N 2 O emission. - Highlights: • A Bayesian model selection approach was used to develop a data-informed model. • Daily soil temperature influenced N 2 O flux above approximately 19 °C. • The effects of daily WFPS on N 2 O flux were complex and changeable. • Daily N 2 O flux was also significantly related to a complex seasonal pattern. • The approach facilitated understanding of the temporal variations of variables on N 2 O
Implementation of upper limit calculation for a poisson variable by bayesian approach
International Nuclear Information System (INIS)
Zhu Yongsheng
2008-01-01
The calculation of Bayesian confidence upper limit for a Poisson variable including both signal and background with and without systematic uncertainties has been formulated. A Fortran 77 routine, BPULE, has been developed to implement the calculation. The routine can account for systematic uncertainties in the background expectation and signal efficiency. The systematic uncertainties may be separately parameterized by a Gaussian, Log-Gaussian or flat probability density function (pdf). Some technical details of BPULE have been discussed. (authors)
Shao, Kan; Allen, Bruce C; Wheeler, Matthew W
2017-10-01
Human variability is a very important factor considered in human health risk assessment for protecting sensitive populations from chemical exposure. Traditionally, to account for this variability, an interhuman uncertainty factor is applied to lower the exposure limit. However, using a fixed uncertainty factor rather than probabilistically accounting for human variability can hardly support probabilistic risk assessment advocated by a number of researchers; new methods are needed to probabilistically quantify human population variability. We propose a Bayesian hierarchical model to quantify variability among different populations. This approach jointly characterizes the distribution of risk at background exposure and the sensitivity of response to exposure, which are commonly represented by model parameters. We demonstrate, through both an application to real data and a simulation study, that using the proposed hierarchical structure adequately characterizes variability across different populations. © 2016 Society for Risk Analysis.
Feature selection for Bayesian network classifiers using the MDL-FS score
Drugan, Madalina M.; Wiering, Marco A.
When constructing a Bayesian network classifier from data, the more or less redundant features included in a dataset may bias the classifier and as a consequence may result in a relatively poor classification accuracy. In this paper, we study the problem of selecting appropriate subsets of features
DEFF Research Database (Denmark)
Burgess, Stephen; Thompson, Simon G; Thompson, Grahame
2010-01-01
Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context...... of multiple genetic markers measured in multiple studies, based on the analysis of individual participant data. First, for a single genetic marker in one study, we show that the usual ratio of coefficients approach can be reformulated as a regression with heterogeneous error in the explanatory variable....... This can be implemented using a Bayesian approach, which is next extended to include multiple genetic markers. We then propose a hierarchical model for undertaking a meta-analysis of multiple studies, in which it is not necessary that the same genetic markers are measured in each study. This provides...
Elsheikh, Ahmed H.
2014-02-01
A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems. © 2013 Elsevier Inc.
Variable selection by lasso-type methods
Directory of Open Access Journals (Sweden)
Sohail Chand
2011-09-01
Full Text Available Variable selection is an important property of shrinkage methods. The adaptive lasso is an oracle procedure and can do consistent variable selection. In this paper, we provide an explanation that how use of adaptive weights make it possible for the adaptive lasso to satisfy the necessary and almost sufcient condition for consistent variable selection. We suggest a novel algorithm and give an important result that for the adaptive lasso if predictors are normalised after the introduction of adaptive weights, it makes the adaptive lasso performance identical to the lasso.
Variable and subset selection in PLS regression
DEFF Research Database (Denmark)
Høskuldsson, Agnar
2001-01-01
The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...
Dondeynaz, C.; Lopez-Puga, J.; Carmona-Moreno, C.
2012-04-01
Improving Water and Sanitation Services (WSS), being a complex and interdisciplinary issue, passes through collaboration and coordination of different sectors (environment, health, economic activities, governance, and international cooperation). This inter-dependency has been recognised with the adoption of the "Integrated Water Resources Management" principles that push for the integration of these various dimensions involved in WSS delivery to ensure an efficient and sustainable management. The understanding of these interrelations appears as crucial for decision makers in the water sector in particular in developing countries where WSS still represent an important leverage for livelihood improvement. In this framework, the Joint Research Centre of the European Commission has developed a coherent database (WatSan4Dev database) containing 29 indicators from environmental, socio-economic, governance and financial aid flows data focusing on developing countries (Celine et al, 2011 under publication). The aim of this work is to model the WatSan4Dev dataset using probabilistic models to identify the key variables influencing or being influenced by the water supply and sanitation access levels. Bayesian Network Models are suitable to map the conditional dependencies between variables and also allows ordering variables by level of influence on the dependent variable. Separated models have been built for water supply and for sanitation because of different behaviour. The models are validated if complying with statistical criteria but either with scientific knowledge and literature. A two steps approach has been adopted to build the structure of the model; Bayesian network is first built for each thematic cluster of variables (e.g governance, agricultural pressure, or human development) keeping a detailed level for interpretation later one. A global model is then built based on significant indicators of each cluster being previously modelled. The structure of the
Bayesian Optimization Under Mixed Constraints with A Slack-Variable Augmented Lagrangian
Energy Technology Data Exchange (ETDEWEB)
Picheny, Victor; Gramacy, Robert B.; Wild, Stefan M.; Le Digabel, Sebastien
2016-12-05
An augmented Lagrangian (AL) can convert a constrained optimization problem into a sequence of simpler (e.g., unconstrained) problems, which are then usually solved with local solvers. Recently, surrogate-based Bayesian optimization (BO) sub-solvers have been successfully deployed in the AL framework for a more global search in the presence of inequality constraints; however, a drawback was that expected improvement (EI) evaluations relied on Monte Carlo. Here we introduce an alternative slack variable AL, and show that in this formulation the EI may be evaluated with library routines. The slack variables furthermore facilitate equality as well as inequality constraints, and mixtures thereof. We show our new slack “ALBO” compares favorably to the original. Its superiority over conventional alternatives is reinforced on several mixed constraint examples.
Wang, L.; Davis, J. L.; Tamisiea, M. E.
2017-12-01
The Antarctic ice sheet (AIS) holds about 60% of all fresh water on the Earth, an amount equivalent to about 58 m of sea-level rise. Observation of AIS mass change is thus essential in determining and predicting its contribution to sea level. While the ice mass loss estimates for West Antarctica (WA) and the Antarctic Peninsula (AP) are in good agreement, what the mass balance over East Antarctica (EA) is, and whether or not it compensates for the mass loss is under debate. Besides the different error sources and sensitivities of different measurement types, complex spatial and temporal variabilities would be another factor complicating the accurate estimation of the AIS mass balance. Therefore, a model that allows for variabilities in both melting rate and seasonal signals would seem appropriate in the estimation of present-day AIS melting. We present a stochastic filter technique, which enables the Bayesian separation of the systematic stripe noise and mass signal in decade-length GRACE monthly gravity series, and allows the estimation of time-variable seasonal and inter-annual components in the signals. One of the primary advantages of this Bayesian method is that it yields statistically rigorous uncertainty estimates reflecting the inherent spatial resolution of the data. By applying the stochastic filter to the decade-long GRACE observations, we present the temporal variabilities of the AIS mass balance at basin scale, particularly over East Antarctica, and decipher the EA mass variations in the past decade, and their role in affecting overall AIS mass balance and sea level.
Kelava, Augustin; Nagengast, Benjamin
2012-09-01
Structural equation models with interaction and quadratic effects have become a standard tool for testing nonlinear hypotheses in the social sciences. Most of the current approaches assume normally distributed latent predictor variables. In this article, we present a Bayesian model for the estimation of latent nonlinear effects when the latent predictor variables are nonnormally distributed. The nonnormal predictor distribution is approximated by a finite mixture distribution. We conduct a simulation study that demonstrates the advantages of the proposed Bayesian model over contemporary approaches (Latent Moderated Structural Equations [LMS], Quasi-Maximum-Likelihood [QML], and the extended unconstrained approach) when the latent predictor variables follow a nonnormal distribution. The conventional approaches show biased estimates of the nonlinear effects; the proposed Bayesian model provides unbiased estimates. We present an empirical example from work and stress research and provide syntax for substantive researchers. Advantages and limitations of the new model are discussed.
Purposeful selection of variables in logistic regression
Directory of Open Access Journals (Sweden)
Williams David Keith
2008-12-01
Full Text Available Abstract Background The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results We show that the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS data. Conclusion If an analyst is in need of an algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.
Directory of Open Access Journals (Sweden)
Brentani Helena
2004-08-01
Full Text Available Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE, "Digital Northern" or Massively Parallel Signature Sequencing (MPSS, is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries" and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
Vêncio, Ricardo Z N; Brentani, Helena; Patrão, Diogo F C; Pereira, Carlos A B
2004-08-31
An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
Accuracy of latent-variable estimation in Bayesian semi-supervised learning.
Yamazaki, Keisuke
2015-09-01
Hierarchical probabilistic models, such as Gaussian mixture models, are widely used for unsupervised learning tasks. These models consist of observable and latent variables, which represent the observable data and the underlying data-generation process, respectively. Unsupervised learning tasks, such as cluster analysis, are regarded as estimations of latent variables based on the observable ones. The estimation of latent variables in semi-supervised learning, where some labels are observed, will be more precise than that in unsupervised, and one of the concerns is to clarify the effect of the labeled data. However, there has not been sufficient theoretical analysis of the accuracy of the estimation of latent variables. In a previous study, a distribution-based error function was formulated, and its asymptotic form was calculated for unsupervised learning with generative models. It has been shown that, for the estimation of latent variables, the Bayes method is more accurate than the maximum-likelihood method. The present paper reveals the asymptotic forms of the error function in Bayesian semi-supervised learning for both discriminative and generative models. The results show that the generative model, which uses all of the given data, performs better when the model is well specified. Copyright © 2015 Elsevier Ltd. All rights reserved.
High Dimensional Variable Selection with Error Control
Directory of Open Access Journals (Sweden)
Sangjin Kim
2016-01-01
Full Text Available Background. The iterative sure independence screening (ISIS is a popular method in selecting important variables while maintaining most of the informative variables relevant to the outcome in high throughput data. However, it not only is computationally intensive but also may cause high false discovery rate (FDR. We propose to use the FDR as a screening method to reduce the high dimension to a lower dimension as well as controlling the FDR with three popular variable selection methods: LASSO, SCAD, and MCP. Method. The three methods with the proposed screenings were applied to prostate cancer data with presence of metastasis as the outcome. Results. Simulations showed that the three variable selection methods with the proposed screenings controlled the predefined FDR and produced high area under the receiver operating characteristic curve (AUROC scores. In applying these methods to the prostate cancer example, LASSO and MCP selected 12 and 8 genes and produced AUROC scores of 0.746 and 0.764, respectively. Conclusions. We demonstrated that the variable selection methods with the sequential use of FDR and ISIS not only controlled the predefined FDR in the final models but also had relatively high AUROC scores.
Penalized variable selection in competing risks regression.
Fu, Zhixuan; Parikh, Chirag R; Zhou, Bingqing
2017-07-01
Penalized variable selection methods have been extensively studied for standard time-to-event data. Such methods cannot be directly applied when subjects are at risk of multiple mutually exclusive events, known as competing risks. The proportional subdistribution hazard (PSH) model proposed by Fine and Gray (J Am Stat Assoc 94:496-509, 1999) has become a popular semi-parametric model for time-to-event data with competing risks. It allows for direct assessment of covariate effects on the cumulative incidence function. In this paper, we propose a general penalized variable selection strategy that simultaneously handles variable selection and parameter estimation in the PSH model. We rigorously establish the asymptotic properties of the proposed penalized estimators and modify the coordinate descent algorithm for implementation. Simulation studies are conducted to demonstrate the good performance of the proposed method. Data from deceased donor kidney transplants from the United Network of Organ Sharing illustrate the utility of the proposed method.
Machine learning techniques to select variable stars
Directory of Open Access Journals (Sweden)
García-Varela Alejandro
2017-01-01
Full Text Available In order to perform a supervised classification of variable stars, we propose and evaluate a set of six features extracted from the magnitude density of the light curves. They are used to train automatic classification systems using state-of-the-art classifiers implemented in the R statistical computing environment. We find that random forests is the most successful method to select variables.
Evaluating experimental design for soil-plant model selection with Bayesian model averaging
Wöhling, Thomas; Geiges, Andreas; Nowak, Wolfgang; Gayler, Sebastian
2013-04-01
The objective selection of appropriate models for realistic simulations of coupled soil-plant processes is a challenging task since the processes are complex, not fully understood at larger scales, and highly non-linear. Also, comprehensive data sets are scarce, and measurements are uncertain. In the past decades, a variety of different models have been developed that exhibit a wide range of complexity regarding their approximation of processes in the coupled model compartments. We present a method for evaluating experimental design for maximum confidence in the model selection task. The method considers uncertainty in parameters, measurements and model structures. Advancing the ideas behind Bayesian Model Averaging (BMA), the model weights in BMA are perceived as uncertain quantities with assigned probability distributions that narrow down as more data are made available. This allows assessing the power of different data types, data densities and data locations in identifying the best model structure from among a suite of plausible models. The models considered in this study are the crop models CERES, SUCROS, GECROS and SPASS, which are coupled to identical routines for simulating soil processes within the modelling framework Expert-N. The four models considerably differ in the degree of detail at which crop growth and root water uptake are represented. Monte-Carlo simulations were conducted for each of these models considering their uncertainty in soil hydraulic properties and selected crop model parameters. The models were then conditioned on field measurements of soil moisture, leaf-area index (LAI), and evapotranspiration rates (from eddy-covariance measurements) during a vegetation period of winter wheat at the Nellingen site in Southwestern Germany. Following our new method, we derived the BMA model weights (and their distributions) when using all data or different subsets thereof. We discuss to which degree the posterior BMA mean outperformed the prior BMA
A new variable selection method for classification
Directory of Open Access Journals (Sweden)
Nuñez Letamendia,Laura
2007-01-01
Full Text Available This work proposes an “ad hoc” new method for variable selection in classification, specifically in Discriminant Analysis. This new method is based on the metaheuristic strategy Tabu Search. From a computational point of view variable selection is a NP-Hard problem and therefore there is no guarantee of finding the optimum solution (NP = Nondeterministic Polynomial Time. This means that when the size of the problem is large finding an optimum solution in practice is unfeasible. As found in other optimization problems, metaheuristic techniques have proved to be good at solving this type of problems. Although there are many references in the literature regarding selecting variables for their use in classification, there are very few key references on the selection of variables for their use in Discriminant Analysis. In fact, the most well-known statistical packages continue to use classic selection methods as Stepwise, Backward or Forward. After performing some tests it is found that Tabu Search obtains significantly better results than the Stepwise, Backward or Forward methods used by classic statistical packages.
A BAYESIAN NONPARAMETRIC MIXTURE MODEL FOR SELECTING GENES AND GENE SUBNETWORKS.
Zhao, Yize; Kang, Jian; Yu, Tianwei
2014-06-01
It is very challenging to select informative features from tens of thousands of measured features in high-throughput data analysis. Recently, several parametric/regression models have been developed utilizing the gene network information to select genes or pathways strongly associated with a clinical/biological outcome. Alternatively, in this paper, we propose a nonparametric Bayesian model for gene selection incorporating network information. In addition to identifying genes that have a strong association with a clinical outcome, our model can select genes with particular expressional behavior, in which case the regression models are not directly applicable. We show that our proposed model is equivalent to an infinity mixture model for which we develop a posterior computation algorithm based on Markov chain Monte Carlo (MCMC) methods. We also propose two fast computing algorithms that approximate the posterior simulation with good accuracy but relatively low computational cost. We illustrate our methods on simulation studies and the analysis of Spellman yeast cell cycle microarray data.
Xu, Zhiqiang
2017-02-16
Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.
Schöniger, Anneli; Wöhling, Thomas; Samaniego, Luis; Nowak, Wolfgang
2014-12-01
Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible.
Energy Technology Data Exchange (ETDEWEB)
Strom, Daniel J.; Joyce, Kevin E.; Maclellan, Jay A.; Watson, David J.; Lynch, Timothy P.; Antonio, Cheryl L.; Birchall, Alan; Anderson, Kevin K.; Zharov, Peter
2012-04-17
In making low-level radioactivity measurements of populations, it is commonly observed that a substantial portion of net results are negative. Furthermore, the observed variance of the measurement results arises from a combination of measurement uncertainty and population variability. This paper presents a method for disaggregating measurement uncertainty from population variability to produce a probability density function (PDF) of possibly true results. To do this, simple, justifiable, and reasonable assumptions are made about the relationship of the measurements to the measurands (the 'true values'). The measurements are assumed to be unbiased, that is, that their average value is the average of the measurands. Using traditional estimates of each measurement's uncertainty to disaggregate population variability from measurement uncertainty, a PDF of measurands for the population is produced. Then, using Bayes's theorem, the same assumptions, and all the data from the population of individuals, a prior PDF is computed for each individual's measurand. These PDFs are non-negative, and their average is equal to the average of the measurement results for the population. The uncertainty in these Bayesian posterior PDFs is all Berkson with no remaining classical component. The methods are applied to baseline bioassay data from the Hanford site. The data include 90Sr urinalysis measurements on 128 people, 137Cs in vivo measurements on 5,337 people, and 239Pu urinalysis measurements on 3,270 people. The method produces excellent results for the 90Sr and 137Cs measurements, since there are nonzero concentrations of these global fallout radionuclides in people who have not been occupationally exposed. The method does not work for the 239Pu measurements in non-occupationally exposed people because the population average is essentially zero.
Directory of Open Access Journals (Sweden)
Beatriz Molina Serrano
2018-01-01
Full Text Available In the current economic, social and political environment, society demands a greater variety of outcomes from the public logistics sector, such as efficiency, efficiency of managed resources, greater transparency and business performance. All of them are an indispensable counterpart for its recognition and support. In case of port planning and management, many variables are included. Use of Bayesian Networks allows to classify, predict and diagnose these variables and even to estimate the subsequent probability of unknown variables, basing on the known ones. Research includes a data base with more than 40 variables, which have been classified as smart port studies in Spain. Then a network was generated using a non-cyclic conducted grafo, which shows port variable relationships. As conclusion, economic variables are cause of the rest of categories and they represent a parent role in the most of cases. Furthermore, if environmental variables are known, subsequent probability of social variables can be estimated.
Foroughi Pour, Ali; Dalton, Lori A
2018-03-21
Many bioinformatics studies aim to identify markers, or features, that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, in prior work we proposed a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms. The proposed algorithms outperform 2MNC-Robust and many other popular feature selection algorithms on synthetic data. In addition, enrichment analysis on real breast cancer, colon cancer, and Leukemia data indicates they also output many of the genes and pathways linked to the cancers under study. Bayesian feature selection is a promising framework for small-sample high-dimensional data, in particular biomarker discovery applications. When applied to cancer data these algorithms outputted many genes already shown to be involved in cancer as well as potentially new biomarkers. Furthermore, one of the proposed algorithms, SPM, outputs blocks of heavily correlated genes, particularly useful for studying gene interactions and gene networks.
Ogle, Kiona; Pendall, Elise
2015-02-01
Isotopic methods offer great potential for partitioning trace gas fluxes such as soil respiration into their different source contributions. Traditional partitioning methods face challenges due to variability introduced by different measurement methods, fractionation effects, and end-member uncertainty. To address these challenges, we describe a hierarchical Bayesian (HB) approach for isotopic partitioning of soil respiration that directly accommodates such variability. We apply our HB method to data from an experiment conducted in a shortgrass steppe ecosystem, where decomposition was previously shown to be stimulated by elevated CO2. Our approach simultaneously fits Keeling plot (KP) models to observations of soil or soil-respired δ13C and [CO2] obtained via chambers and gas wells, corrects the KP intercepts for apparent fractionation (Δ) due to isotope-specific diffusion rates and/or method artifacts, estimates method- and treatment-specific values for Δ, propagates end-member uncertainty, and calculates proportional contributions from two distinct respiration sources ("old" and "new" carbon). The chamber KP intercepts were estimated with greater confidence than the well intercepts and compared to the theoretical value of 4.4‰, our results suggest that Δ varies between 2 and 5.2‰ depending on method (chambers versus wells) and CO2 treatment. Because elevated CO2 plots were fumigated with 13C-depleted CO2, the source contributions were tightly constrained, and new C accounted for 64% (range = 55-73%) of soil respiration. The contributions were less constrained for the ambient CO2 treatments, but new C accounted for significantly less (47%, range = 15-82%) of soil respiration. Our new HB partitioning approach contrasts our original analysis (higher contribution of old C under elevated CO2) because it uses additional data sources, accounts for end-member bias, and estimates apparent fractionation effects.
Spatial variability of coastal wetland resilience to sea-level rise using Bayesian inference
Hardy, T.; Wu, W.
2017-12-01
The coastal wetlands in the Northern Gulf of Mexico (NGOM) account for 40% of coastal wetland area in the United States and provide various ecosystem services to the region and broader areas. Increasing rates of relative sea-level rise (RSLR), and reduced sediment input have increased coastal wetland loss in the NGOM, accounting for 80% of coastal wetland loss in the nation. Traditional models for predicting the impact of RSLR on coastal wetlands in the NGOM have focused on coastal erosion driven by geophysical variables only, and/or at small spatial extents. Here we developed a model in Bayesian inference to make probabilistic prediction of wetland loss in the entire NGOM as a function of vegetation productivity and geophysical attributes. We also studied how restoration efforts help maintain the area of coastal wetlands. Vegetation productivity contributes organic matter to wetland sedimentation and was approximated using the remotely sensed normalized difference moisture index (NDMI). The geophysical variables include RSLR, tidal range, river discharge, coastal slope, and wave height. We found a significantly positive relation between wetland loss and RSLR, which varied significantly at different river discharge regimes. There also existed a significantly negative relation between wetland loss and NDMI, indicating that in-situ vegetation productivity contributed to wetland resilience to RSLR. This relation did not vary significantly between river discharge regimes. The spatial relation revealed three areas of high RSLR but relatively low wetland loss; these areas were associated with wetland restoration projects in coastal Louisiana. Two projects were breakwater projects, where hard materials were placed off-shore to reduce wave action and promote sedimentation. And one project was a vegetation planting project used to promote sedimentation and wetland stabilization. We further developed an interactive web tool that allows stakeholders to develop similar wetland
Uncovering state-dependent relationships in shallow lakes using Bayesian latent variable regression.
Vitense, Kelsey; Hanson, Mark A; Herwig, Brian R; Zimmer, Kyle D; Fieberg, John
2018-03-01
Ecosystems sometimes undergo dramatic shifts between contrasting regimes. Shallow lakes, for instance, can transition between two alternative stable states: a clear state dominated by submerged aquatic vegetation and a turbid state dominated by phytoplankton. Theoretical models suggest that critical nutrient thresholds differentiate three lake types: highly resilient clear lakes, lakes that may switch between clear and turbid states following perturbations, and highly resilient turbid lakes. For effective and efficient management of shallow lakes and other systems, managers need tools to identify critical thresholds and state-dependent relationships between driving variables and key system features. Using shallow lakes as a model system for which alternative stable states have been demonstrated, we developed an integrated framework using Bayesian latent variable regression (BLR) to classify lake states, identify critical total phosphorus (TP) thresholds, and estimate steady state relationships between TP and chlorophyll a (chl a) using cross-sectional data. We evaluated the method using data simulated from a stochastic differential equation model and compared its performance to k-means clustering with regression (KMR). We also applied the framework to data comprising 130 shallow lakes. For simulated data sets, BLR had high state classification rates (median/mean accuracy >97%) and accurately estimated TP thresholds and state-dependent TP-chl a relationships. Classification and estimation improved with increasing sample size and decreasing noise levels. Compared to KMR, BLR had higher classification rates and better approximated the TP-chl a steady state relationships and TP thresholds. We fit the BLR model to three different years of empirical shallow lake data, and managers can use the estimated bifurcation diagrams to prioritize lakes for management according to their proximity to thresholds and chance of successful rehabilitation. Our model improves upon
Bayesian model selection without evidences: application to the dark energy equation-of-state
Hee, S.; Handley, W. J.; Hobson, M. P.; Lasenby, A. N.
2016-01-01
A method is presented for Bayesian model selection without explicitly computing evidences, by using a combined likelihood and introducing an integer model selection parameter n so that Bayes factors, or more generally posterior odds ratios, may be read off directly from the posterior of n. If the total number of models under consideration is specified a priori, the full joint parameter space (θ, n) of the models is of fixed dimensionality and can be explored using standard Markov chain Monte Carlo (MCMC) or nested sampling methods, without the need for reversible jump MCMC techniques. The posterior on n is then obtained by straightforward marginalization. We demonstrate the efficacy of our approach by application to several toy models. We then apply it to constraining the dark energy equation of state using a free-form reconstruction technique. We show that Λ cold dark matter is significantly favoured over all extensions, including the simple w(z) = constant model.
Directory of Open Access Journals (Sweden)
Gunal Bilek
2018-03-01
Full Text Available The aim of this paper is to investigate the factors influencing the Beck Depression Inventory score, the Beck Hopelessness Scale score and the Rosenberg Self-Esteem score and the relationships among the psychiatric, demographic and socio-economic variables with Bayesian network modeling. The data of 823 university students consist of 21 continuous and discrete relevant psychiatric, demographic and socio-economic variables. After the discretization of the continuous variables by two approaches, two Bayesian networks models are constructed using the b n l e a r n package in R, and the results are presented via figures and probabilities. One of the most significant results is that in the first Bayesian network model, the gender of the students influences the level of depression, with female students being more depressive. In the second model, social activity directly influences the level of depression. In each model, depression influences both the level of hopelessness and self-esteem in students; additionally, as the level of depression increases, the level of hopelessness increases, but the level of self-esteem drops.
Zeng, Xueqiang; Luo, Gang
2017-12-01
Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Akkasi, Abbas; Varoglu, Ekrem
2017-01-01
Named Entity Recognition (NER) is a basic step for large number of consequent text mining tasks in the biochemical domain. Increasing the performance of such recognition systems is of high importance and always poses a challenge. In this study, a new community based decision making system is proposed which aims at increasing the efficiency of NER systems in the chemical/drug name context. Particle Swarm Optimization (PSO) algorithm is chosen as the expert selection strategy along with the Bayesian combination method to merge the outputs of the selected classifiers as well as evaluate the fitness of the selected candidates. The proposed system performs in two steps. The first step focuses on creating various numbers of baseline classifiers for NER with different features sets using the Conditional Random Fields (CRFs). The second step involves the selection and efficient combination of the classifiers using PSO and Bayesisan combination. Two comprehensive corpora from BioCreative events, namely ChemDNER and CEMP, are used for the experiments conducted. Results show that the ensemble of classifiers selected by means of the proposed approach perform better than the single best classifier as well as ensembles formed using other popular selection/combination strategies for both corpora. Furthermore, the proposed method outperforms the best performing system at the Biocreative IV ChemDNER track by achieving an F-score of 87.95 percent.
Gamma prior distribution selection for Bayesian analysis of failure rate and reliability
International Nuclear Information System (INIS)
Waler, R.A.; Johnson, M.M.; Waterman, M.S.; Martz, H.F. Jr.
1977-01-01
It is assumed that the phenomenon under study is such that the time-to-failure may be modeled by an exponential distribution with failure-rate parameter, lambda. For Bayesian analyses of the assumed model, the family of gamma distributions provides conjugate prior models for lambda. Thus, an experimenter needs to select a particular gamma model to conduct a Bayesian reliability analysis. The purpose of this paper is to present a methodology which can be used to translate engineering information, experience, and judgment into a choice of a gamma prior distribution. The proposed methodology assumes that the practicing engineer can provide percentile data relating to either the failure rate or the reliability of the phenomenon being investigated. For example, the methodology will select the gamma prior distribution which conveys an engineer's belief that the failure rate, lambda, simultaneously satisfies the probability statements, P(lambda less than 1.0 x 10 -3 ) = 0.50 and P(lambda less than 1.0 x 10 -5 ) = 0.05. That is, two percentiles provided by an engineer are used to determine a gamma prior model which agrees with the specified percentiles. For those engineers who prefer to specify reliability percentiles rather than the failure-rate percentiles illustrated above, one can use the induced negative-log gamma prior distribution which satisfies the probability statements, P(R(t 0 ) less than 0.99) = 0.50 and P(R(t 0 ) less than 0.99999) = 0.95 for some operating time t 0 . Also, the paper includes graphs for selected percentiles which assist an engineer in applying the methodology
Directory of Open Access Journals (Sweden)
Markus Krauss
Full Text Available Interindividual variability in anatomical and physiological properties results in significant differences in drug pharmacokinetics. The consideration of such pharmacokinetic variability supports optimal drug efficacy and safety for each single individual, e.g. by identification of individual-specific dosings. One clear objective in clinical drug development is therefore a thorough characterization of the physiological sources of interindividual variability. In this work, we present a Bayesian population physiologically-based pharmacokinetic (PBPK approach for the mechanistically and physiologically realistic identification of interindividual variability. The consideration of a generic and highly detailed mechanistic PBPK model structure enables the integration of large amounts of prior physiological knowledge, which is then updated with new experimental data in a Bayesian framework. A covariate model integrates known relationships of physiological parameters to age, gender and body height. We further provide a framework for estimation of the a posteriori parameter dependency structure at the population level. The approach is demonstrated considering a cohort of healthy individuals and theophylline as an application example. The variability and co-variability of physiological parameters are specified within the population; respectively. Significant correlations are identified between population parameters and are applied for individual- and population-specific visual predictive checks of the pharmacokinetic behavior, which leads to improved results compared to present population approaches. In the future, the integration of a generic PBPK model into an hierarchical approach allows for extrapolations to other populations or drugs, while the Bayesian paradigm allows for an iterative application of the approach and thereby a continuous updating of physiological knowledge with new data. This will facilitate decision making e.g. from preclinical to
Variable Selection in Model-based Clustering: A General Variable Role Modeling
Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure
2008-01-01
The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally indepe...
Effects of petrophysical uncertainty in Bayesian hydrogeophysical inversion and model selection
Brunetti, Carlotta; Linde, Niklas
2017-04-01
Hydrogeophysical studies rely on petrophysical relationships that link geophysical properties to hydrological proprieties and state variables of interest; these relationships are frequently assumed to be perfect (i.e., a one-to-one relation). Using first-arrival traveltime data from a synthetic crosshole ground-penetrating radar (GPR) experiment, we investigate the role of petrophysical uncertainty on porosity estimates from Markov chain Monte Carlo (MCMC) inversion and on Bayes factors (i.e., ratios of the evidences, or marginal likelihoods, of two competing models) used in Bayesian model selection. The petrophysical errors (PE) are conceptualized by a correlated zero-mean multi-Gaussian field with horizontal anisotropy with a resulting correlation coefficient of 0.8 between porosity and radar wave speed. We consider four different cases: (1) no PE are present (i.e., they are not used to generate the synthetic data) and they are not inferred in the MCMC inversion, (2) the PE are inferred for but they are not present in the data, (3) the PE are present in the data, but not inferred for and (4) the PE are present in the data and inferred for. To obtain appropriate acceptance ratios (i.e., between 35% and 45%), it is necessary to infer the PE as model parameters with a proper proposal distribution (simple Monte Carlo sampling of the petrophysical errors within Metropolis leads to very small acceptance rates). Case 4 provides consistent porosity field estimates (no bias) and the correlation coefficient between the "true" and posterior mean porosity field decreases from 0.9 for case 1 to 0.75. For case 2, we find that the variance of the posterior mean porosity field is too low and the porosity range is underestimated (i.e., some of the variance is accounted for by the inferred petrophysical uncertainty). Correspondingly, the porosity range is too wide for case 3 as it is used to account for petrophysical errors in the data. When comparing three different conceptual
Cawley, Gavin C; Talbot, Nicola L C
2006-10-01
Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than
The Continual Reassessment Method for Multiple Toxicity Grades: A Bayesian Model Selection Approach
Yuan, Ying; Zhang, Shemin; Zhang, Wenhong; Li, Chanjuan; Wang, Ling; Xia, Jielai
2014-01-01
Grade information has been considered in Yuan et al. (2007) wherein they proposed a Quasi-CRM method to incorporate the grade toxicity information in phase I trials. A potential problem with the Quasi-CRM model is that the choice of skeleton may dramatically vary the performance of the CRM model, which results in similar consequences for the Quasi-CRM model. In this paper, we propose a new model by utilizing bayesian model selection approach – Robust Quasi-CRM model – to tackle the above-mentioned pitfall with the Quasi-CRM model. The Robust Quasi-CRM model literally inherits the BMA-CRM model proposed by Yin and Yuan (2009) to consider a parallel of skeletons for Quasi-CRM. The superior performance of Robust Quasi-CRM model was demonstrated by extensive simulation studies. We conclude that the proposed method can be freely used in real practice. PMID:24875783
Directory of Open Access Journals (Sweden)
E. Pishbahar
2015-05-01
Full Text Available There are different ideas and opinions about the effects of macroeconomic variables on real and nominal variables. To answer the question of whether changes in macroeconomic variables as a political tool is useful over a business cycle, understanding the effect of macroeconomic variables on economic growth is important. In the present study, the Bayesian Vector autoregresive model and seasonality data for the years between 1991 and 2013 was used to determine the impact of monetary policy on value-added agriculture. Predicts of Vector autoregresive model are usually divertaed due to a lot of parameters in the model. Bayesian vector autoregresive model estimates more reliable predictions due to reducing the number of included parametrs and considering the former models. Compared to the Vector Autoregressive model, the coefficients are estimated more accurately. Based on the results of RMSE in this study, previous function Nrmal-Vyshart was identified as a suitable previous disteribution. According to the results of the impulse response function, the sudden effects of shocks in macroeconomic variables on the value added in agriculture and domestic venture capital are stable. The effects on the exchange rates, tax revenues and monetary will bemoderated after 7, 5 and 4periods. Monetary policy shocks ,in the first half of the year, increased the value added of agriculture, while in the second half of the year had a depressing effect on the value added.
Introduction to Bayesian statistics
Bolstad, William M
2017-01-01
There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this Third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian staistics. The author continues to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inferenfe cfor discrete random variables, bionomial proprotion, Poisson, normal mean, and simple linear regression. In addition, newly-developing topics in the field are presented in four new chapters: Bayesian inference with unknown mean and variance; Bayesian inference for Multivariate Normal mean vector; Bayesian inference for Multiple Linear RegressionModel; and Computati...
Upper limit for Poisson variable incorporating systematic uncertainties by Bayesian approach
International Nuclear Information System (INIS)
Zhu, Yongsheng
2007-01-01
To calculate the upper limit for the Poisson observable at given confidence level with inclusion of systematic uncertainties in background expectation and signal efficiency, formulations have been established along the line of Bayesian approach. A FORTRAN program, BPULE, has been developed to implement the upper limit calculation
Bayesian Model Averaging for Propensity Score Analysis.
Kaplan, David; Chen, Jianshen
2014-01-01
This article considers Bayesian model averaging as a means of addressing uncertainty in the selection of variables in the propensity score equation. We investigate an approximate Bayesian model averaging approach based on the model-averaged propensity score estimates produced by the R package BMA but that ignores uncertainty in the propensity score. We also provide a fully Bayesian model averaging approach via Markov chain Monte Carlo sampling (MCMC) to account for uncertainty in both parameters and models. A detailed study of our approach examines the differences in the causal estimate when incorporating noninformative versus informative priors in the model averaging stage. We examine these approaches under common methods of propensity score implementation. In addition, we evaluate the impact of changing the size of Occam's window used to narrow down the range of possible models. We also assess the predictive performance of both Bayesian model averaging propensity score approaches and compare it with the case without Bayesian model averaging. Overall, results show that both Bayesian model averaging propensity score approaches recover the treatment effect estimates well and generally provide larger uncertainty estimates, as expected. Both Bayesian model averaging approaches offer slightly better prediction of the propensity score compared with the Bayesian approach with a single propensity score equation. Covariate balance checks for the case study show that both Bayesian model averaging approaches offer good balance. The fully Bayesian model averaging approach also provides posterior probability intervals of the balance indices.
International Nuclear Information System (INIS)
Chagas Moura, Márcio das; Azevedo, Rafael Valença; Droguett, Enrique López; Chaves, Leandro Rego; Lins, Isis Didier
2016-01-01
Occupational accidents pose several negative consequences to employees, employers, environment and people surrounding the locale where the accident takes place. Some types of accidents correspond to low frequency-high consequence (long sick leaves) events, and then classical statistical approaches are ineffective in these cases because the available dataset is generally sparse and contain censored recordings. In this context, we propose a Bayesian population variability method for the estimation of the distributions of the rates of accident and recovery. Given these distributions, a Markov-based model will be used to estimate the uncertainty over the expected number of accidents and the work time loss. Thus, the use of Bayesian analysis along with the Markov approach aims at investigating future trends regarding occupational accidents in a workplace as well as enabling a better management of the labor force and prevention efforts. One application example is presented in order to validate the proposed approach; this case uses available data gathered from a hydropower company in Brazil. - Highlights: • This paper proposes a Bayesian method to estimate rates of accident and recovery. • The model requires simple data likely to be available in the company database. • These results show the proposed model is not too sensitive to the prior estimates.
Variable wavelength selection devices: Physics and applications
Xianyu, Haiqing
Variable wavelength selection (VWS) achieved by implementing tunability to wavelength discriminating devices has generated great interest in basic science, applied physics, and technology. This thesis focuses on the underlying physics and application of several novel wavelength discriminating devices. Holographical polymer dispersed liquid crystals (HPDLCs) are switchable volume gratings formed by exposing a photopolymerizable monomer and liquid crystal mixture to interfering monochromatic light beams. An HPDLCs wavelength discriminating ability along with its switchability, allow it to be utilized in VWS devices. A novel mode HPDLC, total internal reflection (TIR) HPDLC, has been developed as a wavelength selective filter. The grating planes in this device are tilted so that the diffracted light experiences total internal reflection at the glass-air interface and is trapped in the cell until it eventually escapes from an edge. A VWS device is demonstrated by stacking TIR HPDLCs operating at different wavelengths. Converging or diverging recording beams are employed to fabricate chirped reflection HPDLCs with a pitch gradient along the designated direction, creating chirped switchable reflection gratings (CSRGs). A pixelated version of the CSRG is developed herein, and a dynamic spectral equalizer is presented by combining the pixelated CSRG with a prism (for wavelength discrimination). A switchable circular to point converter (SCPC), which enables the random selection of the wavelength bands divided by the Fabry-Perot interferometer utilizing the controllable beam steering capability of transmission HPDLCs, is demonstrated. A random optical cross-switch (TIROL) can be created by integrating a Fabry-Perot interferometer with a stack of SCPC units. The in-plane electric field generated by the interdigitated electrodes is utilized to elongate the helical pitch of a cholesteric liquid crystal and thereby induces a red shift of the transmission reflection peak
Comparison of Two Gas Selection Methodologies: An Application of Bayesian Model Averaging
Energy Technology Data Exchange (ETDEWEB)
Renholds, Andrea S.; Thompson, Sandra E.; Anderson, Kevin K.; Chilton, Lawrence K.
2006-03-31
One goal of hyperspectral imagery analysis is the detection and characterization of plumes. Characterization includes identifying the gases in the plumes, which is a model selection problem. Two gas selection methods compared in this report are Bayesian model averaging (BMA) and minimum Akaike information criterion (AIC) stepwise regression (SR). Simulated spectral data from a three-layer radiance transfer model were used to compare the two methods. Test gases were chosen to span the types of spectra observed, which exhibit peaks ranging from broad to sharp. The size and complexity of the search libraries were varied. Background materials were chosen to either replicate a remote area of eastern Washington or feature many common background materials. For many cases, BMA and SR performed the detection task comparably in terms of the receiver operating characteristic curves. For some gases, BMA performed better than SR when the size and complexity of the search library increased. This is encouraging because we expect improved BMA performance upon incorporation of prior information on background materials and gases.
Mechanism of gram variability in select bacteria.
Beveridge, T J
1990-03-01
Gram stains were performed on strains of Actinomyces bovis, Actinomyces viscosus, Arthrobacter globiformis, Bacillus brevis, Butyrivibrio fibrisolvens, Clostridium tetani, Clostridium thermosaccharolyticum, Corynebacterium parvum, Mycobacterium phlei, and Propionibacterium acnes, using a modified Gram regimen that allowed the staining process to be observed by electron microscopy (J. A. Davies, G. K. Anderson, T. J. Beveridge, and H. C. Clark, J. Bacteriol. 156:837-845, 1983). Furthermore, since a platinum salt replaced the iodine mordant of the Gram stain, energy-dispersive X-ray spectroscopy could evaluate the stain intensity and location by monitoring the platinum signal. These gram-variable bacteria could be split into two groups on the basis of their staining responses. In the Actinomyces-Arthrobacter-Corynebacterium-Mycobacterium-Propionibacterium group, few cells became gram negative until the exponential growth phase; by mid-exponential phase, 10 to 30% of the cells were gram negative. The cells that became gram negative were a select population of the culture, had initiated septum formation, and were more fragile to the stress of the Gram stain at the division site. As cultures aged to stationary phase, there was a relatively slight increase toward gram negativity (now 15 to 40%) due to the increased lysis of nondividing cells by means of lesions in the side walls; these cells maintained their rod shape but stained gram negative. Those in the Bacillus-Butyrivibrio-Clostridium group also became gram negative as cultures aged but by a separate set of events. These bacteria possessed more complex walls, since they were covered by an S layer. They stained gram positive during lag and the initial exponential growth phases, but as doubling times increased, the wall fabric underlying the S layer became noticeably thinner and diffuse, and the cells became more fragile to the Gram stain. By stationary phase, these cultures were virtually gram negative.
Underwood, Kristen L.; Rizzo, Donna M.; Schroth, Andrew W.; Dewoolkar, Mandar M.
2017-12-01
Given the variable biogeochemical, physical, and hydrological processes driving fluvial sediment and nutrient export, the water science and management communities need data-driven methods to identify regions prone to production and transport under variable hydrometeorological conditions. We use Bayesian analysis to segment concentration-discharge linear regression models for total suspended solids (TSS) and particulate and dissolved phosphorus (PP, DP) using 22 years of monitoring data from 18 Lake Champlain watersheds. Bayesian inference was leveraged to estimate segmented regression model parameters and identify threshold position. The identified threshold positions demonstrated a considerable range below and above the median discharge—which has been used previously as the default breakpoint in segmented regression models to discern differences between pre and post-threshold export regimes. We then applied a Self-Organizing Map (SOM), which partitioned the watersheds into clusters of TSS, PP, and DP export regimes using watershed characteristics, as well as Bayesian regression intercepts and slopes. A SOM defined two clusters of high-flux basins, one where PP flux was predominantly episodic and hydrologically driven; and another in which the sediment and nutrient sourcing and mobilization were more bimodal, resulting from both hydrologic processes at post-threshold discharges and reactive processes (e.g., nutrient cycling or lateral/vertical exchanges of fine sediment) at prethreshold discharges. A separate DP SOM defined two high-flux clusters exhibiting a bimodal concentration-discharge response, but driven by differing land use. Our novel framework shows promise as a tool with broad management application that provides insights into landscape drivers of riverine solute and sediment export.
A Bayesian predictive sample size selection design for single-arm exploratory clinical trials.
Teramukai, Satoshi; Daimon, Takashi; Zohar, Sarah
2012-12-30
The aim of an exploratory clinical trial is to determine whether a new intervention is promising for further testing in confirmatory clinical trials. Most exploratory clinical trials are designed as single-arm trials using a binary outcome with or without interim monitoring for early stopping. In this context, we propose a Bayesian adaptive design denoted as predictive sample size selection design (PSSD). The design allows for sample size selection following any planned interim analyses for early stopping of a trial, together with sample size determination before starting the trial. In the PSSD, we determine the sample size using the method proposed by Sambucini (Statistics in Medicine 2008; 27:1199-1224), which adopts a predictive probability criterion with two kinds of prior distributions, that is, an 'analysis prior' used to compute posterior probabilities and a 'design prior' used to obtain prior predictive distributions. In the sample size determination of the PSSD, we provide two sample sizes, that is, N and N(max) , using two types of design priors. At each interim analysis, we calculate the predictive probabilities of achieving a successful result at the end of the trial using the analysis prior in order to stop the trial in case of low or high efficacy (Lee et al., Clinical Trials 2008; 5:93-106), and we select an optimal sample size, that is, either N or N(max) as needed, on the basis of the predictive probabilities. We investigate the operating characteristics through simulation studies, and the PSSD retrospectively applies to a lung cancer clinical trial. (243) Copyright © 2012 John Wiley & Sons, Ltd.
Bayesian latent variable models for the analysis of experimental psychology data.
Merkle, Edgar C; Wang, Ting
2018-02-01
In this paper, we address the use of Bayesian factor analysis and structural equation models to draw inferences from experimental psychology data. While such application is non-standard, the models are generally useful for the unified analysis of multivariate data that stem from, e.g., subjects' responses to multiple experimental stimuli. We first review the models and the parameter identification issues inherent in the models. We then provide details on model estimation via JAGS and on Bayes factor estimation. Finally, we use the models to re-analyze experimental data on risky choice, comparing the approach to simpler, alternative methods.
Murphy, Thomas Brendan; Dean, Nema; Raftery, Adrian E.
2010-01-01
Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins. PMID:20936055
Variable selection in Logistic regression model with genetic algorithm.
Zhang, Zhongheng; Trevino, Victor; Hoseini, Sayed Shahabuddin; Belciug, Smaranda; Boopathi, Arumugam Manivanna; Zhang, Ping; Gorunescu, Florin; Subha, Velappan; Dai, Songshi
2018-02-01
Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.
Wentworth, Mami Tonoe
Uncertainty quantification plays an important role when making predictive estimates of model responses. In this context, uncertainty quantification is defined as quantifying and reducing uncertainties, and the objective is to quantify uncertainties in parameter, model and measurements, and propagate the uncertainties through the model, so that one can make a predictive estimate with quantified uncertainties. Two of the aspects of uncertainty quantification that must be performed prior to propagating uncertainties are model calibration and parameter selection. There are several efficient techniques for these processes; however, the accuracy of these methods are often not verified. This is the motivation for our work, and in this dissertation, we present and illustrate verification frameworks for model calibration and parameter selection in the context of biological and physical models. First, HIV models, developed and improved by [2, 3, 8], describe the viral infection dynamics of an HIV disease. These are also used to make predictive estimates of viral loads and T-cell counts and to construct an optimal control for drug therapy. Estimating input parameters is an essential step prior to uncertainty quantification. However, not all the parameters are identifiable, implying that they cannot be uniquely determined by the observations. These unidentifiable parameters can be partially removed by performing parameter selection, a process in which parameters that have minimal impacts on the model response are determined. We provide verification techniques for Bayesian model calibration and parameter selection for an HIV model. As an example of a physical model, we employ a heat model with experimental measurements presented in [10]. A steady-state heat model represents a prototypical behavior for heat conduction and diffusion process involved in a thermal-hydraulic model, which is a part of nuclear reactor models. We employ this simple heat model to illustrate verification
Langmore, Ian; Davis, Anthony B.; Bal, Guillaume; Marzouk, Youssef M.
2012-01-01
We describe a method for accelerating a 3D Monte Carlo forward radiative transfer model to the point where it can be used in a new kind of Bayesian retrieval framework. The remote sensing challenge is to detect and quantify a chemical effluent of a known absorbing gas produced by an industrial facility in a deep valley. The available data is a single low resolution noisy image of the scene in the near IR at an absorbing wavelength for the gas of interest. The detected sunlight has been multiply reflected by the variable terrain and/or scattered by an aerosol that is assumed partially known and partially unknown. We thus introduce a new class of remote sensing algorithms best described as "multi-pixel" techniques that call necessarily for a 3D radaitive transfer model (but demonstrated here in 2D); they can be added to conventional ones that exploit typically multi- or hyper-spectral data, sometimes with multi-angle capability, with or without information about polarization. The novel Bayesian inference methodology uses adaptively, with efficiency in mind, the fact that a Monte Carlo forward model has a known and controllable uncertainty depending on the number of sun-to-detector paths used.
Energy Technology Data Exchange (ETDEWEB)
Schoelzel, C. [Bonn Univ. (Germany). Meteorologisches Inst.
2006-07-01
This thesis presents the development of statistical climatological-botanical transfer functions in order to provide reconstructions of Holocene climate variability in the Near East region. Two classical concepts, the biomisation as well as the indicator taxa approach, are translated into a Bayesian network. Fossil pollen spectra of laminated sediments from the Ein Gedi location at the western shoreline of the Dead Sea and from the crater lake Birkat Ram in the northern Golan serve as proxy data, covering the past 10000 and 6500 years, respectively. The climatological variables are winter temperature, summer temperature, and annual precipitation, obtained from the 0.5 x 0.5 degree climatology CRU TS 1.0. The Bayesian biome model is based on the three main vegetation territories, the Mediterranean, the Irano-Turanian, and the Saharo-Arabian territory, which are digitized on the same grid as the climate data. From their spatial extend, a classification in the phase space is described by estimating the conditional probability for the existence of a certain biome given the climate. These biome specific likelihood functions are modelled by a generalised linear model, including second order monomials of the climate variables. A statistical mixture model is applied to the biome probabilities as estimated by the Ein Gedi data, resulting in a posterior probability density function for the three dimensional climate state vector. The indicator taxa model is based on the distribution of 15 Mediterranean taxa. Their spatial extend allows to estimate the taxon specific likelihood functions. In this case, they are conditional probability density functions for the climate state vector given the existence of a certain taxon. In order to address the general problem of multivariate non-normally distributed populations, multivariate normal Copulas are used, which allow to create distribution functions with gamma as well as normal marginal distributions. Applying the model to the Birkat
Li, Lianfa; Laurent, Olivier; Wu, Jun
2016-02-05
Epidemiological studies suggest that air pollution is adversely associated with pregnancy outcomes. Such associations may be modified by spatially-varying factors including socio-demographic characteristics, land-use patterns and unaccounted exposures. Yet, few studies have systematically investigated the impact of these factors on spatial variability of the air pollution's effects. This study aimed to examine spatial variability of the effects of air pollution on term birth weight across Census tracts and the influence of tract-level factors on such variability. We obtained over 900,000 birth records from 2001 to 2008 in Los Angeles County, California, USA. Air pollution exposure was modeled at individual level for nitrogen dioxide (NO2) and nitrogen oxides (NOx) using spatiotemporal models. Two-stage Bayesian hierarchical non-linear models were developed to (1) quantify the associations between air pollution exposure and term birth weight within each tract; and (2) examine the socio-demographic, land-use, and exposure-related factors contributing to the between-tract variability of the associations between air pollution and term birth weight. Higher air pollution exposure was associated with lower term birth weight (average posterior effects: -14.7 (95 % CI: -19.8, -9.7) g per 10 ppb increment in NO2 and -6.9 (95 % CI: -12.9, -0.9) g per 10 ppb increment in NOx). The variation of the association across Census tracts was significantly influenced by the tract-level socio-demographic, exposure-related and land-use factors. Our models captured the complex non-linear relationship between these factors and the associations between air pollution and term birth weight: we observed the thresholds from which the influence of the tract-level factors was markedly exacerbated or attenuated. Exacerbating factors might reflect additional exposure to environmental insults or lower socio-economic status with higher vulnerability, whereas attenuating factors might indicate reduced
Pang, Guofei; Perdikaris, Paris; Cai, Wei; Karniadakis, George Em
2017-11-01
The fractional advection-dispersion equation (FADE) can describe accurately the solute transport in groundwater but its fractional order has to be determined a priori. Here, we employ multi-fidelity Bayesian optimization to obtain the fractional order under various conditions, and we obtain more accurate results compared to previously published data. Moreover, the present method is very efficient as we use different levels of resolution to construct a stochastic surrogate model and quantify its uncertainty. We consider two different problem set ups. In the first set up, we obtain variable fractional orders of one-dimensional FADE, considering both synthetic and field data. In the second set up, we identify constant fractional orders of two-dimensional FADE using synthetic data. We employ multi-resolution simulations using two-level and three-level Gaussian process regression models to construct the surrogates.
Directory of Open Access Journals (Sweden)
W David Walter
Full Text Available Bovine tuberculosis is a bacterial disease caused by Mycobacterium bovis in livestock and wildlife with hosts that include Eurasian badgers (Meles meles, brushtail possum (Trichosurus vulpecula, and white-tailed deer (Odocoileus virginianus. Risk-assessment efforts in Michigan have been initiated on farms to minimize interactions of cattle with wildlife hosts but research on M. bovis on cattle farms has not investigated the spatial context of disease epidemiology. To incorporate spatially explicit data, initial likelihood of infection probabilities for cattle farms tested for M. bovis, prevalence of M. bovis in white-tailed deer, deer density, and environmental variables for each farm were modeled in a Bayesian hierarchical framework. We used geo-referenced locations of 762 cattle farms that have been tested for M. bovis, white-tailed deer prevalence, and several environmental variables that may lead to long-term survival and viability of M. bovis on farms and surrounding habitats (i.e., soil type, habitat type. Bayesian hierarchical analyses identified deer prevalence and proportion of sandy soil within our sampling grid as the most supported model. Analysis of cattle farms tested for M. bovis identified that for every 1% increase in sandy soil resulted in an increase in odds of infection by 4%. Our analysis revealed that the influence of prevalence of M. bovis in white-tailed deer was still a concern even after considerable efforts to prevent cattle interactions with white-tailed deer through on-farm mitigation and reduction in the deer population. Cattle farms test positive for M. bovis annually in our study area suggesting that the potential for an environmental source either on farms or in the surrounding landscape may contributing to new or re-infections with M. bovis. Our research provides an initial assessment of potential environmental factors that could be incorporated into additional modeling efforts as more knowledge of deer herd
THE TIME DOMAIN SPECTROSCOPIC SURVEY: VARIABLE SELECTION AND ANTICIPATED RESULTS
Energy Technology Data Exchange (ETDEWEB)
Morganson, Eric; Green, Paul J. [Harvard Smithsonian Center for Astrophysics, 60 Garden St, Cambridge, MA 02138 (United States); Anderson, Scott F.; Ruan, John J. [Department of Astronomy, University of Washington, Box 351580, Seattle, WA 98195 (United States); Myers, Adam D. [Department of Physics and Astronomy, University of Wyoming, Laramie, WY 82071 (United States); Eracleous, Michael; Brandt, William Nielsen [Department of Astronomy and Astrophysics, 525 Davey Laboratory, The Pennsylvania State University, University Park, PA 16802 (United States); Kelly, Brandon [Department of Physics, Broida Hall, University of California, Santa Barbara, CA 93106-9530 (United States); Badenes, Carlos [Department of Physics and Astronomy and Pittsburgh Particle Physics, Astrophysics and Cosmology Center (PITT PACC), University of Pittsburgh, 3941 O’Hara St, Pittsburgh, PA 15260 (United States); Bañados, Eduardo [Max-Planck-Institut für Astronomie, Königstuhl 17, D-69117 Heidelberg (Germany); Blanton, Michael R. [Center for Cosmology and Particle Physics, Department of Physics, New York University, 4 Washington Place, New York, NY 10003 (United States); Bershady, Matthew A. [Department of Astronomy, University of Wisconsin, 475 N. Charter St., Madison, WI 53706 (United States); Borissova, Jura [Instituto de Física y Astronomía, Universidad de Valparaíso, Av. Gran Bretaña 1111, Playa Ancha, Casilla 5030, and Millennium Institute of Astrophysics (MAS), Santiago (Chile); Burgett, William S. [GMTO Corp, Suite 300, 251 S. Lake Ave, Pasadena, CA 91101 (United States); Chambers, Kenneth, E-mail: emorganson@cfa.harvard.edu [Institute for Astronomy, University of Hawaii at Manoa, Honolulu, HI 96822 (United States); and others
2015-06-20
We present the selection algorithm and anticipated results for the Time Domain Spectroscopic Survey (TDSS). TDSS is an Sloan Digital Sky Survey (SDSS)-IV Extended Baryon Oscillation Spectroscopic Survey (eBOSS) subproject that will provide initial identification spectra of approximately 220,000 luminosity-variable objects (variable stars and active galactic nuclei across 7500 deg{sup 2} selected from a combination of SDSS and multi-epoch Pan-STARRS1 photometry. TDSS will be the largest spectroscopic survey to explicitly target variable objects, avoiding pre-selection on the basis of colors or detailed modeling of specific variability characteristics. Kernel Density Estimate analysis of our target population performed on SDSS Stripe 82 data suggests our target sample will be 95% pure (meaning 95% of objects we select have genuine luminosity variability of a few magnitudes or more). Our final spectroscopic sample will contain roughly 135,000 quasars and 85,000 stellar variables, approximately 4000 of which will be RR Lyrae stars which may be used as outer Milky Way probes. The variability-selected quasar population has a smoother redshift distribution than a color-selected sample, and variability measurements similar to those we develop here may be used to make more uniform quasar samples in large surveys. The stellar variable targets are distributed fairly uniformly across color space, indicating that TDSS will obtain spectra for a wide variety of stellar variables including pulsating variables, stars with significant chromospheric activity, cataclysmic variables, and eclipsing binaries. TDSS will serve as a pathfinder mission to identify and characterize the multitude of variable objects that will be detected photometrically in even larger variability surveys such as Large Synoptic Survey Telescope.
Directory of Open Access Journals (Sweden)
R. K. Tiwari
2011-08-01
Full Text Available A novel technique based on the Bayesian neural network (BNN theory is developed and employed to model the temperature variation record from the Western Himalayas. In order to estimate an a posteriori probability function, the BNN is trained with the Hybrid Monte Carlo (HMC/Markov Chain Monte Carlo (MCMC simulations algorithm. The efficacy of the new algorithm is tested on the well known chaotic, first order autoregressive (AR and random models and then applied to model the temperature variation record decoded from the tree-ring widths of the Western Himalayas for the period spanning over 1226–2000 AD. For modeling the actual tree-ring temperature data, optimum network parameters are chosen appropriately and then cross-validation test is performed to ensure the generalization skill of the network on the new data set. Finally, prediction result based on the BNN model is compared with the conventional artificial neural network (ANN and the AR linear models results. The comparative results show that the BNN based analysis makes better prediction than the ANN and the AR models. The new BNN modeling approach provides a viable tool for climate studies and could also be exploited for modeling other kinds of environmental data.
Directory of Open Access Journals (Sweden)
I. Furtado-Junior
Full Text Available Abstract Fishing selectivity of the mangrove crab Ucides cordatus in the north coast of Brazil can be defined as the fisherman's ability to capture and select individuals from a certain size or sex (or a combination of these factors which suggests an empirical selectivity. Considering this hypothesis, we calculated the selectivity curves for males and females crabs using the logit function of the logistic model in the formulation. The Bayesian inference consisted of obtaining the posterior distribution by applying the Markov chain Monte Carlo (MCMC method to software R using the OpenBUGS, BRugs, and R2WinBUGS libraries. The estimated results of width average carapace selection for males and females compared with previous studies reporting the average width of the carapace of sexual maturity allow us to confirm the hypothesis that most mature individuals do not suffer from fishing pressure; thus, ensuring their sustainability.
Huang, Xiaodong; Mengersen, Kerrie; Milinovich, Gabriel; Hu, Wenbiao
2017-06-01
The effects of weather variability on seasonal influenza among different age groups remain unclear. The comparative study aims to explore the differences in the associations between weather variability and seasonal influenza, and growth rates of seasonal influenza epidemics among different age groups in Queensland, Australia. Three Bayesian spatiotemporal conditional autoregressive models were fitted at the postal area level to quantify the relationships between seasonal influenza and monthly minimum temperature (MIT), monthly vapor pressure, school calendar pattern, and Index of Relative Socio-Economic Advantage and Disadvantage for 3 age groups (age, respectively, while the average increase in the monthly influenza cases was 14.6% (95% CI, 9.0%-21.0%), 12.1% (95% CI, 8.8%-16.1%), and 9.2% (95% CI, 1.4%-16.9%) for a 1-hPa increase in vapor pressure. Weather variability appears to be more influential on seasonal influenza transmission in younger (0-14) age groups. The growth rates of influenza at postal area level were relatively small for older (≥65) age groups in Queensland, Australia. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.
Austin, Peter C
2008-10-01
Researchers have proposed using bootstrap resampling in conjunction with automated variable selection methods to identify predictors of an outcome and to develop parsimonious regression models. Using this method, multiple bootstrap samples are drawn from the original data set. Traditional backward variable elimination is used in each bootstrap sample, and the proportion of bootstrap samples in which each candidate variable is identified as an independent predictor of the outcome is determined. The performance of this method for identifying predictor variables has not been examined. Monte Carlo simulation methods were used to determine the ability of bootstrap model selection methods to correctly identify predictors of an outcome when those variables that are selected for inclusion in at least 50% of the bootstrap samples are included in the final regression model. We compared the performance of the bootstrap model selection method to that of conventional backward variable elimination. Bootstrap model selection tended to result in an approximately equal proportion of selected models being equal to the true regression model compared with the use of conventional backward variable elimination. Bootstrap model selection performed comparatively to backward variable elimination for identifying the true predictors of a binary outcome.
Bayesian methods for data analysis
Carlin, Bradley P.
2009-01-01
Approaches for statistical inference Introduction Motivating Vignettes Defining the Approaches The Bayes-Frequentist Controversy Some Basic Bayesian Models The Bayes approach Introduction Prior Distributions Bayesian Inference Hierarchical Modeling Model Assessment Nonparametric Methods Bayesian computation Introduction Asymptotic Methods Noniterative Monte Carlo Methods Markov Chain Monte Carlo Methods Model criticism and selection Bayesian Modeling Bayesian Robustness Model Assessment Bayes Factors via Marginal Density Estimation Bayes Factors
Ju, Hyunsu; Brasier, Allan R
2013-09-11
The choice of selection methods to identify important variables for binary classification modeling is critical to produce stable models that are interpretable, that generate accurate predictions and have minimum bias. This work is motivated by data on clinical and laboratory features of severe dengue infections (dengue hemorrhagic fever, DHF) obtained from 51 individuals enrolled in a prospective observational study of acute human dengue infections. We carry out a comprehensive performance comparison using several classification models for DHF over the dengue data set. We compared variable selection results by Multivariate Adaptive Regression Splines, Learning Ensemble, Random Forest, Bayesian Moving Averaging, Stochastic Search Variable Selection, and Generalized Regularized Logistics Regression. Model averaging methods (bagging, boosting and ensemble learners) have higher accuracy, but the generalized regularized regression model has the highest predictive power because the linearity assumptions of candidate predictors are strongly satisfied via deviance chi-square testing procedures. Bootstrapping applications for evaluating predictive regression coefficients in regularized regression model are performed. Feature reduction methods introduce inherent biases and therefore are data-type dependent. We propose that these limitations can be overcome using an exhaustive approach for searching feature space. Using this approach, our results suggest that IL-10, platelet and lymphocyte counts are the major features for predicting dengue DHF on the basis of blood chemistries and cytokine measurements.
Davies, Andrew J; Hope, Max J
2015-07-15
Contingency plans are essential in guiding the response to marine oil spills. However, they are written before the pollution event occurs so must contain some degree of assumption and prediction and hence may be unsuitable for a real incident when it occurs. The use of Bayesian networks in ecology, environmental management, oil spill contingency planning and post-incident analysis is reviewed and analysed to establish their suitability for use as real-time environmental decision support systems during an oil spill response. It is demonstrated that Bayesian networks are appropriate for facilitating the re-assessment and re-validation of contingency plans following pollutant release, thus helping ensure that the optimum response strategy is adopted. This can minimise the possibility of sub-optimal response strategies causing additional environmental and socioeconomic damage beyond the original pollution event. Copyright © 2015 Elsevier Ltd. All rights reserved.
Using variable combination population analysis for variable selection in multivariate calibration.
Yun, Yong-Huan; Wang, Wei-Ting; Deng, Bai-Chuan; Lai, Guang-Bi; Liu, Xin-bo; Ren, Da-Bing; Liang, Yi-Zeng; Fan, Wei; Xu, Qing-Song
2015-03-03
Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of 'survival of the fittest' from Darwin's natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm-partial least squares (GA-PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750. Copyright © 2015 Elsevier B.V. All rights reserved.
Bayesian modeling of measurement error in predictor variables using item response theory
Fox, Gerardus J.A.; Glas, Cornelis A.W.
2000-01-01
This paper focuses on handling measurement error in predictor variables using item response theory (IRT). Measurement error is of great important in assessment of theoretical constructs, such as intelligence or the school climate. Measurement error is modeled by treating the predictors as unobserved
Ensembling Variable Selectors by Stability Selection for the Cox Model
Directory of Open Access Journals (Sweden)
Qing-Yan Yin
2017-01-01
Full Text Available As a pivotal tool to build interpretive models, variable selection plays an increasingly important role in high-dimensional data analysis. In recent years, variable selection ensembles (VSEs have gained much interest due to their many advantages. Stability selection (Meinshausen and Bühlmann, 2010, a VSE technique based on subsampling in combination with a base algorithm like lasso, is an effective method to control false discovery rate (FDR and to improve selection accuracy in linear regression models. By adopting lasso as a base learner, we attempt to extend stability selection to handle variable selection problems in a Cox model. According to our experience, it is crucial to set the regularization region Λ in lasso and the parameter λmin properly so that stability selection can work well. To the best of our knowledge, however, there is no literature addressing this problem in an explicit way. Therefore, we first provide a detailed procedure to specify Λ and λmin. Then, some simulated and real-world data with various censoring rates are used to examine how well stability selection performs. It is also compared with several other variable selection approaches. Experimental results demonstrate that it achieves better or competitive performance in comparison with several other popular techniques.
Model and Variable Selection Procedures for Semiparametric Time Series Regression
Directory of Open Access Journals (Sweden)
Risa Kato
2009-01-01
Full Text Available Semiparametric regression models are very useful for time series analysis. They facilitate the detection of features resulting from external interventions. The complexity of semiparametric models poses new challenges for issues of nonparametric and parametric inference and model selection that frequently arise from time series data analysis. In this paper, we propose penalized least squares estimators which can simultaneously select significant variables and estimate unknown parameters. An innovative class of variable selection procedure is proposed to select significant variables and basis functions in a semiparametric model. The asymptotic normality of the resulting estimators is established. Information criteria for model selection are also proposed. We illustrate the effectiveness of the proposed procedures with numerical simulations.
A numeric comparison of variable selection algorithms for supervised learning
International Nuclear Information System (INIS)
Palombo, G.; Narsky, I.
2009-01-01
Datasets in modern High Energy Physics (HEP) experiments are often described by dozens or even hundreds of input variables. Reducing a full variable set to a subset that most completely represents information about data is therefore an important task in analysis of HEP data. We compare various variable selection algorithms for supervised learning using several datasets such as, for instance, imaging gamma-ray Cherenkov telescope (MAGIC) data found at the UCI repository. We use classifiers and variable selection methods implemented in the statistical package StatPatternRecognition (SPR), a free open-source C++ package developed in the HEP community ( (http://sourceforge.net/projects/statpatrec/)). For each dataset, we select a powerful classifier and estimate its learning accuracy on variable subsets obtained by various selection algorithms. When possible, we also estimate the CPU time needed for the variable subset selection. The results of this analysis are compared with those published previously for these datasets using other statistical packages such as R and Weka. We show that the most accurate, yet slowest, method is a wrapper algorithm known as generalized sequential forward selection ('Add N Remove R') implemented in SPR.
Selecting candidate predictor variables for the modelling of post ...
African Journals Online (AJOL)
Selecting candidate predictor variables for the modelling of post-discharge mortality from sepsis: a protocol development project. Afri. Health Sci. .... Initial list of candidate predictor variables, N=17. Clinical. Laboratory. Social/Demographic. Vital signs (HR, RR, BP, T). Hemoglobin. Age. Oxygen saturation. Blood culture. Sex.
A Variable-Selection Heuristic for K-Means Clustering.
Brusco, Michael J.; Cradit, J. Dennis
2001-01-01
Presents a variable selection heuristic for nonhierarchical (K-means) cluster analysis based on the adjusted Rand index for measuring cluster recovery. Subjected the heuristic to Monte Carlo testing across more than 2,200 datasets. Results indicate that the heuristic is extremely effective at eliminating masking variables. (SLD)
Walter, William D.; Smith, Rick; Vanderklok, Mike; VerCauterren, Kurt C.
2014-01-01
Bovine tuberculosis is a bacterial disease caused by Mycobacterium bovis in livestock and wildlife with hosts that include Eurasian badgers (Meles meles), brushtail possum (Trichosurus vulpecula), and white-tailed deer (Odocoileus virginianus). Risk-assessment efforts in Michigan have been initiated on farms to minimize interactions of cattle with wildlife hosts but research onM. bovis on cattle farms has not investigated the spatial context of disease epidemiology. To incorporate spatially explicit data, initial likelihood of infection probabilities for cattle farms tested for M. bovis, prevalence of M. bovis in white-tailed deer, deer density, and environmental variables for each farm were modeled in a Bayesian hierarchical framework. We used geo-referenced locations of 762 cattle farms that have been tested for M. bovis, white-tailed deer prevalence, and several environmental variables that may lead to long-term survival and viability of M. bovis on farms and surrounding habitats (i.e., soil type, habitat type). Bayesian hierarchical analyses identified deer prevalence and proportion of sandy soil within our sampling grid as the most supported model. Analysis of cattle farms tested for M. bovisidentified that for every 1% increase in sandy soil resulted in an increase in odds of infection by 4%. Our analysis revealed that the influence of prevalence of M. bovis in white-tailed deer was still a concern even after considerable efforts to prevent cattle interactions with white-tailed deer through on-farm mitigation and reduction in the deer population. Cattle farms test positive for M. bovis annually in our study area suggesting that the potential for an environmental source either on farms or in the surrounding landscape may contributing to new or re-infections with M. bovis. Our research provides an initial assessment of potential environmental factors that could be incorporated into additional modeling efforts as more knowledge of deer herd
DEFF Research Database (Denmark)
Burgess, Stephen; Thompson, Simon G; Thompson, Grahame
2010-01-01
Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context...... an overall estimate of the causal relationship between the phenotype and the outcome, and an assessment of its heterogeneity across studies. As an example, we estimate the causal relationship of blood concentrations of C-reactive protein on fibrinogen levels using data from 11 studies. These methods provide...... a flexible framework for efficient estimation of causal relationships derived from multiple studies. Issues discussed include weak instrument bias, analysis of binary outcome data such as disease risk, missing genetic data, and the use of haplotypes....
2012-01-01
Background Bovine tuberculosis (bTB) is a chronic infectious disease mainly caused by Mycobacterium bovis. Although eradication is a priority for the European authorities, bTB remains active or even increasing in many countries, causing significant economic losses. The integral consideration of epidemiological factors is crucial to more cost-effectively allocate control measures. The aim of this study was to identify the nature and extent of the association between TB distribution and a list of potential risk factors regarding cattle, wild ungulates and environmental aspects in Ciudad Real, a Spanish province with one of the highest TB herd prevalences. Results We used a Bayesian mixed effects multivariable logistic regression model to predict TB occurrence in either domestic or wild mammals per municipality in 2007 by using information from the previous year. The municipal TB distribution and endemicity was clustered in the western part of the region and clearly overlapped with the explanatory variables identified in the final model: (1) incident cattle farms, (2) number of years of veterinary inspection of big game hunting events, (3) prevalence in wild boar, (4) number of sampled cattle, (5) persistent bTB-infected cattle farms, (6) prevalence in red deer, (7) proportion of beef farms, and (8) farms devoted to bullfighting cattle. Conclusions The combination of these eight variables in the final model highlights the importance of the persistence of the infection in the hosts, surveillance efforts and some cattle management choices in the circulation of M. bovis in the region. The spatial distribution of these variables, together with particular Mediterranean features that favour the wildlife-livestock interface may explain the M. bovis persistence in this region. Sanitary authorities should allocate efforts towards specific areas and epidemiological situations where the wildlife-livestock interface seems to critically hamper the definitive bTB eradication
Directory of Open Access Journals (Sweden)
Rodríguez-Prieto Víctor
2012-08-01
Full Text Available Abstract Background Bovine tuberculosis (bTB is a chronic infectious disease mainly caused by Mycobacterium bovis. Although eradication is a priority for the European authorities, bTB remains active or even increasing in many countries, causing significant economic losses. The integral consideration of epidemiological factors is crucial to more cost-effectively allocate control measures. The aim of this study was to identify the nature and extent of the association between TB distribution and a list of potential risk factors regarding cattle, wild ungulates and environmental aspects in Ciudad Real, a Spanish province with one of the highest TB herd prevalences. Results We used a Bayesian mixed effects multivariable logistic regression model to predict TB occurrence in either domestic or wild mammals per municipality in 2007 by using information from the previous year. The municipal TB distribution and endemicity was clustered in the western part of the region and clearly overlapped with the explanatory variables identified in the final model: (1 incident cattle farms, (2 number of years of veterinary inspection of big game hunting events, (3 prevalence in wild boar, (4 number of sampled cattle, (5 persistent bTB-infected cattle farms, (6 prevalence in red deer, (7 proportion of beef farms, and (8 farms devoted to bullfighting cattle. Conclusions The combination of these eight variables in the final model highlights the importance of the persistence of the infection in the hosts, surveillance efforts and some cattle management choices in the circulation of M. bovis in the region. The spatial distribution of these variables, together with particular Mediterranean features that favour the wildlife-livestock interface may explain the M. bovis persistence in this region. Sanitary authorities should allocate efforts towards specific areas and epidemiological situations where the wildlife-livestock interface seems to critically hamper the definitive b
Variable selection in multiple linear regression: The influence of ...
African Journals Online (AJOL)
Abstract. The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the Cp criterion and. Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is ...
Variable selection in multiple linear regression: The influence of ...
African Journals Online (AJOL)
The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the Cp criterion and Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is proposed ...
Financial applications of a Tabu search variable selection model
Directory of Open Access Journals (Sweden)
Zvi Drezner
2001-01-01
Full Text Available We illustrate how a comparatively new technique, a Tabu search variable selection model [Drezner, Marcoulides and Salhi (1999], can be applied efficiently within finance when the researcher must select a subset of variables from among the whole set of explanatory variables under consideration. Several types of problems in finance, including corporate and personal bankruptcy prediction, mortgage and credit scoring, and the selection of variables for the Arbitrage Pricing Model, require the researcher to select a subset of variables from a larger set. In order to demonstrate the usefulness of the Tabu search variable selection model, we: (1 illustrate its efficiency in comparison to the main alternative search procedures, such as stepwise regression and the Maximum R2 procedure, and (2 show how a version of the Tabu search procedure may be implemented when attempting to predict corporate bankruptcy. We accomplish (2 by indicating that a Tabu Search procedure increases the predictability of corporate bankruptcy by up to 10 percentage points in comparison to Altman's (1968 Z-Score model.
Variable selection and estimation for longitudinal survey data
Wang, Li
2014-09-01
There is wide interest in studying longitudinal surveys where sample subjects are observed successively over time. Longitudinal surveys have been used in many areas today, for example, in the health and social sciences, to explore relationships or to identify significant variables in regression settings. This paper develops a general strategy for the model selection problem in longitudinal sample surveys. A survey weighted penalized estimating equation approach is proposed to select significant variables and estimate the coefficients simultaneously. The proposed estimators are design consistent and perform as well as the oracle procedure when the correct submodel was known. The estimating function bootstrap is applied to obtain the standard errors of the estimated parameters with good accuracy. A fast and efficient variable selection algorithm is developed to identify significant variables for complex longitudinal survey data. Simulated examples are illustrated to show the usefulness of the proposed methodology under various model settings and sampling designs. © 2014 Elsevier Inc.
Variable selection for mixture and promotion time cure rate models.
Masud, Abdullah; Tu, Wanzhu; Yu, Zhangsheng
2016-11-16
Failure-time data with cured patients are common in clinical studies. Data from these studies are typically analyzed with cure rate models. Variable selection methods have not been well developed for cure rate models. In this research, we propose two least absolute shrinkage and selection operators based methods, for variable selection in mixture and promotion time cure models with parametric or nonparametric baseline hazards. We conduct an extensive simulation study to assess the operating characteristics of the proposed methods. We illustrate the use of the methods using data from a study of childhood wheezing. © The Author(s) 2016.
Howard B. Stauffer; Cynthia J. Zabel; Jeffrey R. Dunk
2005-01-01
We compared a set of competing logistic regression habitat selection models for Northern Spotted Owls (Strix occidentalis caurina) in California. The habitat selection models were estimated, compared, evaluated, and tested using multiple sample datasets collected on federal forestlands in northern California. We used Bayesian methods in interpreting...
Deng, Bai-chuan; Yun, Yong-huan; Liang, Yi-zeng; Yi, Lun-zhao
2014-10-07
In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/.
Norris, P. M.; da Silva, A. M., Jr.
2016-12-01
Norris and da Silva recently published a method to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation (CDA). The gridcolumn model includes assumed-PDF intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used are MODIS cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. The new approach not only significantly reduces mean and standard deviation biases with respect to the assimilated observables, but also improves the simulated rotational-Ramman scattering cloud optical centroid pressure against independent (non-assimilated) retrievals from the OMI instrument. One obvious difficulty for the method, and other CDA methods, is the lack of information content in passive cloud observables on cloud vertical structure, beyond cloud-top and thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard is helpful, better honoring inversion structures in the background state.
DEFF Research Database (Denmark)
Jensen, Finn Verner; Nielsen, Thomas Dyhre
2016-01-01
is largely due to the availability of efficient inference algorithms for answering probabilistic queries about the states of the variables in the network. Furthermore, to support the construction of Bayesian network models, learning algorithms are also available. We give an overview of the Bayesian network...
Effect of high intensive football match on selected physiological variables
Directory of Open Access Journals (Sweden)
VINAY PAWAR
2011-04-01
Full Text Available The aim of the study was to find out the effect of high intensive football match on selected physiologicalvariables. For this purpose 20 male football players with the mean age 24.6 ± 1.74 year were acted as a subject.Oral body temperature, body weight & blood pressure were the selected as physiological variables, which weremeasured by digital thermometer, weight machine and digital blood pressure. All the data were collected fifteenminute prior and instantly after the end of the match. Paired “t” was used as a statically tool which revealedthat there were statistical significant was observed in all the selected physiological variables. The obtained “t”value for oral body temperature was 9.18, for body weight 7.12, for diastolic blood pressure 8.17 & for systolicblood pressure 8.88 was found between all the physiological variables before and after the high intensivefootball match.
The Properties of Model Selection when Retaining Theory Variables
DEFF Research Database (Denmark)
Hendry, David F.; Johansen, Søren
Economic theories are often fitted directly to data to avoid possible model selection biases. We show that embedding a theory model that specifies the correct set of m relevant exogenous variables, x{t}, within the larger set of m+k candidate variables, (x{t},w{t}), then selection over the second...... set by their statistical significance can be undertaken without affecting the estimator distribution of the theory parameters. This strategy returns the theory-parameter estimates when the theory is correct, yet protects against the theory being under-specified because some w{t} are relevant....
Exhaustive Search for Sparse Variable Selection in Linear Regression
Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato
2018-04-01
We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.
Variable selection in multivariate calibration based on clustering of variable concept.
Farrokhnia, Maryam; Karimi, Sadegh
2016-01-01
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached. Copyright © 2015 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Mabaso Musawenkosi LH
2007-09-01
Full Text Available Abstract Background Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana. Results Of 50 potential explanatory variables from eight environmental data themes, 42 were significantly associated with malaria prevalence in univariate logistic regression and were ranked by the Akaike Information Criterion. Those correlated with higher-ranking relatives of the same environmental theme, were temporarily excluded. The remaining 14 candidates were ranked by selection frequency after running automated step-wise selection procedures on 1000 bootstrap samples drawn from the data. A non-spatial multiple-variable model was developed through step-wise inclusion in order of selection frequency. Previously excluded variables were then re-evaluated for inclusion, using further step-wise bootstrap procedures, resulting in the exclusion of another variable. Finally a Bayesian geo-statistical model using Markov Chain Monte Carlo simulation was fitted to the data, resulting in a final model of three predictor variables, namely summer rainfall, mean annual temperature and altitude. Each was independently and significantly associated with malaria prevalence after allowing for spatial correlation. This model was used to predict malaria prevalence at unobserved locations, producing a smooth risk map for the whole country. Conclusion We have
ENSEMBLE VARIABILITY OF NEAR-INFRARED-SELECTED ACTIVE GALACTIC NUCLEI
International Nuclear Information System (INIS)
Kouzuma, S.; Yamaoka, H.
2012-01-01
We present the properties of the ensemble variability V for nearly 5000 near-infrared active galactic nuclei (AGNs) selected from the catalog of Quasars and Active Galactic Nuclei (13th Edition) and the SDSS-DR7 quasar catalog. From three near-infrared point source catalogs, namely, Two Micron All Sky Survey (2MASS), Deep Near Infrared Survey (DENIS), and UKIDSS/LAS catalogs, we extract 2MASS-DENIS and 2MASS-UKIDSS counterparts for cataloged AGNs by cross-identification between catalogs. We further select variable AGNs based on an optimal criterion for selecting the variable sources. The sample objects are divided into subsets according to whether near-infrared light originates by optical emission or by near-infrared emission in the rest frame; and we examine the correlations of the ensemble variability with the rest-frame wavelength, redshift, luminosity, and rest-frame time lag. In addition, we also examine the correlations of variability amplitude with optical variability, radio intensity, and radio-to-optical flux ratio. The rest-frame optical variability of our samples shows negative correlations with luminosity and positive correlations with rest-frame time lag (i.e., the structure function, SF), and this result is consistent with previous analyses. However, no well-known negative correlation exists between the rest-frame wavelength and optical variability. This inconsistency might be due to a biased sampling of high-redshift AGNs. Near-infrared variability in the rest frame is anticorrelated with the rest-frame wavelength, which is consistent with previous suggestions. However, correlations of near-infrared variability with luminosity and rest-frame time lag are the opposite of these correlations of the optical variability; that is, the near-infrared variability is positively correlated with luminosity but negatively correlated with the rest-frame time lag. Because these trends are qualitatively consistent with the properties of radio-loud quasars reported
Assessing the impact of selected macroeconomic variables in the ...
African Journals Online (AJOL)
This paper examined the effects of some selected macroeconomic variables on residential property price in Lagos. Data were collected from Estate Surveyors and Valuers using the simple random sampling technique and Central Bank of Nigeria (CBN) Statistical Bulletin. The semi-log form of regression equation model ...
Noncausal Bayesian Vector Autoregression
DEFF Research Database (Denmark)
Lanne, Markku; Luoto, Jani
We propose a Bayesian inferential procedure for the noncausal vector autoregressive (VAR) model that is capable of capturing nonlinearities and incorporating effects of missing variables. In particular, we devise a fast and reliable posterior simulator that yields the predictive distribution...
Rammal, Abbas; Perrin, Eric; Vrabie, Valeriu; Assaf, Rabih; Fenniri, Hassan
2017-07-01
Infrared spectroscopy provides useful information on the molecular compositions of biological systems related to molecular vibrations, overtones, and combinations of fundamental vibrations. Mid-infrared (MIR) spectroscopy is sensitive to organic and mineral components and has attracted growing interest in the development of biomarkers related to intrinsic characteristics of lignocellulose biomass. However, not all spectral information is valuable for biomarker construction or for applying analysis methods such as classification. Better processing and interpretation can be achieved by identifying discriminating wavenumbers. The selection of wavenumbers has been addressed through several variable- or feature-selection methods. Some of them have not been adapted for use in large data sets or are difficult to tune, and others require additional information, such as concentrations. This paper proposes a new approach by combining a naïve Bayesian classifier with a genetic algorithm to identify discriminating spectral wavenumbers. The genetic algorithm uses a linear combination of an a posteriori probability and the Bayes error rate as the fitness function for optimization. Such a function allows the improvement of both the compactness and the separation of classes. This approach was tested to classify a small set of maize roots in soil according to their biodegradation process based on their MIR spectra. The results show that this optimization method allows better discrimination of the biodegradation process, compared with using the information of the entire MIR spectrum, the use of the spectral information at wavenumbers selected by a genetic algorithm based on a classical validity index or the use of the spectral information selected by combining a genetic algorithm with other methods, such as Linear Discriminant Analysis. The proposed method selects wavenumbers that correspond to principal vibrations of chemical functional groups of compounds that undergo degradation
A two-step method for variable selection in the analysis of a case-cohort study.
Newcombe, P J; Connolly, S; Seaman, S; Richardson, S; Sharp, S J
2017-11-10
Accurate detection and estimation of true exposure-outcome associations is important in aetiological analysis; when there are multiple potential exposure variables of interest, methods for detecting the subset of variables most likely to have true associations with the outcome of interest are required. Case-cohort studies often collect data on a large number of variables which have not been measured in the entire cohort (e.g. panels of biomarkers). There is a lack of guidance on methods for variable selection in case-cohort studies. We describe and explore the application of three variable selection methods to data from a case-cohort study. These are: (i) selecting variables based on their level of significance in univariable (i.e. one-at-a-time) Prentice-weighted Cox regression models; (ii) stepwise selection applied to Prentice-weighted Cox regression; and (iii) a two-step method which applies a Bayesian variable selection algorithm to obtain posterior probabilities of selection for each variable using multivariable logistic regression followed by effect estimation using Prentice-weighted Cox regression. Across nine different simulation scenarios, the two-step method demonstrated higher sensitivity and lower false discovery rate than the one-at-a-time and stepwise methods. In an application of the methods to data from the EPIC-InterAct case-cohort study, the two-step method identified an additional two fatty acids as being associated with incident type 2 diabetes, compared with the one-at-a-time and stepwise methods. The two-step method enables more powerful and accurate detection of exposure-outcome associations in case-cohort studies. An R package is available to enable researchers to apply this method. © The Author 2017. Published by Oxford University Press on behalf of the International Epidemiological Association.
Selecting Tanker Steaming Speeds under Uncertainty: A Rule-Based Bayesian Reasoning Approach
Directory of Open Access Journals (Sweden)
N.S.F. Abdul Rahman
2015-06-01
Full Text Available In the tanker industry, there are a lot of uncertain conditions that tanker companies have to deal with. For example, the global financial crisis and economic recession, the increase of bunker fuel prices and global climate change. Such conditions have forced tanker companies to change tankers speed from full speed to slow speed, extra slow speed and super slow speed. Due to such conditions, the objective of this paper is to present a methodology for determining vessel speeds of tankers that minimize the cost of the vessels under such conditions. The four levels of vessel speed in the tanker industry will be investigated and will incorporate a number of uncertain conditions. This will be done by developing a scientific model using a rule-based Bayesian reasoning method. The proposed model has produced 96 rules that can be used as guidance in the decision making process. Such results help tanker companies to determine the appropriate vessel speed to be used in a dynamic operational environmental.
Louzoun, Yoram; Alter, Idan; Gragert, Loren; Albrecht, Mark; Maiers, Martin
2018-05-01
Regardless of sampling depth, accurate genotype imputation is limited in regions of high polymorphism which often have a heavy-tailed haplotype frequency distribution. Many rare haplotypes are thus unobserved. Statistical methods to improve imputation by extending reference haplotype distributions using linkage disequilibrium patterns that relate allele and haplotype frequencies have not yet been explored. In the field of unrelated stem cell transplantation, imputation of highly polymorphic human leukocyte antigen (HLA) genes has an important application in identifying the best-matched stem cell donor when searching large registries totaling over 28,000,000 donors worldwide. Despite these large registry sizes, a significant proportion of searched patients present novel HLA haplotypes. Supporting this observation, HLA population genetic models have indicated that many extant HLA haplotypes remain unobserved. The absent haplotypes are a significant cause of error in haplotype matching. We have applied a Bayesian inference methodology for extending haplotype frequency distributions, using a model where new haplotypes are created by recombination of observed alleles. Applications of this joint probability model offer significant improvement in frequency distribution estimates over the best existing alternative methods, as we illustrate using five-locus HLA frequency data from the National Marrow Donor Program registry. Transplant matching algorithms and disease association studies involving phasing and imputation of rare variants may benefit from this statistical inference framework.
DEFF Research Database (Denmark)
Widyas, Nuzul; Jensen, Just; Nielsen, Vivi Hunnicke
selected downwards and three lines were kept as controls. Bayesian statistical methods are used to estimate the genetic variance components. Mixed model analysis is modified including mutation effect following the methods by Wray (1990). DIC was used to compare the model. Models including mutation effect...
Energy Technology Data Exchange (ETDEWEB)
Ruan, John J.; Anderson, Scott F.; MacLeod, Chelsea L.; Becker, Andrew C.; Davenport, James R. A.; Ivezic, Zeljko [Department of Astronomy, University of Washington, Box 351580, Seattle, WA 98195 (United States); Burnett, T. H. [Department of Physics, University of Washington, Seattle, WA 98195-1560 (United States); Kochanek, Christopher S. [Department of Astronomy, Ohio State University, 140 West 18th Avenue, Columbus, OH 43210 (United States); Plotkin, Richard M. [Department of Astronomy, University of Michigan, 500 Church Street, Ann Arbor, MI 48109 (United States); Sesar, Branimir [Division of Physics, Mathematics and Astronomy, Caltech, Pasadena, CA 91125 (United States); Stuart, J. Scott, E-mail: jruan@astro.washington.edu [Lincoln Laboratory, Massachusetts Institute of Technology, 244 Wood Street, Lexington, MA 02420-9108 (United States)
2012-11-20
We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the {approx}30% of {gamma}-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the optical LINEAR asteroid survey, we characterize the optical variability of blazars by fitting a damped random walk model to individual light curves with two main model parameters, the characteristic timescales of variability {tau}, and driving amplitudes on short timescales {sigma}-circumflex. Imposing cuts on minimum {tau} and {sigma}-circumflex allows for blazar selection with high efficiency E and completeness C. To test the efficacy of this approach, we apply this method to optically variable LINEAR objects that fall within the several-arcminute error ellipses of {gamma}-ray sources in the Fermi 2FGL catalog. Despite the extreme stellar contamination at the shallow depth of the LINEAR survey, we are able to recover previously associated optical counterparts to Fermi active galactic nuclei with E {>=} 88% and C = 88% in Fermi 95% confidence error ellipses having semimajor axis r < 8'. We find that the suggested radio counterpart to Fermi source 2FGL J1649.6+5238 has optical variability consistent with other {gamma}-ray blazars and is likely to be the {gamma}-ray source. Our results suggest that the variability of the non-thermal jet emission in blazars is stochastic in nature, with unique variability properties due to the effects of relativistic beaming. After correcting for beaming, we estimate that the characteristic timescale of blazar variability is {approx}3 years in the rest frame of the jet, in contrast with the {approx}320 day disk flux timescale observed in quasars. The variability-based selection method presented will be useful for blazar identification in time-domain optical surveys and is also a probe of jet physics.
Variable selection based cotton bollworm odor spectroscopic detection
Lü, Chengxu; Gai, Shasha; Luo, Min; Zhao, Bo
2016-10-01
Aiming at rapid automatic pest detection based efficient and targeting pesticide application and shooting the trouble of reflectance spectral signal covered and attenuated by the solid plant, the possibility of near infrared spectroscopy (NIRS) detection on cotton bollworm odor is studied. Three cotton bollworm odor samples and 3 blank air gas samples were prepared. Different concentrations of cotton bollworm odor were prepared by mixing the above gas samples, resulting a calibration group of 62 samples and a validation group of 31 samples. Spectral collection system includes light source, optical fiber, sample chamber, spectrometer. Spectra were pretreated by baseline correction, modeled with partial least squares (PLS), and optimized by genetic algorithm (GA) and competitive adaptive reweighted sampling (CARS). Minor counts differences are found among spectra of different cotton bollworm odor concentrations. PLS model of all the variables was built presenting RMSEV of 14 and RV2 of 0.89, its theory basis is insect volatilizes specific odor, including pheromone and allelochemics, which are used for intra-specific and inter-specific communication and could be detected by NIR spectroscopy. 28 sensitive variables are selected by GA, presenting the model performance of RMSEV of 14 and RV2 of 0.90. Comparably, 8 sensitive variables are selected by CARS, presenting the model performance of RMSEV of 13 and RV2 of 0.92. CARS model employs only 1.5% variables presenting smaller error than that of all variable. Odor gas based NIR technique shows the potential for cotton bollworm detection.
Portfolio Selection Based on Distance between Fuzzy Variables
Directory of Open Access Journals (Sweden)
Weiyi Qian
2014-01-01
Full Text Available This paper researches portfolio selection problem in fuzzy environment. We introduce a new simple method in which the distance between fuzzy variables is used to measure the divergence of fuzzy investment return from a prior one. Firstly, two new mathematical models are proposed by expressing divergence as distance, investment return as expected value, and risk as variance and semivariance, respectively. Secondly, the crisp forms of the new models are also provided for different types of fuzzy variables. Finally, several numerical examples are given to illustrate the effectiveness of the proposed approach.
Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas
2017-04-15
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Multi-scale inference of interaction rules in animal groups using Bayesian model selection.
Directory of Open Access Journals (Sweden)
Richard P Mann
2012-01-01
Full Text Available Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis. We show that these exhibit a stereotypical 'phase transition', whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have 'memory' of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture fine scale rules of interaction, which are primarily mediated by physical contact. Conversely, the Markovian self-propelled particle model captures the fine scale rules of interaction but fails to reproduce global dynamics. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics. We conclude that prawns' movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects.
Loneliness as a function of selected personality variables.
Hojat, M
1982-01-01
Hypothesized that in a multivariate statistical model, selected personality variables, depression, anxiety, neuroticism, psychoticism, misanthropy, and external locus of control, could positively predict loneliness, and self-esteem and extraversion could negatively predict loneliness scores. Two groups of Ss were studied independently. Ss in Group I were 232 Iranian college students (156 males, 76 females) who were studying in American colleges. Group II consisted of 305 Iranian students (168 males, 137 females) who were studying in Iranian universities. The obtained results, applying multiple regression analysis, confirmed the directions stated in the research hypothesis. However, some of the selected variables did not contribute significantly in the regression equations. Because of fluctuation of the regression coefficients, due to multicolinearity, the data were subjected to factor analysis. Two factors emerged with eigenvalues greater than unity. Loneliness loaded heavily on the first factor, which was identified as negative attribute of personality.
Jacobs, Rianne; Lesaffre, Emmanuel; Teunis, Peter Fm; Höhle, Michael; van de Kassteele, Jan
2017-01-01
Early identification of contaminated food products is crucial in reducing health burdens of food-borne disease outbreaks. Analytic case-control studies are primarily used in this identification stage by comparing exposures in cases and controls using logistic regression. Standard epidemiological
Filament winding cylinders. III - Selection of the process variables
Lee, Soo-Yong; Springer, George S.
1990-01-01
By using the Lee-Springer filament winding model temperatures, degrees of cure, viscosities, stresses, strains, fiber tensions, fiber motions, and void diameters were calculated in graphite-epoxy composite cylinders during the winding and subsequent curing. The results demonstrate the type of information which can be generated by the model. It is shown, in reference to these results, how the model, and the corresponding WINDTHICK code, can be used to select the appropriate process variables.
Mahalanobis distance and variable selection to optimize dose response
International Nuclear Information System (INIS)
Moore, D.H. II; Bennett, D.E.; Wyrobek, A.J.; Kranzler, D.
1979-01-01
A battery of statistical techniques are combined to improve detection of low-level dose response. First, Mahalanobis distances are used to classify objects as normal or abnormal. Then the proportion classified abnormal is regressed on dose. Finally, a subset of regressor variables is selected which maximizes the slope of the dose response line. Use of the techniques is illustrated by application to mouse sperm damaged by low doses of x-rays
Sollero, Bruna P; Junqueira, Vinícius S; Gomes, Cláudia C G; Caetano, Alexandre R; Cardoso, Fernando F
2017-06-15
Cattle resistance to ticks is known to be under genetic control with a complex biological mechanism within and among breeds. Our aim was to identify genomic segments and tag single nucleotide polymorphisms (SNPs) associated with tick-resistance in Hereford and Braford cattle. The predictive performance of a very low-density tag SNP panel was estimated and compared with results obtained with a 50 K SNP dataset. BayesB (π = 0.99) was initially applied in a genome-wide association study (GWAS) for this complex trait by using deregressed estimated breeding values for tick counts and 41,045 SNP genotypes from 3455 animals raised in southern Brazil. To estimate the combined effect of a genomic region that is potentially associated with quantitative trait loci (QTL), 2519 non-overlapping 1-Mb windows that varied in SNP number were defined, with the top 48 windows including 914 SNPs and explaining more than 20% of the estimated genetic variance for tick resistance. Subsequently, the most informative SNPs were selected based on Bayesian parameters (model frequency and t-like statistics), linkage disequilibrium and minor allele frequency to propose a very low-density 58-SNP panel. Some of these tag SNPs mapped close to or within genes and pseudogenes that are functionally related to tick resistance. Prediction ability of this SNP panel was investigated by cross-validation using K-means and random clustering and a BayesA model to predict direct genomic values. Accuracies from these cross-validations were 0.27 ± 0.09 and 0.30 ± 0.09 for the K-means and random clustering groups, respectively, compared to respective values of 0.37 ± 0.08 and 0.43 ± 0.08 when using all 41,045 SNPs and BayesB with π = 0.99, or of 0.28 ± 0.07 and 0.40 ± 0.08 with π = 0.999. Bayesian GWAS model parameters can be used to select tag SNPs for a very low-density panel, which will include SNPs that are potentially linked to functional genes. It can be useful for cost
How to avoid mismodelling in GLM-based fMRI data analysis: cross-validated Bayesian model selection.
Soch, Joram; Haynes, John-Dylan; Allefeld, Carsten
2016-11-01
Voxel-wise general linear models (GLMs) are a standard approach for analyzing functional magnetic resonance imaging (fMRI) data. An advantage of GLMs is that they are flexible and can be adapted to the requirements of many different data sets. However, the specification of first-level GLMs leaves the researcher with many degrees of freedom which is problematic given recent efforts to ensure robust and reproducible fMRI data analysis. Formal model comparisons that allow a systematic assessment of GLMs are only rarely performed. On the one hand, too simple models may underfit data and leave real effects undiscovered. On the other hand, too complex models might overfit data and also reduce statistical power. Here we present a systematic approach termed cross-validated Bayesian model selection (cvBMS) that allows to decide which GLM best describes a given fMRI data set. Importantly, our approach allows for non-nested model comparison, i.e. comparing more than two models that do not just differ by adding one or more regressors. It also allows for spatially heterogeneous modelling, i.e. using different models for different parts of the brain. We validate our method using simulated data and demonstrate potential applications to empirical data. The increased use of model comparison and model selection should increase the reliability of GLM results and reproducibility of fMRI studies. Copyright © 2016 Elsevier Inc. All rights reserved.
Variable Selection in Multivariable Regression Using SAS/IML
Directory of Open Access Journals (Sweden)
Ali A. Al-Subaihi
2002-11-01
Full Text Available This paper introduces a SAS/IML program to select among the multivariate model candidates based on a few well-known multivariate model selection criteria. Stepwise regression and all-possible-regression are considered. The program is user friendly and requires the user to paste or read the data at the beginning of the module, include the names of the dependent and independent variables (the y's and the x's, and then run the module. The program produces the multivariate candidate models based on the following criteria: Forward Selection, Forward Stepwise Regression, Backward Elimination, Mean Square Error, Coefficient of Multiple Determination, Adjusted Coefficient of Multiple Determination, Akaike's Information Criterion, the Corrected Form of Akaike's Information Criterion, Hannan and Quinn Information Criterion, the Corrected Form of Hannan and Quinn (HQc Information Criterion, Schwarz's Criterion, and Mallow's PC. The output also constitutes detailed as well as summarized results.
A Simple K-Map Based Variable Selection Scheme in the Direct ...
African Journals Online (AJOL)
A multiplexer with (n-l) data select inputs can realise directly a function of n variables. In this paper, a simple k-map based variable selection scheme is proposed such that an n variable logic function can be synthesised using a multiplexer with (n-q) data input variables and q data select variables. The procedure is based on ...
a simple k-map based variable selection scheme in the direct
African Journals Online (AJOL)
Dr Obe
ABSTRACT. A multiplexer with (n-l) data select inputs can realise directly a function of n variables. In this paper, a simple k-map based variable selection scheme is proposed such that an n variable logic function can be synthesised using a multiplexer with (n-q) data input variables and q data select variables.
Roth, T.; Sprau, P.; Naguib, M.; Amrhein, V.
2012-01-01
Responses of organisms to environments or to conspecifics may abruptly change once the organism has changed its state. For example, the expression of sexually selected signals often depends on the pairing status of the sender. A likely change in signaling routines at the point of pair formation
Roth, T.; Sprau, P.; Naguib, M.; Amrhein, V.
2012-01-01
Responses of organisms to environments or to conspecifics may abruptly change once the organism has changed its state. For example, the expression of sexually selected signals often depends on the pairing status of the sender. A likely change in signaling routines at the point of pair formation
Directory of Open Access Journals (Sweden)
Kaski Kimmo
2007-05-01
Full Text Available Abstract Background A key challenge in metabonomics is to uncover quantitative associations between multidimensional spectroscopic data and biochemical measures used for disease risk assessment and diagnostics. Here we focus on clinically relevant estimation of lipoprotein lipids by 1H NMR spectroscopy of serum. Results A Bayesian methodology, with a biochemical motivation, is presented for a real 1H NMR metabonomics data set of 75 serum samples. Lipoprotein lipid concentrations were independently obtained for these samples via ultracentrifugation and specific biochemical assays. The Bayesian models were constructed by Markov chain Monte Carlo (MCMC and they showed remarkably good quantitative performance, the predictive R-values being 0.985 for the very low density lipoprotein triglycerides (VLDL-TG, 0.787 for the intermediate, 0.943 for the low, and 0.933 for the high density lipoprotein cholesterol (IDL-C, LDL-C and HDL-C, respectively. The modelling produced a kernel-based reformulation of the data, the parameters of which coincided with the well-known biochemical characteristics of the 1H NMR spectra; particularly for VLDL-TG and HDL-C the Bayesian methodology was able to clearly identify the most characteristic resonances within the heavily overlapping information in the spectra. For IDL-C and LDL-C the resulting model kernels were more complex than those for VLDL-TG and HDL-C, probably reflecting the severe overlap of the IDL and LDL resonances in the 1H NMR spectra. Conclusion The systematic use of Bayesian MCMC analysis is computationally demanding. Nevertheless, the combination of high-quality quantification and the biochemical rationale of the resulting models is expected to be useful in the field of metabonomics.
Isoenzymatic variability in tropical maize populations under reciprocal recurrent selection
Directory of Open Access Journals (Sweden)
Pinto Luciana Rossini
2003-01-01
Full Text Available Maize (Zea mays L. is one of the crops in which the genetic variability has been extensively studied at isoenzymatic loci. The genetic variability of the maize populations BR-105 and BR-106, and the synthetics IG-3 and IG-4, obtained after one cycle of a high-intensity reciprocal recurrent selection (RRS, was investigated at seven isoenzymatic loci. A total of twenty alleles were identified, and most of the private alleles were found in the BR-106 population. One cycle of reciprocal recurrent selection (RRS caused reductions of 12% in the number of alleles in both populations. Changes in allele frequencies were also observed between populations and synthetics, mainly for the Est 2 locus. Populations presented similar values for the number of alleles per locus, percentage of polymorphic loci, and observed and expected heterozygosities. A decrease of the genetic variation values was observed for the synthetics as a consequence of genetic drift effects and reduction of the effective population sizes. The distribution of the genetic diversity within and between populations revealed that most of the diversity was maintained within them, i.e. BR-105 x BR-106 (G ST = 3.5% and IG-3 x IG-4 (G ST = 4.0%. The genetic distances between populations and synthetics increased approximately 21%. An increase in the genetic divergence between the populations occurred without limiting new selection procedures.
Variable Selection for Road Segmentation in Aerial Images
Warnke, S.; Bulatov, D.
2017-05-01
For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.
Chaotic Dynamical State Variables Selection Procedure Based Image Encryption Scheme
Directory of Open Access Journals (Sweden)
Zia Bashir
2017-12-01
Full Text Available Nowadays, in the modern digital era, the use of computer technologies such as smartphones, tablets and the Internet, as well as the enormous quantity of confidential information being converted into digital form have resulted in raised security issues. This, in turn, has led to rapid developments in cryptography, due to the imminent need for system security. Low-dimensional chaotic systems have low complexity and key space, yet they achieve high encryption speed. An image encryption scheme is proposed that, without compromising the security, uses reasonable resources. We introduced a chaotic dynamic state variables selection procedure (CDSVSP to use all state variables of a hyper-chaotic four-dimensional dynamical system. As a result, less iterations of the dynamical system are required, and resources are saved, thus making the algorithm fast and suitable for practical use. The simulation results of security and other miscellaneous tests demonstrate that the suggested algorithm excels at robustness, security and high speed encryption.
International Nuclear Information System (INIS)
Del Pozzo, Walter; Veitch, John; Vecchio, Alberto
2011-01-01
Second-generation interferometric gravitational-wave detectors, such as Advanced LIGO and Advanced Virgo, are expected to begin operation by 2015. Such instruments plan to reach sensitivities that will offer the unique possibility to test general relativity in the dynamical, strong-field regime and investigate departures from its predictions, in particular, using the signal from coalescing binary systems. We introduce a statistical framework based on Bayesian model selection in which the Bayes factor between two competing hypotheses measures which theory is favored by the data. Probability density functions of the model parameters are then used to quantify the inference on individual parameters. We also develop a method to combine the information coming from multiple independent observations of gravitational waves, and show how much stronger inference could be. As an introduction and illustration of this framework-and a practical numerical implementation through the Monte Carlo integration technique of nested sampling-we apply it to gravitational waves from the inspiral phase of coalescing binary systems as predicted by general relativity and a very simple alternative theory in which the graviton has a nonzero mass. This method can (and should) be extended to more realistic and physically motivated theories.
Estimation and variable selection for generalized additive partial linear models
Wang, Li
2011-08-01
We study generalized additive partial linear models, proposing the use of polynomial spline smoothing for estimation of nonparametric functions, and deriving quasi-likelihood based estimators for the linear parameters. We establish asymptotic normality for the estimators of the parametric components. The procedure avoids solving large systems of equations as in kernel-based procedures and thus results in gains in computational simplicity. We further develop a class of variable selection procedures for the linear parameters by employing a nonconcave penalized quasi-likelihood, which is shown to have an asymptotic oracle property. Monte Carlo simulations and an empirical example are presented for illustration. © Institute of Mathematical Statistics, 2011.
Ye, M.; Elshall, A. S.; Tang, G.; Samani, S.
2016-12-01
Bayesian Model Evidence (BME) is the measure of the average fit of the model to data given all the parameter values that the model can take. By accounting for the trade-off between the model ability to reproduce the observation data and model complexity, BME estimates of candidate models are employed to calculate model weights, which are used for model selection and model averaging. This study shows that accurate estimation of the BME is important for penalizing models with more complexity. To improve the accuracy of BME estimation, we resort to Monte Carlo numerical estimators over semi-analytical solutions (such as Laplace approximations, BIC, KIC and other). This study examines prominent numerical estimators of BME that are the thermodynamic integration (TI), and the importance sampling methods of arithmetic mean (AM), harmonic mean (HM), and steppingstone sampling (SS). AM estimator (based on prior sampling) and HM estimator (based on posterior sampling) are straightforward to implement, yet they lead to under and over estimation, respectively. TI and SS improve beyond this by means of sampling multiple intermediate distributions that links the prior and the posterior, using Markov Chain Monte Carlo (MCMC). TI and SS are theoretically unbiased estimators that are mathematically rigorous. Yet a theoretically unbiased estimator could have large bias in practice arising from numerical implementation, because MCMC sampling errors of certain intermediate distributions can introduce bias. We propose an SS variant, namely the multiple one-steppingstone sampling (MOSS), which turns these intermediate stumbling "blocks" of SS into steppingstones toward BME estimation. Thus, MOSS is less sensitive to MCMC sampling errors. We evaluate these estimators using a problem of groundwater transport model selection. The modeling results show that SS and MOSS estimators gave the most accurate results. In addition, the results show that the magnitude of the estimation error is a
A selective review of robust variable selection with applications in bioinformatics.
Wu, Cen; Ma, Shuangge
2015-09-01
A drastic amount of data have been and are being generated in bioinformatics studies. In the analysis of such data, the standard modeling approaches can be challenged by the heavy-tailed errors and outliers in response variables, the contamination in predictors (which may be caused by, for instance, technical problems in microarray gene expression studies), model mis-specification and others. Robust methods are needed to tackle these challenges. When there are a large number of predictors, variable selection can be as important as estimation. As a generic variable selection and regularization tool, penalization has been extensively adopted. In this article, we provide a selective review of robust penalized variable selection approaches especially designed for high-dimensional data from bioinformatics and biomedical studies. We discuss the robust loss functions, penalty functions and computational algorithms. The theoretical properties and implementation are also briefly examined. Application examples of the robust penalization approaches in representative bioinformatics and biomedical studies are also illustrated. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Balabin, Roman M; Smirnov, Sergey V
2011-04-29
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic
International Nuclear Information System (INIS)
Balabin, Roman M.; Smirnov, Sergey V.
2011-01-01
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm -1 ) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic
Wöhling, T.; Schöniger, A.; Geiges, A.; Nowak, W.; Gayler, S.
2013-12-01
The objective selection of appropriate models for realistic simulations of coupled soil-plant processes is a challenging task since the processes are complex, not fully understood at larger scales, and highly non-linear. Also, comprehensive data sets are scarce, and measurements are uncertain. In the past decades, a variety of different models have been developed that exhibit a wide range of complexity regarding their approximation of processes in the coupled model compartments. We present a method for evaluating experimental design for maximum confidence in the model selection task. The method considers uncertainty in parameters, measurements and model structures. Advancing the ideas behind Bayesian Model Averaging (BMA), we analyze the changes in posterior model weights and posterior model choice uncertainty when more data are made available. This allows assessing the power of different data types, data densities and data locations in identifying the best model structure from among a suite of plausible models. The models considered in this study are the crop models CERES, SUCROS, GECROS and SPASS, which are coupled to identical routines for simulating soil processes within the modelling framework Expert-N. The four models considerably differ in the degree of detail at which crop growth and root water uptake are represented. Monte-Carlo simulations were conducted for each of these models considering their uncertainty in soil hydraulic properties and selected crop model parameters. Using a Bootstrap Filter (BF), the models were then conditioned on field measurements of soil moisture, matric potential, leaf-area index, and evapotranspiration rates (from eddy-covariance measurements) during a vegetation period of winter wheat at a field site at the Swabian Alb in Southwestern Germany. Following our new method, we derived model weights when using all data or different subsets thereof. We discuss to which degree the posterior mean outperforms the prior mean and all
Ethnic variability in adiposity and cardiovascular risk: the variable disease selection hypothesis.
Wells, Jonathan C K
2009-02-01
Evidence increasingly suggests that ethnic differences in cardiovascular risk are partly mediated by adipose tissue biology, which refers to the regional distribution of adipose tissue and its differential metabolic activity. This paper proposes a novel evolutionary hypothesis for ethnic genetic variability in adipose tissue biology. Whereas medical interest focuses on the harmful effect of excess fat, the value of adipose tissue is greatest during chronic energy insufficiency. Following Neel's influential paper on the thrifty genotype, proposed to have been favoured by exposure to cycles of feast and famine, much effort has been devoted to searching for genetic markers of 'thrifty metabolism'. However, whether famine-induced starvation was the primary selective pressure on adipose tissue biology has been questioned, while the notion that fat primarily represents a buffer against starvation appears inconsistent with historical records of mortality during famines. This paper reviews evidence for the role played by adipose tissue in immune function and proposes that adipose tissue biology responds to selective pressures acting through infectious disease. Different diseases activate the immune system in different ways and induce different metabolic costs. It is hypothesized that exposure to different infectious disease burdens has favoured ethnic genetic variability in the anatomical location of, and metabolic profile of, adipose tissue depots.
Buchner, J.; Georgakakis, A.; Nandra, K.; Hsu, L.; Rangel, C.; Brightman, M.; Merloni, A.; Salvato, M.; Donley, J.; Kocevski, D.
2014-04-01
Context. Aims: Active galactic nuclei are known to have complex X-ray spectra that depend on both the properties of the accreting super-massive black hole (e.g. mass, accretion rate) and the distribution of obscuring material in its vicinity (i.e. the "torus"). Often however, simple and even unphysical models are adopted to represent the X-ray spectra of AGN, which do not capture the complexity and diversity of the observations. In the case of blank field surveys in particular, this should have an impact on e.g. the determination of the AGN luminosity function, the inferred accretion history of the Universe and also on our understanding of the relation between AGN and their host galaxies. Methods: We develop a Bayesian framework for model comparison and parameter estimation of X-ray spectra. We take into account uncertainties associated with both the Poisson nature of X-ray data and the determination of source redshift using photometric methods. We also demonstrate how Bayesian model comparison can be used to select among ten different physically motivated X-ray spectral models the one that provides a better representation of the observations. This methodology is applied to X-ray AGN in the 4 Ms Chandra Deep Field South. Results: For the ~350 AGN in that field, our analysis identifies four components needed to represent the diversity of the observed X-ray spectra: (1) an intrinsic power law; (2) a cold obscurer which reprocesses the radiation due to photo-electric absorption, Compton scattering and Fe-K fluorescence; (3) an unabsorbed power law associated with Thomson scattering off ionised clouds; and (4) Compton reflection, most noticeable from a stronger-than-expected Fe-K line. Simpler models, such as a photo-electrically absorbed power law with a Thomson scattering component, are ruled out with decisive evidence (B > 100). We also find that ignoring the Thomson scattering component results in underestimation of the inferred column density, NH, of the obscurer
Peltola, Tomi; Marttinen, Pekka; Vehtari, Aki
2012-01-01
High-dimensional datasets with large amounts of redundant information are nowadays available for hypothesis-free exploration of scientific questions. A particular case is genome-wide association analysis, where variations in the genome are searched for effects on disease or other traits. Bayesian variable selection has been demonstrated as a possible analysis approach, which can account for the multifactorial nature of the genetic effects in a linear regression model. Yet, the computation presents a challenge and application to large-scale data is not routine. Here, we study aspects of the computation using the Metropolis-Hastings algorithm for the variable selection: finite adaptation of the proposal distributions, multistep moves for changing the inclusion state of multiple variables in a single proposal and multistep move size adaptation. We also experiment with a delayed rejection step for the multistep moves. Results on simulated and real data show increase in the sampling efficiency. We also demonstrate that with application specific proposals, the approach can overcome a specific mixing problem in real data with 3822 individuals and 1,051,811 single nucleotide polymorphisms and uncover a variant pair with synergistic effect on the studied trait. Moreover, we illustrate multimodality in the real dataset related to a restrictive prior distribution on the genetic effect sizes and advocate a more flexible alternative. PMID:23166669
Sex-specific selection for MHC variability in Alpine chamois
Directory of Open Access Journals (Sweden)
Schaschl Helmut
2012-02-01
Full Text Available Abstract Background In mammals, males typically have shorter lives than females. This difference is thought to be due to behavioural traits which enhance competitive abilities, and hence male reproductive success, but impair survival. Furthermore, in many species males usually show higher parasite burden than females. Consequently, the intensity of selection for genetic factors which reduce susceptibility to pathogens may differ between sexes. High variability at the major histocompatibility complex (MHC genes is believed to be advantageous for detecting and combating the range of infectious agents present in the environment. Increased heterozygosity at these immune genes is expected to be important for individual longevity. However, whether males in natural populations benefit more from MHC heterozygosity than females has rarely been investigated. We investigated this question in a long-term study of free-living Alpine chamois (Rupicapra rupicapra, a polygynous mountain ungulate. Results Here we show that male chamois survive significantly (P = 0.022 longer if heterozygous at the MHC class II DRB locus, whereas females do not. Improved survival of males was not a result of heterozygote advantage per se, as background heterozygosity (estimated across twelve microsatellite loci did not change significantly with age. Furthermore, reproductively active males depleted their body fat reserves earlier than females leading to significantly impaired survival rates in this sex (P Conclusions Increased MHC class II DRB heterozygosity with age in males, suggests that MHC heterozygous males survive longer than homozygotes. Reproductively active males appear to be less likely to survive than females most likely because of the energetic challenge of the winter rut, accompanied by earlier depletion of their body fat stores, and a generally higher parasite burden. This scenario renders the MHC-mediated immune response more important for males than for females
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data.
Abram, Samantha V; Helwig, Nathaniel E; Moodie, Craig A; DeYoung, Colin G; MacDonald, Angus W; Waller, Niels G
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks.
Lesaffre, Emmanuel
2012-01-01
The growth of biostatistics has been phenomenal in recent years and has been marked by considerable technical innovation in both methodology and computational practicality. One area that has experienced significant growth is Bayesian methods. The growing use of Bayesian methodology has taken place partly due to an increasing number of practitioners valuing the Bayesian paradigm as matching that of scientific discovery. In addition, computational advances have allowed for more complex models to be fitted routinely to realistic data sets. Through examples, exercises and a combination of introd
McGuire, Jimmy A; Witt, Christopher C; Altshuler, Douglas L; Remsen, J V
2007-10-01
Hummingbirds are an important model system in avian biology, but to date the group has been the subject of remarkably few phylogenetic investigations. Here we present partitioned Bayesian and maximum likelihood phylogenetic analyses for 151 of approximately 330 species of hummingbirds and 12 outgroup taxa based on two protein-coding mitochondrial genes (ND2 and ND4), flanking tRNAs, and two nuclear introns (AK1 and BFib). We analyzed these data under several partitioning strategies ranging between unpartitioned and a maximum of nine partitions. In order to select a statistically justified partitioning strategy following partitioned Bayesian analysis, we considered four alternative criteria including Bayes factors, modified versions of the Akaike information criterion for small sample sizes (AIC(c)), Bayesian information criterion (BIC), and a decision-theoretic methodology (DT). Following partitioned maximum likelihood analyses, we selected a best-fitting strategy using hierarchical likelihood ratio tests (hLRTS), the conventional AICc, BIC, and DT, concluding that the most stringent criterion, the performance-based DT, was the most appropriate methodology for selecting amongst partitioning strategies. In the context of our well-resolved and well-supported phylogenetic estimate, we consider the historical biogeography of hummingbirds using ancestral state reconstructions of (1) primary geographic region of occurrence (i.e., South America, Central America, North America, Greater Antilles, Lesser Antilles), (2) Andean or non-Andean geographic distribution, and (3) minimum elevational occurrence. These analyses indicate that the basal hummingbird assemblages originated in the lowlands of South America, that most of the principle clades of hummingbirds (all but Mountain Gems and possibly Bees) originated on this continent, and that there have been many (at least 30) independent invasions of other primary landmasses, especially Central America.
Sex-specific selection for MHC variability in Alpine chamois.
Schaschl, Helmut; Suchentrunk, Franz; Morris, David L; Ben Slimen, Hichem; Smith, Steve; Arnold, Walter
2012-02-15
In mammals, males typically have shorter lives than females. This difference is thought to be due to behavioural traits which enhance competitive abilities, and hence male reproductive success, but impair survival. Furthermore, in many species males usually show higher parasite burden than females. Consequently, the intensity of selection for genetic factors which reduce susceptibility to pathogens may differ between sexes. High variability at the major histocompatibility complex (MHC) genes is believed to be advantageous for detecting and combating the range of infectious agents present in the environment. Increased heterozygosity at these immune genes is expected to be important for individual longevity. However, whether males in natural populations benefit more from MHC heterozygosity than females has rarely been investigated. We investigated this question in a long-term study of free-living Alpine chamois (Rupicapra rupicapra), a polygynous mountain ungulate. Here we show that male chamois survive significantly (P = 0.022) longer if heterozygous at the MHC class II DRB locus, whereas females do not. Improved survival of males was not a result of heterozygote advantage per se, as background heterozygosity (estimated across twelve microsatellite loci) did not change significantly with age. Furthermore, reproductively active males depleted their body fat reserves earlier than females leading to significantly impaired survival rates in this sex (P < 0.008). This sex-difference was even more pronounced in areas affected by scabies, a severe parasitosis, as reproductively active males were less likely to survive than females. However, we did not find evidence for a survival advantage associated with specific MHC alleles in areas affected by scabies. Increased MHC class II DRB heterozygosity with age in males, suggests that MHC heterozygous males survive longer than homozygotes. Reproductively active males appear to be less likely to survive than females most likely
Gilkey, Kelly M.; Myers, Jerry G.; McRae, Michael P.; Griffin, Elise A.; Kallrui, Aditya S.
2012-01-01
The Exploration Medical Capability project is creating a catalog of risk assessments using the Integrated Medical Model (IMM). The IMM is a software-based system intended to assist mission planners in preparing for spaceflight missions by helping them to make informed decisions about medical preparations and supplies needed for combating and treating various medical events using Probabilistic Risk Assessment. The objective is to use statistical analyses to inform the IMM decision tool with estimated probabilities of medical events occurring during an exploration mission. Because data regarding astronaut health are limited, Bayesian statistical analysis is used. Bayesian inference combines prior knowledge, such as data from the general U.S. population, the U.S. Submarine Force, or the analog astronaut population located at the NASA Johnson Space Center, with observed data for the medical condition of interest. The posterior results reflect the best evidence for specific medical events occurring in flight. Bayes theorem provides a formal mechanism for combining available observed data with data from similar studies to support the quantification process. The IMM team performed Bayesian updates on the following medical events: angina, appendicitis, atrial fibrillation, atrial flutter, dental abscess, dental caries, dental periodontal disease, gallstone disease, herpes zoster, renal stones, seizure, and stroke.
Norris, Peter M.; Da Silva, Arlindo M.
2016-01-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
On the prior probabilities for two-stage Bayesian estimates
International Nuclear Information System (INIS)
Kohut, P.
1992-01-01
The method of Bayesian inference is reexamined for its applicability and for the required underlying assumptions in obtaining and using prior probability estimates. Two different approaches are suggested to determine the first-stage priors in the two-stage Bayesian analysis which avoid certain assumptions required for other techniques. In the first scheme, the prior is obtained through a true frequency based distribution generated at selected intervals utilizing actual sampling of the failure rate distributions. The population variability distribution is generated as the weighed average of the frequency distributions. The second method is based on a non-parametric Bayesian approach using the Maximum Entropy Principle. Specific features such as integral properties or selected parameters of prior distributions may be obtained with minimal assumptions. It is indicated how various quantiles may also be generated with a least square technique
Input variable selection for interpolating high-resolution climate ...
African Journals Online (AJOL)
Although the primary input data of climate interpolations are usually meteorological data, other related (independent) variables are frequently incorporated in the interpolation process. One such variable is elevation, which is known to have a strong influence on climate. This research investigates the potential of 4 additional ...
Bhargava and Ishizuka's BI-Method: A Neglected Method for Variable Selection
Leung, Shing On; Sachs, John
2005-01-01
Quite often in data reduction, it is more meaningful and economical to select a subset of variables instead of reducing the dimensionality of the variable space with principal components analysis. The authors present a neglected method for variable selection called the BI-method (R. P. Bhargava & T. Ishizuka, 1981). It is a direct, simple method…
Bayesian modeling using WinBUGS
Ntzoufras, Ioannis
2009-01-01
A hands-on introduction to the principles of Bayesian modeling using WinBUGS Bayesian Modeling Using WinBUGS provides an easily accessible introduction to the use of WinBUGS programming techniques in a variety of Bayesian modeling settings. The author provides an accessible treatment of the topic, offering readers a smooth introduction to the principles of Bayesian modeling with detailed guidance on the practical implementation of key principles. The book begins with a basic introduction to Bayesian inference and the WinBUGS software and goes on to cover key topics, including: Markov Chain Monte Carlo algorithms in Bayesian inference Generalized linear models Bayesian hierarchical models Predictive distribution and model checking Bayesian model and variable evaluation Computational notes and screen captures illustrate the use of both WinBUGS as well as R software to apply the discussed techniques. Exercises at the end of each chapter allow readers to test their understanding of the presented concepts and all ...
Flood quantile estimation at ungauged sites by Bayesian networks
Mediero, L.; Santillán, D.; Garrote, L.
2012-04-01
Estimating flood quantiles at a site for which no observed measurements are available is essential for water resources planning and management. Ungauged sites have no observations about the magnitude of floods, but some site and basin characteristics are known. The most common technique used is the multiple regression analysis, which relates physical and climatic basin characteristic to flood quantiles. Regression equations are fitted from flood frequency data and basin characteristics at gauged sites. Regression equations are a rigid technique that assumes linear relationships between variables and cannot take the measurement errors into account. In addition, the prediction intervals are estimated in a very simplistic way from the variance of the residuals in the estimated model. Bayesian networks are a probabilistic computational structure taken from the field of Artificial Intelligence, which have been widely and successfully applied to many scientific fields like medicine and informatics, but application to the field of hydrology is recent. Bayesian networks infer the joint probability distribution of several related variables from observations through nodes, which represent random variables, and links, which represent causal dependencies between them. A Bayesian network is more flexible than regression equations, as they capture non-linear relationships between variables. In addition, the probabilistic nature of Bayesian networks allows taking the different sources of estimation uncertainty into account, as they give a probability distribution as result. A homogeneous region in the Tagus Basin was selected as case study. A regression equation was fitted taking the basin area, the annual maximum 24-hour rainfall for a given recurrence interval and the mean height as explanatory variables. Flood quantiles at ungauged sites were estimated by Bayesian networks. Bayesian networks need to be learnt from a huge enough data set. As observational data are reduced, a
Laurent, V.C.E.; Verhoef, W.; Damm, A.; Schaepman, M.E.; Clevers, J.G.P.W.
2013-01-01
Vegetation variables such as leaf area index (LAI) and leaf chlorophyll content (Cab) are important inputs for vegetation growth models. LAI and Cab can be estimated from remote sensing data using either empirical or physically-based approaches. The latter are more generally applicable because they
Laurent, V.C.E.; Verhoef, W.; Damm, A.; Schaepman, M.E.; Clevers, J.G.P.W.
2013-01-01
Vegetation variables such as leaf area index (LAI) and leaf chlorophyll content (Cab) are important inputs for vegetation growth models. LAI and Cab can be estimated from remote sensing data using either empirical or physically-based approaches. The latter are more generally applicable because they
Bayesian Analysis Made Simple An Excel GUI for WinBUGS
Woodward, Philip
2011-01-01
From simple NLMs to complex GLMMs, this book describes how to use the GUI for WinBUGS - BugsXLA - an Excel add-in written by the author that allows a range of Bayesian models to be easily specified. With case studies throughout, the text shows how to routinely apply even the more complex aspects of model specification, such as GLMMs, outlier robust models, random effects Emax models, auto-regressive errors, and Bayesian variable selection. It provides brief, up-to-date discussions of current issues in the practical application of Bayesian methods. The author also explains how to obtain free so
A New Variable Weighting and Selection Procedure for K-Means Cluster Analysis
Steinley, Douglas; Brusco, Michael J.
2008-01-01
A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these…
A Bayesian Reformulation of the Extended Drift-Diffusion Model in Perceptual Decision Making
Fard, Pouyan R.; Park, Hame; Warkentin, Andrej; Kiebel, Stefan J.; Bitzer, Sebastian
2017-01-01
Perceptual decision making can be described as a process of accumulating evidence to a bound which has been formalized within drift-diffusion models (DDMs). Recently, an equivalent Bayesian model has been proposed. In contrast to standard DDMs, this Bayesian model directly links information in the stimulus to the decision process. Here, we extend this Bayesian model further and allow inter-trial variability of two parameters following the extended version of the DDM. We derive parameter distributions for the Bayesian model and show that they lead to predictions that are qualitatively equivalent to those made by the extended drift-diffusion model (eDDM). Further, we demonstrate the usefulness of the extended Bayesian model (eBM) for the analysis of concrete behavioral data. Specifically, using Bayesian model selection, we find evidence that including additional inter-trial parameter variability provides for a better model, when the model is constrained by trial-wise stimulus features. This result is remarkable because it was derived using just 200 trials per condition, which is typically thought to be insufficient for identifying variability parameters in DDMs. In sum, we present a Bayesian analysis, which provides for a novel and promising analysis of perceptual decision making experiments. PMID:28553219
A Bayesian Reformulation of the Extended Drift-Diffusion Model in Perceptual Decision Making
Directory of Open Access Journals (Sweden)
Pouyan R. Fard
2017-05-01
Full Text Available Perceptual decision making can be described as a process of accumulating evidence to a bound which has been formalized within drift-diffusion models (DDMs. Recently, an equivalent Bayesian model has been proposed. In contrast to standard DDMs, this Bayesian model directly links information in the stimulus to the decision process. Here, we extend this Bayesian model further and allow inter-trial variability of two parameters following the extended version of the DDM. We derive parameter distributions for the Bayesian model and show that they lead to predictions that are qualitatively equivalent to those made by the extended drift-diffusion model (eDDM. Further, we demonstrate the usefulness of the extended Bayesian model (eBM for the analysis of concrete behavioral data. Specifically, using Bayesian model selection, we find evidence that including additional inter-trial parameter variability provides for a better model, when the model is constrained by trial-wise stimulus features. This result is remarkable because it was derived using just 200 trials per condition, which is typically thought to be insufficient for identifying variability parameters in DDMs. In sum, we present a Bayesian analysis, which provides for a novel and promising analysis of perceptual decision making experiments.
Pathogen-mediated selection for MHC variability in wild zebrafish
Czech Academy of Sciences Publication Activity Database
Smith, C.; Ondračková, Markéta; Spence, R.; Adams, S.; Betts, D. S.; Mallon, E.
2011-01-01
Roč. 13, č. 6 (2011), s. 589-605 ISSN 1522-0613 Institutional support: RVO:68081766 Keywords : digenean * frequency-dependent selection * heterozygote advantage * major histocompatibility complex * metazoan parasite * pathogen-driven selection Subject RIV: EG - Zoology Impact factor: 1.029, year: 2011
Bayesian Peak Picking for NMR Spectra
Cheng, Yichen
2014-02-01
Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method.
Bayesian Peak Picking for NMR Spectra
Directory of Open Access Journals (Sweden)
Yichen Cheng
2014-02-01
Full Text Available Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method.
Interbronchoscopist variability in endobronchial path selection: a simulation study.
Dolina, Marina Y; Cornish, Duane C; Merritt, Scott A; Rai, Lav; Mahraj, Rickhesvar; Higgins, William E; Bascom, Rebecca
2008-04-01
Endobronchial path selection is important for the bronchoscopic diagnosis of focal lung lesions. Path selection typically involves mentally reconstructing a three-dimensional path by interpreting a stack of two-dimensional (2D) axial plane CT scan sections. The hypotheses of our study about path selection were as follows: (1) bronchoscopists are inaccurate and overly confident when making endobronchial path selections based on 2D CT scan analysis; and (2) path selection accuracy and confidence improve and become better aligned when bronchoscopists employ path-planning methods based on virtual bronchoscopy (VB). Studies of endobronchial path selection comparing three path-planning methods (ie, the standard 2D CT scan analysis and two new VB-based techniques) were performed. The task was to navigate to discrete lesions located between the third-order and fifth-order bronchi of the right upper and middle lobes. Outcome measures were the cumulative accuracy of making four sequential path selection decisions and self-reported confidence (1, least confident; 5, most confident). Both experienced and inexperienced bronchoscopists participated in the studies. In the first study involving a static paper-based tool, the mean (+/- SD) cumulative accuracy was 14 +/- 3% using 2D CT scan analysis (confidence, 3.4 +/- 1.3) and 49 +/- 15% using a VB-based technique (confidence, 4.2 +/- 1.1; p = 0.0001 across all comparisons). For a second study using an interactive computer-based tool, the mean accuracy was 40 +/- 28% using 2D CT scan analysis (confidence, 3.0 +/- 0.3) and 96 +/- 3% using a dynamic VB-based technique (confidence, 4.6 +/- 0.2). Regardless of the experience level of the bronchoscopist, use of the standard 2D CT scan analysis resulted in poor path selection accuracy and misaligned confidence. Use of the VB-based techniques resulted in considerably higher accuracy and better aligned decision confidence. Endobronchial path selection is a source of error in the
Rainfall trends and variability in selected areas of Ethiopian Somali ...
African Journals Online (AJOL)
Moreover, proper spatial distribution of meteorological stations together with early warning system are required to further support local adaptive and coping strategies that the community designed towards rainfall variability in particular and climate change/disaster and risk at large. Keywords: Ethiopian Somali Region, Gode, ...
Bayesian phylogeography finds its roots.
Directory of Open Access Journals (Sweden)
Philippe Lemey
2009-09-01
Full Text Available As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has been frequently interpreted in the light of their genetic histories. Unfortunately, inference of historical dispersal or migration patterns of viruses has mainly been restricted to model-free heuristic approaches that provide little insight into the temporal setting of the spatial dynamics. The introduction of probabilistic models of evolution, however, offers unique opportunities to engage in this statistical endeavor. Here we introduce a Bayesian framework for inference, visualization and hypothesis testing of phylogeographic history. By implementing character mapping in a Bayesian software that samples time-scaled phylogenies, we enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. Standard Markov model inference is extended with a stochastic search variable selection procedure that identifies the parsimonious descriptions of the diffusion process. In addition, we propose priors that can incorporate geographical sampling distributions or characterize alternative hypotheses about the spatial dynamics. To visualize the spatial and temporal information, we summarize inferences using virtual globe software. We describe how Bayesian phylogeography compares with previous parsimony analysis in the investigation of the influenza A H5N1 origin and H5N1 epidemiological linkage among sampling localities. Analysis of rabies in West African dog populations reveals how virus diffusion may enable endemic maintenance through continuous epidemic cycles. From these analyses, we conclude that our phylogeographic framework will make an important asset in molecular epidemiology that can be easily generalized to infer biogeogeography from genetic data for many organisms.
Joint Variable Selection and Classification with Immunohistochemical Data
Directory of Open Access Journals (Sweden)
Debashis Ghosh
2009-01-01
Full Text Available To determine if candidate cancer biomarkers have utility in a clinical setting, validation using immunohistochemical methods is typically done. Most analyses of such data have not incorporated the multivariate nature of the staining profiles. In this article, we consider modelling such data using recently developed ideas from the machine learning community. In particular, we consider the joint goals of feature selection and classification. We develop estimation procedures for the analysis of immunohistochemical profiles using the least absolute selection and shrinkage operator. These lead to novel and flexible models and algorithms for the analysis of compositional data. The techniques are illustrated using data from a cancer biomarker study.
Effect of balance exercise on selected kinematic gait variables in ...
African Journals Online (AJOL)
The purpose of this study was to investigate the effect of balance exercise on some selected kinematic gait parameters in patients with knee joint osteoarthritis. Forty subjects (18 men and 22 women) participated in the study.They were divided into two groups: Group 1 (experimental) that was treated with balance exercises, ...
Sousa, Taís N; Tarazona-Santos, Eduardo M; Wilson, Daniel J; Madureira, Ana P; Falcão, Paula R K; Fontes, Cor J F; Gil, Luiz H S; Ferreira, Marcelo U; Carvalho, Luzia H; Brito, Cristiana F A
2010-11-22
Plasmodium vivax malaria is a major public health challenge in Latin America, Asia and Oceania, with 130-435 million clinical cases per year worldwide. Invasion of host blood cells by P. vivax mainly depends on a type I membrane protein called Duffy binding protein (PvDBP). The erythrocyte-binding motif of PvDBP is a 170 amino-acid stretch located in its cysteine-rich region II (PvDBPII), which is the most variable segment of the protein. To test whether diversifying natural selection has shaped the nucleotide diversity of PvDBPII in Brazilian populations, this region was sequenced in 122 isolates from six different geographic areas. A Bayesian method was applied to test for the action of natural selection under a population genetic model that incorporates recombination. The analysis was integrated with a structural model of PvDBPII, and T- and B-cell epitopes were localized on the 3-D structure. The results suggest that: (i) recombination plays an important role in determining the haplotype structure of PvDBPII, and (ii) PvDBPII appears to contain neutrally evolving codons as well as codons evolving under natural selection. Diversifying selection preferentially acts on sites identified as epitopes, particularly on amino acid residues 417, 419, and 424, which show strong linkage disequilibrium. This study shows that some polymorphisms of PvDBPII are present near the erythrocyte-binding domain and might serve to elude antibodies that inhibit cell invasion. Therefore, these polymorphisms should be taken into account when designing vaccines aimed at eliciting antibodies to inhibit erythrocyte invasion.
Murray, Jessica R.; Minson, Sarah E.; Svarc, Jerry L.
2014-01-01
Fault creep, depending on its rate and spatial extent, is thought to reduce earthquake hazard by releasing tectonic strain aseismically. We use Bayesian inversion and a newly expanded GPS data set to infer the deep slip rates below assigned locking depths on the San Andreas, Maacama, and Bartlett Springs Faults of Northern California and, for the latter two, the spatially variable interseismic creep rate above the locking depth. We estimate deep slip rates of 21.5 ± 0.5, 13.1 ± 0.8, and 7.5 ± 0.7 mm/yr below 16 km, 9 km, and 13 km on the San Andreas, Maacama, and Bartlett Springs Faults, respectively. We infer that on average the Bartlett Springs fault creeps from the Earth's surface to 13 km depth, and below 5 km the creep rate approaches the deep slip rate. This implies that microseismicity may extend below the locking depth; however, we cannot rule out the presence of locked patches in the seismogenic zone that could generate moderate earthquakes. Our estimated Maacama creep rate, while comparable to the inferred deep slip rate at the Earth's surface, decreases with depth, implying a slip deficit exists. The Maacama deep slip rate estimate, 13.1 mm/yr, exceeds long-term geologic slip rate estimates, perhaps due to distributed off-fault strain or the presence of multiple active fault strands. While our creep rate estimates are relatively insensitive to choice of model locking depth, insufficient independent information regarding locking depths is a source of epistemic uncertainty that impacts deep slip rate estimates.
da Silva, Arlindo M.; Norris, Peter M.
2013-01-01
Part I presented a Monte Carlo Bayesian method for constraining a complex statistical model of GCM sub-gridcolumn moisture variability using high-resolution MODIS cloud data, thereby permitting large-scale model parameter estimation and cloud data assimilation. This part performs some basic testing of this new approach, verifying that it does indeed significantly reduce mean and standard deviation biases with respect to the assimilated MODIS cloud optical depth, brightness temperature and cloud top pressure, and that it also improves the simulated rotational-Ramman scattering cloud optical centroid pressure (OCP) against independent (non-assimilated) retrievals from the OMI instrument. Of particular interest, the Monte Carlo method does show skill in the especially difficult case where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach allows finite jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. This paper also examines a number of algorithmic and physical sensitivities of the new method and provides guidance for its cost-effective implementation. One obvious difficulty for the method, and other cloud data assimilation methods as well, is the lack of information content in the cloud observables on cloud vertical structure, beyond cloud top pressure and optical thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard (1998) provides some help in this respect, by better honoring inversion structures in the background state.
Norris, Peter M.; da Silva, Arlindo M.
2016-01-01
Part 1 of this series presented a Monte Carlo Bayesian method for constraining a complex statistical model of global circulation model (GCM) sub-gridcolumn moisture variability using high-resolution Moderate Resolution Imaging Spectroradiometer (MODIS) cloud data, thereby permitting parameter estimation and cloud data assimilation for large-scale models. This article performs some basic testing of this new approach, verifying that it does indeed reduce mean and standard deviation biases significantly with respect to the assimilated MODIS cloud optical depth, brightness temperature and cloud-top pressure and that it also improves the simulated rotational-Raman scattering cloud optical centroid pressure (OCP) against independent (non-assimilated) retrievals from the Ozone Monitoring Instrument (OMI). Of particular interest, the Monte Carlo method does show skill in the especially difficult case where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach allows non-gradient-based jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast, where the background state has a clear swath. This article also examines a number of algorithmic and physical sensitivities of the new method and provides guidance for its cost-effective implementation. One obvious difficulty for the method, and other cloud data assimilation methods as well, is the lack of information content in passive-radiometer-retrieved cloud observables on cloud vertical structure, beyond cloud-top pressure and optical thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification from Riishojgaard provides some help in this respect, by
Compiling Relational Bayesian Networks for Exact Inference
DEFF Research Database (Denmark)
Jaeger, Manfred; Chavira, Mark; Darwiche, Adnan
2004-01-01
We describe a system for exact inference with relational Bayesian networks as defined in the publicly available \\primula\\ tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating...... and differentiating these circuits in time linear in their size. We report on experimental results showing the successful compilation, and efficient inference, on relational Bayesian networks whose {\\primula}--generated propositional instances have thousands of variables, and whose jointrees have clusters...
Hambrook, Dillon A; Ilievski, Marko; Mosadeghzad, Mohamad; Tata, Matthew
2017-01-01
The process of resolving mixtures of several sounds into their separate individual streams is known as auditory scene analysis and it remains a challenging task for computational systems. It is well-known that animals use binaural differences in arrival time and intensity at the two ears to find the arrival angle of sounds in the azimuthal plane, and this localization function has sometimes been considered sufficient to enable the un-mixing of complex scenes. However, the ability of such systems to resolve distinct sound sources in both space and frequency remains limited. The neural computations for detecting interaural time difference (ITD) have been well studied and have served as the inspiration for computational auditory scene analysis systems, however a crucial limitation of ITD models is that they produce ambiguous or "phantom" images in the scene. This has been thought to limit their usefulness at frequencies above about 1khz in humans. We present a simple Bayesian model and an implementation on a robot that uses ITD information recursively. The model makes use of head rotations to show that ITD information is sufficient to unambiguously resolve sound sources in both space and frequency. Contrary to commonly held assumptions about sound localization, we show that the ITD cue used with high-frequency sound can provide accurate and unambiguous localization and resolution of competing sounds. Our findings suggest that an "active hearing" approach could be useful in robotic systems that operate in natural, noisy settings. We also suggest that neurophysiological models of sound localization in animals could benefit from revision to include the influence of top-down memory and sensorimotor integration across head rotations.
Directory of Open Access Journals (Sweden)
Dillon A Hambrook
Full Text Available The process of resolving mixtures of several sounds into their separate individual streams is known as auditory scene analysis and it remains a challenging task for computational systems. It is well-known that animals use binaural differences in arrival time and intensity at the two ears to find the arrival angle of sounds in the azimuthal plane, and this localization function has sometimes been considered sufficient to enable the un-mixing of complex scenes. However, the ability of such systems to resolve distinct sound sources in both space and frequency remains limited. The neural computations for detecting interaural time difference (ITD have been well studied and have served as the inspiration for computational auditory scene analysis systems, however a crucial limitation of ITD models is that they produce ambiguous or "phantom" images in the scene. This has been thought to limit their usefulness at frequencies above about 1khz in humans. We present a simple Bayesian model and an implementation on a robot that uses ITD information recursively. The model makes use of head rotations to show that ITD information is sufficient to unambiguously resolve sound sources in both space and frequency. Contrary to commonly held assumptions about sound localization, we show that the ITD cue used with high-frequency sound can provide accurate and unambiguous localization and resolution of competing sounds. Our findings suggest that an "active hearing" approach could be useful in robotic systems that operate in natural, noisy settings. We also suggest that neurophysiological models of sound localization in animals could benefit from revision to include the influence of top-down memory and sensorimotor integration across head rotations.
Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H
2017-07-01
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in
Bayesian LASSO, scale space and decision making in association genetics.
Pasanen, Leena; Holmström, Lasse; Sillanpää, Mikko J
2015-01-01
LASSO is a penalized regression method that facilitates model fitting in situations where there are as many, or even more explanatory variables than observations, and only a few variables are relevant in explaining the data. We focus on the Bayesian version of LASSO and consider four problems that need special attention: (i) controlling false positives, (ii) multiple comparisons, (iii) collinearity among explanatory variables, and (iv) the choice of the tuning parameter that controls the amount of shrinkage and the sparsity of the estimates. The particular application considered is association genetics, where LASSO regression can be used to find links between chromosome locations and phenotypic traits in a biological organism. However, the proposed techniques are relevant also in other contexts where LASSO is used for variable selection. We separate the true associations from false positives using the posterior distribution of the effects (regression coefficients) provided by Bayesian LASSO. We propose to solve the multiple comparisons problem by using simultaneous inference based on the joint posterior distribution of the effects. Bayesian LASSO also tends to distribute an effect among collinear variables, making detection of an association difficult. We propose to solve this problem by considering not only individual effects but also their functionals (i.e. sums and differences). Finally, whereas in Bayesian LASSO the tuning parameter is often regarded as a random variable, we adopt a scale space view and consider a whole range of fixed tuning parameters, instead. The effect estimates and the associated inference are considered for all tuning parameters in the selected range and the results are visualized with color maps that provide useful insights into data and the association problem considered. The methods are illustrated using two sets of artificial data and one real data set, all representing typical settings in association genetics.
3D Bayesian contextual classifiers
DEFF Research Database (Denmark)
Larsen, Rasmus
2000-01-01
We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours.......We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours....
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...
Bessiere, Pierre; Ahuactzin, Juan Manuel; Mekhnacha, Kamel
2013-01-01
Probability as an Alternative to Boolean LogicWhile logic is the mathematical foundation of rational reasoning and the fundamental principle of computing, it is restricted to problems where information is both complete and certain. However, many real-world problems, from financial investments to email filtering, are incomplete or uncertain in nature. Probability theory and Bayesian computing together provide an alternative framework to deal with incomplete and uncertain data. Decision-Making Tools and Methods for Incomplete and Uncertain DataEmphasizing probability as an alternative to Boolean
Probability and Bayesian statistics
1987-01-01
This book contains selected and refereed contributions to the "Inter national Symposium on Probability and Bayesian Statistics" which was orga nized to celebrate the 80th birthday of Professor Bruno de Finetti at his birthplace Innsbruck in Austria. Since Professor de Finetti died in 1985 the symposium was dedicated to the memory of Bruno de Finetti and took place at Igls near Innsbruck from 23 to 26 September 1986. Some of the pa pers are published especially by the relationship to Bruno de Finetti's scientific work. The evolution of stochastics shows growing importance of probability as coherent assessment of numerical values as degrees of believe in certain events. This is the basis for Bayesian inference in the sense of modern statistics. The contributions in this volume cover a broad spectrum ranging from foundations of probability across psychological aspects of formulating sub jective probability statements, abstract measure theoretical considerations, contributions to theoretical statistics an...
Multiobjective reservoir operating rules based on cascade reservoir input variable selection method
Yang, Guang; Guo, Shenglian; Liu, Pan; Li, Liping; Xu, Chongyu
2017-04-01
The input variable selection in multiobjective cascade reservoir operation is an important and difficult task. To address this problem, this study proposes the cascade reservoir input variable selection (CIS) method that searches for the most valuable input variables for decision making in multiple-objectivity cascade reservoir operations. From a case study of Hanjiang cascade reservoirs in China, we derive reservoir operating rules based on the combination of CIS and Gaussian radial basis functions (RBFs) methods and optimize the rules through Pareto-archived dynamically dimensioned search (PA-DDS) with two objectives: to maximize both power generation and water supply. We select the most effective input variables and evaluate their impacts on cascade reservoir operations. From the simulated trajectories of reservoir water level, power generation, and water supply, we analyze the multiobjective operating rules with several input variables. The results demonstrate that the CIS method performs well in the selection of input variables for the cascade reservoir operation, and the RBFs method can fully express the nonlinear operating rules for cascade reservoirs. We conclude that the CIS method is an effective and stable approach to identifying the most valuable information from a large number of candidate input variables. While the reservoir storage state is the most valuable information for the Hanjiang cascade reservoir multiobjective operation, the reservoir inflow is the most effective input variable for the single-objective operation of Danjiangkou.
Molecular variability from two selection of BRT10 population in an ...
African Journals Online (AJOL)
Genetic variability of two groups of palms composed with four progenies selected in BRT10 improved populations resulting from successive self-fertilizations of two parents LM2T and LM10T was studied using four polymorphic microsatellites DNA markers of Elaeis guineensis Jacq. The molecular variability of those ...
De Smet, Tom; Struys, Michel M. R. F.; Neckebroek, Martine M.; Van den Hauwe, Kristof; Bonte, Sjoert; Mortier, Eric P.
2008-01-01
BACKGROUND: Closed-loop control of the hypnotic component of anesthesia has been proposed in an attempt to optimize drug delivery. Here, we introduce a newly developed Bayesian-based, patient-individualized, model-based, adaptive control method for bispectral index (BIS) guided propofol infusion
Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen
2017-12-27
Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP
Schmidtmann, I; Elsäßer, A; Weinmann, A; Binder, H
2014-12-30
For determining a manageable set of covariates potentially influential with respect to a time-to-event endpoint, Cox proportional hazards models can be combined with variable selection techniques, such as stepwise forward selection or backward elimination based on p-values, or regularized regression techniques such as component-wise boosting. Cox regression models have also been adapted for dealing with more complex event patterns, for example, for competing risks settings with separate, cause-specific hazard models for each event type, or for determining the prognostic effect pattern of a variable over different landmark times, with one conditional survival model for each landmark. Motivated by a clinical cancer registry application, where complex event patterns have to be dealt with and variable selection is needed at the same time, we propose a general approach for linking variable selection between several Cox models. Specifically, we combine score statistics for each covariate across models by Fisher's method as a basis for variable selection. This principle is implemented for a stepwise forward selection approach as well as for a regularized regression technique. In an application to data from hepatocellular carcinoma patients, the coupled stepwise approach is seen to facilitate joint interpretation of the different cause-specific Cox models. In conditional survival models at landmark times, which address updates of prediction as time progresses and both treatment and other potential explanatory variables may change, the coupled regularized regression approach identifies potentially important, stably selected covariates together with their effect time pattern, despite having only a small number of events. These results highlight the promise of the proposed approach for coupling variable selection between Cox models, which is particularly relevant for modeling for clinical cancer registries with their complex event patterns. Copyright © 2014 John Wiley & Sons
Wachter, Jenny; Hill, Stuart
2016-01-01
Pathogenic species of Neisseria utilize variable outer membrane proteins to facilitate infection and proliferation within the human host. However, the mechanisms behind the evolution of these variable alleles remain largely unknown due to analysis of previously limited datasets. In this study, we have expanded upon the previous analyses to substantially increase the number of analyzed sequences by including multiple diverse strains, from various geographic locations, to determine whether positive selective pressure is exerted on the evolution of these variable genes. Although Neisseria are naturally competent, this analysis indicates that only intrastrain horizontal gene transfer among the pathogenic Neisseria principally account for these genes exhibiting linkage equilibrium which drives the polymorphisms evidenced within these alleles. As the majority of polymorphisms occur across species, the divergence of these variable genes is dependent upon the species and is independent of geographical location, disease severity, or serogroup. Tests of neutrality were able to detect strong selection pressures acting upon both the opa and pil gene families, and were able to locate the majority of these sites within the exposed variable regions of the encoded proteins. Evidence of positive selection acting upon the hypervariable domains of Opa contradicts previous beliefs and provides evidence for selection of receptor binding. As the pathogenic Neisseria reside exclusively within the human host, the strong selection pressures acting upon both the opa and pil gene families provide support for host immune system pressure driving sequence polymorphisms within these variable genes.
Directory of Open Access Journals (Sweden)
Woosik Jang
2015-01-01
Full Text Available Since the 1970s, revenues generated by Korean contractors in international construction have increased rapidly, exceeding USD 70 billion per year in recent years. However, Korean contractors face significant risks from market uncertainty and sensitivity to economic volatility and technical difficulties. As the volatility of these risks threatens project profitability, approximately 15% of bad projects were found to account for 74% of losses from the same international construction sector. Anticipating bad projects via preemptive risk management can better prevent losses so that contractors can enhance the efficiency of bidding decisions during the early stages of a project cycle. In line with these objectives, this paper examines the effect of such factors on the degree of project profitability. The Naïve Bayesian classifier is applied to identify a good project screening tool, which increases practical applicability using binomial variables with limited information that is obtainable in the early stages. The proposed model produced superior classification results that adequately reflect contractor views of risk. It is anticipated that when users apply the proposed model based on their own knowledge and expertise, overall firm profit rates will increase as a result of early abandonment of bad projects as well as the prioritization of good projects before final bidding decisions are made.
A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method
Directory of Open Access Journals (Sweden)
Jun-He Yang
2017-01-01
Full Text Available Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir’s water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir’s water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.
Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection
DEFF Research Database (Denmark)
Karaman, Ibrahim; Qannari, El Mostafa; Martens, Harald
2013-01-01
is often used by chemometricians. In order to evaluate the predictive ability of both methods, cross model validation was implemented. The performance of both methods was assessed using FTIR spectroscopic data, on the one hand, and a set of simulated data. The stability of the variable selection procedures......The objective of this study was to compare two different techniques of variable selection, Sparse PLSR and Jack-knife PLSR, with respect to their predictive ability and their ability to identify relevant variables. Sparse PLSR is a method that is frequently used in genomics, whereas Jack-knife PLSR...... was highlighted by the frequency of the selection of each variable in the cross model validation segments. Computationally, Jack-knife PLSR was much faster than Sparse PLSR. But while it was found that both methods have more or less the same predictive ability, Sparse PLSR turned out to be generally very stable...
3rd Bayesian Young Statisticians Meeting
Lanzarone, Ettore; Villalobos, Isadora; Mattei, Alessandra
2017-01-01
This book is a selection of peer-reviewed contributions presented at the third Bayesian Young Statisticians Meeting, BAYSM 2016, Florence, Italy, June 19-21. The meeting provided a unique opportunity for young researchers, M.S. students, Ph.D. students, and postdocs dealing with Bayesian statistics to connect with the Bayesian community at large, to exchange ideas, and to network with others working in the same field. The contributions develop and apply Bayesian methods in a variety of fields, ranging from the traditional (e.g., biostatistics and reliability) to the most innovative ones (e.g., big data and networks).
A survey of variable selection methods in two Chinese epidemiology journals
Directory of Open Access Journals (Sweden)
Lynn Henry S
2010-09-01
Full Text Available Abstract Background Although much has been written on developing better procedures for variable selection, there is little research on how it is practiced in actual studies. This review surveys the variable selection methods reported in two high-ranking Chinese epidemiology journals. Methods Articles published in 2004, 2006, and 2008 in the Chinese Journal of Epidemiology and the Chinese Journal of Preventive Medicine were reviewed. Five categories of methods were identified whereby variables were selected using: A - bivariate analyses; B - multivariable analysis; e.g. stepwise or individual significance testing of model coefficients; C - first bivariate analyses, followed by multivariable analysis; D - bivariate analyses or multivariable analysis; and E - other criteria like prior knowledge or personal judgment. Results Among the 287 articles that reported using variable selection methods, 6%, 26%, 30%, 21%, and 17% were in categories A through E, respectively. One hundred sixty-three studies selected variables using bivariate analyses, 80% (130/163 via multiple significance testing at the 5% alpha-level. Of the 219 multivariable analyses, 97 (44% used stepwise procedures, 89 (41% tested individual regression coefficients, but 33 (15% did not mention how variables were selected. Sixty percent (58/97 of the stepwise routines also did not specify the algorithm and/or significance levels. Conclusions The variable selection methods reported in the two journals were limited in variety, and details were often missing. Many studies still relied on problematic techniques like stepwise procedures and/or multiple testing of bivariate associations at the 0.05 alpha-level. These deficiencies should be rectified to safeguard the scientific validity of articles published in Chinese epidemiology journals.
Kuiper, Rebecca M; Nederhoff, Tim; Klugkist, Irene
2015-05-01
In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration-based set of hypotheses containing equality constraints on the means, or a theory-based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory-based hypotheses) has advantages over exploration (i.e., examining all possible equality-constrained hypotheses). Furthermore, examining reasonable order-restricted hypotheses has more power to detect the true effect/non-null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory-based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number). © 2014 The British Psychological Society.
Gonçalves, E; St Aubyn, A; Martins, A
2010-06-01
Classical methodologies for grapevine selection used in the vine-growing world are generally based on comparisons among a small number of clones. This does not take advantage of the entire genetic variability within ancient varieties, and therefore limits selection challenges. Using the general principles of plant breeding and of quantitative genetics, we propose new breeding strategies, focussed on conservation and quantification of genetic variability by performing a cycle of mass genotypic selection prior to clonal selection. To exploit a sufficiently large amount of genetic variability, initial selection trials must be generally very large. The use of experimental designs adequate for those field trials has been intensively recommended for numerous species. However, their use in initial trials of grapevines has not been studied. With the aim of identifying the most suitable experimental designs for quantification of genetic variability and selection of ancient varieties, a study was carried out to assess through simulation the comparative efficiency of various experimental designs (randomized complete block design, alpha design and row-column (RC) design). The results indicated a greater efficiency for alpha and RC designs, enabling more precise estimates of genotypic variance, greater precision in the prediction of genetic gain and consequently greater efficiency in genotypic mass selection.
Predictive and Descriptive CoMFA Models: The Effect of Variable Selection.
Sepehri, Bakhtyar; Omidikia, Nematollah; Kompany-Zareh, Mohsen; Ghavami, Raouf
2018-01-01
Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Wimmer, Valentin; Lehermeier, Christina; Albrecht, Theresa; Auinger, Hans-Jürgen; Wang, Yu; Schön, Chris-Carolin
2013-10-01
In genome-based prediction there is considerable uncertainty about the statistical model and method required to maximize prediction accuracy. For traits influenced by a small number of quantitative trait loci (QTL), predictions are expected to benefit from methods performing variable selection [e.g., BayesB or the least absolute shrinkage and selection operator (LASSO)] compared to methods distributing effects across the genome [ridge regression best linear unbiased prediction (RR-BLUP)]. We investigate the assumptions underlying successful variable selection by combining computer simulations with large-scale experimental data sets from rice (Oryza sativa L.), wheat (Triticum aestivum L.), and Arabidopsis thaliana (L.). We demonstrate that variable selection can be successful when the number of phenotyped individuals is much larger than the number of causal mutations contributing to the trait. We show that the sample size required for efficient variable selection increases dramatically with decreasing trait heritabilities and increasing extent of linkage disequilibrium (LD). We contrast and discuss contradictory results from simulation and experimental studies with respect to superiority of variable selection methods over RR-BLUP. Our results demonstrate that due to long-range LD, medium heritabilities, and small sample sizes, superiority of variable selection methods cannot be expected in plant breeding populations even for traits like FRIGIDA gene expression in Arabidopsis and flowering time in rice, assumed to be influenced by a few major QTL. We extend our conclusions to the analysis of whole-genome sequence data and infer upper bounds for the number of causal mutations which can be identified by LASSO. Our results have major impact on the choice of statistical method needed to make credible inferences about genetic architecture and prediction accuracy of complex traits.
Inference in hybrid Bayesian networks
DEFF Research Database (Denmark)
Lanseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael
2009-01-01
Since the 1980s, Bayesian Networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability-techniques (like fault trees a...... decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability....... and reliability block diagrams). However, limitations in the BNs' calculation engine have prevented BNs from becoming equally popular for domains containing mixtures of both discrete and continuous variables (so-called hybrid domains). In this paper we focus on these difficulties, and summarize some of the last...
Bayesian nonparametric data analysis
Müller, Peter; Jara, Alejandro; Hanson, Tim
2015-01-01
This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.
Crystal structure prediction accelerated by Bayesian optimization
Yamashita, Tomoki; Sato, Nobuya; Kino, Hiori; Miyake, Takashi; Tsuda, Koji; Oguchi, Tamio
2018-01-01
We propose a crystal structure prediction method based on Bayesian optimization. Our method is classified as a selection-type algorithm which is different from evolution-type algorithms such as an evolutionary algorithm and particle swarm optimization. Crystal structure prediction with Bayesian optimization can efficiently select the most stable structure from a large number of candidate structures with a lower number of searching trials using a machine learning technique. Crystal structure prediction using Bayesian optimization combined with random search is applied to known systems such as NaCl and Y2Co17 to discuss the efficiency of Bayesian optimization. These results demonstrate that Bayesian optimization can significantly reduce the number of searching trials required to find the global minimum structure by 30-40% in comparison with pure random search, which leads to much less computational cost.
Seleção de variáveis em QSAR Variable selection in QSAR
Directory of Open Access Journals (Sweden)
Márcia Miguel Castro Ferreira
2002-05-01
Full Text Available The process of building mathematical models in quantitative structure-activity relationship (QSAR studies is generally limited by the size of the dataset used to select variables from. For huge datasets, the task of selecting a given number of variables that produces the best linear model can be enormous, if not unfeasible. In this case, some methods can be used to separate good parameter combinations from the bad ones. In this paper three methodologies are analyzed: systematic search, genetic algorithm and chemometric methods. These methods have been exposed and discussed through practical examples.
Kiezun, Adam; Lee, I-Ting Angelina; Shomron, Noam
2009-01-01
Logistic regression is often used to help make medical decisions with binary outcomes. Here we evaluate the use of several methods for selection of variables in logistic regression. We use a large dataset to predict the diagnosis of myocardial infarction in patients reporting to an emergency room with chest pain. Our results indicate that some of the examined methods are well suited for variable selection in logistic regression and that our model, and our myocardial infarction risk calculator, can be an additional tool to aid physicians in myocardial infarction diagnosis.
International Nuclear Information System (INIS)
Proriol, J.
1994-01-01
Five different methods are compared for selecting the most important variables with a view to classifying high energy physics events with neural networks. The different methods are: the F-test, Principal Component Analysis (PCA), a decision tree method: CART, weight evaluation, and Optimal Cell Damage (OCD). The neural networks use the variables selected with the different methods. We compare the percentages of events properly classified by each neural network. The learning set and the test set are the same for all the neural networks. (author)
Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection
Chen, Lisha
2012-12-01
The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.
Current Debates on Variability in Child Welfare Decision-Making: A Selected Literature Review
Directory of Open Access Journals (Sweden)
Emily Keddell
2014-11-01
Full Text Available This article considers selected drivers of decision variability in child welfare decision-making and explores current debates in relation to these drivers. Covering the related influences of national orientation, risk and responsibility, inequality and poverty, evidence-based practice, constructions of abuse and its causes, domestic violence and cognitive processes, it discusses the literature in regards to how each of these influences decision variability. It situates these debates in relation to the ethical issue of variability and the equity issues that variability raises. I propose that despite the ecological complexity that drives decision variability, that improving internal (within-country decision consistency is still a valid goal. It may be that the use of annotated case examples, kind learning systems, and continued commitments to the social justice issues of inequality and individualisation can contribute to this goal.
Bayesian artificial intelligence
Korb, Kevin B
2003-01-01
As the power of Bayesian techniques has become more fully realized, the field of artificial intelligence has embraced Bayesian methodology and integrated it to the point where an introduction to Bayesian techniques is now a core course in many computer science programs. Unlike other books on the subject, Bayesian Artificial Intelligence keeps mathematical detail to a minimum and covers a broad range of topics. The authors integrate all of Bayesian net technology and learning Bayesian net technology and apply them both to knowledge engineering. They emphasize understanding and intuition but also provide the algorithms and technical background needed for applications. Software, exercises, and solutions are available on the authors' website.
Bayesian artificial intelligence
Korb, Kevin B
2010-01-01
Updated and expanded, Bayesian Artificial Intelligence, Second Edition provides a practical and accessible introduction to the main concepts, foundation, and applications of Bayesian networks. It focuses on both the causal discovery of networks and Bayesian inference procedures. Adopting a causal interpretation of Bayesian networks, the authors discuss the use of Bayesian networks for causal modeling. They also draw on their own applied research to illustrate various applications of the technology.New to the Second EditionNew chapter on Bayesian network classifiersNew section on object-oriente
Meta-Statistics for Variable Selection: The R Package BioMark
Directory of Open Access Journals (Sweden)
Ron Wehrens
2012-11-01
Full Text Available Biomarker identification is an ever more important topic in the life sciences. With the advent of measurement methodologies based on microarrays and mass spectrometry, thousands of variables are routinely being measured on complex biological samples. Often, the question is what makes two groups of samples different. Classical hypothesis testing suffers from the multiple testing problem; however, correcting for this often leads to a lack of power. In addition, choosing α cutoff levels remains somewhat arbitrary. Also in a regression context, a model depending on few but relevant variables will be more accurate and precise, and easier to interpret biologically.We propose an R package, BioMark, implementing two meta-statistics for variable selection. The first, higher criticism, presents a data-dependent selection threshold for significance, instead of a cookbook value of α = 0.05. It is applicable in all cases where two groups are compared. The second, stability selection, is more general, and can also be applied in a regression context. This approach uses repeated subsampling of the data in order to assess the variability of the model coefficients and selects those that remain consistently important. It is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data. BioMark also contains functionality for simulating data with specific characteristics for algorithm development and testing.
A Robust Supervised Variable Selection for Noisy High-Dimensional Data
Czech Academy of Sciences Publication Activity Database
Kalina, Jan; Schlenker, Anna
2015-01-01
Roč. 2015, Article 320385 (2015), s. 1-10 ISSN 2314-6133 R&D Projects: GA ČR GA13-17187S Institutional support: RVO:67985807 Keywords : dimensionality reduction * variable selection * robustness Subject RIV: BA - General Mathematics Impact factor: 2.134, year: 2015
The Effect of Listening to Specific Musical Genre Selections on Measures of Heart Rate Variability
Orman, Evelyn K.
2011-01-01
University students (N = 30) individually listened to the Billboard 100 top-ranked musical selection for their most and least liked musical genre. Two minutes of silence preceded each musical listening condition, and heart rate variability (HRV) was recorded throughout. All HRV measures decreased during music listening as compared with silence.…
A QSAR Study of Environmental Estrogens Based on a Novel Variable Selection Method
Directory of Open Access Journals (Sweden)
Aiqian Zhang
2012-05-01
Full Text Available A large number of descriptors were employed to characterize the molecular structure of 53 natural, synthetic, and environmental chemicals which are suspected of disrupting endocrine functions by mimicking or antagonizing natural hormones and may thus pose a serious threat to the health of humans and wildlife. In this work, a robust quantitative structure-activity relationship (QSAR model with a novel variable selection method has been proposed for the effective estrogens. The variable selection method is based on variable interaction (VSMVI with leave-multiple-out cross validation (LMOCV to select the best subset. During variable selection, model construction and assessment, the Organization for Economic Co-operation and Development (OECD principles for regulation of QSAR acceptability were fully considered, such as using an unambiguous multiple-linear regression (MLR algorithm to build the model, using several validation methods to assessment the performance of the model, giving the define of applicability domain and analyzing the outliers with the results of molecular docking. The performance of the QSAR model indicates that the VSMVI is an effective, feasible and practical tool for rapid screening of the best subset from large molecular descriptors.
Selective ligand recognition by a diversity-generating retroelement variable protein.
Directory of Open Access Journals (Sweden)
Jason L Miller
2008-06-01
Full Text Available Diversity-generating retroelements (DGRs recognize novel ligands through massive protein sequence variation, a property shared uniquely with the adaptive immune response. Little is known about how recognition is achieved by DGR variable proteins. Here, we present the structure of the Bordetella bacteriophage DGR variable protein major tropism determinant (Mtd bound to the receptor pertactin, revealing remarkable adaptability in the static binding sites of Mtd. Despite large dissimilarities in ligand binding mode, principles underlying selective recognition were strikingly conserved between Mtd and immunoreceptors. Central to this was the differential amplification of binding strengths by avidity (i.e., multivalency, which not only relaxed the demand for optimal complementarity between Mtd and pertactin but also enhanced distinctions among binding events to provide selectivity. A quantitatively similar balance between complementarity and avidity was observed for Bordetella bacteriophage DGR as occurs in the immune system, suggesting that variable repertoires operate under a narrow set of conditions to recognize novel ligands.
Yan, Zhengbing; Kuang, Te-Hui; Yao, Yuan
2017-09-01
In recent years, multivariate statistical monitoring of batch processes has become a popular research topic, wherein multivariate fault isolation is an important step aiming at the identification of the faulty variables contributing most to the detected process abnormality. Although contribution plots have been commonly used in statistical fault isolation, such methods suffer from the smearing effect between correlated variables. In particular, in batch process monitoring, the high autocorrelations and cross-correlations that exist in variable trajectories make the smearing effect unavoidable. To address such a problem, a variable selection-based fault isolation method is proposed in this research, which transforms the fault isolation problem into a variable selection problem in partial least squares discriminant analysis and solves it by calculating a sparse partial least squares model. As different from the traditional methods, the proposed method emphasizes the relative importance of each process variable. Such information may help process engineers in conducting root-cause diagnosis. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Penalized regression procedures for variable selection in the potential outcomes framework.
Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L
2015-05-10
A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple 'impute, then select' class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data, and imputation are drawn. A difference least absolute shrinkage and selection operator algorithm is defined, along with its multiple imputation analogs. The procedures are illustrated using a well-known right-heart catheterization dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Muller, Benjamin J.; Cade, Brian S.; Schwarzkoph, Lin
2018-01-01
Many different factors influence animal activity. Often, the value of an environmental variable may influence significantly the upper or lower tails of the activity distribution. For describing relationships with heterogeneous boundaries, quantile regressions predict a quantile of the conditional distribution of the dependent variable. A quantile count model extends linear quantile regression methods to discrete response variables, and is useful if activity is quantified by trapping, where there may be many tied (equal) values in the activity distribution, over a small range of discrete values. Additionally, different environmental variables in combination may have synergistic or antagonistic effects on activity, so examining their effects together, in a modeling framework, is a useful approach. Thus, model selection on quantile counts can be used to determine the relative importance of different variables in determining activity, across the entire distribution of capture results. We conducted model selection on quantile count models to describe the factors affecting activity (numbers of captures) of cane toads (Rhinella marina) in response to several environmental variables (humidity, temperature, rainfall, wind speed, and moon luminosity) over eleven months of trapping. Environmental effects on activity are understudied in this pest animal. In the dry season, model selection on quantile count models suggested that rainfall positively affected activity, especially near the lower tails of the activity distribution. In the wet season, wind speed limited activity near the maximum of the distribution, while minimum activity increased with minimum temperature. This statistical methodology allowed us to explore, in depth, how environmental factors influenced activity across the entire distribution, and is applicable to any survey or trapping regime, in which environmental variables affect activity.
Energy Technology Data Exchange (ETDEWEB)
De Lucia, Frank C., E-mail: frank.delucia@us.army.mil; Gottfried, Jennifer L.
2011-02-15
Using a series of thirteen organic materials that includes novel high-nitrogen energetic materials, conventional organic military explosives, and benign organic materials, we have demonstrated the importance of variable selection for maximizing residue discrimination with partial least squares discriminant analysis (PLS-DA). We built several PLS-DA models using different variable sets based on laser induced breakdown spectroscopy (LIBS) spectra of the organic residues on an aluminum substrate under an argon atmosphere. The model classification results for each sample are presented and the influence of the variables on these results is discussed. We found that using the whole spectra as the data input for the PLS-DA model gave the best results. However, variables due to the surrounding atmosphere and the substrate contribute to discrimination when the whole spectra are used, indicating this may not be the most robust model. Further iterative testing with additional validation data sets is necessary to determine the most robust model.
DEFF Research Database (Denmark)
Omidikia, Nematollah; Kompany-Zareh, Mohsen
2013-01-01
Employment of Uninformative Variable Elimination (UVE) as a robust variable selection method is reported in this study. Each regression coefficient represents the contribution of the corresponding variable in the established model, but in the presence of uninformative variables as well as colline......Employment of Uninformative Variable Elimination (UVE) as a robust variable selection method is reported in this study. Each regression coefficient represents the contribution of the corresponding variable in the established model, but in the presence of uninformative variables as well...... as collinearity reliability of the regression coefficient's magnitude is suspicious. Successive Projection Algorithm (SPA) and Gram-Schmidt Orthogonalization (GSO) were implemented as pre-selection technique for removing collinearity and redundancy among variables in the model. Uninformative variable elimination...
Directory of Open Access Journals (Sweden)
Anne E Goodenough
Full Text Available Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC, are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection, which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset--habitat and offspring quality in the great tit (Parus major--the optimal REVS model explained more variance (higher R(2, was more parsimonious (lower AIC, and had greater significance (lower P values, than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R(2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively. Results are ecologically intuitive as even when there are several competing models, they share a set of "core" variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method
Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.
2017-04-01
Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP
Directory of Open Access Journals (Sweden)
Petr Raušer
2011-01-01
Full Text Available The aim of the study was to detect and compare the haemostatic variables and bleeding after 7‑days administration of carprofen or meloxicam in clinically healthy miniature pigs. Twenty-one clinically healthy Göttingen miniature pigs were divided into 3 groups. Selected haemostatic variables such as platelet count, prothrombin time, activated partial thromboplastin time, thrombin time, fibrinogen, serum biochemical variables such as total protein, bilirubin, urea, creatinine, alkaline phosphatase, alanine aminotransferase and gamma-glutamyltransferase and haemoglobin, haematocrit, red blood cells, white blood cells and buccal mucosal bleeding time were assessed before and 7 days after daily intramuscular administration of saline (1.5 ml per animal, control group, carprofen (2 mg·kg-1 or meloxicam (0.1 mg·kg-1. In pigs receiving carprofen or meloxicam, the thrombin time was significantly increased (p p p p < 0.05 compared to the control group. Significant differences were not detected in other haemostatic, biochemical variables or bleeding time compared to other groups or to the pretreatment values. Intramuscular administration of carprofen or meloxicam in healthy miniature pigs for 7 days causes sporadic, but not clinically important changes of selected haemostatic variables. Therefore, we can recommend them for perioperative use, e.g. for their analgesic effects, in orthopaedic or other surgical procedures without increased bleeding.
Variability-selected active galactic nuclei from supernova search in the Chandra deep field south
Trevese, D.; Boutsia, K.; Vagnetti, F.; Cappellaro, E.; Puccetti, S.
2008-09-01
Context: Variability is a property shared by virtually all active galactic nuclei (AGNs), and was adopted as a criterion for their selection using data from multi epoch surveys. Low Luminosity AGNs (LLAGNs) are contaminated by the light of their host galaxies, and cannot therefore be detected by the usual colour techniques. For this reason, their evolution in cosmic time is poorly known. Consistency with the evolution derived from X-ray detected samples has not been clearly established so far, also because the low luminosity population consists of a mixture of different object types. LLAGNs can be detected by the nuclear optical variability of extended objects. Aims: Several variability surveys have been, or are being, conducted for the detection of supernovae (SNe). We propose to re-analyse these SNe data using a variability criterion optimised for AGN detection, to select a new AGN sample and study its properties. Methods: We analysed images acquired with the wide field imager at the 2.2 m ESO/MPI telescope, in the framework of the STRESS supernova survey. We selected the AXAF field centred on the Chandra Deep Field South where, besides the deep X-ray survey, various optical data exist, originating in the EIS and COMBO-17 photometric surveys and the spectroscopic database of GOODS. Results: We obtained a catalogue of 132 variable AGN candidates. Several of the candidates are X-ray sources. We compare our results with an HST variability study of X-ray and IR detected AGNs, finding consistent results. The relatively high fraction of confirmed AGNs in our sample (60%) allowed us to extract a list of reliable AGN candidates for spectroscopic follow-up observations. Table [see full text] is only available in electronic form at http://www.aanda.org
DEFF Research Database (Denmark)
Ödman, Peter; Johansen, C.L.; Olsson, L.
2010-01-01
of biomass and substrate (casamino acids) concentrations, respectively. The effect of combination of fluorescence and gas analyzer data as well as of different variable selection methods was investigated. Improved prediction models were obtained by combination of data from the two sensors and by variable......Fed-batch cultivations of Streptomyces coelicolor, producing the antibiotic actinorhodin, were monitored online by multiwavelength fluorescence spectroscopy and off-gas analysis. Partial least squares (PLS), locally weighted regression, and multilinear PLS (N-PLS) models were built for prediction...
Compiling Relational Bayesian Networks for Exact Inference
DEFF Research Database (Denmark)
Jaeger, Manfred; Darwiche, Adnan; Chavira, Mark
2006-01-01
We describe in this paper a system for exact inference with relational Bayesian networks as defined in the publicly available PRIMULA tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference...... by evaluating and differentiating these circuits in time linear in their size. We report on experimental results showing successful compilation and efficient inference on relational Bayesian networks, whose PRIMULA--generated propositional instances have thousands of variables, and whose jointrees have clusters...
Directory of Open Access Journals (Sweden)
Guillermo A. Cañadas
2010-12-01
Full Text Available Burnout syndrome has a high incidence among professional healthcare and social workers. This leads to deterioration in the quality of their working life and affects their health, the organization where they work and, via their clients, society itself. Given these serious effects, many studies have investigated this construct and identified groups at increased risk of the syndrome. The present work has 2 main aims: to compare burnout levels in potential risk groups among professional healthcare workers; and to compare them using standard and Bayesian statistical analysis. The sample consisted of 108 psycho-social care workers based at 2 centers run by the Granada Council in Spain. All participants, anonymously and individually, filled in a booklet that included questions on personal information and the Spanish adaptation of the Maslach Burnout Inventory (MBI. Standard and Bayesian analysis of variance were used to identify the risk factors associated with different levels of burnout. It was found that the information provided by the Bayesian procedure complemented that provided by the standard procedure.
Approximate Bayesian computation.
Directory of Open Access Journals (Sweden)
Mikael Sunnåker
Full Text Available Approximate Bayesian computation (ABC constitutes a class of computational methods rooted in Bayesian statistics. In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences (e.g., in population genetics, ecology, epidemiology, and systems biology.
International Nuclear Information System (INIS)
Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao
2017-01-01
Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction. (paper)
Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao
2017-03-01
Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction.
International Nuclear Information System (INIS)
Ghasemi, Jahan B.; Zolfonoun, Ehsan
2012-01-01
Selection of the most informative molecular descriptors from the original data set is a key step for development of quantitative structure activity/property relationship models. Recently, mutual information (MI) has gained increasing attention in feature selection problems. This paper presents an effective mutual information-based feature selection approach, named mutual information maximization by replacing collinear variables (MIMRCV), for nonlinear quantitative structure-property relationship models. The proposed variable selection method was applied to three different QSPR datasets, soil degradation half-life of 47 organophosphorus pesticides, GC-MS retention times of 85 volatile organic compounds, and water-to-micellar cetyltrimethylammonium bromide partition coefficients of 62 organic compounds.The obtained results revealed that using MIMRCV as feature selection method improves the predictive quality of the developed models compared to conventional MI based variable selection algorithms
Energy Technology Data Exchange (ETDEWEB)
Ghasemi, Jahan B.; Zolfonoun, Ehsan [Toosi University of Technology, Tehran (Korea, Republic of)
2012-05-15
Selection of the most informative molecular descriptors from the original data set is a key step for development of quantitative structure activity/property relationship models. Recently, mutual information (MI) has gained increasing attention in feature selection problems. This paper presents an effective mutual information-based feature selection approach, named mutual information maximization by replacing collinear variables (MIMRCV), for nonlinear quantitative structure-property relationship models. The proposed variable selection method was applied to three different QSPR datasets, soil degradation half-life of 47 organophosphorus pesticides, GC-MS retention times of 85 volatile organic compounds, and water-to-micellar cetyltrimethylammonium bromide partition coefficients of 62 organic compounds.The obtained results revealed that using MIMRCV as feature selection method improves the predictive quality of the developed models compared to conventional MI based variable selection algorithms.
Calibration Variable Selection and Natural Zero Determination for Semispan and Canard Balances
Ulbrich, Norbert M.
2013-01-01
Independent calibration variables for the characterization of semispan and canard wind tunnel balances are discussed. It is shown that the variable selection for a semispan balance is determined by the location of the resultant normal and axial forces that act on the balance. These two forces are the first and second calibration variable. The pitching moment becomes the third calibration variable after the normal and axial forces are shifted to the pitch axis of the balance. Two geometric distances, i.e., the rolling and yawing moment arms, are the fourth and fifth calibration variable. They are traditionally substituted by corresponding moments to simplify the use of calibration data during a wind tunnel test. A canard balance is related to a semispan balance. It also only measures loads on one half of a lifting surface. However, the axial force and yawing moment are of no interest to users of a canard balance. Therefore, its calibration variable set is reduced to the normal force, pitching moment, and rolling moment. The combined load diagrams of the rolling and yawing moment for a semispan balance are discussed. They may be used to illustrate connections between the wind tunnel model geometry, the test section size, and the calibration load schedule. Then, methods are reviewed that may be used to obtain the natural zeros of a semispan or canard balance. In addition, characteristics of three semispan balance calibration rigs are discussed. Finally, basic requirements for a full characterization of a semispan balance are reviewed.
DEFF Research Database (Denmark)
Sharifzadeh, Sara; Ghodsi, Ali; Clemmensen, Line H.
2017-01-01
Principal component analysis (PCA) is one of the main unsupervised pre-processing methods for dimension reduction. When the training labels are available, it is worth using a supervised PCA strategy. In cases that both dimension reduction and variable selection are required, sparse PCA (SPCA......) methods are preferred. In this paper, a sparse supervised PCA (SSPCA) method is proposed for pre-processing. This method is appropriate especially in problems where, a high dimensional input necessitates the use of a sparse method and a target label is also available to guide the variable selection......) algorithm. We compare the proposed method with PCA, PMD-based SPCA and supervised PCA. In addition, SSPCA is also compared with sparse partial least squares (SPLS), due to the similarity between the two objective functions. Experimental results from the simulated as well as real data sets show that, SSPCA...
Bayesian nonparametric hierarchical modeling.
Dunson, David B
2009-04-01
In biomedical research, hierarchical models are very widely used to accommodate dependence in multivariate and longitudinal data and for borrowing of information across data from different sources. A primary concern in hierarchical modeling is sensitivity to parametric assumptions, such as linearity and normality of the random effects. Parametric assumptions on latent variable distributions can be challenging to check and are typically unwarranted, given available prior knowledge. This article reviews some recent developments in Bayesian nonparametric methods motivated by complex, multivariate and functional data collected in biomedical studies. The author provides a brief review of flexible parametric approaches relying on finite mixtures and latent class modeling. Dirichlet process mixture models are motivated by the need to generalize these approaches to avoid assuming a fixed finite number of classes. Focusing on an epidemiology application, the author illustrates the practical utility and potential of nonparametric Bayes methods.
Prediction of road accidents: A Bayesian hierarchical approach.
Deublein, Markus; Schubert, Matthias; Adey, Bryan T; Köhler, Jochen; Faber, Michael H
2013-03-01
In this paper a novel methodology for the prediction of the occurrence of road accidents is presented. The methodology utilizes a combination of three statistical methods: (1) gamma-updating of the occurrence rates of injury accidents and injured road users, (2) hierarchical multivariate Poisson-lognormal regression analysis taking into account correlations amongst multiple dependent model response variables and effects of discrete accident count data e.g. over-dispersion, and (3) Bayesian inference algorithms, which are applied by means of data mining techniques supported by Bayesian Probabilistic Networks in order to represent non-linearity between risk indicating and model response variables, as well as different types of uncertainties which might be present in the development of the specific models. Prior Bayesian Probabilistic Networks are first established by means of multivariate regression analysis of the observed frequencies of the model response variables, e.g. the occurrence of an accident, and observed values of the risk indicating variables, e.g. degree of road curvature. Subsequently, parameter learning is done using updating algorithms, to determine the posterior predictive probability distributions of the model response variables, conditional on the values of the risk indicating variables. The methodology is illustrated through a case study using data of the Austrian rural motorway network. In the case study, on randomly selected road segments the methodology is used to produce a model to predict the expected number of accidents in which an injury has occurred and the expected number of light, severe and fatally injured road users. Additionally, the methodology is used for geo-referenced identification of road sections with increased occurrence probabilities of injury accident events on a road link between two Austrian cities. It is shown that the proposed methodology can be used to develop models to estimate the occurrence of road accidents for any
gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework
Hofner, Benjamin; Mayr, Andreas; Schmid, Matthias
2014-01-01
Generalized additive models for location, scale and shape are a flexible class of regression models that allow to model multiple parameters of a distribution function, such as the mean and the standard deviation, simultaneously. With the R package gamboostLSS, we provide a boosting method to fit these models. Variable selection and model choice are naturally available within this regularized regression framework. To introduce and illustrate the R package gamboostLSS and its infrastructure, we...
Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.
2013-01-01
Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933
Geographic variability of selected phenolic compounds in fresh berries of two Cornus species
Popović, Zorica; Matić, Rada; Bajić-Ljubičić, Jasna; Tešević, Vele; Bojović, Srđan
2017-01-01
The aim of this study was to investigate the chemical variability of Cornus mas and Cornus sanguinea on the basis of the content of six selected phenolic compounds in fruit extracts. Fruits were sampled at the time of full ripening, mid-September, from two localities that differed in terms of orographic and environmental conditions. Fresh fruit extracts were analyzed by LC–MS/MS to determine the presence and contents of neochlorogenic acid, quercitrin, isoquercetin, hyperoside, rutoside and q...
Pérez, Noel; Guevara, Miguel A.; Silva, Augusto
2013-02-01
This work addresses the issue of variable selection within the context of breast cancer classification with mammography. A comprehensive repository of feature vectors was used including a hybrid subset gathering image-based and clinical features. It aimed to gather experimental evidence of variable selection in terms of cardinality, type and find a classification scheme that provides the best performance over the Area Under Receiver Operating Characteristics Curve (AUC) scores using the ranked features subset. We evaluated and classified a total of 300 subsets of features formed by the application of Chi-Square Discretization, Information-Gain, One-Rule and RELIEF methods in association with Feed-Forward Backpropagation Neural Network (FFBP), Support Vector Machine (SVM) and Decision Tree J48 (DTJ48) Machine Learning Algorithms (MLA) for a comparative performance evaluation based on AUC scores. A variable selection analysis was performed for Single-View Ranking and Multi-View Ranking groups of features. Features subsets representing Microcalcifications (MCs), Masses and both MCs and Masses lesions achieved AUC scores of 0.91, 0.954 and 0.934 respectively. Experimental evidence demonstrated that classification performance was improved by combining image-based and clinical features. The most important clinical and image-based features were StromaDistortion and Circularity respectively. Other less important but worth to use due to its consistency were Contrast, Perimeter, Microcalcification, Correlation and Elongation.
Lü, Chengxu; Jiang, Xunpeng; Zhou, Xingfan; Zhang, Yinqiao; Zhang, Naiqian; Wei, Chongfeng; Mao, Wenhua
2017-10-01
Wet gluten is a useful quality indicator for wheat, and short wave near infrared spectroscopy (NIRS) is a high performance technique with the advantage of economic rapid and nondestructive test. To study the feasibility of short wave NIRS analyzing wet gluten directly from wheat seed, 54 representative wheat seed samples were collected and scanned by spectrometer. 8 spectral pretreatment method and genetic algorithm (GA) variable selection method were used to optimize analysis. Both quantitative and qualitative model of wet gluten were built by partial least squares regression and discriminate analysis. For quantitative analysis, normalization is the optimized pretreatment method, 17 wet gluten sensitive variables are selected by GA, and GA model performs a better result than that of all variable model, with R2V=0.88, and RMSEV=1.47. For qualitative analysis, automatic weighted least squares baseline is the optimized pretreatment method, all variable models perform better results than those of GA models. The correct classification rates of 3 class of 30% wet gluten content are 95.45, 84.52, and 90.00%, respectively. The short wave NIRS technique shows potential for both quantitative and qualitative analysis of wet gluten for wheat seed.
Multiview Bayesian Correlated Component Analysis
DEFF Research Database (Denmark)
Kamronn, Simon Due; Poulsen, Andreas Trier; Hansen, Lars Kai
2015-01-01
are identical. Here we propose a hierarchical probabilistic model that can infer the level of universality in such multiview data, from completely unrelated representations, corresponding to canonical correlation analysis, to identical representations as in correlated component analysis. This new model, which...... we denote Bayesian correlated component analysis, evaluates favorably against three relevant algorithms in simulated data. A well-established benchmark EEG data set is used to further validate the new model and infer the variability of spatial representations across multiple subjects....
Kuligowski, Julia; Pérez-Guaita, David; Escobar, Javier; de la Guardia, Miguel; Vento, Máximo; Ferrer, Alberto; Quintás, Guillermo
2013-11-15
Variable subset selection is often mandatory in high throughput metabolomics and proteomics. However, depending on the variable to sample ratio there is a significant susceptibility of variable selection towards chance correlations. The evaluation of the predictive capabilities of PLSDA models estimated by cross-validation after feature selection provides overly optimistic results if the selection is performed on the entire set and no external validation set is available. In this work, a simulation of the statistical null hypothesis is proposed to test whether the discrimination capability of a PLSDA model after variable selection estimated by cross-validation is statistically higher than that attributed to the presence of chance correlations in the original data set. Statistical significance of PLSDA CV-figures of merit obtained after variable selection is expressed by means of p-values calculated by using a permutation test that included the variable selection step. The reliability of the approach is evaluated using two variable selection methods on experimental and simulated data sets with and without induced class differences. The proposed approach can be considered as a useful tool when no external validation set is available and provides a straightforward way to evaluate differences between variable selection methods. © 2013 Elsevier B.V. All rights reserved.
The Use of Variable Q1 Isolation Windows Improves Selectivity in LC-SWATH-MS Acquisition.
Zhang, Ying; Bilbao, Aivett; Bruderer, Tobias; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard; Varesio, Emmanuel
2015-10-02
As tryptic peptides and metabolites are not equally distributed along the mass range, the probability of cross fragment ion interference is higher in certain windows when fixed Q1 SWATH windows are applied. We evaluated the benefits of utilizing variable Q1 SWATH windows with regards to selectivity improvement. Variable windows based on equalizing the distribution of either the precursor ion population (PIP) or the total ion current (TIC) within each window were generated by an in-house software, swathTUNER. These two variable Q1 SWATH window strategies outperformed, with respect to quantification and identification, the basic approach using a fixed window width (FIX) for proteomic profiling of human monocyte-derived dendritic cells (MDDCs). Thus, 13.8 and 8.4% additional peptide precursors, which resulted in 13.1 and 10.0% more proteins, were confidently identified by SWATH using the strategy PIP and TIC, respectively, in the MDDC proteomic sample. On the basis of the spectral library purity score, some improvement warranted by variable Q1 windows was also observed, albeit to a lesser extent, in the metabolomic profiling of human urine. We show that the novel concept of "scheduled SWATH" proposed here, which incorporates (i) variable isolation windows and (ii) precursor retention time segmentation further improves both peptide and metabolite identifications.
Bayesian supervised dimensionality reduction.
Gönen, Mehmet
2013-12-01
Dimensionality reduction is commonly used as a preprocessing step before training a supervised learner. However, coupled training of dimensionality reduction and supervised learning steps may improve the prediction performance. In this paper, we introduce a simple and novel Bayesian supervised dimensionality reduction method that combines linear dimensionality reduction and linear supervised learning in a principled way. We present both Gibbs sampling and variational approximation approaches to learn the proposed probabilistic model for multiclass classification. We also extend our formulation toward model selection using automatic relevance determination in order to find the intrinsic dimensionality. Classification experiments on three benchmark data sets show that the new model significantly outperforms seven baseline linear dimensionality reduction algorithms on very low dimensions in terms of generalization performance on test data. The proposed model also obtains the best results on an image recognition task in terms of classification and retrieval performances.
Social variables exert selective pressures in the evolution and form of primate mimetic musculature.
Burrows, Anne M; Li, Ly; Waller, Bridget M; Micheletta, Jerome
2016-04-01
Mammals use their faces in social interactions more so than any other vertebrates. Primates are an extreme among most mammals in their complex, direct, lifelong social interactions and their frequent use of facial displays is a means of proximate visual communication with conspecifics. The available repertoire of facial displays is primarily controlled by mimetic musculature, the muscles that move the face. The form of these muscles is, in turn, limited by and influenced by phylogenetic inertia but here we use examples, both morphological and physiological, to illustrate the influence that social variables may exert on the evolution and form of mimetic musculature among primates. Ecomorphology is concerned with the adaptive responses of morphology to various ecological variables such as diet, foliage density, predation pressures, and time of day activity. We present evidence that social variables also exert selective pressures on morphology, specifically using mimetic muscles among primates as an example. Social variables include group size, dominance 'style', and mating systems. We present two case studies to illustrate the potential influence of social behavior on adaptive morphology of mimetic musculature in primates: (1) gross morphology of the mimetic muscles around the external ear in closely related species of macaque (Macaca mulatta and Macaca nigra) characterized by varying dominance styles and (2) comparative physiology of the orbicularis oris muscle among select ape species. This muscle is used in both facial displays/expressions and in vocalizations/human speech. We present qualitative observations of myosin fiber-type distribution in this muscle of siamang (Symphalangus syndactylus), chimpanzee (Pan troglodytes), and human to demonstrate the potential influence of visual and auditory communication on muscle physiology. In sum, ecomorphologists should be aware of social selective pressures as well as ecological ones, and that observed morphology might
Richards, Joseph W.; Starr, Dan L.; Brink, Henrik; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; James, J. Berian; Long, James P.; Rice, John
2012-01-01
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because (1) standard assumptions for machine-learned model selection procedures break down and (2) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting, co-training, and active learning (AL). We argue that AL—where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up—is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and Optical Gravitational Lensing Experiment, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a Web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply AL to classify variable stars in the All Sky Automated Survey, finding dramatic improvement in our agreement with the ASAS Catalog of Variable Stars, from 65.5% to 79.5%, and a significant increase in the classifier's average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.
HEART RATE VARIABILITY CLASSIFICATION USING SADE-ELM CLASSIFIER WITH BAT FEATURE SELECTION
Directory of Open Access Journals (Sweden)
R Kavitha
2017-07-01
Full Text Available The electrical activity of the human heart is measured by the vital bio medical signal called ECG. This electrocardiogram is employed as a crucial source to gather the diagnostic information of a patient’s cardiopathy. The monitoring function of cardiac disease is diagnosed by documenting and handling the electrocardiogram (ECG impulses. In the recent years many research has been done and developing an enhanced method to identify the risk in the patient’s body condition by processing and analysing the ECG signal. This analysis of the signal helps to find the cardiac abnormalities, arrhythmias, and many other heart problems. ECG signal is processed to detect the variability in heart rhythm; heart rate variability is calculated based on the time interval between heart beats. Heart Rate Variability HRV is measured by the variation in the beat to beat interval. The Heart rate Variability (HRV is an essential aspect to diagnose the properties of the heart. Recent development enhances the potential with the aid of non-linear metrics in reference point with feature selection. In this paper, the fundamental elements are taken from the ECG signal for feature selection process where Bat algorithm is employed for feature selection to predict the best feature and presented to the classifier for accurate classification. The popular machine learning algorithm ELM is taken for classification, integrated with evolutionary algorithm named Self- Adaptive Differential Evolution Extreme Learning Machine SADEELM to improve the reliability of classification. It combines Effective Fuzzy Kohonen clustering network (EFKCN to be able to increase the accuracy of the effect for HRV transmission classification. Hence, it is observed that the experiment carried out unveils that the precision is improved by the SADE-ELM method and concurrently optimizes the computation time.
International Nuclear Information System (INIS)
Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; Berian James, J.; Brink, Henrik; Long, James P.; Rice, John
2012-01-01
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because (1) standard assumptions for machine-learned model selection procedures break down and (2) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting, co-training, and active learning (AL). We argue that AL—where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up—is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and Optical Gravitational Lensing Experiment, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a Web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply AL to classify variable stars in the All Sky Automated Survey, finding dramatic improvement in our agreement with the ASAS Catalog of Variable Stars, from 65.5% to 79.5%, and a significant increase in the classifier's average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.
Computational Neuropsychology and Bayesian Inference.
Parr, Thomas; Rees, Geraint; Friston, Karl J
2018-01-01
Computational theories of brain function have become very influential in neuroscience. They have facilitated the growth of formal approaches to disease, particularly in psychiatric research. In this paper, we provide a narrative review of the body of computational research addressing neuropsychological syndromes, and focus on those that employ Bayesian frameworks. Bayesian approaches to understanding brain function formulate perception and action as inferential processes. These inferences combine 'prior' beliefs with a generative (predictive) model to explain the causes of sensations. Under this view, neuropsychological deficits can be thought of as false inferences that arise due to aberrant prior beliefs (that are poor fits to the real world). This draws upon the notion of a Bayes optimal pathology - optimal inference with suboptimal priors - and provides a means for computational phenotyping. In principle, any given neuropsychological disorder could be characterized by the set of prior beliefs that would make a patient's behavior appear Bayes optimal. We start with an overview of some key theoretical constructs and use these to motivate a form of computational neuropsychology that relates anatomical structures in the brain to the computations they perform. Throughout, we draw upon computational accounts of neuropsychological syndromes. These are selected to emphasize the key features of a Bayesian approach, and the possible types of pathological prior that may be present. They range from visual neglect through hallucinations to autism. Through these illustrative examples, we review the use of Bayesian approaches to understand the link between biology and computation that is at the heart of neuropsychology.
DEFF Research Database (Denmark)
Mauricio Iglesias, Miguel; Valverde Perez, Borja; Sin, Gürkan
Selecting the right controlled variables in a bioprocess is challenging since the objectives of the process (yields, product or substrate concentration) are difficult to relate with a given actuator. We apply here process control tools that can be used to assist in the selection of controlled var...... variables to the case of the SHARON-Anammox process for autotrophic nitrogen removal....
Classification of Brazilian soils by using LIBS and variable selection in the wavelet domain.
Pontes, Márcio José Coelho; Cortez, Juliana; Galvão, Roberto Kawakami Harrop; Pasquini, Celio; Araújo, Mário César Ugulino; Coelho, Ricardo Marques; Chiba, Márcio Koiti; de Abreu, Mônica Ferreira; Madari, Beáta Emöke
2009-05-29
This paper proposes a novel analytical methodology for soil classification based on the use of laser-induced breakdown spectroscopy (LIBS) and chemometric techniques. In the proposed methodology, linear discriminant analysis (LDA) is employed to build a classification model on the basis of a reduced subset of spectral variables. For the purpose of variable selection, three techniques are considered, namely the successive projection algorithm (SPA), the genetic algorithm (GA), and a stepwise formulation (SW). The use of a data compression procedure in the wavelet domain is also proposed to reduce the computational workload involved in the variable selection process. The methodology is validated in a case study involving the classification of 149 Brazilian soil samples into three different orders (Argissolo, Latossolo and Nitossolo). For means of comparison, soft independent modelling of class analogy (SIMCA) models are also employed. The best discrimination of soil types was attained by SPA-LDA, which achieved an average classification rate of 90% in the validation set and 72% in cross-validation. Moreover, the proposed wavelet compression procedure was found to be of value by providing a 100-fold reduction in computational workload without significantly compromising the classification accuracy of the resulting models.
Space Shuttle RTOS Bayesian Network
Morris, A. Terry; Beling, Peter A.
2001-01-01
With shrinking budgets and the requirements to increase reliability and operational life of the existing orbiter fleet, NASA has proposed various upgrades for the Space Shuttle that are consistent with national space policy. The cockpit avionics upgrade (CAU), a high priority item, has been selected as the next major upgrade. The primary functions of cockpit avionics include flight control, guidance and navigation, communication, and orbiter landing support. Secondary functions include the provision of operational services for non-avionics systems such as data handling for the payloads and caution and warning alerts to the crew. Recently, a process to selection the optimal commercial-off-the-shelf (COTS) real-time operating system (RTOS) for the CAU was conducted by United Space Alliance (USA) Corporation, which is a joint venture between Boeing and Lockheed Martin, the prime contractor for space shuttle operations. In order to independently assess the RTOS selection, NASA has used the Bayesian network-based scoring methodology described in this paper. Our two-stage methodology addresses the issue of RTOS acceptability by incorporating functional, performance and non-functional software measures related to reliability, interoperability, certifiability, efficiency, correctness, business, legal, product history, cost and life cycle. The first stage of the methodology involves obtaining scores for the various measures using a Bayesian network. The Bayesian network incorporates the causal relationships between the various and often competing measures of interest while also assisting the inherently complex decision analysis process with its ability to reason under uncertainty. The structure and selection of prior probabilities for the network is extracted from experts in the field of real-time operating systems. Scores for the various measures are computed using Bayesian probability. In the second stage, multi-criteria trade-off analyses are performed between the scores
Nikoloulopoulos, Aristidis K
2016-06-30
The method of generalized estimating equations (GEE) is popular in the biostatistics literature for analyzing longitudinal binary and count data. It assumes a generalized linear model for the outcome variable, and a working correlation among repeated measurements. In this paper, we introduce a viable competitor: the weighted scores method for generalized linear model margins. We weight the univariate score equations using a working discretized multivariate normal model that is a proper multivariate model. Because the weighted scores method is a parametric method based on likelihood, we propose composite likelihood information criteria as an intermediate step for model selection. The same criteria can be used for both correlation structure and variable selection. Simulations studies and the application example show that our method outperforms other existing model selection methods in GEE. From the example, it can be seen that our methods not only improve on GEE in terms of interpretability and efficiency but also can change the inferential conclusions with respect to GEE. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Role of apoptosis in common variable immunodeficiency and selective immunoglobulin A deficiency.
Yazdani, Reza; Fatholahi, Maryam; Ganjalikhani-Hakemi, Mazdak; Abolhassani, Hassan; Azizi, Gholamreza; Hamid, Kabir Magaji; Rezaei, Nima; Aghamohammadi, Asghar
2016-03-01
Common variable immunodeficiency (CVID) and selective IgA deficiency (SIgAD) are the most common primary immunodeficiencies in human. Both diseases share clinical manifestation and molecular defects. Increased apoptosis may be one of the mechanisms involved in the pathogenesis of CVID and SIgAD. Elevated apoptosis in this disorder leads to defective long-term survival of B-cells, reduced antibody production, decreased lymphocyte proliferation and defective cytokine secretion. For the first time, we reviewed the role of apoptosis in CVID and SIgAD. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kamman, J. H.; Hall, C. L.
1975-01-01
Two inlet performance tests and one inlet/airframe drag test were conducted in 1969 at the NASA-Ames Research Center. The basic inlet system was two-dimensional, three ramp (overhead), external compression, with variable capture area. The data from these tests were analyzed to show the effects of selected design variables on the performance of this type of inlet system. The inlet design variables investigated include inlet bleed, bypass, operating mass flow ratio, inlet geometry, and variable capture area.
gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework
Directory of Open Access Journals (Sweden)
Benjamin Hofner
2016-10-01
Full Text Available Generalized additive models for location, scale and shape are a flexible class of regression models that allow to model multiple parameters of a distribution function, such as the mean and the standard deviation, simultaneously. With the R package gamboostLSS, we provide a boosting method to fit these models. Variable selection and model choice are naturally available within this regularized regression framework. To introduce and illustrate the R package gamboostLSS and its infrastructure, we use a data set on stunted growth in India. In addition to the specification and application of the model itself, we present a variety of convenience functions, including methods for tuning parameter selection, prediction and visualization of results. The package gamboostLSS is available from the Comprehensive R Archive Network (CRAN at https://CRAN.R-project.org/package=gamboostLSS.
Understanding Computational Bayesian Statistics
Bolstad, William M
2011-01-01
A hands-on introduction to computational statistics from a Bayesian point of view Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistic
Bayesian statistics an introduction
Lee, Peter M
2012-01-01
Bayesian Statistics is the school of thought that combines prior beliefs with the likelihood of a hypothesis to arrive at posterior beliefs. The first edition of Peter Lee’s book appeared in 1989, but the subject has moved ever onwards, with increasing emphasis on Monte Carlo based techniques. This new fourth edition looks at recent techniques such as variational methods, Bayesian importance sampling, approximate Bayesian computation and Reversible Jump Markov Chain Monte Carlo (RJMCMC), providing a concise account of the way in which the Bayesian approach to statistics develops as wel
Gezari, S.; Martin, D. C.; Forster, K.; Neill, J. D.; Huber, M.; Heckman, T.; Bianchi, L.; Morrissey, P.; Neff, S. G.; Seibert, M.;
2013-01-01
We present the selection and classification of over a thousand ultraviolet (UV) variable sources discovered in approximately 40 deg(exp 2) of GALEX Time Domain Survey (TDS) NUV images observed with a cadence of 2 days and a baseline of observations of approximately 3 years. The GALEX TDS fields were designed to be in spatial and temporal coordination with the Pan-STARRS1 Medium Deep Survey, which provides deep optical imaging and simultaneous optical transient detections via image differencing. We characterize the GALEX photometric errors empirically as a function of mean magnitude, and select sources that vary at the 5 sigma level in at least one epoch. We measure the statistical properties of the UV variability, including the structure function on timescales of days and years. We report classifications for the GALEX TDS sample using a combination of optical host colors and morphology, UV light curve characteristics, and matches to archival X-ray, and spectroscopy catalogs. We classify 62% of the sources as active galaxies (358 quasars and 305 active galactic nuclei), and 10% as variable stars (including 37 RR Lyrae, 53 M dwarf flare stars, and 2 cataclysmic variables). We detect a large-amplitude tail in the UV variability distribution for M-dwarf flare stars and RR Lyrae, reaching up to absolute value(?m) = 4.6 mag and 2.9 mag, respectively. The mean amplitude of the structure function for quasars on year timescales is five times larger than observed at optical wavelengths. The remaining unclassified sources include UV-bright extragalactic transients, two of which have been spectroscopically confirmed to be a young core-collapse supernova and a flare from the tidal disruption of a star by dormant supermassive black hole. We calculate a surface density for variable sources in the UV with NUV less than 23 mag and absolute value(?m) greater than 0.2 mag of approximately 8.0, 7.7, and 1.8 deg(exp -2) for quasars, active galactic nuclei, and RR Lyrae stars
Extreme precipitation variability, forage quality and large herbivore diet selection in arid environments
Cain, James W.; Gedir, Jay V.; Marshal, Jason P.; Krausman, Paul R.; Allen, Jamison D.; Duff, Glenn C.; Jansen, Brian; Morgart, John R.
2017-01-01
Nutritional ecology forms the interface between environmental variability and large herbivore behaviour, life history characteristics, and population dynamics. Forage conditions in arid and semi-arid regions are driven by unpredictable spatial and temporal patterns in rainfall. Diet selection by herbivores should be directed towards overcoming the most pressing nutritional limitation (i.e. energy, protein [nitrogen, N], moisture) within the constraints imposed by temporal and spatial variability in forage conditions. We investigated the influence of precipitation-induced shifts in forage nutritional quality and subsequent large herbivore responses across widely varying precipitation conditions in an arid environment. Specifically, we assessed seasonal changes in diet breadth and forage selection of adult female desert bighorn sheep Ovis canadensis mexicana in relation to potential nutritional limitations in forage N, moisture and energy content (as proxied by dry matter digestibility, DMD). Succulents were consistently high in moisture but low in N and grasses were low in N and moisture until the wet period. Nitrogen and moisture content of shrubs and forbs varied among seasons and climatic periods, whereas trees had consistently high N and moderate moisture levels. Shrubs, trees and succulents composed most of the seasonal sheep diets but had little variation in DMD. Across all seasons during drought and during summer with average precipitation, forages selected by sheep were higher in N and moisture than that of available forage. Differences in DMD between sheep diets and available forage were minor. Diet breadth was lowest during drought and increased with precipitation, reflecting a reliance on few key forage species during drought. Overall, forage selection was more strongly associated with N and moisture content than energy content. Our study demonstrates that unlike north-temperate ungulates which are generally reported to be energy-limited, N and moisture
A Bayesian method for assessing multiscalespecies-habitat relationships
Stuber, Erica F.; Gruber, Lutz F.; Fontaine, Joseph J.
2017-01-01
ContextScientists face several theoretical and methodological challenges in appropriately describing fundamental wildlife-habitat relationships in models. The spatial scales of habitat relationships are often unknown, and are expected to follow a multi-scale hierarchy. Typical frequentist or information theoretic approaches often suffer under collinearity in multi-scale studies, fail to converge when models are complex or represent an intractable computational burden when candidate model sets are large.ObjectivesOur objective was to implement an automated, Bayesian method for inference on the spatial scales of habitat variables that best predict animal abundance.MethodsWe introduce Bayesian latent indicator scale selection (BLISS), a Bayesian method to select spatial scales of predictors using latent scale indicator variables that are estimated with reversible-jump Markov chain Monte Carlo sampling. BLISS does not suffer from collinearity, and substantially reduces computation time of studies. We present a simulation study to validate our method and apply our method to a case-study of land cover predictors for ring-necked pheasant (Phasianus colchicus) abundance in Nebraska, USA.ResultsOur method returns accurate descriptions of the explanatory power of multiple spatial scales, and unbiased and precise parameter estimates under commonly encountered data limitations including spatial scale autocorrelation, effect size, and sample size. BLISS outperforms commonly used model selection methods including stepwise and AIC, and reduces runtime by 90%.ConclusionsGiven the pervasiveness of scale-dependency in ecology, and the implications of mismatches between the scales of analyses and ecological processes, identifying the spatial scales over which species are integrating habitat information is an important step in understanding species-habitat relationships. BLISS is a widely applicable method for identifying important spatial scales, propagating scale uncertainty, and
Wang, Guiyun; Ma, Mingyu; Zhang, Zhuoyong; Xiang, Yuhong; Harrington, Peter de B
2013-08-15
A novel method combining a discrete particle swarm optimization (DPSO) with a support vector machine (SVM) was proposed for the variable interval selection of tissue sections of endometrial carcinoma by near infrared spectroscopy. The DPSO-SVM algorithm includes a multi-stage screening. In each screening step, the DPSO was repeated 50 times using random sampling, and the frequencies that the variable intervals were selected among the 50 repeats were used to select the most probable intervals. The variable intervals with high probabilities were selected and further used in the next screening. Finally, the subset of variable intervals with the highest classification rate was considered as the optimal variable intervals. A synthetic data set mimicking the near infrared (NIR) spectra of tissue samples was applied to evaluate the performance of the DPSO-SVM. For the synthetic data, the classification rates were 74.9 ± 0.9% and 100% for the full spectral range and the six variable intervals selected by the DPSO-SVM. For the real endometrial tissue data, the entire spectral data gave an average accuracy of 69.5 ± 0.5%, while the 20 variable intervals gave 98.5 ± 0.3%. The results showed that the informative variables from the NIR spectra could be selected and high classification accuracy was achieved by the proposed approach. Copyright © 2013 Elsevier B.V. All rights reserved.
Bayesian geostatistical modeling of leishmaniasis incidence in Brazil.
Directory of Open Access Journals (Sweden)
Dimitrios-Alexios Karagiannis-Voules
Full Text Available BACKGROUND: Leishmaniasis is endemic in 98 countries with an estimated 350 million people at risk and approximately 2 million cases annually. Brazil is one of the most severely affected countries. METHODOLOGY: We applied Bayesian geostatistical negative binomial models to analyze reported incidence data of cutaneous and visceral leishmaniasis in Brazil covering a 10-year period (2001-2010. Particular emphasis was placed on spatial and temporal patterns. The models were fitted using integrated nested Laplace approximations to perform fast approximate Bayesian inference. Bayesian variable selection was employed to determine the most important climatic, environmental, and socioeconomic predictors of cutaneous and visceral leishmaniasis. PRINCIPAL FINDINGS: For both types of leishmaniasis, precipitation and socioeconomic proxies were identified as important risk factors. The predicted number of cases in 2010 were 30,189 (standard deviation [SD]: 7,676 for cutaneous leishmaniasis and 4,889 (SD: 288 for visceral leishmaniasis. Our risk maps predicted the highest numbers of infected people in the states of Minas Gerais and Pará for visceral and cutaneous leishmaniasis, respectively. CONCLUSIONS/SIGNIFICANCE: Our spatially explicit, high-resolution incidence maps identified priority areas where leishmaniasis control efforts should be targeted with the ultimate goal to reduce disease incidence.
DEFF Research Database (Denmark)
Christensen, Lars P.B.; Larsen, Jan
2006-01-01
A general Variational Bayesian framework for iterative data and parameter estimation for coherent detection is introduced as a generalization of the EM-algorithm. Explicit solutions are given for MIMO channel estimation with Gaussian prior and noise covariance estimation with inverse-Wishart prior....... Simulation of a GSM-like system provides empirical proof that the VBEM-algorithm is able to provide better performance than the EM-algorithm. However, if the posterior distribution is highly peaked, the VBEM-algorithm approaches the EM-algorithm and the gain disappears. The potential gain is therefore...
Directory of Open Access Journals (Sweden)
Yong-Hong Zhang
2015-05-01
Full Text Available Assessing the human placental barrier permeability of drugs is very important to guarantee drug safety during pregnancy. Quantitative structure–activity relationship (QSAR method was used as an effective assessing tool for the placental transfer study of drugs, while in vitro human placental perfusion is the most widely used method. In this study, the partial least squares (PLS variable selection and modeling procedure was used to pick out optimal descriptors from a pool of 620 descriptors of 65 compounds and to simultaneously develop a QSAR model between the descriptors and the placental barrier permeability expressed by the clearance indices (CI. The model was subjected to internal validation by cross-validation and y-randomization and to external validation by predicting CI values of 19 compounds. It was shown that the model developed is robust and has a good predictive potential (r2 = 0.9064, RMSE = 0.09, q2 = 0.7323, rp2 = 0.7656, RMSP = 0.14. The mechanistic interpretation of the final model was given by the high variable importance in projection values of descriptors. Using PLS procedure, we can rapidly and effectively select optimal descriptors and thus construct a model with good stability and predictability. This analysis can provide an effective tool for the high-throughput screening of the placental barrier permeability of drugs.
Bayesian inference with information content model check for Langevin equations
DEFF Research Database (Denmark)
Krog, Jens F. C.; Lomholt, Michael Andersen
2017-01-01
The Bayesian data analysis framework has been proven to be a systematic and effective method of parameter inference and model selection for stochastic processes. In this work we introduce an information content model check which may serve as a goodness-of-fit, like the chi-square procedure......, to complement conventional Bayesian analysis. We demonstrate this extended Bayesian framework on a system of Langevin equations, where coordinate dependent mobilities and measurement noise hinder the normal mean squared displacement approach....
2012-01-01
tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint. PMID:22962944
Directory of Open Access Journals (Sweden)
Adrion Christine
2012-09-01
provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.
Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method
Jiang, Yuan; He, Yunxiao
2015-01-01
LASSO is a popular statistical tool often used in conjunction with generalized linear models that can simultaneously select variables and estimate parameters. When there are many variables of interest, as in current biological and biomedical studies, the power of LASSO can be limited. Fortunately, so much biological and biomedical data have been collected and they may contain useful information about the importance of certain variables. This paper proposes an extension of LASSO, namely, prior LASSO (pLASSO), to incorporate that prior information into penalized generalized linear models. The goal is achieved by adding in the LASSO criterion function an additional measure of the discrepancy between the prior information and the model. For linear regression, the whole solution path of the pLASSO estimator can be found with a procedure similar to the Least Angle Regression (LARS). Asymptotic theories and simulation results show that pLASSO provides significant improvement over LASSO when the prior information is relatively accurate. When the prior information is less reliable, pLASSO shows great robustness to the misspecification. We illustrate the application of pLASSO using a real data set from a genome-wide association study. PMID:27217599
Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis
Directory of Open Access Journals (Sweden)
Ueki Masao
2012-05-01
Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.
Villaescusa-Martínez, Víctor; Sáez-Villar, Lorena; Martínez-Domínguez, María Luisa
2014-01-01
To determine the general and specific conditions required in the official announcements of the different autonomous health services for the provision of vacant nursing positions as statutory nursing staff. A documentary review of the latest public employment offers (PEO) for statutory nursing staff of the 17 Spanish autonomous communities (AACC) was carried out, between June 2007 and August 2012. The variables related to the announcements method, the general requirements and the criteria used to evaluate the applicants, were reviewed. All AACC use the public competition as the method for selecting applicants. The general requirements are uniform in the 17 official announcements. The most commonly used system today in the public competition is the multiple-choice test (58%), being necessary to obtain at least 50% of the marks to pass in all of them. At the public competition stage, the undergraduate, specialized and continuous education; professional experience, scientific, teaching activities and other merits are evaluated differently. The knowledge of the regional language is present in official announcements. The weight of the parts involved in the process is variable, with 50-50% and 60-40% being the most commonly used. The general requirements that the applicants should meet are homogeneous, as well as the types of PEO. The competition processes of the AACC are very different from each other, and there is a great variability of criteria in the selection process based on the evaluation of the suitable merits, and the importance that each stage has on the competition. Copyright © 2013 Elsevier España, S.L. All rights reserved.
A default Bayesian hypothesis test for mediation.
Nuijten, Michèle B; Wetzels, Ruud; Matzke, Dora; Dolan, Conor V; Wagenmakers, Eric-Jan
2015-03-01
In order to quantify the relationship between multiple variables, researchers often carry out a mediation analysis. In such an analysis, a mediator (e.g., knowledge of a healthy diet) transmits the effect from an independent variable (e.g., classroom instruction on a healthy diet) to a dependent variable (e.g., consumption of fruits and vegetables). Almost all mediation analyses in psychology use frequentist estimation and hypothesis-testing techniques. A recent exception is Yuan and MacKinnon (Psychological Methods, 14, 301-322, 2009), who outlined a Bayesian parameter estimation procedure for mediation analysis. Here we complete the Bayesian alternative to frequentist mediation analysis by specifying a default Bayesian hypothesis test based on the Jeffreys-Zellner-Siow approach. We further extend this default Bayesian test by allowing a comparison to directional or one-sided alternatives, using Markov chain Monte Carlo techniques implemented in JAGS. All Bayesian tests are implemented in the R package BayesMed (Nuijten, Wetzels, Matzke, Dolan, & Wagenmakers, 2014).
Multi-level Bayesian analyses for single- and multi-vehicle freeway crashes.
Yu, Rongjie; Abdel-Aty, Mohamed
2013-09-01
This study presents multi-level analyses for single- and multi-vehicle crashes on a mountainous freeway. Data from a 15-mile mountainous freeway section on I-70 were investigated. Both aggregate and disaggregate models for the two crash conditions were developed. Five years of crash data were used in the aggregate investigation, while the disaggregate models utilized one year of crash data along with real-time traffic and weather data. For the aggregate analyses, safety performance functions were developed for the purpose of revealing the contributing factors for each crash type. Two methodologies, a Bayesian bivariate Poisson-lognormal model and a Bayesian hierarchical Poisson model with correlated random effects, were estimated to simultaneously analyze the two crash conditions with consideration of possible correlations. Except for the factors related to geometric characteristics, two exposure parameters (annual average daily traffic and segment length) were included. Two different sets of significant explanatory and exposure variables were identified for the single-vehicle (SV) and multi-vehicle (MV) crashes. It was found that the Bayesian bivariate Poisson-lognormal model is superior to the Bayesian hierarchical Poisson model, the former with a substantially lower DIC and more significant variables. In addition to the aggregate analyses, microscopic real-time crash risk evaluation models were developed for the two crash conditions. Multi-level Bayesian logistic regression models were estimated with the random parameters accounting for seasonal variations, crash-unit-level diversity and segment-level random effects capturing unobserved heterogeneity caused by the geometric characteristics. The model results indicate that the effects of the selected variables on crash occurrence vary across seasons and crash units; and that geometric characteristic variables contribute to the segment variations: the more unobserved heterogeneity have been accounted, the better
Highly variable recombinational landscape modulates efficacy of natural selection in birds.
Gossmann, Toni I; Santure, Anna W; Sheldon, Ben C; Slate, Jon; Zeng, Kai
2014-08-01
Determining the rate of protein evolution and identifying the causes of its variation across the genome are powerful ways to understand forces that are important for genome evolution. By using a multitissue transcriptome data set from great tit (Parus major), we analyzed patterns of molecular evolution between two passerine birds, great tit and zebra finch (Taeniopygia guttata), using the chicken genome (Gallus gallus) as an outgroup. We investigated whether a special feature of avian genomes, the highly variable recombinational landscape, modulates the efficacy of natural selection through the effects of Hill-Robertson interference, which predicts that selection should be more effective in removing deleterious mutations and incorporating beneficial mutations in high-recombination regions than in low-recombination regions. In agreement with these predictions, genes located in low-recombination regions tend to have a high proportion of neutrally evolving sites and relaxed selective constraint on sites subject to purifying selection, whereas genes that show strong support for past episodes of positive selection appear disproportionally in high-recombination regions. There is also evidence that genes located in high-recombination regions tend to have higher gene expression specificity than those located in low-recombination regions. Furthermore, more compact genes (i.e., those with fewer/shorter introns or shorter proteins) evolve faster than less compact ones. In sum, our results demonstrate that transcriptome sequencing is a powerful method to answer fundamental questions about genome evolution in nonmodel organisms. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.
Liu, Cong; Wang, Xujun; Genchev, Georgi Z; Lu, Hui
2017-07-15
New developments in high-throughput genomic technologies have enabled the measurement of diverse types of omics biomarkers in a cost-efficient and clinically-feasible manner. Developing computational methods and tools for analysis and translation of such genomic data into clinically-relevant information is an ongoing and active area of investigation. For example, several studies have utilized an unsupervised learning framework to cluster patients by integrating omics data. Despite such recent advances, predicting cancer prognosis using integrated omics biomarkers remains a challenge. There is also a shortage of computational tools for predicting cancer prognosis by using supervised learning methods. The current standard approach is to fit a Cox regression model by concatenating the different types of omics data in a linear manner, while penalty could be added for feature selection. A more powerful approach, however, would be to incorporate data by considering relationships among omics datatypes. Here we developed two methods: a SKI-Cox method and a wLASSO-Cox method to incorporate the association among different types of omics data. Both methods fit the Cox proportional hazards model and predict a risk score based on mRNA expression profiles. SKI-Cox borrows the information generated by these additional types of omics data to guide variable selection, while wLASSO-Cox incorporates this information as a penalty factor during model fitting. We show that SKI-Cox and wLASSO-Cox models select more true variables than a LASSO-Cox model in simulation studies. We assess the performance of SKI-Cox and wLASSO-Cox using TCGA glioblastoma multiforme and lung adenocarcinoma data. In each case, mRNA expression, methylation, and copy number variation data are integrated to predict the overall survival time of cancer patients. Our methods achieve better performance in predicting patients' survival in glioblastoma and lung adenocarcinoma. Copyright © 2017. Published by Elsevier
Bayesian Test of Significance for Conditional Independence: The Multinomial Model
Directory of Open Access Journals (Sweden)
Pablo de Morais Andrade
2014-03-01
Full Text Available Conditional independence tests have received special attention lately in machine learning and computational intelligence related literature as an important indicator of the relationship among the variables used by their models. In the field of probabilistic graphical models, which includes Bayesian network models, conditional independence tests are especially important for the task of learning the probabilistic graphical model structure from data. In this paper, we propose the full Bayesian significance test for tests of conditional independence for discrete datasets. The full Bayesian significance test is a powerful Bayesian test for precise hypothesis, as an alternative to the frequentist’s significance tests (characterized by the calculation of the p-value.
Kleibergen, F.R.; Kleijn, R.; Paap, R.
2000-01-01
We propose a novel Bayesian test under a (noninformative) Jeffreys'priorspecification. We check whether the fixed scalar value of the so-calledBayesian Score Statistic (BSS) under the null hypothesis is aplausiblerealization from its known and standardized distribution under thealternative. Unlike
Yuan, Ying; MacKinnon, David P.
2009-01-01
In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…
Predicting land cover using GIS, Bayesian and evolutionary algorithm methods.
Aitkenhead, M J; Aalders, I H
2009-01-01
Modelling land cover change from existing land cover maps is a vital requirement for anyone wishing to understand how the landscape may change in the future. In order to test any land cover change model, existing data must be used. However, often it is not known which data should be applied to the problem, or whether relationships exist within and between complex datasets. Here we have developed and tested a model that applied evolutionary processes to Bayesian networks. The model was developed and tested on a dataset containing land cover information and environmental data, in order to show that decisions about which datasets should be used could be made automatically. Bayesian networks are amenable to evolutionary methods as they can be easily described using a binary string to which crossover and mutation operations can be applied. The method, developed to allow comparison with standard Bayesian network development software, was proved capable of carrying out a rapid and effective search of the space of possible networks in order to find an optimal or near-optimal solution for the selection of datasets that have causal links with one another. Comparison of land cover mapping in the North-East of Scotland was made with a commercial Bayesian software package, with the evolutionary method being shown to provide greater flexibility in its ability to adapt to incorporate/utilise available evidence/knowledge and develop effective and accurate network structures, at the cost of requiring additional computer programming skills. The dataset used to develop the models included GIS-based data taken from the Land Cover for Scotland 1988 (LCS88), Land Capability for Forestry (LCF), Land Capability for Agriculture (LCA), the soil map of Scotland and additional climatic variables.
Estrada-Peña, Agustín; Estrada-Sánchez, Adrián; Estrada-Sánchez, David; de la Fuente, José
2013-09-26
Modelling the environmental niche and spatial distribution of pathogen-transmitting arthropods involves various quality and methodological concerns related to using climate data to capture the environmental niche. This study tested the potential of MODIS remotely sensed and interpolated gridded covariates to estimate the climate niche of the medically important ticks Ixodes ricinus and Hyalomma marginatum. We also assessed model inflation resulting from spatial autocorrelation (SA) and collinearity (CO) of covariates used as time series of data (monthly values of variables), principal components analysis (PCA), and a discrete Fourier transformation. Performance of the models was measured using area under the curve (AUC), autocorrelation by Moran's I, and collinearity by the variance inflation factor (VIF). The covariate spatial resolution slightly affected the final AUC. Consistently, models for H. marginatum performed better than models for I. ricinus, likely because of a species-derived rather than covariate effect because the former occupies a more limited niche. Monthly series of interpolated climate always better captured the climate niche of the ticks, but the SA was around 2 times higher and the maximum VIF between covariates around 30 times higher in interpolated than in MODIS-derived covariates. Interpolated or remotely sensed monthly series of covariates always had higher SA and CO than their transformations by PCA or Fourier. Regarding the effects of background point selection on AUC, we found that selection based on a set of rules for the distance to the core distribution and the heterogeneity of the landscape influenced model outcomes. The best selection relied on a random selection of points as close as possible to the target organism area of distribution, but effects are variable according to the species modelled. Testing for effects of SA and CO is necessary before incorporating these covariates into algorithms building a climate envelope. Results
Hydrological flow predictions in ungauged and sparsely gauged watersheds use regionalization or classification of hydrologically similar watersheds to develop empirical relationships between hydrologic, climatic, and watershed variables. The watershed classifications may be based...
International Nuclear Information System (INIS)
Soomro, A.M.; Seehar, G.M.; Bhanger, M.I.
2003-01-01
Pesticide induced changes were assessed in thirty two subjects of attempted suicide cases. Among all, the farmers and their families were recorded as most frequently suicide attempting. The values obtained from seven biochemical variables of 29 years old (average age) hospitalized subjects were compared to the same number and age matched normal volunteers. The results revealed major differences in the mean values of the selected parameters. The mean difference calculate; alkaline phosphatase (178.7 mu/l), Bilirubin (7.5 mg/dl), GPT (59.2 mu/l) and glucose (38.6 mg/dl) were higher than the controls, which indicate the hepatotoxicity induced by the pesticides in suicide attempting individuals. Increase in serum creatinine and urea indicated renal malfunction that could be linked with pesticide induced nephrotoxicity among them. (author)
International Nuclear Information System (INIS)
Akroyd, Duane; Legg, Jeff; Jackowski, Melissa B.; Adams, Robert D.
2009-01-01
The purpose of this study was to examine the impact of selected organizational factors and the leadership behavior of supervisors on radiation therapists' commitment to their organizations. The population for this study consists of all full time clinical radiation therapists registered by the American Registry of Radiologic Technologists (ARRT) in the United States. A random sample of 800 radiation therapists was obtained from the ARRT for this study. Questionnaires were mailed to all participants and measured organizational variables; managerial leadership variable and three components of organizational commitment (affective, continuance and normative). It was determined that organizational support, and leadership behavior of supervisors each had a significant and positive affect on normative and affective commitment of radiation therapists and each of the models predicted over 40% of the variance in radiation therapists organizational commitment. This study examined radiation therapists' commitment to their organizations and found that affective (emotional attachment to the organization) and normative (feelings of obligation to the organization) commitments were more important than continuance commitment (awareness of the costs of leaving the organization). This study can help radiation oncology administrators and physicians to understand the values their radiation therapy employees hold that are predictive of their commitment to the organization. A crucial result of the study is the importance of the perceived support of the organization and the leadership skills of managers/supervisors on radiation therapists' commitment to the organization.
Eckert, Andrew J; Shahi, Hurshbir; Datwyler, Shannon L; Neale, David B
2012-08-01
Plant populations arrayed across sharp environmental gradients are ideal systems for identifying the genetic basis of ecologically relevant phenotypes. A series of five uplifted marine terraces along the northern coast of California represents one such system where morphologically distinct populations of lodgepole pine (Pinus contorta) are distributed across sharp soil gradients ranging from fertile soils near the coast to podzolic soils ca. 5 km inland. A total of 92 trees was sampled across four coastal marine terraces (N = 10-46 trees/terrace) located in Mendocino County, California and sequenced for a set of 24 candidate genes for growth and responses to various soil chemistry variables. Statistical analyses relying on patterns of nucleotide diversity were employed to identify genes whose diversity patterns were inconsistent with three null models. Most genes displayed patterns of nucleotide diversity that were consistent with null models (N = 19) or with the presence of paralogs (N = 3). Two genes, however, were exceptional: an aluminum responsive ABC-transporter with F(ST) = 0.664 and an inorganic phosphate transporter characterized by divergent haplotypes segregating at intermediate frequencies in most populations. Spatially variable natural selection along gradients of aluminum and phosphate ion concentrations likely accounted for both outliers. These results shed light on some of the genetic components comprising the extended phenotype of this ecosystem, as well as highlight ecotones as fruitful study systems for the detection of adaptive genetic variants.
Milanez, Karla Danielle Tavares Melo; Araújo Nóbrega, Thiago César; Silva Nascimento, Danielle; Galvão, Roberto Kawakami Harrop; Pontes, Márcio José Coelho
2017-09-01
Multivariate models have been widely used in analytical problems involving quantitative and qualitative analyzes. However, there are cases in which a model is not applicable to spectra of samples obtained under new experimental conditions or in an instrument not involved in the modeling step. A solution to this problem is the transfer of multivariate models, usually performed using standardization of the spectral responses or enhancement of the robustness of the model. This present paper proposes two new criteria for selection of robust variables for classification transfer employing the successive projections algorithm (SPA). These variables are then used to build models based on linear discriminant analysis (LDA) with low sensitivity with respect to the differences between the responses of the instruments involved. For this purpose, transfer samples are included in the calculation of the cost for each subset of variables under consideration. The proposed methods are evaluated for two case studies involving identification of adulteration of extra virgin olive oil (EVOO) and hydrated ethyl alcohol fuel (HEAF) using UV-Vis and NIR spectroscopy, respectively. In both cases, similar or better classification transfer results (obtained for a test set measured on the secondary instrument) employing the two criteria were obtained in comparison with direct standardization (DS) and piecewise direct standardization (PDS). For the UV-Vis data, both proposed criteria achieved the correct classification rate (CCR) of 85%, while the best CCR obtained for the standardization methods was 81% for DS. For the NIR data, 92.5% of CCR was obtained by both criteria as well as DS. The results demonstrated the possibility of using either of the criteria proposed for building robust models as an alternative to the standardization of spectral responses for transfer of classification. Copyright © 2017 Elsevier B.V. All rights reserved.
Sipkens, Timothy A.; Hadwin, Paul J.; Grauer, Samuel J.; Daun, Kyle J.
2018-03-01
Competing theories have been proposed to account for how the latent heat of vaporization of liquid iron varies with temperature, but experimental confirmation remains elusive, particularly at high temperatures. We propose time-resolved laser-induced incandescence measurements on iron nanoparticles combined with Bayesian model plausibility, as a novel method for evaluating these relationships. Our approach scores the explanatory power of candidate models, accounting for parameter uncertainty, model complexity, measurement noise, and goodness-of-fit. The approach is first validated with simulated data and then applied to experimental data for iron nanoparticles in argon. Our results justify the use of Román's equation to account for the temperature dependence of the latent heat of vaporization of liquid iron.
Bayesian methods applied to GWAS.
Fernando, Rohan L; Garrick, Dorian
2013-01-01
Bayesian multiple-regression methods are being successfully used for genomic prediction and selection. These regression models simultaneously fit many more markers than the number of observations available for the analysis. Thus, the Bayes theorem is used to combine prior beliefs of marker effects, which are expressed in terms of prior distributions, with information from data for inference. Often, the analyses are too complex for closed-form solutions and Markov chain Monte Carlo (MCMC) sampling is used to draw inferences from posterior distributions. This chapter describes how these Bayesian multiple-regression analyses can be used for GWAS. In most GWAS, false positives are controlled by limiting the genome-wise error rate, which is the probability of one or more false-positive results, to a small value. As the number of test in GWAS is very large, this results in very low power. Here we show how in Bayesian GWAS false positives can be controlled by limiting the proportion of false-positive results among all positives to some small value. The advantage of this approach is that the power of detecting associations is not inversely related to the number of markers.
Progression of selective IgA deficiency to common variable immunodeficiency.
Aghamohammadi, Asghar; Mohammadi, Javad; Parvaneh, Nima; Rezaei, Nima; Moin, Mostafa; Espanol, Teresa; Hammarstrom, Lennart
2008-01-01
Selective IgA deficiency (IgAD) is the most common primary immunodeficiency in Caucasians. Although it is often asymptomatic, selected patients show an increased frequency of infections, allergies and autoimmune manifestations. Common variable immunodeficiency (CVID) is a primary antibody deficiency disease that shares many clinical features with IgAD. A common genetic basis for IgAD and CVID has been suggested based on their occurrence in members of the same family and the similarity of the underlying B cell defects. Progression from IgAD to CVID has also been reported in several cases. Here we present 4 patients with IgAD and autoimmune features who subsequently developed CVID. All symptomatic IgAD patients, especially those with associated IgG subclass deficiency or autoimmune features, should be monitored for evolution to CVID. Early diagnosis of this conversion and institution of immunoglobulin therapy is effective in preventing severe bacterial infections and pulmonary insufficiency. (c) 2008 S. Karger AG, Basel.
Energy Technology Data Exchange (ETDEWEB)
Thompson, William L. [Bonneville Power Administration, Portland, OR (US). Environment, Fish and Wildlife
2001-07-01
Monitoring population numbers is important for assessing trends and meeting various legislative mandates. However, sampling across time introduces a temporal aspect to survey design in addition to the spatial one. For instance, a sample that is initially representative may lose this attribute if there is a shift in numbers and/or spatial distribution in the underlying population that is not reflected in later sampled plots. Plot selection methods that account for this temporal variability will produce the best trend estimates. Consequently, I used simulation to compare bias and relative precision of estimates of population change among stratified and unstratified sampling designs based on permanent, temporary, and partial replacement plots under varying levels of spatial clustering, density, and temporal shifting of populations. Permanent plots produced more precise estimates of change than temporary plots across all factors. Further, permanent plots performed better than partial replacement plots except for high density (5 and 10 individuals per plot) and 25% - 50% shifts in the population. Stratified designs always produced less precise estimates of population change for all three plot selection methods, and often produced biased change estimates and greatly inflated variance estimates under sampling with partial replacement. Hence, stratification that remains fixed across time should be avoided when monitoring populations that are likely to exhibit large changes in numbers and/or spatial distribution during the study period. Key words: bias; change estimation; monitoring; permanent plots; relative precision; sampling with partial replacement; temporary plots.
Computational Neuropsychology and Bayesian Inference
Directory of Open Access Journals (Sweden)
Thomas Parr
2018-02-01
Full Text Available Computational theories of brain function have become very influential in neuroscience. They have facilitated the growth of formal approaches to disease, particularly in psychiatric research. In this paper, we provide a narrative review of the body of computational research addressing neuropsychological syndromes, and focus on those that employ Bayesian frameworks. Bayesian approaches to understanding brain function formulate perception and action as inferential processes. These inferences combine ‘prior’ beliefs with a generative (predictive model to explain the causes of sensations. Under this view, neuropsychological deficits can be thought of as false inferences that arise due to aberrant prior beliefs (that are poor fits to the real world. This draws upon the notion of a Bayes optimal pathology – optimal inference with suboptimal priors – and provides a means for computational phenotyping. In principle, any given neuropsychological disorder could be characterized by the set of prior beliefs that would make a patient’s behavior appear Bayes optimal. We start with an overview of some key theoretical constructs and use these to motivate a form of computational neuropsychology that relates anatomical structures in the brain to the computations they perform. Throughout, we draw upon computational accounts of neuropsychological syndromes. These are selected to emphasize the key features of a Bayesian approach, and the possible types of pathological prior that may be present. They range from visual neglect through hallucinations to autism. Through these illustrative examples, we review the use of Bayesian approaches to understand the link between biology and computation that is at the heart of neuropsychology.
FCERI AND HISTAMINE METABOLISM GENE VARIABILITY IN SELECTIVE RESPONDERS TO NSAIDS
Directory of Open Access Journals (Sweden)
Gemma Amo
2016-09-01
Full Text Available The high-affinity IgE receptor (Fcε RI is a heterotetramer of three subunits: Fcε RIα, Fcε RIβ and Fcε RIγ (αβγ2 encoded by three genes designated as FCER1A, FCER1B (MS4A2 and FCER1G, respectively. Recent evidence points to FCERI gene variability as a relevant factor in the risk of developing allergic diseases. Because Fcε RI plays a key role in the events downstream of the triggering factors in immunological response, we hypothesized that FCERI gene variants might be related with the risk of, or with the clinical response to, selective (IgE mediated non-steroidal anti-inflammatory (NSAID hypersensitivity.From a cohort of 314 patients suffering from selective hypersensitivity to metamizole, ibuprofen, diclofenac, paracetamol, acetylsalicylic acid (ASA, propifenazone, naproxen, ketoprofen, dexketoprofen, etofenamate, aceclofenac, etoricoxib, dexibuprofen, indomethacin, oxyphenylbutazone or piroxicam, and 585 unrelated healthy controls that tolerated these NSAIDs, we analyzed the putative effects of the FCERI SNPs FCER1A rs2494262, rs2427837 and rs2251746; FCER1B rs1441586, rs569108 and rs512555; FCER1G rs11587213, rs2070901 and rs11421. Furthermore, in order to identify additional genetic markers which might be associated with the risk of developing selective NSAID hypersensitivity, or which may modify the putative association of FCERI gene variations with risk, we analyzed polymorphisms known to affect histamine synthesis or metabolism, such as rs17740607, rs2073440, rs1801105, rs2052129, rs10156191, rs1049742 and rs1049793 in the HDC, HNMT and DAO genes.No major genetic associations with risk or with clinical presentation, and no gene-gene interactions, or gene-phenotype interactions (including age, gender, IgE concentration, antecedents of atopy, culprit drug or clinical presentation were identified in patients. However, logistic regression analyses indicated that the presence of antecedents of atopy and the DAO SNP rs2052129 (GG
Bayesian data analysis for newcomers.
Kruschke, John K; Liddell, Torrin M
2018-02-01
This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.
NonpModelCheck: An R Package for Nonparametric Lack-of-Fit Testing and Variable Selection
Directory of Open Access Journals (Sweden)
Adriano Zanin Zambom
2017-05-01
Full Text Available We describe the R package NonpModelCheck for hypothesis testing and variable selection in nonparametric regression. This package implements functions to perform hypothesis testing for the significance of a predictor or a group of predictors in a fully nonparametric heteroscedastic regression model using high-dimensional one-way ANOVA. Based on the p values from the test of each covariate, three different algorithms allow the user to perform variable selection using false discovery rate corrections. A function for classical local polynomial regression is implemented for the multivariate context, where the degree of the polynomial can be as large as needed and bandwidth selection strategies are built in.
Statistics: a Bayesian perspective
National Research Council Canada - National Science Library
Berry, Donald A
1996-01-01
...: it is the only introductory textbook based on Bayesian ideas, it combines concepts and methods, it presents statistics as a means of integrating data into the significant process, it develops ideas...
Granade, Christopher; Combes, Joshua; Cory, D. G.
2016-03-01
In recent years, Bayesian methods have been proposed as a solution to a wide range of issues in quantum state and process tomography. State-of-the-art Bayesian tomography solutions suffer from three problems: numerical intractability, a lack of informative prior distributions, and an inability to track time-dependent processes. Here, we address all three problems. First, we use modern statistical methods, as pioneered by Huszár and Houlsby (2012 Phys. Rev. A 85 052120) and by Ferrie (2014 New J. Phys. 16 093035), to make Bayesian tomography numerically tractable. Our approach allows for practical computation of Bayesian point and region estimators for quantum states and channels. Second, we propose the first priors on quantum states and channels that allow for including useful experimental insight. Finally, we develop a method that allows tracking of time-dependent states and estimates the drift and diffusion processes affecting a state. We provide source code and animated visual examples for our methods.
Bayesian analysis of factors associated with fibromyalgia syndrome subjects
Jayawardana, Veroni; Mondal, Sumona; Russek, Leslie
2015-01-01
Factors contributing to movement-related fear were assessed by Russek, et al. 2014 for subjects with Fibromyalgia (FM) based on the collected data by a national internet survey of community-based individuals. The study focused on the variables, Activities-Specific Balance Confidence scale (ABC), Primary Care Post-Traumatic Stress Disorder screen (PC-PTSD), Tampa Scale of Kinesiophobia (TSK), a Joint Hypermobility Syndrome screen (JHS), Vertigo Symptom Scale (VSS-SF), Obsessive-Compulsive Personality Disorder (OCPD), Pain, work status and physical activity dependent from the "Revised Fibromyalgia Impact Questionnaire" (FIQR). The study presented in this paper revisits same data with a Bayesian analysis where appropriate priors were introduced for variables selected in the Russek's paper.
Attention in a bayesian framework
DEFF Research Database (Denmark)
Whiteley, Louise Emma; Sahani, Maneesh
2012-01-01
, and include both selective phenomena, where attention is invoked by cues that point to particular stimuli, and integrative phenomena, where attention is invoked dynamically by endogenous processing. However, most previous Bayesian accounts of attention have focused on describing relatively simple experimental...... settings, where cues shape expectations about a small number of upcoming stimuli and thus convey "prior" information about clearly defined objects. While operationally consistent with the experiments it seeks to describe, this view of attention as prior seems to miss many essential elements of both its...
Automated Photometry, Period Analysis and Flare-up Constraints for Selected Mira Variable Stars
Mais, D. E.; Stencel, R. E.; Richards, D.
2005-05-01
During the course of the past two years, 108 selected Mira-type program stars have been monitored to address potential flare up episodes. These include 34 M-type, 17-S type and 57 C-type Mira's. This paper will describe the greater than 140,000 magnitude determinations that have been obtained, many closely spaced in time, which are being used to further constrain the potential occurrences of flare-up events. Random reports in the literature suggest that some Mira variables may go through flare up stages, which result in brightening on the order of several tenths of a magnitude or more, and may last hours to days in length. Very little is known about these events and their frequency, indeed, it is not clear that these events are real or instrumental phenomena. The light curves of many of the program stars show a Cepheid like bump phenomenon, usually on the ascending part of the light curve. In general, these bumps appear in longer period Mira's (>350 days) as pointed out by Melikian in 1999. Bumps are not obvious or easily seen in visual data records, although slope changes during rising phase are seen in some cases. In order to address the reality of these events, we established an automated acquisition/analysis of a group of 108 Mira variables [M(oxygen), S and C types] in order to obtain the densest possible coverage of the periods, to better constrain the character and frequency of flare-ups. Telescope control scripts were put in place along with real time analysis. This allowed for unattended acquisition of data on every clear night, all night long, in the V, R and I photometric bands. In addition, during the course of most nights, multiple determinations are often obtained for a given star. We are grateful to the estate of William Herschel Womble for partial support of these efforts.
Variational Bayesian Filtering
Czech Academy of Sciences Publication Activity Database
Šmídl, Václav; Quinn, A.
2008-01-01
Roč. 56, č. 10 (2008), s. 5020-5030 ISSN 1053-587X R&D Projects: GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : Bayesian filtering * particle filtering * Variational Bayes Subject RIV: BC - Control Systems Theory Impact factor: 2.335, year: 2008 http://library.utia.cas.cz/separaty/2008/AS/smidl-variational bayesian filtering.pdf
Bayesian Networks An Introduction
Koski, Timo
2009-01-01
Bayesian Networks: An Introduction provides a self-contained introduction to the theory and applications of Bayesian networks, a topic of interest and importance for statisticians, computer scientists and those involved in modelling complex data sets. The material has been extensively tested in classroom teaching and assumes a basic knowledge of probability, statistics and mathematics. All notions are carefully explained and feature exercises throughout. Features include:.: An introduction to Dirichlet Distribution, Exponential Families and their applications.; A detailed description of learni
Prediction of road accidents: A Bayesian hierarchical approach
DEFF Research Database (Denmark)
Deublein, Markus; Schubert, Matthias; Adey, Bryan T.
2013-01-01
-lognormal regression analysis taking into account correlations amongst multiple dependent model response variables and effects of discrete accident count data e.g. over-dispersion, and (3) Bayesian inference algorithms, which are applied by means of data mining techniques supported by Bayesian Probabilistic Networks...... in order to represent non-linearity between risk indicating and model response variables, as well as different types of uncertainties which might be present in the development of the specific models.Prior Bayesian Probabilistic Networks are first established by means of multivariate regression analysis...
Selection of entropy-measure parameters for knowledge discovery in heart rate variability data.
Mayer, Christopher C; Bachler, Martin; Hörtenhuber, Matthias; Stocker, Christof; Holzinger, Andreas; Wassertheurer, Siegfried
2014-01-01
Heart rate variability is the variation of the time interval between consecutive heartbeats. Entropy is a commonly used tool to describe the regularity of data sets. Entropy functions are defined using multiple parameters, the selection of which is controversial and depends on the intended purpose. This study describes the results of tests conducted to support parameter selection, towards the goal of enabling further biomarker discovery. This study deals with approximate, sample, fuzzy, and fuzzy measure entropies. All data were obtained from PhysioNet, a free-access, on-line archive of physiological signals, and represent various medical conditions. Five tests were defined and conducted to examine the influence of: varying the threshold value r (as multiples of the sample standard deviation σ, or the entropy-maximizing rChon), the data length N, the weighting factors n for fuzzy and fuzzy measure entropies, and the thresholds rF and rL for fuzzy measure entropy. The results were tested for normality using Lilliefors' composite goodness-of-fit test. Consequently, the p-value was calculated with either a two sample t-test or a Wilcoxon rank sum test. The first test shows a cross-over of entropy values with regard to a change of r. Thus, a clear statement that a higher entropy corresponds to a high irregularity is not possible, but is rather an indicator of differences in regularity. N should be at least 200 data points for r = 0.2 σ and should even exceed a length of 1000 for r = rChon. The results for the weighting parameters n for the fuzzy membership function show different behavior when coupled with different r values, therefore the weighting parameters have been chosen independently for the different threshold values. The tests concerning rF and rL showed that there is no optimal choice, but r = rF = rL is reasonable with r = rChon or r = 0.2σ. Some of the tests showed a dependency of the test significance on the data at hand. Nevertheless, as the medical
Directory of Open Access Journals (Sweden)
Lianqing Zhu
2018-01-01
Full Text Available In order to improve the classification accuracy of Chinese Salvia miltiorrhiza using near-infrared spectroscopy, a novel local variable selection strategy is thus proposed. Combining the strengths of the local algorithm and interval partial least squares, the spectra data have firstly been divided into several pairs of classes in sample direction and equidistant subintervals in variable direction. Then, a local classification model has been built, and the most proper spectral region has been selected based on the new evaluation criterion considering both classification error rate and best predictive ability under the leave-one-out cross validation scheme for each pair of classes. Finally, each observation can be assigned to belong to the class according to the statistical analysis of classification results of the local classification model built on selected variables. The performance of the proposed method was demonstrated through near-infrared spectra of cultivated or wild Salvia miltiorrhiza, which are collected from 8 geographical origins in 5 provinces of China. For comparison, soft independent modelling of class analogy and partial least squares discriminant analysis methods are, respectively, employed as the classification model. Experimental results showed that classification performance of the classification model with local variable selection was obvious better than that without variable selection.
Soares, Sófacles Figueredo Carreiro; Galvão, Roberto Kawakami Harrop; Araújo, Mário César Ugulino; da Silva, Edvan Cirino; Pereira, Claudete Fernandes; de Andrade, Stéfani Iury Evangelista; Leite, Flaviano Carvalho
2011-03-09
This work proposes a modification to the successive projections algorithm (SPA) aimed at selecting spectral variables for multiple linear regression (MLR) in the presence of unknown interferents not included in the calibration data set. The modified algorithm favours the selection of variables in which the effect of the interferent is less pronounced. The proposed procedure can be regarded as an adaptive modelling technique, because the spectral features of the samples to be analyzed are considered in the variable selection process. The advantages of this new approach are demonstrated in two analytical problems, namely (1) ultraviolet-visible spectrometric determination of tartrazine, allure red and sunset yellow in aqueous solutions under the interference of erythrosine, and (2) near-infrared spectrometric determination of ethanol in gasoline under the interference of toluene. In these case studies, the performance of conventional MLR-SPA models is substantially degraded by the presence of the interferent. This problem is circumvented by applying the proposed Adaptive MLR-SPA approach, which results in prediction errors smaller than those obtained by three other multivariate calibration techniques, namely stepwise regression, full-spectrum partial-least-squares (PLS) and PLS with variables selected by a genetic algorithm. An inspection of the variable selection results reveals that the Adaptive approach successfully avoids spectral regions in which the interference is more intense. Copyright © 2011 Elsevier B.V. All rights reserved.
Modeling operational risks of the nuclear industry with Bayesian networks
International Nuclear Information System (INIS)
Wieland, Patricia; Lustosa, Leonardo J.
2009-01-01
Basically, planning a new industrial plant requires information on the industrial management, regulations, site selection, definition of initial and planned capacity, and on the estimation of the potential demand. However, this is far from enough to assure the success of an industrial enterprise. Unexpected and extremely damaging events may occur that deviates from the original plan. The so-called operational risks are not only in the system, equipment, process or human (technical or managerial) failures. They are also in intentional events such as frauds and sabotage, or extreme events like terrorist attacks or radiological accidents and even on public reaction to perceived environmental or future generation impacts. For the nuclear industry, it is a challenge to identify and to assess the operational risks and their various sources. Early identification of operational risks can help in preparing contingency plans, to delay the decision to invest or to approve a project that can, at an extreme, affect the public perception of the nuclear energy. A major problem in modeling operational risk losses is the lack of internal data that are essential, for example, to apply the loss distribution approach. As an alternative, methods that consider qualitative and subjective information can be applied, for example, fuzzy logic, neural networks, system dynamic or Bayesian networks. An advantage of applying Bayesian networks to model operational risk is the possibility to include expert opinions and variables of interest, to structure the model via causal dependencies among these variables, and to specify subjective prior and conditional probabilities distributions at each step or network node. This paper suggests a classification of operational risks in industry and discusses the benefits and obstacles of the Bayesian networks approach to model those risks. (author)
r2VIM: A new variable selection method for random forests in genome-wide association studies.
Szymczak, Silke; Holzinger, Emily; Dasgupta, Abhijit; Malley, James D; Molloy, Anne M; Mills, James L; Brody, Lawrence C; Stambolian, Dwight; Bailey-Wilson, Joan E
2016-01-01
Machine learning methods and in particular random forests (RFs) are a promising alternative to standard single SNP analyses in genome-wide association studies (GWAS). RFs provide variable importance measures (VIMs) to rank SNPs according to their predictive power. However, in contrast to the established genome-wide significance threshold, no clear criteria exist to determine how many SNPs should be selected for downstream analyses. We propose a new variable selection approach, recurrent relative variable importance measure (r2VIM). Importance values are calculated relative to an observed minimal importance score for several runs of RF and only SNPs with large relative VIMs in all of the runs are selected as important. Evaluations on simulated GWAS data show that the new method controls the number of false-positives under the null hypothesis. Under a simple alternative hypothesis with several independent main effects it is only slightly less powerful than logistic regression. In an experimental GWAS data set, the same strong signal is identified while the approach selects none of the SNPs in an underpowered GWAS. The novel variable selection method r2VIM is a promising extension to standard RF for objectively selecting relevant SNPs in GWAS while controlling the number of false-positive results.
Variable Selection for Nonparametric Gaussian Process Priors: Models and Computational Strategies.
Savitsky, Terrance; Vannucci, Marina; Sha, Naijun
2011-02-01
This paper presents a unified treatment of Gaussian process models that extends to data from the exponential dispersion family and to survival data. Our specific interest is in the analysis of data sets with predictors that have an a priori unknown form of possibly nonlinear associations to the response. The modeling approach we describe incorporates Gaussian processes in a generalized linear model framework to obtain a class of nonparametric regression models where the covariance matrix depends on the predictors. We consider, in particular, continuous, categorical and count responses. We also look into models that account for survival outcomes. We explore alternative covariance formulations for the Gaussian process prior and demonstrate the flexibility of the construction. Next, we focus on the important problem of selecting variables from the set of possible predictors and describe a general framework that employs mixture priors. We compare alternative MCMC strategies for posterior inference and achieve a computationally efficient and practical approach. We demonstrate performances on simulated and benchmark data sets.
Smith, T. M.; Kloesel, M. F.; Sudbrack, C. K.
2017-01-01
Powder-bed additive manufacturing processes use fine powders to build parts layer by layer. For selective laser melted (SLM) Alloy 718, the powders that are available off-the-shelf are in the 10-45 or 15-45 micron size range. A comprehensive investigation of sixteen powders from these typical ranges and two off-nominal-sized powders is underway to gain insight into the impact of feedstock on processing, durability and performance of 718 SLM space-flight hardware. This talk emphasizes an aspect of this work: the impact of powder variability on the microstructure and defects observed in the as-fabricated and full heated material, where lab-scale components were built using vendor recommended parameters. These typical powders exhibit variation in composition, percentage of fines, roughness, morphology and particle size distribution. How these differences relate to the melt-pool size, porosity, grain structure, precipitate distributions, and inclusion content will be presented and discussed in context of build quality and powder acceptance.
Farhadi, E; Nemati, S; Amirzargar, A A; Hirbod-Mobarakeh, A; Nabavi, M; Soltani, S; Mahdaviani, S A; Shahinpour, S; Arshi, S; Nikbin, B; Aghamohammadi, A; Rezaei, N
2014-01-01
Primary antibody deficiencies (PADs) are a heterogeneous group of disorders, characterised by increased susceptibility to recurrent bacterial infections. Common variable immunodeficiency (CVID) is the most important PAD from the clinical point of view and selective IgA deficiency (IgAD) is the most common PAD. However, the underlying gene defect in both is still unknown. As a recent study in Europe showed an association between a single nucleotide polymorphism (SNP) of AICDA gene with PADs, this study was performed to evaluate such an association in Iranian patients. Fifty-eight patients with PAD, including 39 CVID and 19 IgAD, as well as 34 healthy volunteers, were enrolled in this study. Genotyping was done in all groups for an intronic SNP in AICDA (rs2580874), using real-time PCR genotyping assay. The less frequent genotype of AICDA in IgAD patients was AA, seen in 10.5% of the patients, which was much lower than the 30.8% in CVID patients and 38.2% in the controls. However, these differences were not significant. Indeed the GG genotype in the patients with PADs was seen in 20.7%, compared to 8.8% in the controls without any significant difference. There was no significant association between the previously reported genetic variant of AICDA gene and the development of CVID or IgAD, but further multi-center studies are also needed. Copyright © 2013 SEICAP. Published by Elsevier Espana. All rights reserved.
Germino, Matthew J.
2012-01-01
Big sagebrush (Artemisia tridentata) communities dominate a large fraction of the United States and provide critical habitat for a number of wildlife species of concern. Loss of big sagebrush due to fire followed by poor restoration success continues to reduce ecological potential of this ecosystem type, particularly in the Great Basin. Choice of appropriate seed sources for restoration efforts is currently unguided due to knowledge gaps on genetic variation and local adaptation as they relate to a changing landscape. We are assessing ecophysiological responses of big sagebrush to climate variation, comparing plants that germinated from ~20 geographically distinct populations of each of the three subspecies of big sagebrush. Seedlings were previously planted into common gardens by US Forest Service collaborators Drs. B. Richardson and N. Shaw, (USFS Rocky Mountain Research Station, Provo, Utah and Boise, Idaho) as part of the Great Basin Native Plant Selection and Increase Project. Seed sources spanned all states in the conterminous Western United States. Germination, establishment, growth and ecophysiological responses are being linked to genomics and foliar palatability. New information is being produced to aid choice of appropriate seed sources by Bureau of Land Management and USFS field offices when they are planning seed acquisitions for emergency post-fire rehabilitation projects while considering climate variability and wildlife needs.
Bayesian evaluation of inequality constrained hypotheses
Gu, X.; Mulder, J.; Deković, M.; Hoijtink, H.
2014-01-01
Bayesian evaluation of inequality constrained hypotheses enables researchers to investigate their expectations with respect to the structure among model parameters. This article proposes an approximate Bayes procedure that can be used for the selection of the best of a set of inequality constrained
Dimensionality reduction in Bayesian estimation algorithms
Directory of Open Access Journals (Sweden)
G. W. Petty
2013-09-01
Full Text Available An idealized synthetic database loosely resembling 3-channel passive microwave observations of precipitation against a variable background is employed to examine the performance of a conventional Bayesian retrieval algorithm. For this dataset, algorithm performance is found to be poor owing to an irreconcilable conflict between the need to find matches in the dependent database versus the need to exclude inappropriate matches. It is argued that the likelihood of such conflicts increases sharply with the dimensionality of the observation space of real satellite sensors, which may utilize 9 to 13 channels to retrieve precipitation, for example. An objective method is described for distilling the relevant information content from N real channels into a much smaller number (M of pseudochannels while also regularizing the background (geophysical plus instrument noise component. The pseudochannels are linear combinations of the original N channels obtained via a two-stage principal component analysis of the dependent dataset. Bayesian retrievals based on a single pseudochannel applied to the independent dataset yield striking improvements in overall performance. The differences between the conventional Bayesian retrieval and reduced-dimensional Bayesian retrieval suggest that a major potential problem with conventional multichannel retrievals – whether Bayesian or not – lies in the common but often inappropriate assumption of diagonal error covariance. The dimensional reduction technique described herein avoids this problem by, in effect, recasting the retrieval problem in a coordinate system in which the desired covariance is lower-dimensional, diagonal, and unit magnitude.
Dimensionality reduction in Bayesian estimation algorithms
Petty, G. W.
2013-09-01
An idealized synthetic database loosely resembling 3-channel passive microwave observations of precipitation against a variable background is employed to examine the performance of a conventional Bayesian retrieval algorithm. For this dataset, algorithm performance is found to be poor owing to an irreconcilable conflict between the need to find matches in the dependent database versus the need to exclude inappropriate matches. It is argued that the likelihood of such conflicts increases sharply with the dimensionality of the observation space of real satellite sensors, which may utilize 9 to 13 channels to retrieve precipitation, for example. An objective method is described for distilling the relevant information content from N real channels into a much smaller number (M) of pseudochannels while also regularizing the background (geophysical plus instrument) noise component. The pseudochannels are linear combinations of the original N channels obtained via a two-stage principal component analysis of the dependent dataset. Bayesian retrievals based on a single pseudochannel applied to the independent dataset yield striking improvements in overall performance. The differences between the conventional Bayesian retrieval and reduced-dimensional Bayesian retrieval suggest that a major potential problem with conventional multichannel retrievals - whether Bayesian or not - lies in the common but often inappropriate assumption of diagonal error covariance. The dimensional reduction technique described herein avoids this problem by, in effect, recasting the retrieval problem in a coordinate system in which the desired covariance is lower-dimensional, diagonal, and unit magnitude.
An experiment on selecting most informative variables in socio-economic data
Directory of Open Access Journals (Sweden)
L. Jenkins
2014-01-01
Full Text Available In many studies where data are collected on several variables, there is a motivation to find if fewer variables would provide almost as much information. Variance of a variable about its mean is the common statistical measure of information content, and that is used here. We are interested whether the variability in one variable is sufficiently correlated with that in one or more of the other variables that the first variable is redundant. We wish to find one or more ‘principal variables’ that sufficiently reflect the information content in all the original variables. The paper explains the method of principal variables and reports experiments using the technique to see if just a few variables are sufficient to reflect the information in 11 socioeconomic variables on 130 countries from a World Bank (WB database. While the method of principal variables is highly successful in a statistical sense, the WB data varies greatly from year to year, demonstrating that fewer variables wo uld be inadequate for this data.
Kurki, Helen K
2013-01-01
This study tests the hypothesis, a correlate of the obstetric dilemma, that skeletal variability in the human female pelvic canal is limited owing to the action of stabilizing selection. Levels of variation in three skeletal regions (pelvic canal, noncanal pelvis, and limbs) of females and males are compared to each other and between sexes. Nine human skeletal samples (total female n = 101; male n = 117) representing diverse populations were included. Osteometric data were collected from the articulated pelvis, os coxa, sacrum, femur, tibia, humerus, radius, and clavicle. Coefficients of variation, adjusted for small sample size (V*), were calculated for variables in separate samples by sex, and mean V*s were taken for the skeletal regions. Size variances were measured as V* of the geometric mean (GM) of the skeletal region variables. Using nonparametric methods, coefficients were compared between sexes and skeletal regions and correlations among V*s were calculated. Females and males do not differ in levels of variation for any skeletal region. The pelvic canal is the most variable region in both sexes, while size variability (GM) is similar among the three skeletal regions. Across the samples, canal and noncanal pelvic regions share patterns of variability in females but not males, while variability of the limb skeleton is independent in both sexes. The results suggest that stabilizing selection does not limit variability in the female pelvic canal. Biological plasticity may be greater in the canal than that in other skeletal regions. Copyright © 2013 Wiley Periodicals, Inc.
Bayesian microsaccade detection
Mihali, Andra; van Opheusden, Bas; Ma, Wei Ji
2017-01-01
Microsaccades are high-velocity fixational eye movements, with special roles in perception and cognition. The default microsaccade detection method is to determine when the smoothed eye velocity exceeds a threshold. We have developed a new method, Bayesian microsaccade detection (BMD), which performs inference based on a simple statistical model of eye positions. In this model, a hidden state variable changes between drift and microsaccade states at random times. The eye position is a biased random walk with different velocity distributions for each state. BMD generates samples from the posterior probability distribution over the eye state time series given the eye position time series. Applied to simulated data, BMD recovers the “true” microsaccades with fewer errors than alternative algorithms, especially at high noise. Applied to EyeLink eye tracker data, BMD detects almost all the microsaccades detected by the default method, but also apparent microsaccades embedded in high noise—although these can also be interpreted as false positives. Next we apply the algorithms to data collected with a Dual Purkinje Image eye tracker, whose higher precision justifies defining the inferred microsaccades as ground truth. When we add artificial measurement noise, the inferences of all algorithms degrade; however, at noise levels comparable to EyeLink data, BMD recovers the “true” microsaccades with 54% fewer errors than the default algorithm. Though unsuitable for online detection, BMD has other advantages: It returns probabilities rather than binary judgments, and it can be straightforwardly adapted as the generative model is refined. We make our algorithm available as a software package. PMID:28114483
Bayesian inference and the parametric bootstrap
Efron, Bradley
2013-01-01
The parametric bootstrap can be used for the efficient computation of Bayes posterior distributions. Importance sampling formulas take on an easy form relating to the deviance in exponential families, and are particularly simple starting from Jeffreys invariant prior. Because of the i.i.d. nature of bootstrap sampling, familiar formulas describe the computational accuracy of the Bayes estimates. Besides computational methods, the theory provides a connection between Bayesian and frequentist analysis. Efficient algorithms for the frequentist accuracy of Bayesian inferences are developed and demonstrated in a model selection example. PMID:23843930
Quilty, J.; Adamowski, J. F.
2015-12-01
Urban water supply systems are often stressed during seasonal outdoor water use as water demands related to the climate are variable in nature making it difficult to optimize the operation of the water supply system. Urban water demand forecasts (UWD) failing to include meteorological conditions as inputs to the forecast model may produce poor forecasts as they cannot account for the increase/decrease in demand related to meteorological conditions. Meteorological records stochastically simulated into the future can be used as inputs to data-driven UWD forecasts generally resulting in improved forecast accuracy. This study aims to produce data-driven UWD forecasts for two different Canadian water utilities (Montreal and Victoria) using machine learning methods by first selecting historical UWD and meteorological records derived from a stochastic weather generator using nonlinear input variable selection. The nonlinear input variable selection methods considered in this work are derived from the concept of conditional mutual information, a nonlinear dependency measure based on (multivariate) probability density functions and accounts for relevancy, conditional relevancy, and redundancy from a potential set of input variables. The results of our study indicate that stochastic weather inputs can improve UWD forecast accuracy for the two sites considered in this work. Nonlinear input variable selection is suggested as a means to identify which meteorological conditions should be utilized in the forecast.
Directory of Open Access Journals (Sweden)
Beáta Baranová
2018-03-01
Full Text Available The variations in ground beetles (Coleoptera: Carabidae assemblages across the three types of farmland habitats, arable land, meadows and woody vegetation were studied in relation to vegetation cover structure, intensity of agrotechnical interventions and selected soil properties. Material was pitfall trapped in 2010 and 2011 on twelve sites of the agricultural landscape in the Prešov town and its near vicinity, Eastern Slovakia. A total of 14,763 ground beetle individuals were entrapped. Material collection resulted into 92 Carabidae species, with the following six species dominating: Poecilus cupreus, Pterostichus melanarius, Pseudoophonus rufipes, Brachinus crepitans, Anchomenus dorsalis and Poecilus versicolor. Studied habitats differed significantly in the number of entrapped individuals, activity abundance as well as representation of the carabids according to their habitat preferences and ability to fly. However, no significant distinction was observed in the diversity, evenness neither dominance. The most significant environmental variables affecting Carabidae assemblages species variability were soil moisture and herb layer 0-20 cm. Another best variables selected by the forward selection were intensity of agrotechnical interventions, humus content and shrub vegetation. The other from selected soil properties seem to have just secondary meaning for the adult carabids. Environmental variables have the strongest effect on the habitat specialists, whereas ground beetles without special requirements to the habitat quality seem to be affected by the studied environmental variables just little.
How few? Bayesian statistics in injury biomechanics.
Cutcliffe, Hattie C; Schmidt, Allison L; Lucas, Joseph E; Bass, Cameron R
2012-10-01
In injury biomechanics, there are currently no general a priori estimates of how few specimens are necessary to obtain sufficiently accurate injury risk curves for a given underlying distribution. Further, several methods are available for constructing these curves, and recent methods include Bayesian survival analysis. This study used statistical simulations to evaluate the fidelity of different injury risk methods using limited sample sizes across four different underlying distributions. Five risk curve techniques were evaluated, including Bayesian techniques. For the Bayesian analyses, various prior distributions were assessed, each incorporating more accurate information. Simulated subject injury and biomechanical input values were randomly sampled from each underlying distribution, and injury status was determined by comparing these values. Injury risk curves were developed for this data using each technique for various small sample sizes; for each, analyses on 2000 simulated data sets were performed. Resulting median predicted risk values and confidence intervals were compared with the underlying distributions. Across conditions, the standard and Bayesian survival analyses better represented the underlying distributions included in this study, especially for extreme (1, 10, and 90%) risk. This study demonstrates that the value of the Bayesian analysis is the use of informed priors. As the mean of the prior approaches the actual value, the sample size necessary for good reproduction of the underlying distribution with small confidence intervals can be as small as 2. This study provides estimates of confidence intervals and number of samples to allow the selection of the most appropriate sample sizes given known information.
Broad-band characteristics of seven new hard X-ray selected cataclysmic variables
Bernardini, F.; de Martino, D.; Mukai, K.; Russell, D. M.; Falanga, M.; Masetti, N.; Ferrigno, C.; Israel, G.
2017-10-01
We present timing and spectral analysis of a sample of seven hard X-ray selected cataclysmic variable candidates based on simultaneous X-ray and optical observations collected with XMM-Newton, complemented with Swift/BAT and INTEGRAL /IBIS hard X-ray data and ground-based optical photometry. For six sources, X-ray pulsations are detected for the first time in the range of ˜296-6098 s, identifying them as members of the magnetic class. Swift J0927.7-6945, Swift J0958.0-4208, Swift J1701.3-4304, Swift J2113.5+5422 and possibly PBC J0801.2-4625 are intermediate polars (IPs), while Swift J0706.8+0325 is a short (1.7 h) orbital period polar, the 11th hard X-ray-selected identified so far. X-ray orbital modulation is also observed in Swift J0927.7-6945 (5.2 h) and Swift J2113.5+5422 (4.1 h). Swift J1701.3-4304 is discovered as the longest orbital period (12.8 h) deep eclipsing IP. The spectra of the magnetic systems reveal optically thin multitemperature emission between 0.2 and 60 keV. Energy-dependent spin pulses and the orbital modulation in Swift J0927.7-6945 and Swift J2113.5+5422 are due to intervening local high-density absorbing material (NH ˜ 1022 - 23 cm-2). In Swift J0958.0-4208 and Swift J1701.3-4304, a soft X-ray blackbody (kT ˜ 50 and ˜80 eV) is detected, adding them to the growing group of `soft' IPs. White dwarf masses are determined in the range of ˜ 0.58-1.18 M⊙, indicating massive accreting primaries in five of them. Most sources accrete at rates lower than the expected secular value for their orbital period. Formerly proposed as a long-period (9.4 h) nova-like CV, Swift J0746.3-1608 shows peculiar spectrum and light curves suggesting either an atypical low-luminosity CV or a low-mass X-ray binary.
Network-based group variable selection for detecting expression quantitative trait loci (eQTL
Directory of Open Access Journals (Sweden)
Zhang Xuegong
2011-06-01
Full Text Available Abstract Background Analysis of expression quantitative trait loci (eQTL aims to identify the genetic loci associated with the expression level of genes. Penalized regression with a proper penalty is suitable for the high-dimensional biological data. Its performance should be enhanced when we incorporate biological knowledge of gene expression network and linkage disequilibrium (LD structure between loci in high-noise background. Results We propose a network-based group variable selection (NGVS method for QTL detection. Our method simultaneously maps highly correlated expression traits sharing the same biological function to marker sets formed by LD. By grouping markers, complex joint activity of multiple SNPs can be considered and the dimensionality of eQTL problem is reduced dramatically. In order to demonstrate the power and flexibility of our method, we used it to analyze two simulations and a mouse obesity and diabetes dataset. We considered the gene co-expression network, grouped markers into marker sets and treated the additive and dominant effect of each locus as a group: as a consequence, we were able to replicate results previously obtained on the mouse linkage dataset. Furthermore, we observed several possible sex-dependent loci and interactions of multiple SNPs. Conclusions The proposed NGVS method is appropriate for problems with high-dimensional data and high-noise background. On eQTL problem it outperforms the classical Lasso method, which does not consider biological knowledge. Introduction of proper gene expression and loci correlation information makes detecting causal markers more accurate. With reasonable model settings, NGVS can lead to novel biological findings.
Falocco, S.; De Cicco, D.; Paolillo, M.; Covone, G.; Longo, G.; Grado, A.; Limatola, L.; Vaccari, M.; Botticella, M. T.; Pignata, G.; Cappellaro, E.; Trevese, D.; Vagnetti, F.; Salvato, M.; Radovich, M.; Hsu, L.; Brandt, W. N.; Capaccioli, M.; Napolitano, N.; Baruffolo, A.; Cascone, E.; Schipani, P.
This work makes use of the VST observations to select variable sources. We use also the IR photometry, SED fitting and X-ray information where available to confirm the nature of the AGN candidates. The IR data, available over the full survey area, allow to confirm the consistency of the variability selection with the IR color selection method, while the detection of variability may prove useful to detect the presence of an AGN in IR selected starburst galaxies.
Bayesian Exploratory Factor Analysis
DEFF Research Database (Denmark)
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corr......This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor......, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates...
Ershadi, Ali
2013-05-01
The influence of uncertainty in land surface temperature, air temperature, and wind speed on the estimation of sensible heat flux is analyzed using a Bayesian inference technique applied to the Surface Energy Balance System (SEBS) model. The Bayesian approach allows for an explicit quantification of the uncertainties in input variables: a source of error generally ignored in surface heat flux estimation. An application using field measurements from the Soil Moisture Experiment 2002 is presented. The spatial variability of selected input meteorological variables in a multitower site is used to formulate the prior estimates for the sampling uncertainties, and the likelihood function is formulated assuming Gaussian errors in the SEBS model. Land surface temperature, air temperature, and wind speed were estimated by sampling their posterior distribution using a Markov chain Monte Carlo algorithm. Results verify that Bayesian-inferred air temperature and wind speed were generally consistent with those observed at the towers, suggesting that local observations of these variables were spatially representative. Uncertainties in the land surface temperature appear to have the strongest effect on the estimated sensible heat flux, with Bayesian-inferred values differing by up to ±5°C from the observed data. These differences suggest that the footprint of the in situ measured land surface temperature is not representative of the larger-scale variability. As such, these measurements should be used with caution in the calculation of surface heat fluxes and highlight the importance of capturing the spatial variability in the land surface temperature: particularly, for remote sensing retrieval algorithms that use this variable for flux estimation.
Natural selection acts on Atlantic salmon major histocompatibility (MH) variability in the wild
Eyto, de E.; McGinnity, P.; Consuegra, S.; Coughlan, J.; Tufto, J.; Farrell, K.; Megens, H.J.W.C.; Jordan, W.; Cross, T.; Stet, R.J.M.
2007-01-01
Pathogen-driven balancing selection is thought to maintain polymorphism in major histocompatibility (MH) genes. However, there have been few empirical demonstrations of selection acting on MH loci in natural populations. To determine whether natural selection on MH genes has fitness consequences for
Lakayan, Dina; Tuppurainen, Jussipekka; Albers, Martin; van Lint, Matthijs J.; van Iperen, Dick J.; Weda, Jelmer J.A.; Kuncova-Kallio, Johana; Somsen, Govert W.; Kool, Jeroen
2018-01-01
A variable-wavelength Kretschmann configuration surface plasmon resonance (SPR) apparatus with angle scanning is presented. The setup provides the possibility of selecting the optimum wavelength with respect to the properties of the metal layer of the sensorchip, sample matrix, and biomolecular
Berliner, M.
2017-12-01
Bayesian statistical decision theory offers a natural framework for decision-policy making in the presence of uncertainty. Key advantages of the approach include efficient incorporation of information and observations. However, in complicated settings it is very difficult, perhaps essentially impossible, to formalize the mathematical inputs needed in the approach. Nevertheless, using the approach as a template is useful for decision support; that is, organizing and communicating our analyses. Bayesian hierarchical modeling is valuable in quantifying and managing uncertainty such cases. I review some aspects of the idea emphasizing statistical model development and use in the context of sea-level rise.
Bayesian Exploratory Factor Analysis
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.; Piatek, Rémi
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates from a high dimensional set of psychological measurements. PMID:25431517
bayesQR: A Bayesian Approach to Quantile Regression
Directory of Open Access Journals (Sweden)
Dries F. Benoit
2017-01-01
Full Text Available After its introduction by Koenker and Basset (1978, quantile regression has become an important and popular tool to investigate the conditional response distribution in regression. The R package bayesQR contains a number of routines to estimate quantile regression parameters using a Bayesian approach based on the asymmetric Laplace distribution. The package contains functions for the typical quantile regression with continuous dependent variable, but also supports quantile regression for binary dependent variables. For both types of dependent variables, an approach to variable selection using the adaptive lasso approach is provided. For the binary quantile regression model, the package also contains a routine that calculates the fitted probabilities for each vector of predictors. In addition, functions for summarizing the results, creating traceplots, posterior histograms and drawing quantile plots are included. This paper starts with a brief overview of the theoretical background of the models used in the bayesQR package. The main part of this paper discusses the computational problems that arise in the implementation of the procedure and illustrates the usefulness of the package through selected examples.
International Nuclear Information System (INIS)
Schaaf, Arjen van der; Xu Chengjian; Luijk, Peter van; Veld, Aart A. van’t; Langendijk, Johannes A.; Schilstra, Cornelis
2012-01-01
Purpose: Multivariate modeling of complications after radiotherapy is frequently used in conjunction with data driven variable selection. This study quantifies the risk of overfitting in a data driven modeling method using bootstrapping for data with typical clinical characteristics, and estimates the minimum amount of data needed to obtain models with relatively high predictive power. Materials and methods: To facilitate repeated modeling and cross-validation with independent datasets for the assessment of true predictive power, a method was developed to generate simulated data with statistical properties similar to real clinical data sets. Characteristics of three clinical data sets from radiotherapy treatment of head and neck cancer patients were used to simulate data with set sizes between 50 and 1000 patients. A logistic regression method using bootstrapping and forward variable selection was used for complication modeling, resulting for each simulated data set in a selected number of variables and an estimated predictive power. The true optimal number of variables and true predictive power were calculated using cross-validation with very large independent data sets. Results: For all simulated data set sizes the number of variables selected by the bootstrapping method was on average close to the true optimal number of variables, but showed considerable spread. Bootstrapping is more accurate in selecting the optimal number of variables than the AIC and BIC alternatives, but this did not translate into a significant difference of the true predictive power. The true predictive power asymptotically converged toward a maximum predictive power for large data sets, and the estimated predictive power converged toward the true predictive power. More than half of the potential predictive power is gained after approximately 200 samples. Our simulations demonstrated severe overfitting (a predicative power lower than that of predicting 50% probability) in a number of small
Shah, Abhik; Woolf, Peter
2009-01-01
Summary In this paper, we introduce pebl, a Python library and application for learning Bayesian network structure from data and prior knowledge that provides features unmatched by alternative software packages: the ability to use interventional data, flexible specification of structural priors, modeling with hidden variables and exploitation of parallel processing. PMID:20161541
Huynh, Huynh
By noting that a Rasch or two parameter logistic (2PL) item belongs to the exponential family of random variables and that the probability density function (pdf) of the correct response (X=1) and the incorrect response (X=0) are symmetric with respect to the vertical line at the item location, it is shown that the conjugate prior for ability is…
Applied Music Teaching Behavior as a Function of Selected Personality Variables.
Schmidt, Charles P.
1989-01-01
Investigates the relationships among applied music teaching behaviors and personality variables as measured by the Myers-Briggs Type Indicator (MBTI). Suggests that personality variables may be important factors underlying four applied music teaching behaviors: approvals, rate of reinforcement, teacher model/performance, and pace. (LS)
Directory of Open Access Journals (Sweden)
Renata Bujak
2016-07-01
Full Text Available Non-targeted metabolomics constitutes a part of systems biology and aims to determine many metabolites in complex biological samples. Datasets obtained in non-targeted metabolomics studies are multivariate and high-dimensional due to the sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA without and with multiple testing correction as well as least absolute shrinkage and selection operator (LASSO were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction, selected 46 and 218 variables based on VIP criteria using Pareto and UV scaling, respectively. In the case of the PH study, 217 and 320 variables were selected based on VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built with multiple testing correction, selected 4 and 19 variables as statistically significant in terms of Pareto and UV scaling, respectively. For PH study, 14 and 18 variables were selected based on VIP criteria in terms of Pareto and UV scaling, respectively. Additionally, the concept and fundaments of the least absolute shrinkage and selection operator (LASSO with bootstrap procedure evaluating reproducibility of results, was demonstrated. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3% and 100%. However, apart from the popularity of PLS-DA and OPLS-DA methods in metabolomics, it should be highlighted that they do not control type I or type II error, but only arbitrarily establish a cut-off value for PLS-DA loadings
The role of protozoa-driven selection in shaping human genetic variability.
Pozzoli, Uberto; Fumagalli, Matteo; Cagliani, Rachele; Comi, Giacomo P; Bresolin, Nereo; Clerici, Mario; Sironi, Manuela
2010-03-01
Protozoa exert a strong selective pressure in humans. The selection signatures left by these pathogens can be exploited to identify genetic modulators of infection susceptibility. We show that protozoa diversity in different geographic locations is a good measure of protozoa-driven selective pressure; protozoa diversity captured selection signatures at known malaria resistance loci and identified several selected single nucleotide polymorphisms in immune and hemolytic anemia genes. A genome-wide search enabled us to identify 5180 variants mapping to 1145 genes that are subjected to protozoa-driven selective pressure. We provide a genome-wide estimate of protozoa-driven selective pressure and identify candidate susceptibility genes for protozoa-borne diseases. Copyright 2010 Elsevier Ltd. All rights reserved.
Loturco, Irineu; Artioli, Guilherme Giannini; Kobal, Ronaldo; Gil, Saulo; Franchini, Emerson
2014-07-01
This study investigated the relationship between punching acceleration and selected strength and power variables in 19 professional karate athletes from the Brazilian National Team (9 men and 10 women; age, 23 ± 3 years; height, 1.71 ± 0.09 m; and body mass [BM], 67.34 ± 13.44 kg). Punching acceleration was assessed under 4 different conditions in a randomized order: (a) fixed distance aiming to attain maximum speed (FS), (b) fixed distance aiming to attain maximum impact (FI), (c) self-selected distance aiming to attain maximum speed, and (d) self-selected distance aiming to attain maximum impact. The selected strength and power variables were as follows: maximal dynamic strength in bench press and squat-machine, squat and countermovement jump height, mean propulsive power in bench throw and jump squat, and mean propulsive velocity in jump squat with 40% of BM. Upper- and lower-body power and maximal dynamic strength variables were positively correlated to punch acceleration in all conditions. Multiple regression analysis also revealed predictive variables: relative mean propulsive power in squat jump (W·kg-1), and maximal dynamic strength 1 repetition maximum in both bench press and squat-machine exercises. An impact-oriented instruction and a self-selected distance to start the movement seem to be crucial to reach the highest acceleration during punching execution. This investigation, while demonstrating strong correlations between punching acceleration and strength-power variables, also provides important information for coaches, especially for designing better training strategies to improve punching speed.
Schnitzer, Mireille E.; Lok, Judith J.; Gruber, Susan
2015-01-01
This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low-and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios. PMID:26226129
Hierarchical Bayesian Modeling of Fluid-Induced Seismicity
Broccardo, M.; Mignan, A.; Wiemer, S.; Stojadinovic, B.; Giardini, D.
2017-11-01
In this study, we present a Bayesian hierarchical framework to model fluid-induced seismicity. The framework is based on a nonhomogeneous Poisson process with a fluid-induced seismicity rate proportional to the rate of injected fluid. The fluid-induced seismicity rate model depends upon a set of physically meaningful parameters and has been validated for six fluid-induced case studies. In line with the vision of hierarchical Bayesian modeling, the rate parameters are considered as random variables. We develop both the Bayesian inference and updating rules, which are used to develop a probabilistic forecasting model. We tested the Basel 2006 fluid-induced seismic case study to prove that the hierarchical Bayesian model offers a suitable framework to coherently encode both epistemic uncertainty and aleatory variability. Moreover, it provides a robust and consistent short-term seismic forecasting model suitable for online risk quantification and mitigation.
Bayesian methods for hackers probabilistic programming and Bayesian inference
Davidson-Pilon, Cameron
2016-01-01
Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power. Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention. Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples a...
Kwon, Deukwoo; Reis, Isildinha M.
2016-01-01
Background: We proposed approximate Bayesian computation with single distribution selection (ABC-SD) for estimating mean and standard deviation from other reported summary statistics. The ABC-SD generates pseudo data from a single parametric distribution thought to be the true distribution of underlying study data. This single distribution is either an educated guess, or it is selected via model selection using posterior probability criterion for testing two or more candidate distributions. F...
Wagenmakers, E.-J.
2009-01-01
The probabilistic approach to human reasoning is exemplified by the information gain model for the Wason card selection task. Although the model is elegant and original, several key aspects of the model warrant further discussion, particularly those concerning the scope of the task and the choice
Bayesian logistic regression analysis
Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.
2012-01-01
In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an
Bayesian statistical inference
Directory of Open Access Journals (Sweden)
Bruno De Finetti
2017-04-01
Full Text Available This work was translated into English and published in the volume: Bruno De Finetti, Induction and Probability, Biblioteca di Statistica, eds. P. Monari, D. Cocchi, Clueb, Bologna, 1993.Bayesian statistical Inference is one of the last fundamental philosophical papers in which we can find the essential De Finetti's approach to the statistical inference.
International Nuclear Information System (INIS)
Cordoba Maquilon, Jorge E; Gonzalez Calderon, Carlos A; Posada Henao, John J
2011-01-01
A study using revealed preference surveys and psychological tests was conducted. Key psychological variables of behavior involved in the choice of transportation mode in a population sample of the Metropolitan Area of the Valle de Aburra were detected. The experiment used the random utility theory for discrete choice models and reasoned action in order to assess beliefs. This was used as a tool for analysis of the psychological variables using the sixteen personality factor questionnaire (16PF test). In addition to the revealed preference surveys, two other surveys were carried out: one with socio-economic characteristics and the other with latent indicators. This methodology allows for an integration of discrete choice models and latent variables. The integration makes the model operational and quantifies the unobservable psychological variables. The most relevant result obtained was that anxiety affects the choice of urban transportation mode and shows that physiological alterations, as well as problems in perception and beliefs, can affect the decision-making process.
Morphologies of Mid-IR Variability-Selected AGN Host Galaxies
Polimera, Mugdha; Sarajedini, Vicki; Ashby, Matthew L. N.; Willner, S. P.; Fazio, Giovanni G.
2018-01-01
We use multi-epoch 3.6 and 4.5 {μ m} data from the Spitzer Extended Deep Survey (SEDS) to probe the AGN population among galaxies to redshifts ˜3 via their mid-IR variability. About 1% of all galaxies in our survey contain varying nuclei, 80% of which are likely to be AGN. Twenty-three percent of mid-IR variables are also X-ray sources. The mid-IR variables have a slightly greater fraction of weakly disturbed morphologies compared to a control sample of normal galaxies. The increased fraction of weakly distorted hosts becomes more significant when we remove the X-ray emitting AGN, while the frequency of strongly disturbed hosts remains similar to the control galaxy sample. These results suggest that mid-IR variability identifies a unique population of obscured, Compton-thick AGN revealing elevated levels of weak distortion among their host galaxies.
Bayesian Subset Modeling for High-Dimensional Generalized Linear Models
Liang, Faming
2013-06-01
This article presents a new prior setting for high-dimensional generalized linear models, which leads to a Bayesian subset regression (BSR) with the maximum a posteriori model approximately equivalent to the minimum extended Bayesian information criterion model. The consistency of the resulting posterior is established under mild conditions. Further, a variable screening procedure is proposed based on the marginal inclusion probability, which shares the same properties of sure screening and consistency with the existing sure independence screening (SIS) and iterative sure independence screening (ISIS) procedures. However, since the proposed procedure makes use of joint information from all predictors, it generally outperforms SIS and ISIS in real applications. This article also makes extensive comparisons of BSR with the popular penalized likelihood methods, including Lasso, elastic net, SIS, and ISIS. The numerical results indicate that BSR can generally outperform the penalized likelihood methods. The models selected by BSR tend to be sparser and, more importantly, of higher prediction ability. In addition, the performance of the penalized likelihood methods tends to deteriorate as the number of predictors increases, while this is not significant for BSR. Supplementary materials for this article are available online. © 2013 American Statistical Association.
Naranjo, Lizbeth; Pérez, Carlos J; Martín, Jacinto; Campos-Roca, Yolanda
2017-04-01
In the scientific literature, there is a lack of variable selection and classification methods considering replicated data. The problem motivating this work consists in the discrimination of people suffering Parkinson's disease from healthy subjects based on acoustic features automatically extracted from replicated voice recordings. A two-stage variable selection and classification approach has been developed to properly match the replication-based experimental design. The way the statistical approach has been specified allows that the computational problems are solved by using an easy-to-implement Gibbs sampling algorithm. The proposed approach produces an acceptable predictive capacity for PD discrimination with the considered database, despite the fact that the sample size is relatively small. Specifically, the accuracy rate, sensitivity and specificity are 86.2%, 82.5%, and 90.0%, respectively. However, the most important fact is that there is an improvement in the interpretability of the results at the same time that it is shown a better chain mixing and a lower computation time with respect to the only-classification approaches presented in the scientific literature. To the best of the authors' knowledge, this is the first approach developed to properly consider intra-subject variability for variable selection and classification. Although the proposed approach has been applied for PD discrimination, it can be applied in other contexts with similar replication-based experimental designs. Copyright © 2017 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Michael Vohland
2017-10-01
Full Text Available We explored the potentials of both non-imaging laboratory and airborne imaging spectroscopy to assess arable soil quality indicators. We focused on microbial biomass-C (MBC and hot water-extractable C (HWEC, complemented by organic carbon (OC and nitrogen (N as well-studied spectrally active parameters. The aggregation of different spectral variable selection strategies was used to analyze benefits for reachable estimation accuracies and to explore spectral predictive mechanisms for MBC and HWEC. With selected variables, quantification accuracies improved markedly for MBC (laboratory: RPD = 2.32 instead of 1.33 with full spectra; airborne: 2.35 instead of 1.80 and OC (laboratory: RPD = 3.08 instead of 2.36; airborne: 2.20 instead of 1.94. Patterns of selected variables indicated similarities between HWEC and OC, but significant differences between all other soil variables. This agreed to our results of indirect approaches in which both (i wet-chemical data of OC and N and (ii spectra fitted to measured OC and N values were used to estimate MBC and HWEC. Compared to these approaches, we found marked benefits of laboratory and airborne data for a direct spectral quantification of MBC (but not for HWEC. This suggests specificity of spectra for MBC, usable for the determination of this important soil parameter.
Bayesian Kernel Mixtures for Counts.
Canale, Antonio; Dunson, David B
2011-12-01
Although Bayesian nonparametric mixture models for continuous data are well developed, there is a limited literature on related approaches for count data. A common strategy is to use a mixture of Poissons, which unfortunately is quite restrictive in not accounting for distributions having variance less than the mean. Other approaches include mixing multinomials, which requires finite support, and using a Dirichlet process prior with a Poisson base measure, which does not allow smooth deviations from the Poisson. As a broad class of alternative models, we propose to use nonparametric mixtures of rounded continuous kernels. An efficient Gibbs sampler is developed for posterior computation, and a simulation study is performed to assess performance. Focusing on the rounded Gaussian case, we generalize the modeling framework to account for multivariate count data, joint modeling with continuous and categorical variables, and other complications. The methods are illustrated through applications to a developmental toxicity study and marketing data. This article has supplementary material online.
Directory of Open Access Journals (Sweden)
Legarra Andrés
2006-09-01
Full Text Available Abstract Breeding sheep populations for scrapie resistance could result in a loss of genetic variability. In this study, the effect on genetic variability of selection for increasing the ARR allele frequency was estimated in the Latxa breed. Two sources of information were used, pedigree and genetic polymorphisms (fifteen microsatellites. The results based on the genealogical information were conditioned by a low pedigree completeness level that revealed the interest of also using the information provided by the molecular markers. The overall results suggest that no great negative effect on genetic variability can be expected in the short time in the population analysed by selection of only ARR/ARR males. The estimated average relationship of ARR/ARR males with reproductive females was similar to that of all available males whatever its genotype: 0.010 vs. 0.012 for a genealogical relationship and 0.257 vs. 0.296 for molecular coancestry, respectively. However, selection of only ARR/ARR males implied important losses in founder animals (87 percent and low frequency alleles (30 percent in the ram population. The evaluation of mild selection strategies against scrapie susceptibility based on the use of some ARR heterozygous males was difficult because the genetic relationships estimated among animals differed when pedigree or molecular information was used, and the use of more molecular markers should be evaluated.
Dimensionality reduction in Bayesian estimation algorithms
G. W. Petty
2013-01-01
An idealized synthetic database loosely resembling 3-channel passive microwave observations of precipitation against a variable background is employed to examine the performance of a conventional Bayesian retrieval algorithm. For this dataset, algorithm performance is found to be poor owing to an irreconcilable conflict between the need to find matches in the dependent database versus the need to exclude inappropriate matches. It is argued that the likelihood of such conflicts increase...
Dimensionality reduction in Bayesian estimation algorithms
G. W. Petty
2013-01-01
An idealized synthetic database loosely resembling 3-channel passive microwave observations of precipitation against a variable background is employed to examine the performance of a conventional Bayesian retrieval algorithm. For this dataset, algorithm performance is found to be poor owing to an irreconcilable conflict between the need to find matches in the dependent database versus the need to exclude inappropriate matches. It is argued that the likelihood of such conf...
Bayesian analysis of Markov point processes
DEFF Research Database (Denmark)
Berthelsen, Kasper Klitgaard; Møller, Jesper
2006-01-01
Recently Møller, Pettitt, Berthelsen and Reeves introduced a new MCMC methodology for drawing samples from a posterior distribution when the likelihood function is only specified up to a normalising constant. We illustrate the method in the setting of Bayesian inference for Markov point processes...... a partially ordered Markov point process as the auxiliary variable. As the method requires simulation from the "unknown" likelihood, perfect simulation algorithms for spatial point processes become useful....
Bayesian optimization for materials science
Packwood, Daniel
2017-01-01
This book provides a short and concise introduction to Bayesian optimization specifically for experimental and computational materials scientists. After explaining the basic idea behind Bayesian optimization and some applications to materials science in Chapter 1, the mathematical theory of Bayesian optimization is outlined in Chapter 2. Finally, Chapter 3 discusses an application of Bayesian optimization to a complicated structure optimization problem in computational surface science. Bayesian optimization is a promising global optimization technique that originates in the field of machine learning and is starting to gain attention in materials science. For the purpose of materials design, Bayesian optimization can be used to predict new materials with novel properties without extensive screening of candidate materials. For the purpose of computational materials science, Bayesian optimization can be incorporated into first-principles calculations to perform efficient, global structure optimizations. While re...
Bayesian calibration for forensic age estimation.
Ferrante, Luigi; Skrami, Edlira; Gesuita, Rosaria; Cameriere, Roberto
2015-05-10
Forensic medicine is increasingly called upon to assess the age of individuals. Forensic age estimation is mostly required in relation to illegal immigration and identification of bodies or skeletal remains. A variety of age estimation methods are based on dental samples and use of regression models, where the age of an individual is predicted by morphological tooth changes that take place over time. From the medico-legal point of view, regression models, with age as the dependent random variable entail that age tends to be overestimated in the young and underestimated in the old. To overcome this bias, we describe a new full Bayesian calibration method (asymmetric Laplace Bayesian calibration) for forensic age estimation that uses asymmetric Laplace distribution as the probability model. The method was compared with three existing approaches (two Bayesian and a classical method) using simulated data. Although its accuracy was comparable with that of the other methods, the asymmetric Laplace Bayesian calibration appears to be significantly more reliable and robust in case of misspecification of the probability model. The proposed method was also applied to a real dataset of values of the pulp chamber of the right lower premolar measured on x-ray scans of individuals of known age. Copyright © 2015 John Wiley & Sons, Ltd.
Selection variability for Arg48His in alcohol dehydrogenase ADH1B among Asian populations.
Evsyukov, Alexey; Ivanov, Denis
2013-08-01
The variant His at codon 48 of the alcohol dehydrogenase gene (ADH1B) results in more efficient ethanol metabolism than with the "typical" codon 48Arg. In this study we introduced selection properties of Arg48His genotypes of ADH1B and estimated fitness in four ethnic-geographical clusters in Asia. Population genetics models were employed that derive observed gene frequencies from fitness relationships among genotypes, to infer the selection pattern of polymorphisms in an indirect manner. The data were analyzed using the model of "complete stationary distribution" by Wright that takes into account random genetic drift, pressure of migrations, mutations, and selection as influential factors of gene frequency. We found that the different population groups showed some variation in the types of selection for Arg48His. Han Chinese from eastern and southeastern China and the Japanese and Korean populations showed stabilizing selection, while the groups from Central Asian and Indochina showed divergent selection. However, all the groups demonstrated a strong positive selection for Arg48His. Copyright © 2013 Wayne State University Press, Detroit, Michigan 48201-1309.
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Firefly as a novel swarm intelligence variable selection method in spectroscopy.
Goodarzi, Mohammad; dos Santos Coelho, Leandro
2014-12-10
A critical step in multivariate calibration is wavelength selection, which is used to build models with better prediction performance when applied to spectral data. Up to now, many feature selection techniques have been developed. Among all different types of feature selection techniques, those based on swarm intelligence optimization methodologies are more interesting since they are usually simulated based on animal and insect life behavior to, e.g., find the shortest path between a food source and their nests. This decision is made by a crowd, leading to a more robust model with less falling in local minima during the optimization cycle. This paper represents a novel feature selection approach to the selection of spectroscopic data, leading to more robust calibration models. The performance of the firefly algorithm, a swarm intelligence paradigm, was evaluated and compared with genetic algorithm and particle swarm optimization. All three techniques were coupled with partial least squares (PLS) and applied to three spectroscopic data sets. They demonstrate improved prediction results in comparison to when only a PLS model was built using all wavelengths. Results show that firefly algorithm as a novel swarm paradigm leads to a lower number of selected wavelengths while the prediction performance of built PLS stays the same. Copyright © 2014. Published by Elsevier B.V.
AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.
Sun, Lei; Wang, Jun; Wei, Jinmao
2017-03-14
The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.
Directory of Open Access Journals (Sweden)
JORGE E. CÓRDOBA MAQUILÓN
2011-01-01
Full Text Available Aplicando encuestas de preferencias reveladas y cuestionarios psicológicos se realizó un estudio detectando variables psicológicas claves de la conducta que intervienen en la elección de un modo de transporte en un grupo de habitantes del Área Metropolitana del Valle de Aburrá. Se tuvo en cuenta la teoría de la utilidad aleatoriapara los modelos de elección discreta y la acción razonada para evaluar las creencias y se utilice como herramienta de análisis de las variables psicológicas el cuestionario de factor de personalidad (16PF. Además de las encuestas de preferencias reveladas, se aplicaron otras dos encuestas: una de categorías socioeconómicas, y otra con indicadores latentes. Esta metodología permite una integración de modelos de elección discreta y de variables latentes, que lo hace operativo y cuantifica las variables psicológicas inobservables. El resultado más relevante que se obtuvo fue que la ansiedad incide en la elección de un modo de transporte urbano y se muestra que una alteración fisiológica, problemas en la percepción, y las creencias pueden afectar el proceso de toma de decisiones.
Variable selection for modelling effects of eutrophication on stream and river ecosystems
Nijboer, R.C.; Verdonschot, P.F.M.
2004-01-01
Models are needed for forecasting the effects of eutrophication on stream and river ecosystems. Most of the current models do not include differences in local stream characteristics and effects on the biota. To define the most important variables that should be used in a stream eutrophication model,
Simulation of Energy Selective signal Amplification in Gas Environment of Variable Pressure SEM
Czech Academy of Sciences Publication Activity Database
Neděla, Vilém; Konvalina, Ivo; Lencová, B.; Zlámal, J.
2011-01-01
Roč. 17, Suppl. 2 (2011), s. 920-921 ISSN 1431-9276 R&D Projects: GA ČR GAP102/10/1410 Institutional research plan: CEZ:AV0Z20650511 Keywords : variable pressure scanning electron microscopes (VP-SEM) * AQUASEM II * simlulation Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 3.007, year: 2011
Selected socio-demographic variables and their effect on the quality ...
African Journals Online (AJOL)
Background : The aim of this study was to determine the effect of some socio demographic variables on the quality of life of elderly individuals aged 60 to 75years in Nnewi North Local Government area of Anambra State, Southeastern Nigeria. Method: A total of 169 subjects which included 80 females and 89 males who ...
Cortical Response Variability as a Developmental Index of Selective Auditory Attention
Strait, Dana L.; Slater, Jessica; Abecassis, Victor; Kraus, Nina
2014-01-01
Attention induces synchronicity in neuronal firing for the encoding of a given stimulus at the exclusion of others. Recently, we reported decreased variability in scalp-recorded cortical evoked potentials to attended compared with ignored speech in adults. Here we aimed to determine the developmental time course for this neural index of auditory…
Variable Selection Strategies for Small-area Estimation Using FIA Plots and Remotely Sensed Data
Andrew Lister; Rachel Riemann; James Westfall; Mike Hoppus
2005-01-01
The USDA Forest Service's Forest Inventory and Analysis (FIA) unit maintains a network of tens of thousands of georeferenced forest inventory plots distributed across the United States. Data collected on these plots include direct measurements of tree diameter and height and other variables. We present a technique by which FIA plot data and coregistered...
The variability in outdoor concentrations of acrolein, benzene, toluene, ethylbenzene and xylenes (BTEX), and 1,3-butadiene was examined for data measured during summer 2004 of the Detroit Exposure and Aerosol Research Study (DEARS). Results for acrolein indicated no significant...
Temporal variability of selected chemical and physical propertires of topsoil of three soil types
Czech Academy of Sciences Publication Activity Database
Jirků, V.; Kodešová, R.; Nikodem, A.; Mühlhanselová, M.; Žigová, Anna
2013-01-01
Roč. 15, - (2013) ISSN 1607-7962. [EGU General Assembly /10./. 07.04.2013-12.04.2013, Vienna] R&D Projects: GA ČR GA526/08/0434 Institutional support: RVO:67985831 Keywords : soil properties * soil types * temporal variability Subject RIV: DF - Soil Science http://meetingorganizer.copernicus.org/EGU2013/EGU2013-7650-1.pdf
Bayesian Independent Component Analysis
DEFF Research Database (Denmark)
Winther, Ole; Petersen, Kaare Brandt
2007-01-01
In this paper we present an empirical Bayesian framework for independent component analysis. The framework provides estimates of the sources, the mixing matrix and the noise parameters, and is flexible with respect to choice of source prior and the number of sources and sensors. Inside the engine...... in a Matlab toolbox, is demonstrated for non-negative decompositions and compared with non-negative matrix factorization.......In this paper we present an empirical Bayesian framework for independent component analysis. The framework provides estimates of the sources, the mixing matrix and the noise parameters, and is flexible with respect to choice of source prior and the number of sources and sensors. Inside the engine...
Arregui, Iñigo
2018-01-01
In contrast to the situation in a laboratory, the study of the solar atmosphere has to be pursued without direct access to the physical conditions of interest. Information is therefore incomplete and uncertain and inference methods need to be employed to diagnose the physical conditions and processes. One of such methods, solar atmospheric seismology, makes use of observed and theoretically predicted properties of waves to infer plasma and magnetic field properties. A recent development in solar atmospheric seismology consists in the use of inversion and model comparison methods based on Bayesian analysis. In this paper, the philosophy and methodology of Bayesian analysis are first explained. Then, we provide an account of what has been achieved so far from the application of these techniques to solar atmospheric seismology and a prospect of possible future extensions.
Mørup, Morten; Schmidt, Mikkel N
2012-09-01
Many networks of scientific interest naturally decompose into clusters or communities with comparatively fewer external than internal links; however, current Bayesian models of network communities do not exert this intuitive notion of communities. We formulate a nonparametric Bayesian model for community detection consistent with an intuitive definition of communities and present a Markov chain Monte Carlo procedure for inferring the community structure. A Matlab toolbox with the proposed inference procedure is available for download. On synthetic and real networks, our model detects communities consistent with ground truth, and on real networks, it outperforms existing approaches in predicting missing links. This suggests that community structure is an important structural property of networks that should be explicitly modeled.
Energy Technology Data Exchange (ETDEWEB)
Andrews, Stephen A. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Sigeti, David E. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2017-11-15
These are a set of slides about Bayesian hypothesis testing, where many hypotheses are tested. The conclusions are the following: The value of the Bayes factor obtained when using the median of the posterior marginal is almost the minimum value of the Bayes factor. The value of τ^{2} which minimizes the Bayes factor is a reasonable choice for this parameter. This allows a likelihood ratio to be computed with is the least favorable to H_{0}.
Bayesian networks in reliability
Energy Technology Data Exchange (ETDEWEB)
Langseth, Helge [Department of Mathematical Sciences, Norwegian University of Science and Technology, N-7491 Trondheim (Norway)]. E-mail: helgel@math.ntnu.no; Portinale, Luigi [Department of Computer Science, University of Eastern Piedmont ' Amedeo Avogadro' , 15100 Alessandria (Italy)]. E-mail: portinal@di.unipmn.it
2007-01-15
Over the last decade, Bayesian networks (BNs) have become a popular tool for modelling many kinds of statistical problems. We have also seen a growing interest for using BNs in the reliability analysis community. In this paper we will discuss the properties of the modelling framework that make BNs particularly well suited for reliability applications, and point to ongoing research that is relevant for practitioners in reliability.
DEFF Research Database (Denmark)
Antoniou, Constantinos; Harrison, Glenn W.; Lau, Morten I.
2015-01-01
A large literature suggests that many individuals do not apply Bayes’ Rule when making decisions that depend on them correctly pooling prior information and sample data. We replicate and extend a classic experimental study of Bayesian updating from psychology, employing the methods of experimental...... economics, with careful controls for the confounding effects of risk aversion. Our results show that risk aversion significantly alters inferences on deviations from Bayes’ Rule....
Approximate Bayesian recursive estimation
Czech Academy of Sciences Publication Activity Database
Kárný, Miroslav
2014-01-01
Roč. 285, č. 1 (2014), s. 100-111 ISSN 0020-0255 R&D Projects: GA ČR GA13-13502S Institutional support: RVO:67985556 Keywords : Approximate parameter estimation * Bayesian recursive estimation * Kullback–Leibler divergence * Forgetting Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 4.038, year: 2014 http://library.utia.cas.cz/separaty/2014/AS/karny-0425539.pdf
Personalized Audio Systems - a Bayesian Approach
DEFF Research Database (Denmark)
Nielsen, Jens Brehm; Jensen, Bjørn Sand; Hansen, Toke Jansen
2013-01-01
, the present paper presents a general inter-active framework for personalization of such audio systems. The framework builds on Bayesian Gaussian process regression in which a model of the users's objective function is updated sequentially. The parameter setting to be evaluated in a given trial is selected...... are optimized using the proposed framework. Twelve test subjects obtain a personalized setting with the framework, and these settings are signicantly preferred to those obtained with random experimentation....
Bayesian theory and applications
Dellaportas, Petros; Polson, Nicholas G; Stephens, David A
2013-01-01
The development of hierarchical models and Markov chain Monte Carlo (MCMC) techniques forms one of the most profound advances in Bayesian analysis since the 1970s and provides the basis for advances in virtually all areas of applied and theoretical Bayesian statistics. This volume guides the reader along a statistical journey that begins with the basic structure of Bayesian theory, and then provides details on most of the past and present advances in this field. The book has a unique format. There is an explanatory chapter devoted to each conceptual advance followed by journal-style chapters that provide applications or further advances on the concept. Thus, the volume is both a textbook and a compendium of papers covering a vast range of topics. It is appropriate for a well-informed novice interested in understanding the basic approach, methods and recent applications. Because of its advanced chapters and recent work, it is also appropriate for a more mature reader interested in recent applications and devel...
Bayesian Correlation Analysis for Sequence Count Data.
Directory of Open Access Journals (Sweden)
Daniel Sánchez-Taltavull
Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.
McCarty, James; Parrinello, Michele
2017-11-01
In this paper, we combine two powerful computational techniques, well-tempered metadynamics and time-lagged independent component analysis. The aim is to develop a new tool for studying rare events and exploring complex free energy landscapes. Metadynamics is a well-established and widely used enhanced sampling method whose efficiency depends on an appropriate choice of collective variables. Often the initial choice is not optimal leading to slow convergence. However by analyzing the dynamics generated in one such run with a time-lagged independent component analysis and the techniques recently developed in the area of conformational dynamics, we obtain much more efficient collective variables that are also better capable of illuminating the physics of the system. We demonstrate the power of this approach in two paradigmatic examples.
Bobrowski, Maria; Schickhoff, Udo
2017-04-01
Betula utilis is a major constituent of alpine treeline ecotones in the western and central Himalayan region. The objective of this study is to provide first time analysis of the potential distribution of Betula utilis in the subalpine and alpine belts of the Himalayan region using species distribution modelling. Using Generalized Linear Models (GLM) we aim at examining climatic factors controlling the species distribution under current climate conditions. Furthermore we evaluate the prediction ability of climate data derived from different statistical methods. GLMs were created using least correlated bioclimatic variables derived from two different climate models: 1) interpolated climate data (i.e. Worldclim, Hijmans et al., 2005) and 2) quasi-mechanistical statistical downscaling (i.e. Chelsa; Karger et al., 2016). Model accuracy was evaluated by the ability to predict the potential species distribution range. We found that models based on variables of Chelsa climate data had higher predictive power, whereas models using Worldclim climate data consistently overpredicted the potential suitable habitat for Betula utilis. Although climatic variables of Worldclim are widely used in modelling species distribution, our results suggest to treat them with caution when remote regions like the Himalayan mountains are in focus. Unmindful usage of climatic variables for species distribution models potentially cause misleading projections and may lead to wrong implications and recommendations for nature conservation. References: Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G. & Jarvis, A. (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965-1978. Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N., Linder, H.P. & Kessler, M. (2016) Climatologies at high resolution for the earth land surface areas. arXiv:1607.00217 [physics].
International Nuclear Information System (INIS)
Hoffman, F.O.; Gardner, R.H.; Eckerman, K.F.
1982-06-01
Dose predictions for the ingestion of 90 Sr and 137 Cs, using aquatic and terrestrial food chain transport models similar to those in the Nuclear Regulatory Commission's Regulatory Guide 1.109, are evaluated through estimating the variability of model parameters and determining the effect of this variability on model output. The variability in the predicted dose equivalent is determined using analytical and numerical procedures. In addition, a detailed discussion is included on 90 Sr dosimetry. The overall estimates of uncertainty are most relevant to conditions where site-specific data is unavailable and when model structure and parameter estimates are unbiased. Based on the comparisons performed in this report, it is concluded that the use of the generic default parameters in Regulatory Guide 1.109 will usually produce conservative dose estimates that exceed the 90th percentile of the predicted distribution of dose equivalents. An exception is the meat pathway for 137 Cs, in which use of generic default values results in a dose estimate at the 24th percentile. Among the terrestrial pathways of exposure, the non-leafy vegetable pathway is the most important for 90 Sr. For 90 Sr, the parameters for soil retention, soil-to-plant transfer, and internal dosimetry contribute most significantly to the variability in the predicted dose for the combined exposure to all terrestrial pathways. For 137 Cs, the meat transfer coefficient the mass interception factor for pasture forage, and the ingestion dose factor are the most important parameters. The freshwater finfish bioaccumulation factor is the most important parameter for the dose prediction of 90 Sr and 137 Cs transported over the water-fish-man pathway
Oracle Efficient Variable Selection in Random and Fixed Effects Panel Data Models
DEFF Research Database (Denmark)
Kock, Anders Bredahl
This paper generalizes the results for the Bridge estimator of Huang et al. (2008) to linear random and fixed effects panel data models which are allowed to grow in both dimensions. In particular we show that the Bridge estimator is oracle efficient. It can correctly distinguish between relevant...... of Huang et al. (2008). Furthermore, the number of relevant variables is allowed to be larger than the sample size....
Fitzpatrick, Benjamin R; Lamb, David W; Mengersen, Kerrie
2016-01-01
Modern soil mapping is characterised by the need to interpolate point referenced (geostatistical) observations and the availability of large numbers of environmental characteristics for consideration as covariates to aid this interpolation. Modelling tasks of this nature also occur in other fields such as biogeography and environmental science. This analysis employs the Least Angle Regression (LAR) algorithm for fitting Least Absolute Shrinkage and Selection Operator (LASSO) penalized Multiple Linear Regressions models. This analysis demonstrates the efficiency of the LAR algorithm at selecting covariates to aid the interpolation of geostatistical soil carbon observations. Where an exhaustive search of the models that could be constructed from 800 potential covariate terms and 60 observations would be prohibitively demanding, LASSO variable selection is accomplished with trivial computational investment.
Lamb, David W.; Mengersen, Kerrie
2016-01-01
Modern soil mapping is characterised by the need to interpolate point referenced (geostatistical) observations and the availability of large numbers of environmental characteristics for consideration as covariates to aid this interpolation. Modelling tasks of this nature also occur in other fields such as biogeography and environmental science. This analysis employs the Least Angle Regression (LAR) algorithm for fitting Least Absolute Shrinkage and Selection Operator (LASSO) penalized Multiple Linear Regressions models. This analysis demonstrates the efficiency of the LAR algorithm at selecting covariates to aid the interpolation of geostatistical soil carbon observations. Where an exhaustive search of the models that could be constructed from 800 potential covariate terms and 60 observations would be prohibitively demanding, LASSO variable selection is accomplished with trivial computational investment. PMID:27603135
Variability-selected active galactic nuclei in the VST-SUDARE/VOICE survey of the COSMOS field
De Cicco, D.; Paolillo, M.; Covone, G.; Falocco, S.; Longo, G.; Grado, A.; Limatola, L.; Botticella, M. T.; Pignata, G.; Cappellaro, E.; Vaccari, M.; Trevese, D.; Vagnetti, F.; Salvato, M.; Radovich, M.; Brandt, W. N.; Capaccioli, M.; Napolitano, N. R.; Schipani, P.
2015-02-01
Context. Active galaxies are characterized by variability at every wavelength, with timescales from hours to years depending on the observing window. Optical variability has proven to be an effective way of detecting AGNs in imaging surveys, lasting from weeks to years. Aims: In the present work we test the use of optical variability as a tool to identify active galactic nuclei in the VST multiepoch survey of the COSMOS field, originally tailored to detect supernova events. Methods: We make use of the multiwavelength data provided by other COSMOS surveys to discuss the reliability of the method and the nature of our AGN candidates. Results: The selection on the basis of optical variability returns a sample of 83 AGN candidates; based on a number of diagnostics, we conclude that 67 of them are confirmed AGNs (81% purity), 12 are classified as supernovae, while the nature of the remaining 4 is unknown. For the subsample of AGNs with some spectroscopic classification, we find that Type 1 are prevalent (89%) compared to Type 2 AGNs (11%). Overall, our approach is able to retrieve on average 15% of all AGNs in the field identified by means of spectroscopic or X-ray classification, with a strong dependence on the source apparent magnitude (completeness ranging from 26% to 5%). In particular, the completeness for Type 1 AGNs is 25%, while it drops to 6% for Type 2 AGNs. The rest of the X-ray selected AGN population presents on average a larger rms variability than the bulk of non-variable sources, indicating that variability detection for at least some of these objects is prevented only by the photometric accuracy of the data. The low completeness is in part due to the short observing span: we show that increasing the temporal baseline results in larger samples as expected for sources with a red-noise power spectrum. Our results allow us to assess the usefulness of this AGN selection technique in view of future wide-field surveys. Observations were provided by the ESO
Piegorsch, W W; Lockhart, A C; Carr, G J; Margolin, B H; Brooks, T; Douglas, G R; Liegibel, U M; Suzuki, T; Thybaud, V; van Delft, J H; Gorelick, N J
1997-02-14
Experimental features of a positive selection transgenic mouse mutation assay based on a lambda lacZ transgene are considered in detail, with emphasis on results using germ cells as the target tissue. Sources of variability in the experimental protocol that can affect the statistical nature of the observations are examined, with the goal of identifying sources of excess variation in the observed mutant frequencies. The sources include plate-to-plate (within packages), package-to-package (within animals), and animal-to-animal variability. Data from five laboratories are evaluated in detail. Results suggest only scattered patterns of excess variability below the animal-to-animal level, but, generally, significant excess variability at the animal-to-animal level. Using source of variability analyses to guide the choice of statistical methods, control-vs-treatment comparisons are performed for assessing the male germ cell mutagenicity of ethylnitrosourea (ENU), isopropyl methanesulfonate (iPMS), and methyl methanesulfonate (MMS). Results on male germ cell mutagenesis of ethyl methanesulfonate (EMS) and methylnitrosourea (MNU) are also reported.
The Relationship between Selected Body Composition Variables and Muscular Endurance in Women
Esco, Michael R.; Olson, Michele S.; Williford, Henry N.
2010-01-01
The primary purpose of this study was to determine if muscular endurance is affected by referenced waist circumference groupings, independent of body mass and subcutaneous abdominal fat, in women. This study also explored whether selected body composition measures were associated with muscular endurance. Eighty-four women were measured for height,…
A DNA-based system for selecting and displaying the combined result of two input variables
DEFF Research Database (Denmark)
Liu, Huajie; Wang, Jianbang; Song, S
2015-01-01
demonstrate this capability in a DNA-based system that takes two input numbers, represented in DNA strands, and returns the result of their multiplication, writing this as a number in a display. Unlike a conventional calculator, this system operates by selecting the result from a library of solutions rather...
Empirically Driven Variable Selection for the Estimation of Causal Effects with Observational Data
Keller, Bryan; Chen, Jianshen
2016-01-01
Observational studies are common in educational research, where subjects self-select or are otherwise non-randomly assigned to different interventions (e.g., educational programs, grade retention, special education). Unbiased estimation of a causal effect with observational data depends crucially on the assumption of ignorability, which specifies…
Directory of Open Access Journals (Sweden)
da Silva SMD
2016-03-01
Full Text Available Silvia Maria Doria da Silva, Ilma Aparecida Paschoal, Eduardo Mello De Capitani, Marcos Mello Moreira, Luciana Campanatti Palhares, Mônica Corso PereiraPneumology Service, Department of Internal Medicine, School of Medical Sciences, State University of Campinas (UNICAMP, Campinas, São Paulo, BrazilBackground: Computed tomography (CT phenotypic characterization helps in understanding the clinical diversity of chronic obstructive pulmonary disease (COPD patients, but its clinical relevance and its relationship with functional features are not clarified. Volumetric capnography (VC uses the principle of gas washout and analyzes the pattern of CO2 elimination as a function of expired volume. The main variables analyzed were end-tidal concentration of carbon dioxide (ETCO2, Slope of phase 2 (Slp2, and Slope of phase 3 (Slp3 of capnogram, the curve which represents the total amount of CO2 eliminated by the lungs during each breath.Objective: To investigate, in a group of patients with severe COPD, if the phenotypic analysis by CT could identify different subsets of patients, and if there was an association of CT findings and functional variables.Subjects and methods: Sixty-five patients with COPD Gold III–IV were admitted for clinical evaluation, high-resolution CT, and functional evaluation (spirometry, 6-minute walk test [6MWT], and VC. The presence and profusion of tomography findings were evaluated, and later, the patients were identified as having emphysema (EMP or airway disease (AWD phenotype. EMP and AWD groups were compared; tomography findings scores were evaluated versus spirometric, 6MWT, and VC variables.Results: Bronchiectasis was found in 33.8% and peribronchial thickening in 69.2% of the 65 patients. Structural findings of airways had no significant correlation with spirometric variables. Air trapping and EMP were strongly correlated with VC variables, but in opposite directions. There was some overlap between the EMP and AWD
Prater, Tracie; Tilson, Will; Jones, Zack
2015-01-01
The absence of an economy of scale in spaceflight hardware makes additive manufacturing an immensely attractive option for propulsion components. As additive manufacturing techniques are increasingly adopted by government and industry to produce propulsion hardware in human-rated systems, significant development efforts are needed to establish these methods as reliable alternatives to conventional subtractive manufacturing. One of the critical challenges facing powder bed fusion techniques in this application is variability between machines used to perform builds. Even with implementation of robust process controls, it is possible for two machines operating at identical parameters with equivalent base materials to produce specimens with slightly different material properties. The machine variability study presented here evaluates 60 specimens of identical geometry built using the same parameters. 30 samples were produced on machine 1 (M1) and the other 30 samples were built on machine 2 (M2). Each of the 30-sample sets were further subdivided into three subsets (with 10 specimens in each subset) to assess the effect of progressive heat treatment on machine variability. The three categories for post-processing were: stress relief, stress relief followed by hot isostatic press (HIP), and stress relief followed by HIP followed by heat treatment per AMS 5664. Each specimen (a round, smooth tensile) was mechanically tested per ASTM E8. Two formal statistical techniques, hypothesis testing for equivalency of means and one-way analysis of variance (ANOVA), were applied to characterize the impact of machine variability and heat treatment on six material properties: tensile stress, yield stress, modulus of elasticity, fracture elongation, and reduction of area. This work represents the type of development effort that is critical as NASA, academia, and the industrial base work collaboratively to establish a path to certification for additively manufactured parts. For future
One-Stage and Bayesian Two-Stage Optimal Designs for Mixture Models
Lin, Hefang
1999-01-01
In this research, Bayesian two-stage D-D optimal designs for mixture experiments with or without process variables under model uncertainty are developed. A Bayesian optimality criterion is used in the first stage to minimize the determinant of the posterior variances of the parameters. The second stage design is then generated according to an optimality procedure that collaborates with the improved model from first stage data. Our results show that the Bayesian two-stage D-D optimal design...
Directory of Open Access Journals (Sweden)
T. Van Vuuren
2012-12-01
Full Text Available Purpose: The purpose of the research that informed this article was to examine the relationship between customer satisfaction, trust, supplier image, commitment and customer loyalty within an optometric practice environment. Problem investigated: Optometric businesses need to adopt their strategies to enhance loyalty, as customer satisfaction is not enough to ensure loyalty and customer retention. An understanding of the variables influencing loyalty could help businesses within the optometric service environment to retain their customers and become more profitable. Methodology: The methodological approach followed was exploratory and quantitative in nature. The sample consisted of 357 customers who visited the practice twice or more over the previous six years. A structured questionnaire, with a five-point Likert scale, was fielded to gather the data. The descriptive and multiple regression analysis approach was used to analyse the results. Collinearity statistics and Pearson's correlation coefficient were also calculated to determine which independent variable has the largest influence on customer loyalty. Findings and implications: The main finding is that customer satisfaction had the highest correlation with customer loyalty. The other independent variables, however, also appear to significantly influence customer loyalty within an optometric practice environment. The implication is that optometric practices need to focus on customer satisfaction, trust, supplier image and commitment when addressing the improvement of customer loyalty. Originality and value of the research: The article contributes to the improvement of customer loyalty within a service business environment that could assist in facilitating larger market share, higher customer retention and greater profitability for the business over the long term.
Wahid, Abdul; Khan, Dost Muhammad; Hussain, Ijaz
2017-01-01
High dimensional data are commonly encountered in various scientific fields and pose great challenges to modern statistical analysis. To address this issue different penalized regression procedures have been introduced in the litrature, but these methods cannot cope with the problem of outliers and leverage points in the heavy tailed high dimensional data. For this purppose, a new Robust Adaptive Lasso (RAL) method is proposed which is based on pearson residuals weighting scheme. The weight function determines the compatibility of each observations and downweight it if they are inconsistent with the assumed model. It is observed that RAL estimator can correctly select the covariates with non-zero coefficients and can estimate parameters, simultaneously, not only in the presence of influential observations, but also in the presence of high multicolliearity. We also discuss the model selection oracle property and the asymptotic normality of the RAL. Simulations findings and real data examples also demonstrate the better performance of the proposed penalized regression approach.
Selected topics in the classical theory of functions of a complex variable
Heins, Maurice
2014-01-01
Elegant and concise, this text is geared toward advanced undergraduate students acquainted with the theory of functions of a complex variable. The treatment presents such students with a number of important topics from the theory of analytic functions that may be addressed without erecting an elaborate superstructure. These include some of the theory's most celebrated results, which seldom find their way into a first course. After a series of preliminaries, the text discusses properties of meromorphic functions, the Picard theorem, and harmonic and subharmonic functions. Subsequent topics incl
Select injury-related variables are affected by stride length and foot strike style during running.
Boyer, Elizabeth R; Derrick, Timothy R
2015-09-01
Some frontal plane and transverse plane variables have been associated with running injury, but it is not known if they differ with foot strike style or as stride length is shortened. To identify if step width, iliotibial band strain and strain rate, positive and negative free moment, pelvic drop, hip adduction, knee internal rotation, and rearfoot eversion differ between habitual rearfoot and habitual mid-/forefoot strikers when running with both a rearfoot strike (RFS) and a mid-/forefoot strike (FFS) at 3 stride lengths. Controlled laboratory study. A total of 42 healthy runners (21 habitual rearfoot, 21 habitual mid-/forefoot) ran overground at 3.35 m/s with both a RFS and a FFS at their preferred stride lengths and 5% and 10% shorter. Variables did not differ between habitual groups. Step width was 1.5 cm narrower for FFS, widening to 0.8 cm as stride length shortened. Iliotibial band strain and strain rate did not differ between foot strikes but decreased as stride length shortened (0.3% and 1.8%/s, respectively). Pelvic drop was reduced 0.7° for FFS compared with RFS, and both pelvic drop and hip adduction decreased as stride length shortened (0.8° and 1.5°, respectively). Peak knee internal rotation was not affected by foot strike or stride length. Peak rearfoot eversion was not different between foot strikes but decreased 0.6° as stride length shortened. Peak positive free moment (normalized to body weight [BW] and height [h]) was not affected by foot strike or stride length. Peak negative free moment was -0.0038 BW·m/h greater for FFS and decreased -0.0004 BW·m/h as stride length shortened. The small decreases in most variables as stride length shortened were likely associated with the concomitant wider step width. RFS had slightly greater pelvic drop, while FFS had slightly narrower step width and greater negative free moment. Shortening one's stride length may decrease or at least not increase propensity for running injuries based on the variables
Computed ABC Analysis for Rational Selection of Most Informative Variables in Multivariate Data
Ultsch, Alfred; Lötsch, Jörn
2015-01-01
Objective Multivariate data sets often differ in several factors or derived statistical parameters, which have to be selected for a valid interpretation. Basing this selection on traditional statistical limits leads occasionally to the perception of losing information from a data set. This paper proposes a novel method for calculating precise limits for the selection of parameter sets. Methods The algorithm is based on an ABC analysis and calculates these limits on the basis of the mathematical properties of the distribution of the analyzed items. The limits im-plement the aim of any ABC analysis, i.e., comparing the increase in yield to the required additional effort. In particular, the limit for set A, the “important few”, is optimized in a way that both, the effort and the yield for the other sets (B and C), are minimized and the additional gain is optimized. Results As a typical example from biomedical research, the feasibility of the ABC analysis as an objective replacement for classical subjective limits to select highly relevant variance components of pain thresholds is presented. The proposed method improved the biological inter-pretation of the results and increased the fraction of valid information that was obtained from the experimental data. Conclusions The method is applicable to many further biomedical problems in-cluding the creation of diagnostic complex biomarkers or short screening tests from comprehensive test batteries. Thus, the ABC analysis can be proposed as a mathematically valid replacement for traditional limits to maximize the information obtained from multivariate research data. PMID:26061064
Travelling green : Variables influencing students’ intention to select a green hotel
Lindqvist, Julia; Andersson, Mikaela
2015-01-01
Problematization: Tourism has a major impact on the environment. However, there is a conflict of interest making it difficult for the hotel business to decrease this impact. On the one hand, there is a pressure for environmentally friendly behaviour from society. On the other hand, the customers want to be pampered during their hotel stay. This makes it necessary to further investigate what influences customers’ intention to select a green hotel. Therefore this thesis examines students’ inten...
Bayesian Modelling of Functional Whole Brain Connectivity
DEFF Research Database (Denmark)
Røge, Rasmus
This thesis deals with parcellation of whole-brain functional magnetic resonance imaging (fMRI) using Bayesian inference with mixture models tailored to the fMRI data. In the three included papers and manuscripts, we analyze two different approaches to modeling fMRI signal; either we accept...... the prevalent strategy of standardizing of fMRI time series and model data using directional statistics or we model the variability in the signal across the brain and across multiple subjects. In either case, we use Bayesian nonparametric modeling to automatically learn from the fMRI data the number...... of funcional units, i.e. parcels. We benchmark the proposed mixture models against state of the art methods of brain parcellation, both probabilistic and non-probabilistic. The time series of each voxel are most often standardized using z-scoring which projects the time series data onto a hypersphere...
Bayesian analysis in plant pathology.
Mila, A L; Carriquiry, A L
2004-09-01
ABSTRACT Bayesian methods are currently much discussed and applied in several disciplines from molecular biology to engineering. Bayesian inference is the process of fitting a probability model to a set of data and summarizing the results via probability distributions on the parameters of the model and unobserved quantities such as predictions for new observations. In this paper, after a short introduction of Bayesian inference, we present the basic features of Bayesian methodology using examples from sequencing genomic fragments and analyzing microarray gene-expressing levels, reconstructing disease maps, and designing experiments.
Directory of Open Access Journals (Sweden)
K. KAMALAKKANNAN
2011-06-01
Full Text Available The purpose of this study is to analyze the effect of aquatic plyometric training with and without the use ofweights on selected physical fitness variables among volleyball players. To achieve the purpose of these study 36physically active undergraduate volleyball players between 18 and 20 years of age volunteered as participants.The participants were randomly categorized into three groups of 12 each: a control group (CG, an aquaticPlyometric training with weight group (APTWG, and an aquatic Plyometric training without weight group(APTWOG. The subjects of the control group were not exposed to any training. Both experimental groupsunderwent their respective experimental treatment for 12 weeks, 3 days per week and a single session on eachday. Speed, endurance, and explosive power were measured as the dependent variables for this study. 36 days ofexperimental treatment was conducted for all the groups and pre and post data was collected. The collected datawere analyzed using an analysis of covariance (ANCOVA and followed by a Scheffé’s post hoc test. The resultsrevealed significant differences between groups on all the selected dependent variables. This study demonstratedthat aquatic plyometric training can be one effective means for improving speed, endurance, and explosivepower in volley ball players
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected
Murphy, Hannah M; Warren-Myers, Fletcher W; Jenkins, Gregory P; Hamer, Paul A; Swearer, Stephen E
2014-08-01
In fishes, the growth-mortality hypothesis has received broad acceptance as a driver of recruitment variability. Recruitment is likely to be lower in years when the risk of starvation and predation in the larval stage is greater, leading to higher mortality. Juvenile snapper, Pagrus auratus (Sparidae), experience high recruitment variation in Port Phillip Bay, Australia. Using a 5-year (2005, 2007, 2008, 2010, 2011) data set of larval and juvenile snapper abundances and their daily growth histories, based on otolith microstructure, we found selective mortality acted on larval size at 5 days post-hatch in 4 low and average recruitment years. The highest recruitment year (2005) was characterised by no size-selective mortality. Larval growth of the initial larval population was related to recruitment, but larval growth of the juveniles was not. Selective mortality may have obscured the relationship between larval traits of the juveniles and recruitment as fast-growing and large larvae preferentially survived in lower recruitment years and fast growth was ubiquitous in high recruitment years. An index of daily mortality within and among 3 years (2007, 2008, 2010), where zooplankton were concurrently sampled with ichthyoplankton, was related to per capita availability of preferred larval prey, providing support for the match-mismatch hypothesis. In 2010, periods of low daily mortality resulted in no selective mortality. Thus both intra- and inter-annual variability in the magnitude and occurrence of selective mortality in species with complex life cycles can obscure relationships between larval traits and population replenishment, leading to underestimation of their importance in recruitment studies.
Johnson, Darren W; Monro, Keyne; Marshall, Dustin J
2013-05-01
Why are sperm so variable despite having a singular, critical function and an intimate relationship with fitness? A key to understanding the evolution of sperm morphology is identifying which traits enable sperm to be successful fertilizers. Several sperm traits (e.g., tail length, overall size) are implicated in sperm performance, but the benefits of these traits are likely to be highly context dependent. Here, we examined phenotypic selection on sperm morphology of a broadcast spawning tube worm (Galeolaria gemineoa). We conducted laboratory experiments to measure the relationship between average sperm morphology and relative fertilization success across a range of sperm environments that were designed to approximate the range of sperm concentrations and ages encountered by eggs in nature. We found that the strength and form of multivariate selection varied substantially across our environmental gradients. Sperm with long tails and small heads were favored in high-concentration environments, whereas sperm with long heads were favored at low concentrations and old ages. We suggest variation in the local fertilization environment and resulting differences in selection can preserve variability in sperm morphology both within and among males. © 2012 The Author(s). Evolution © 2012 The Society for the Study of Evolution.
Variable selection methods in PLS regression - a comparison study on metabolomics data
DEFF Research Database (Denmark)
Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach
Partial least squares regression (PLSR) has been applied to various fields such as psychometrics, consumer science, econometrics and process control. Recently it has been applied to metabolomics based data sets (GC/LC-MS, NMR) and proven to be a very powerful in situations with many variables...... for the purpose of reducing over-fitting problems and providing useful interpretation tools. It has excellent possibilities for giving a graphical overview of sample and variation patterns. It can handle co-linearity in an efficient way and make it possible to use different highly correlated data sets in one...... Integrating Omics data. Statistical Applications in Genetics and Molecular Biology, 7:Article 35, 2008. 2. Martens H and Martens M. Modifed Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR). Food Quality and Preference, 11:5-16, 2000....
Bogdanovski, Rumen G.
2015-06-01
The observations of variable stars, especially those which show fast changes in their brightness, require high speed and high precision photometry. In order to study events like low amplitude optical oscillations and small scale fluctuations in the light curves, synchronous observations are required. These observations have to be carried out simultaneously at two or more, preferably distant, sites (Romanyuk et al., 2001), which allows the identification and elimination of artifacts produced by the equipment and the atmospheric interferences. In this way the fine structure of the light curve is revealed with a significant certainty. In order to study these events a new high speed time synchronized photometric system had to be designed, which addresses the requirements of the observations of high frequency subtle phenomena during stellar flares. It provides remote automatedand centralized control of the photometric equipment over a computer network,as well as remotemonitoring. Furthermore, some preliminary data processing can be performed at the time the data is obtained.
Directory of Open Access Journals (Sweden)
Concepcion Martires
1990-12-01
Full Text Available This study sought to determine the relationship between the conflict management styles of managers and certain management and organization factors. A total of 462 top, middle, and lower managers from 72 companies participated in the study which utilized the Thomas-Killman Conflict Mode Instrument. To facilitate the computation of the statistical data, a microcomputer and a software package was used.The majority of the managers of the 17 types of organization included in the study use collaborative mode of managing conflict. This finding is congruent with the findings of past studies conducted on managers of commercial banks, service, manufacturing, trading advertising, appliance, investment houses, and overseas recruitment industries showing their high degree of objectivity and assertiveness of their own personal goals and of other people's concerns. The second dominant style, which is compromising, indicates their desire in sharing and searching for solutions that result in satisfaction among conflicting parties. This finding is highly consistent with the strong Filipino value of smooth interpersonal relationships (SIR as reflected and discussed in the numerous researches on Filipino values.The chi-square tests generated by the computer package in statistics showed independence between the manager's conflict management styles and each of the variables of sex, civil status, position level at work, work experience, type of corporation, and number of subordinates. This result is again congruent with those of past studies conducted in the Philippines. The past and present findings may imply that conflict management mode may be a highly personal style that is not dependent on any of these variables included in the study. However, the chi-square tests show that management style is dependent on the manager's age and educational attainment.
Energy Technology Data Exchange (ETDEWEB)
Putnam, Russell A., E-mail: putnamr@uwindsor.ca [Department of Physics, University of Windsor, Windsor, Ontario N9B 3P4 (Canada); Mohaidat, Qassem I., E-mail: q.muhaidat@yu.edu.jo [Department of Physics, Yarmouk University, Irbid 21163 (Jordan); Daabous, Andrew, E-mail: daabousa@uwindsor.ca [Department of Physics, University of Windsor, Windsor, Ontario N9B 3P4 (Canada); Rehse, Steven J., E-mail: rehse@uwindsor.ca [Department of Physics, University of Windsor, Windsor, Ontario N9B 3P4 (Canada)
2013-09-01
Laser-induced breakdown spectroscopy has been used to obtain spectral fingerprints from live bacterial specimens from thirteen distinct taxonomic bacterial classes representative of five bacterial genera. By taking sums, ratios, and complex ratios of measured atomic emission line intensities three unique sets of independent variables (models) were constructed to determine which choice of independent variables provided optimal genus-level classification of unknown specimens utilizing a discriminant function analysis. A model composed of 80 independent variables constructed from simple and complex ratios of the measured emission line intensities was found to provide the greatest sensitivity and specificity. This model was then used in a partial least squares discriminant analysis to compare the performance of this multivariate technique with a discriminant function analysis. The partial least squares discriminant analysis possessed a higher true positive rate, possessed a higher false positive rate, and was more effective at distinguishing between highly similar spectra from closely related bacterial genera. This suggests it may be the preferred multivariate technique in future species-level or strain-level classifications. - Highlights: • Laser-induced breakdown spectroscopy was used to classify bacteria by genus. • We examine three different independent variable down selection models. • A PLS-DA returned higher rates of true positives than a DFA. • A PLS-DA returned higher rates of false positives than a DFA. • A PLS-DA was better able to discriminate similar spectra compared to DFA.
Genome scans for detecting footprints of local adaptation using a Bayesian factor model.
Duforet-Frebourg, Nicolas; Bazin, Eric; Blum, Michael G B
2014-09-01
There is a considerable impetus in population genomics to pinpoint loci involved in local adaptation. A powerful approach to find genomic regions subject to local adaptation is to genotype numerous molecular markers and look for outlier loci. One of the most common approaches for selection scans is based on statistics that measure population differentiation such as FST. However, there are important caveats with approaches related to FST because they require grouping individuals into populations and they additionally assume a particular model of population structure. Here, we implement a more flexible individual-based approach based on Bayesian factor models. Factor models capture population structure with latent variables called factors, which can describe clustering of individuals into populations or isolation-by-distance patterns. Using hierarchical Bayesian modeling, we both infer population structure and identify outlier loci that are candidates for local adaptation. In order to identify outlier loci, the hierarchical factor model searches for loci that are atypically related to population structure as measured by the latent factors. In a model of population divergence, we show that it can achieve a 2-fold or more reduction of false discovery rate compared with the software BayeScan or with an FST approach. We show that our software can handle large data sets by analyzing the single nucleotide polymorphisms of the Human Genome Diversity Project. The Bayesian factor model is implemented in the open-source PCAdapt software. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Directory of Open Access Journals (Sweden)
Abdul Wahid
Full Text Available High dimensional data are commonly encountered in various scientific fields and pose great challenges to modern statistical analysis. To address this issue different penalized regression procedures have been introduced in the litrature, but these methods cannot cope with the problem of outliers and leverage points in the heavy tailed high dimensional data. For this purppose, a new Robust Adaptive Lasso (RAL method is proposed which is based on pearson residuals weighting scheme. The weight function determines the compatibility of each observations and downweight it if they are inconsistent with the assumed model. It is observed that RAL estimator can correctly select the covariates with non-zero coefficients and can estimate parameters, simultaneously, not only in the presence of influential observations, but also in the presence of high multicolliearity. We also discuss the model selection oracle property and the asymptotic normality of the RAL. Simulations findings and real data examples also demonstrate the better performance of the proposed penalized regression approach.
Rigét, Frank; Vorkamp, Katrin; Hobson, Keith A; Muir, Derek C G; Dietz, Rune
2013-09-01
Temporal trends of selected POPs (PCB-52 and 153, p,p'-DDE, HCB, α- and β-HCH) in blubber of ringed seals (Pusa hispida) collected from the early 1990s to 2010 from central West Greenland were studied. In this period, the climate of Greenland warmed and the influences of climate indices such as winter sea-ice coverage (November-May), the number of sea-ice days during winter in Disko Bay, water temperature and salinity at Fyllas Banke during the preceding summer and the Arctic Oscillation Index (AOI) during the preceding winter on concentrations of selected POPs were evaluated using multiple regressions and an information-theoretic approach. Biological co-variables such as age, sex and trophic position (as determined by δ(15)N analysis) of seals were also evaluated. Decreasing levels of the selected POPs were found in all cases and with the highest rate for α-HCH (-10.5% annually) and the lowest rate for β-HCH (-1.9% annually). Sex and age were found to have strong predictive power in the case of PCB-52 and trophic position in the case of p,p'-DDE. Among the climate indices the strongest predictive power was found for the number of sea-ice days in the case of PCB-52, the AOI winter index in the case of α-HCH and salinity at Fyllas Banke during the preceding summer in the case of β-HCH. The present study documents the need for including both biological variables and climate variability parameters in temporal trend studies of POPs in Arctic biota.
Congdon, Peter
2014-01-01
This book provides an accessible approach to Bayesian computing and data analysis, with an emphasis on the interpretation of real data sets. Following in the tradition of the successful first edition, this book aims to make a wide range of statistical modeling applications accessible using tested code that can be readily adapted to the reader's own applications. The second edition has been thoroughly reworked and updated to take account of advances in the field. A new set of worked examples is included. The novel aspect of the first edition was the coverage of statistical modeling using WinBU
Temporal variability of selected chemical and physical properties of topsoil of three soil types
Jirku, Veronika; Kodesova, Radka; Nikodem, Antonin; Muhlhanselova, Marcela; Zigova, Anna
2013-04-01
Temporal variability of soil properties measured in surface horizons of three soil types (Haplic Cambisol, Greyic Phaeozem, Haplic Luvisol) was studied in years 2007, 2008, 2009 and 2010. Undisturbed soil samples were taken every month to evaluate the actual field soil-water content, bulk density, porosity and hydraulic properties. The grab soil samples were taken every month to evaluate aggregate stability using the WSA (water stable aggregates) index, pHH2O and pHKCl, soil organic matter content and quality. Unsaturated hydraulic conductivity for pressure head of -2 cm was measured directly in the field using the minidisk tension infiltrometer. In addition soil structure was documented on micromorphological images. In some cases, the similar trends of the pHH2O , pHKCl , A400/A600, rod, P, θfield or WSA values were observed in different soils. Interestingly, the similar trends were found mostly for the Haplic Cambisol and the Greyic Phaeozem despite the fact that these soils considerably differed (different soil substrate, pedogenetic processes, etc.) and that variable crops (winter wheat and spring wheat) were planted at both locations during two years (2007 and 2006). Mostly different trends were observed for the Haplic Luvisol and the Greyic Phaeozem (soil of the same substrate). The reason could be attributed to a high vulnerability of the Haplic Luvisol to soil degradation in comparison to that of the Greyic Phaeozem. Parameters of hydraulic properties were highly variable and did not show similar trends for different soils (except the saturated soil water content and the slope of the retention curve at the inflection point for Haplic Cambisol and Greyic Phaeozem). Soil structure, aggregate stability and soil hydraulic properties were interrelated and depended on plant growth, rainfall compaction and tillage. The drier conditions in some soils positively influenced the soil aggregate stability, the slope of the retention curve at the inflection point and
A Bayesian semiparametric Markov regression model for juvenile dermatomyositis.
De Iorio, Maria; Gallot, Natacha; Valcarcel, Beatriz; Wedderburn, Lucy
2018-02-20
Juvenile dermatomyositis (JDM) is a rare autoimmune disease that may lead to serious complications, even to death. We develop a 2-state Markov regression model in a Bayesian framework to characterise disease progression in JDM over time and gain a better understanding of the factors influencing disease risk. The transition probabilities between disease and remission state (and vice versa) are a function of time-homogeneous and time-varying covariates. These latter types of covariates are introduced in the model through a latent health state function, which describes patient-specific health over time and accounts for variability among patients. We assume a nonparametric prior based on the Dirichlet process to model the health state function and the baseline transition intensities between disease and remission state and vice versa. The Dirichlet process induces a clustering of the patients in homogeneous risk groups. To highlight clinical variables that most affect the transition probabilities, we perform variable selection using spike and slab prior distributions. Posterior inference is performed through Markov chain Monte Carlo methods. Data were made available from the UK JDM Cohort and Biomarker Study and Repository, hosted at the UCL Institute of Child Health. Copyright © 2018 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
Anna Zbierska
2016-09-01
Full Text Available The paper presents the evaluation of seasonal and long-term changes in selected nutrients of three lakes of the Poznań Lakeland. The lakes were selected due to the high risk of pollution from agricultural and residential areas. Water samples were taken in 6 control points in the spring, summer and autumn, from 2004 to 2014. Trophic status of the lakes was evaluated based on the concentration of nutrients (nitrates, nitrites, ammonium, nitrogen and phosphorus and indicators of eutrophication. Studies have shown that the concentration of nutrients varied greatly both in individual years and seasons of the analyzed decades, especially in Lakes Niepruszewskie and Pamiątkowskie. The main problem is the high concentration of nitrates. In general, it showed an upward trend until 2013, especially in the spring. This may indicate that actions restricting runoff pollution from agricultural sources have not been fully effective. On the other hand, a marked downward trend in the concentrations of NH4 over the years from 2004 to 2014, especially after 2007, indicates a gradual improvement of wastewater management. Moreover, seasonal variation in NH4 concentrations differed from those of NO3 and NO2. The highest values were reported in the autumn season, the lowest in the summer. Concentrations of nutrients and eutrophication indexes reached high values in all analysed lakes, indicating a eutrophic or hypertrophic state of the lakes. The high value of the N:P ratio indicates that the lakes had a huge surplus of nitrogen, and phosphorus is a productivity limiting factor.
Metrics for evaluating performance and uncertainty of Bayesian network models
Bruce G. Marcot
2012-01-01
This paper presents a selected set of existing and new metrics for gauging Bayesian network model performance and uncertainty. Selected existing and new metrics are discussed for conducting model sensitivity analysis (variance reduction, entropy reduction, case file simulation); evaluating scenarios (influence analysis); depicting model complexity (numbers of model...
Impact of oil price shocks on selected macroeconomic variables in Nigeria
International Nuclear Information System (INIS)
Iwayemi, Akin; Fowowe, Babajide
2011-01-01
The impact of oil price shocks on the macroeconomy has received a great deal of attention since the 1970 s. Initially, many empirical studies found a significant negative effect between oil price shocks and GDP but more recently, empirical studies have reported an insignificant relationship between oil shocks and the macroeconomy. A key feature of existing research is that it applies predominantly to advanced, oil-importing countries. For oil-exporting countries, different conclusions are expected but this can only be ascertained empirically. This study conducts an empirical analysis of the effects of oil price shocks on a developing country oil-exporter-Nigeria. Our findings showed that oil price shocks do not have a major impact on most macroeconomic variables in Nigeria. The results of the Granger-causality tests, impulse response functions, and variance decomposition analysis all showed that different measures of linear and positive oil shocks have not caused output, government expenditure, inflation, and the real exchange rate. The tests support the existence of asymmetric effects of oil price shocks because we find that negative oil shocks significantly cause output and the real exchange rate.
Application of SEAWAT to select variable-density and viscosity problems
Dausman, Alyssa M.; Langevin, Christian D.; Thorne, Danny T.; Sukop, Michael C.
2010-01-01
SEAWAT is a combined version of MODFLOW and MT3DMS, designed to simulate three-dimensional, variable-density, saturated groundwater flow. The most recent version of the SEAWAT program, SEAWAT Version 4 (or SEAWAT_V4), supports equations of state for fluid density and viscosity. In SEAWAT_V4, fluid density can be calculated as a function of one or more MT3DMS species, and optionally, fluid pressure. Fluid viscosity is calculated as a function of one or more MT3DMS species, and the program also includes additional functions for representing the dependence of fluid viscosity on temperature. This report documents testing of and experimentation with SEAWAT_V4 with six previously published problems that include various combinations of density-dependent flow due to temperature variations and/or concentration variations of one or more species. Some of the problems also include variations in viscosity that result from temperature differences in water and oil. Comparisons between the results of SEAWAT_V4 and other published results are generally consistent with one another, with minor differences considered acceptable.
On the selection of significant variables in a model for the deteriorating process of facades
Serrat, C.; Gibert, V.; Casas, J. R.; Rapinski, J.
2017-10-01
In previous works the authors of this paper have introduced a predictive system that uses survival analysis techniques for the study of time-to-failure in the facades of a building stock. The approach is population based, in order to obtain information on the evolution of the stock across time, and to help the manager in the decision making process on global maintenance strategies. For the decision making it is crutial to determine those covariates -like materials, morphology and characteristics of the facade, orientation or environmental conditions- that play a significative role in the progression of different failures. The proposed platform also incorporates an open source GIS plugin that includes survival and test moduli that allow the investigator to model the time until a lesion taking into account the variables collected during the inspection process. The aim of this paper is double: a) to shortly introduce the predictive system, as well as the inspection and the analysis methodologies and b) to introduce and illustrate the modeling strategy for the deteriorating process of an urban front. The illustration will be focused on the city of L’Hospitalet de Llobregat (Barcelona, Spain) in which more than 14,000 facades have been inspected and analyzed.
A Bayesian Approach to Person Fit Analysis in Item Response Theory Models. Research Report.
Glas, Cees A. W.; Meijer, Rob R.
A Bayesian approach to the evaluation of person fit in item response theory (IRT) models is presented. In a posterior predictive check, the observed value on a discrepancy variable is positioned in its posterior distribution. In a Bayesian framework, a Markov Chain Monte Carlo procedure can be used to generate samples of the posterior distribution…
Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi
2018-04-01
Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
Classification using Bayesian neural nets
J.C. Bioch (Cor); O. van der Meer; R. Potharst (Rob)
1995-01-01
textabstractRecently, Bayesian methods have been proposed for neural networks to solve regression and classification problems. These methods claim to overcome some difficulties encountered in the standard approach such as overfitting. However, an implementation of the full Bayesian approach to
Bayesian Data Analysis (lecture 1)
CERN. Geneva
2018-01-01
framework but we will also go into more detail and discuss for example the role of the prior. The second part of the lecture will cover further examples and applications that heavily rely on the bayesian approach, as well as some computational tools needed to perform a bayesian analysis.
Bayesian Data Analysis (lecture 2)
CERN. Geneva
2018-01-01
framework but we will also go into more detail and discuss for example the role of the prior. The second part of the lecture will cover further examples and applications that heavily rely on the bayesian approach, as well as some computational tools needed to perform a bayesian analysis.
Xu, Yunfei; Dass, Sarat; Maiti, Tapabrata
2016-01-01
This brief introduces a class of problems and models for the prediction of the scalar field of interest from noisy observations collected by mobile sensor networks. It also introduces the problem of optimal coordination of robotic sensors to maximize the prediction quality subject to communication and mobility constraints either in a centralized or distributed manner. To solve such problems, fully Bayesian approaches are adopted, allowing various sources of uncertainties to be integrated into an inferential framework effectively capturing all aspects of variability involved. The fully Bayesian approach also allows the most appropriate values for additional model parameters to be selected automatically by data, and the optimal inference and prediction for the underlying scalar field to be achieved. In particular, spatio-temporal Gaussian process regression is formulated for robotic sensors to fuse multifactorial effects of observations, measurement noise, and prior distributions for obtaining the predictive di...
The Bayesian Covariance Lasso.
Khondker, Zakaria S; Zhu, Hongtu; Chu, Haitao; Lin, Weili; Ibrahim, Joseph G
2013-04-01
Estimation of sparse covariance matrices and their inverse subject to positive definiteness constraints has drawn a lot of attention in recent years. The abundance of high-dimensional data, where the sample size ( n ) is less than the dimension ( d ), requires shrinkage estimation methods since the maximum likelihood estimator is not positive definite in this case. Furthermore, when n is larger than d but not sufficiently larger, shrinkage estimation is more stable than maximum likelihood as it reduces the condition number of the precision matrix. Frequentist methods have utilized penalized likelihood methods, whereas Bayesian approaches rely on matrix decompositions or Wishart priors for shrinkage. In this paper we propose a new method, called the Bayesian Covariance Lasso (BCLASSO), for the shrinkage estimation of a precision (covariance) matrix. We consider a class of priors for the precision matrix that leads to the popular frequentist penalties as special cases, develop a Bayes estimator for the precision matrix, and propose an efficient sampling scheme that does not precalculate boundaries for positive definiteness. The proposed method is permutation invariant and performs shrinkage and estimation simultaneously for non-full rank data. Simulations show that the proposed BCLASSO performs similarly as frequentist methods for non-full rank data.
Genetic variability and selection criteria in rice mutant lines as revealed by quantitative traits.
Oladosu, Yusuff; Rafii, M Y; Abdullah, Norhani; Abdul Malek, Mohammad; Rahim, H A; Hussin, Ghazali; Abdul Latif, Mohammad; Kareem, Isiaka
2014-01-01
Genetic based knowledge of different vegetative and yield traits play a major role in varietal improvement of rice. Genetic variation gives room for recombinants which are essential for the development of a new variety in any crop. Based on this background, this work was carried out to evaluate genetic diversity of derived mutant lines and establish relationships between their yield and yield components using multivariate analysis. To achieve this objective, two field trials were carried out on 45 mutant rice genotypes to evaluate their growth and yield traits. Data were taken on vegetative traits and yield and its components, while genotypic and phenotypic coefficients, variance components, expected genetic advance, and heritability were calculated. All the genotypes showed variations for vegetative traits and yield and its components. Also, there was positive relationship between the quantitative traits and the final yield with the exception of number of tillers. Finally, the evaluated genotypes were grouped into five major clusters based on the assessed traits with the aid of UPGMA dendrogram. So hybridization of group I with group V or group VI could be used to attain higher heterosis or vigour among the genotypes. Also, this evaluation could be useful in developing reliable selection indices for important agronomic traits in rice.
The influence of selected socio-demographic variables on symptoms occurring during the menopause
Directory of Open Access Journals (Sweden)
Marta Makara-Studzińska
2015-02-01
Full Text Available Introduction: It is considered that the lifestyle conditioned by socio-demographic or socio-economic factors determines the health condition of people to the greatest extent. The aim of this study is to evaluate the influence of selected socio-demographic factors on the kinds of symptoms occurring during menopause. Material and methods : The study group consisted of 210 women aged 45 to 65, not using hormone replacement therapy, staying at healthcare centers for rehabilitation treatment. The study was carried out in 2013-2014 in the Silesian, Podlaskie and Lesser Poland voivodeships. The set of tools consisted of the authors’ own survey questionnaire and the Menopause Rating Scale (MRS. Results : The most commonly occurring symptom in the group of studied women was a depressive mood, from the group of psychological symptoms, followed by physical and mental fatigue, and discomfort connected with muscle and joint pain. The greatest intensity of symptoms was observed in the group of women with the lowest level of education, reporting an average or bad material situation, and unemployed women. Conclusions : An alarmingly high number of reported psychological symptoms in the group of menopausal women was observed, and in particular among the group of low socio-economic status. Career seems to be a factor reducing the risk of occurrence of psychological symptoms. There is an urgent need for health promotion and prophylaxis in the group of menopausal women, and in many cases for implementation of specialist psychological assistance.
Bayesian inference with ecological applications
Link, William A
2009-01-01
This text is written to provide a mathematically sound but accessible and engaging introduction to Bayesian inference specifically for environmental scientists, ecologists and wildlife biologists. It emphasizes the power and usefulness of Bayesian methods in an ecological context. The advent of fast personal computers and easily available software has simplified the use of Bayesian and hierarchical models . One obstacle remains for ecologists and wildlife biologists, namely the near absence of Bayesian texts written specifically for them. The book includes many relevant examples, is supported by software and examples on a companion website and will become an essential grounding in this approach for students and research ecologists. Engagingly written text specifically designed to demystify a complex subject Examples drawn from ecology and wildlife research An essential grounding for graduate and research ecologists in the increasingly prevalent Bayesian approach to inference Companion website with analyt...
Bayesian Inference on Gravitational Waves
Directory of Open Access Journals (Sweden)
Asad Ali
2015-12-01
Full Text Available The Bayesian approach is increasingly becoming popular among the astrophysics data analysis communities. However, the Pakistan statistics communities are unaware of this fertile interaction between the two disciplines. Bayesian methods have been in use to address astronomical problems since the very birth of the Bayes probability in eighteenth century. Today the Bayesian methods for the detection and parameter estimation of gravitational waves have solid theoretical grounds with a strong promise for the realistic applications. This article aims to introduce the Pakistan statistics communities to the applications of Bayesian Monte Carlo methods in the analysis of gravitational wave data with an overview of the Bayesian signal detection and estimation methods and demonstration by a couple of simplified examples.
Joshi, Deepti; St-Hilaire, André; Daigle, Anik; Ouarda, Taha B. M. J.
2013-04-01
SummaryThis study attempts to compare the performance of two statistical downscaling frameworks in downscaling hydrological indices (descriptive statistics) characterizing the low flow regimes of three rivers in Eastern Canada - Moisie, Romaine and Ouelle. The statistical models selected are Relevance Vector Machine (RVM), an implementation of Sparse Bayesian Learning, and the Automated Statistical Downscaling tool (ASD), an implementation of Multiple Linear Regression. Inputs to both frameworks involve climate variables significantly (α = 0.05) correlated with the indices. These variables were processed using Canonical Correlation Analysis and the resulting canonical variates scores were used as input to RVM to estimate the selected low flow indices. In ASD, the significantly correlated climate variables were subjected to backward stepwise predictor selection and the selected predictors were subsequently used to estimate the selected low flow indices using Multiple Linear Regression. With respect to the correlation between climate variables and the selected low flow indices, it was observed that all indices are influenced, primarily, by wind components (Vertical, Zonal and Meridonal) and humidity variables (Specific and Relative Humidity). The downscaling performance of the framework involving RVM was found to be better than ASD in terms of Relative Root Mean Square Error, Relative Mean Absolute Bias and Coefficient of Determination. In all cases, the former resulted in less variability of the performance indices between calibration and validation sets, implying better generalization ability than for the latter.
Bayesian methodology for reliability model acceptance
International Nuclear Information System (INIS)
Zhang Ruoxue; Mahadevan, Sankaran
2003-01-01
This paper develops a methodology to assess the reliability computation model validity using the concept of Bayesian hypothesis testing, by comparing the model prediction and experimental observation, when there is only one computational model available to evaluate system behavior. Time-independent and time-dependent problems are investigated, with consideration of both cases: with and without statistical uncertainty in the model. The case of time-independent failure probability prediction with no statistical uncertainty is a straightforward application of Bayesian hypothesis testing. However, for the life prediction (time-dependent reliability) problem, a new methodology is developed in this paper to make the same Bayesian hypothesis testing concept applicable. With the existence of statistical uncertainty in the model, in addition to the application of a predictor estimator of the Bayes factor, the uncertainty in the Bayes factor is explicitly quantified through treating it as a random variable and calculating the probability that it exceeds a specified value. The developed method provides a rational criterion to decision-makers for the acceptance or rejection of the computational model
Moghimi, Saba; Schudlo, Larissa; Chau, Tom; Guerguerian, Anne-Marie
2015-01-01
Music-induced brain activity modulations in areas involved in emotion regulation may be useful in achieving therapeutic outcomes. Clinical applications of music may involve prolonged or repeated exposures to music. However, the variability of the observed brain activity patterns in repeated exposures to music is not well understood. We hypothesized that multiple exposures to the same music would elicit more consistent activity patterns than exposure to different music. In this study, the temporal and spatial variability of cerebral prefrontal hemodynamic response was investigated across multiple exposures to self-selected musical excerpts in 10 healthy adults. The hemodynamic changes were measured using prefrontal cortex near infrared spectroscopy and represented by instantaneous phase values. Based on spatial and temporal characteristics of these observed hemodynamic changes, we defined a consistency index to represent variability across these domains. The consistency index across repeated exposures to the same piece of music was compared to the consistency index corresponding to prefrontal activity from randomly matched non-identical musical excerpts. Consistency indexes were significantly different for identical versus non-identical musical excerpts when comparing a subset of repetitions. When all four exposures were compared, no significant difference was observed between the consistency indexes of randomly matched non-identical musical excerpts and the consistency index corresponding to repetitions of the same musical excerpts. This observation suggests the existence of only partial consistency between repeated exposures to the same musical excerpt, which may stem from the role of the prefrontal cortex in regulating other cognitive and emotional processes.
Muposhi, Victor K; Gandiwa, Edson; Chemura, Abel; Bartels, Paul; Makuza, Stanley M; Madiri, Tinaapi H
An understanding of the habitat selection patterns by wild herbivores is critical for adaptive management, particularly towards ecosystem management and wildlife conservation in semi arid savanna ecosystems. We tested the following predictions: (i) surface water availability, habitat quality and human presence have a strong influence on the spatial distribution of wild herbivores in the dry season, (ii) habitat suitability for large herbivores would be higher compared to medium-sized herbivores in the dry season, and (iii) spatial extent of suitable habitats for wild herbivores will be different between years, i.e., 2006 and 2010, in Matetsi Safari Area, Zimbabwe. MaxEnt modeling was done to determine the habitat suitability of large herbivores and medium-sized herbivores. MaxEnt modeling of habitat suitability for large herbivores using the environmental variables was successful for the selected species in 2006 and 2010, except for elephant (Loxodonta africana) for the year 2010. Overall, large herbivores probability of occurrence was mostly influenced by distance from rivers. Distance from roads influenced much of the variability in the probability of occurrence of medium-sized herbivores. The overall predicted area for large and medium-sized herbivores was not different. Large herbivores may not necessarily utilize larger habitat patches over medium-sized herbivores due to the habitat homogenizing effect of water provisioning. Effect of surface water availability, proximity to riverine ecosystems and roads on habitat suitability of large and medium-sized herbivores in the dry season was highly variable thus could change from one year to another. We recommend adaptive management initiatives aimed at ensuring dynamic water supply in protected areas through temporal closure and or opening of water points to promote heterogeneity of wildlife habitats.
Directory of Open Access Journals (Sweden)
Victor K Muposhi
Full Text Available An understanding of the habitat selection patterns by wild herbivores is critical for adaptive management, particularly towards ecosystem management and wildlife conservation in semi arid savanna ecosystems. We tested the following predictions: (i surface water availability, habitat quality and human presence have a strong influence on the spatial distribution of wild herbivores in the dry season, (ii habitat suitability for large herbivores would be higher compared to medium-sized herbivores in the dry season, and (iii spatial extent of suitable habitats for wild herbivores will be different between years, i.e., 2006 and 2010, in Matetsi Safari Area, Zimbabwe. MaxEnt modeling was done to determine the habitat suitability of large herbivores and medium-sized herbivores. MaxEnt modeling of habitat suitability for large herbivores using the environmental variables was successful for the selected species in 2006 and 2010, except for elephant (Loxodonta africana for the year 2010. Overall, large herbivores probability of occurrence was mostly influenced by distance from rivers. Distance from roads influenced much of the variability in the probability of occurrence of medium-sized herbivores. The overall predicted area for large and medium-sized herbivores was not different. Large herbivores may not necessarily utilize larger habitat patches over medium-sized herbivores due to the habitat homogenizing effect of water provisioning. Effect of surface water availability, proximity to riverine ecosystems and roads on habitat suitability of large and medium-sized herbivores in the dry season was highly variable thus could change from one year to another. We recommend adaptive management initiatives aimed at ensuring dynamic water supply in protected areas through temporal closure and or opening of water points to promote heterogeneity of wildlife habitats.
Bayesian networks in overlay recipe optimization
Binns, Lewis A.; Reynolds, Greg; Rigden, Timothy C.; Watkins, Stephen; Soroka, Andrew
2005-05-01
Currently, overlay measurements are characterized by "recipe", which defines both physical parameters such as focus, illumination et cetera, and also the software parameters such as algorithm to be used and regions of interest. Setting up these recipes requires both engineering time and wafer availability on an overlay tool, so reducing these requirements will result in higher tool productivity. One of the significant challenges to automating this process is that the parameters are highly and complexly correlated. At the same time, a high level of traceability and transparency is required in the recipe creation process, so a technique that maintains its decisions in terms of well defined physical parameters is desirable. Running time should be short, given the system (automatic recipe creation) is being implemented to reduce overheads. Finally, a failure of the system to determine acceptable parameters should be obvious, so a certainty metric is also desirable. The complex, nonlinear interactions make solution by an expert system difficult at best, especially in the verification of the resulting decision network. The transparency requirements tend to preclude classical neural networks and similar techniques. Genetic algorithms and other "global minimization" techniques require too much computational power (given system footprint and cost requirements). A Bayesian network, however, provides a solution to these requirements. Such a network, with appropriate priors, can be used during recipe creation / optimization not just to select a good set of parameters, but also to guide the direction of search, by evaluating the network state while only incomplete information is available. As a Bayesian network maintains an estimate of the probability distribution of nodal values, a maximum-entropy approach can be utilized to obtain a working recipe in a minimum or near-minimum number of steps. In this paper we discuss the potential use of a Bayesian network in such a capacity
Shen, Chung-Wei; Chen, Yi-Hau
2018-03-13
We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.
Towards port sustainability through probabilistic models: Bayesian networks
Directory of Open Access Journals (Sweden)
B. Molina
2018-04-01
Full Text Available It is necessary that a manager of an infrastructure knows relations between variables. Using Bayesian networks, variables can be classified, predicted and diagnosed, being able to estimate posterior probability of the unknown ones based on known ones. The proposed methodology has generated a database with port variables, which have been classified as economic, social, environmental and institutional, as addressed in of smart ports studies made in all Spanish Port System. Network has been developed using an acyclic directed graph, which have let us know relationships in terms of parents and sons. In probabilistic terms, it can be concluded from the constructed network that the most decisive variables for port sustainability are those that are part of the institutional dimension. It has been concluded that Bayesian networks allow modeling uncertainty probabilistically even when the number of variables is high as it occurs in port planning and exploitation.
Bayesian Recurrent Neural Network for Language Modeling.
Chien, Jen-Tzung; Ku, Yuan-Chu
2016-02-01
A language model (LM) is calculated as the probability of a word sequence that provides the solution to word prediction for a variety of information systems. A recurrent neural network (RNN) is powerful to learn the large-span dynamics of a word sequence in the continuous space. However, the training of the RNN-LM is an ill-posed problem because of too many parameters from a large dictionary size and a high-dimensional hidden layer. This paper presents a Bayesian approach to regularize the RNN-LM and apply it for continuous speech recognition. We aim to penalize the too complicated RNN-LM by compensating for the uncertainty of the estimated model parameters, which is represented by a Gaussian prior. The objective function in a Bayesian classification network is formed as the regularized cross-entropy error function. The regularized model is constructed not only by calculating the regularized parameters according to the maximum a posteriori criterion but also by estimating the Gaussian hyperparameter by maximizing the marginal likelihood. A rapid approximation to a Hessian matrix is developed to implement the Bayesian RNN-LM (BRNN-LM) by selecting a small set of salient outer-products. The proposed BRNN-LM achieves a sparser model than the RNN-LM. Experiments on different corpora show the robustness of system performance by applying the rapid BRNN-LM under different conditions.
DEFF Research Database (Denmark)
Hartelius, Karsten; Carstensen, Jens Michael
2003-01-01
A method for locating distorted grid structures in images is presented. The method is based on the theories of template matching and Bayesian image restoration. The grid is modeled as a deformable template. Prior knowledge of the grid is described through a Markov random field (MRF) model which...... represents the spatial coordinates of the grid nodes. Knowledge of how grid nodes are depicted in the observed image is described through the observation model. The prior consists of a node prior and an arc (edge) prior, both modeled as Gaussian MRFs. The node prior models variations in the positions of grid...... nodes and the arc prior models variations in row and column spacing across the grid. Grid matching is done by placing an initial rough grid over the image and applying an ensemble annealing scheme to maximize the posterior distribution of the grid. The method can be applied to noisy images with missing...
Bayesian Geostatistical Design
DEFF Research Database (Denmark)
Diggle, Peter; Lophaven, Søren Nymand
2006-01-01
This paper describes the use of model-based geostatistics for choosing the set of sampling locations, collectively called the design, to be used in a geostatistical analysis. Two types of design situation are considered. These are retrospective design, which concerns the addition of sampling...... locations to, or deletion of locations from, an existing design, and prospective design, which consists of choosing positions for a new set of sampling locations. We propose a Bayesian design criterion which focuses on the goal of efficient spatial prediction whilst allowing for the fact that model...... parameter values are unknown. The results show that in this situation a wide range of interpoint distances should be included in the design, and the widely used regular design is often not the best choice....
deal: A Package for Learning Bayesian Networks
Directory of Open Access Journals (Sweden)
Susanne G. Boettcher
2003-12-01
Full Text Available deal is a software package for use with R. It includes several methods for analysing data using Bayesian networks with variables of discrete and/or continuous types but restricted to conditionally Gaussian networks. Construction of priors for network parameters is supported and their parameters can be learned from data using conjugate updating. The network score is used as a metric to learn the structure of the network and forms the basis of a heuristic search strategy. deal has an interface to Hugin.
Directory of Open Access Journals (Sweden)
Alexandra Ferenczi Vaňová
2017-01-01
Full Text Available 1024x768 The article presents the influence assessment of significance of some selected variables from the entrepreneurs' accounting system on the achieved profit or loss of the agricultural companies in the Slovak Republic. Accounting information serves as an active tool for internal users for operational as well as strategic company management, and for external users the information is determined as legally binding output information which is a subject to disclosure. Individual financial statements of assessed agricultural companies are considered to be the relevant source of information. Agricultural companies are represented by commercial companies and agricultural cooperatives. Profit or loss after income tax presents the final complex effect of economic company's performance. The existence and development of companies is conditioned by assets which amount and structure depend on focus and the range of subject activity but as well as on specific factors set by the production process in the agricultural primary production. The increase in liabilities is notable by the influence of unsufficient amount of own company funding sources, mainly the increase in trade payables. The continuance of company reproduction process is secured by a bank loan drawdown. The income situation of companies of agricultural primary production is favourably influenced by the subsidies of non-investment character. During the observed period of years 2004 - 2014 the examined variables were assessed by means of statistical methods. The obtained results of rate determination of statistical correlation between selected variables by means of classical canonical analysis and non-parametric correlation analysis secured that in the assessed group of companies all analysed variables influenced statistically significantly profit or loss after income tax, mainly the total value of assets and non-investment subsidies, except for years 2010, 2012 a 2013, when the statistically
Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G
2015-11-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from
Learning from incomplete data in Bayesian networks with qualitative influences
Masegosa, Andrés; Feelders, A.J.; van der Gaag, L.C.
2016-01-01
Domain experts can often quite reliably specify the sign of influences between variables in a Bayesian network. If we exploit this prior knowledge in estimating the probabilities of the network, it is more likely to be accepted by its users and may in fact be better calibrated with reality. We
A Bayesian Panel Data Approach to Explaining Market Beta Dynamics
R. Bauer (Rob); M.M.J.E. Cosemans (Mathijs); R. Frehen (Rik); P.C. Schotman (Peter)
2008-01-01
markdownabstractWe characterize the process that drives the market betas of individual stocks by setting up a hierarchical Bayesian panel data model that allows a flexible specification for beta. We show that combining the parametric relationship between betas and conditioning variables specified by
An Integrated Procedure for Bayesian Reliability Inference Using MCMC
Directory of Open Access Journals (Sweden)
Jing Lin
2014-01-01
Full Text Available The recent proliferation of Markov chain Monte Carlo (MCMC approaches has led to the use of the Bayesian inference in a wide variety of fields. To facilitate MCMC applications, this paper proposes an integrated procedure for Bayesian inference using MCMC methods, from a reliability perspective. The goal is to build a framework for related academic research and engineering applications to implement modern computational-based Bayesian approaches, especially for reliability inferences. The procedure developed here is a continuous improvement process with four stages (Plan, Do, Study, and Action and 11 steps, including: (1 data preparation; (2 prior inspection and integration; (3 prior selection; (4 model selection; (5 posterior sampling; (6 MCMC convergence diagnostic; (7 Monte Carlo error diagnostic; (8 model improvement; (9 model comparison; (10 inference making; (11 data updating and inference improvement. The paper illustrates the proposed procedure using a case study.
Huang, Minxue; Corbin, Joshua R; Dolan, Nicholas S; Fry, Charles G; Vinokur, Anastasiya I; Guzei, Ilia A; Schomaker, Jennifer M
2017-06-05
An array of silver complexes supported by nitrogen-donor ligands catalyze the transformation of C═C and C-H bonds to valuable C-N bonds via nitrene transfer. The ability to achieve high chemoselectivity and site selectivity in an amination event requires an understanding of both the solid- and solution-state behavior of these catalysts. X-ray structural characterizations were helpful in determining ligand features that promote the formation of monomeric versus dimeric complexes. Variable-temperature 1 H and DOSY NMR experiments were especially useful for understanding how the ligand identity influences the nuclearity, coordination number, and fluxional behavior of silver(I) complexes in solution. These insights are valuable for developing improved ligand designs.
Roca-Pardiñas, Javier; Cadarso-Suárez, Carmen; Tahoces, Pablo G; Lado, María J
2009-01-30
In many biomedical applications, interest lies in being able to distinguish between two possible states of a given response variable, depending on the values of certain continuous predictors. If the number of predictors, p, is high, or if there is redundancy among them, it then becomes important to decide on the selection of the best subset of predictors that will be able to obtain the models with greatest discrimination capacity. With this aim in mind, logistic generalized additive models were considered and receiver operating characteristic (ROC) curves were applied in order to determine and compare the discriminatory capacity of such models. This study sought to develop bootstrap-based tests that allow for the following to be ascertained: (a) the optimal number q system dedicated to early detection of breast cancer. Copyright (c) 2008 John Wiley & Sons, Ltd.
Bayesian adaptive methods for clinical trials
Berry, Scott M; Muller, Peter
2010-01-01
Already popular in the analysis of medical device trials, adaptive Bayesian designs are increasingly being used in drug development for a wide variety of diseases and conditions, from Alzheimer's disease and multiple sclerosis to obesity, diabetes, hepatitis C, and HIV. Written by leading pioneers of Bayesian clinical trial designs, Bayesian Adaptive Methods for Clinical Trials explores the growing role of Bayesian thinking in the rapidly changing world of clinical trial analysis. The book first summarizes the current state of clinical trial design and analysis and introduces the main ideas and potential benefits of a Bayesian alternative. It then gives an overview of basic Bayesian methodological and computational tools needed for Bayesian clinical trials. With a focus on Bayesian designs that achieve good power and Type I error, the next chapters present Bayesian tools useful in early (Phase I) and middle (Phase II) clinical trials as well as two recent Bayesian adaptive Phase II studies: the BATTLE and ISP...
Road network safety evaluation using Bayesian hierarchical joint model.
Wang, Jie; Huang, Helai
2016-05-01
Safety and efficiency are commonly regarded as two significant performance indicators of transportation systems. In practice, road network planning has focused on road capacity and transport efficiency whereas the safety level of a road network has received little attention in the planning stage. This study develops a Bayesian hierarchical joint model for road network safety evaluation to help planners take traffic safety into account when planning a road network. The proposed model establishes relationships between road network risk and micro-level variables related to road entities and traffic volume, as well as socioeconomic, trip generation and network density variables at macro level which are generally used for long term transportation plans. In addition, network spatial correlation between intersections and their connected road segments is also considered in the model. A road network is elaborately selected in order to compare the proposed hierarchical joint model with a previous joint model and a negative binomial model. According to the results of the model comparison, the hierarchical joint model outperforms the joint model and negative binomial model in terms of the goodness-of-fit and predictive performance, which indicates the reasonableness of considering the hierarchical data structure in crash prediction and analysis. Moreover, both random effects at the TAZ level and the spatial correlation between intersections and their adjacent segments are found to be significant, supporting the employment of the hierarchical joint model as an alternative in road-network-level safety modeling as well. Copyright © 2016 Elsevier Ltd. All rights reserved.
Stakhovych, Stanislav; Bijmolt, Tammo H. A.; Wedel, Michel
2012-01-01
In this article, we present a Bayesian spatial factor analysis model. We extend previous work on confirmatory factor analysis by including geographically distributed latent variables and accounting for heterogeneity and spatial autocorrelation. The simulation study shows excellent recovery of the
Directory of Open Access Journals (Sweden)
Wanda Pilch
2017-01-01
Full Text Available Regular moderate physical activity positively affects health, fitness, and body composition; it regulates the pro- and anti-inflammatory cytokines levels. Vitamin D plays an important regulatory role; its adequate levels correlate with low values of inflammation markers and an increase in muscle strength and fitness in exercising people. The study’s aim was to evaluate changes in somatic variables, oxidative stress, and inflammation markers, as well as blood calcidiol concentration in middle-aged healthy women after 12 weeks of aerobics classes—endurance exercises, including choreographic sequences, aiming to improve fitness and motor coordination. The training led to a significant reduction of body mass and fat tissue; it induced an increase in lean body mass. After the 12-week training program, plasma antioxidant status increased (0.65 ± 0.21, p<0.01 and the concentration of lipid peroxidation products decreased (0.07 ± 0.02, p<0.001. A significant increase in plasma antioxidant status associated with training could have reduced the level of proinflammatory interleukin as indicated by a positive correlation between these variables (rs = 0.64, p<0.05. The study proved that a 12-week health training program in physically inactive middle-aged women might provide improvements in their anthropometric parameters and selected biochemical indicators.
Pilch, Wanda; Tota, Łukasz; Sadowska-Krępa, Ewa; Piotrowska, Anna; Kępińska, Magdalena; Pałka, Tomasz; Maszczyk, Adam
2017-01-01
Regular moderate physical activity positively affects health, fitness, and body composition; it regulates the pro- and anti-inflammatory cytokines levels. Vitamin D plays an important regulatory role; its adequate levels correlate with low values of inflammation markers and an increase in muscle strength and fitness in exercising people. The study's aim was to evaluate changes in somatic variables, oxidative stress, and inflammation markers, as well as blood calcidiol concentration in middle-aged healthy women after 12 weeks of aerobics classes-endurance exercises, including choreographic sequences, aiming to improve fitness and motor coordination. The training led to a significant reduction of body mass and fat tissue; it induced an increase in lean body mass. After the 12-week training program, plasma antioxidant status increased (0.65 ± 0.21, p < 0.01) and the concentration of lipid peroxidation products decreased (0.07 ± 0.02, p < 0.001). A significant increase in plasma antioxidant status associated with training could have reduced the level of proinflammatory interleukin as indicated by a positive correlation between these variables ( r s = 0.64, p < 0.05). The study proved that a 12-week health training program in physically inactive middle-aged women might provide improvements in their anthropometric parameters and selected biochemical indicators.
Shen, Jincheng; Wang, Lu; Daignault, Stephanie; Spratt, Daniel E; Morgan, Todd M; Taylor, Jeremy M G
2018-01-01
A personalized treatment policy requires defining the optimal treatment for each patient based on their clinical and other characteristics. Here we consider a commonly encountered situation in practice, when analyzing data from observational cohorts, that there are auxiliary variables which affect both the treatment and the outcome, yet these variables are not of primary interest to be included in a generalizable treatment strategy. Furthermore, there is not enough prior knowledge of the effect of the treatments or of the importance of the covariates for us to explicitly specify the dependency between the outcome and different covariates, thus we choose a model that is flexible enough to accommodate the possibly complex association of the outcome on the covariates. We consider observational studies with a survival outcome and propose to use Random Survival Forest with Weighted Bootstrap (RSFWB) to model the counterfactual outcomes while marginalizing over the auxiliary covariates. By maximizing the restricted mean survival time, we estimate the optimal regime for a target population based on a selected set of covariates. Simulation studies illustrate that the proposed method performs reliably across a range of different scenarios. We further apply RSFWB to a prostate cancer study.
THE HOST GALAXY PROPERTIES OF VARIABILITY SELECTED AGN IN THE PAN-STARRS1 MEDIUM DEEP SURVEY
Energy Technology Data Exchange (ETDEWEB)
Heinis, S.; Gezari, S.; Kumar, S. [Department of Astronomy, University of Maryland, College Park, MD (United States); Burgett, W. S.; Flewelling, H.; Huber, M. E.; Kaiser, N.; Wainscoat, R. J.; Waters, C. [Institute for Astronomy, University of Hawaii at Manoa, Honolulu, HI 96822 (United States)
2016-07-20
We study the properties of 975 active galactic nuclei (AGNs) selected by variability in the Pan-STARRS1 Medium deep Survey. Using complementary multi-wavelength data from the ultraviolet to the far-infrared, we use spectral energy distribution fitting to determine the AGN and host properties at z < 1 and compare to a well-matched control sample. We confirm the trend previously observed: that the variability amplitude decreases with AGN luminosity, but we also observe that the slope of this relation steepens with wavelength, resulting in a “redder when brighter” trend at low luminosities. Our results show that AGNs are hosted by more massive hosts than control sample galaxies, while the rest frame dust-corrected NUV r color distribution of AGN hosts is similar to control galaxies. We find a positive correlation between the AGN luminosity and star formation rate (SFR), independent of redshift. AGN hosts populate the entire range of SFRs within and outside of the Main Sequence of star-forming galaxies. Comparing the distribution of AGN hosts and control galaxies, we show that AGN hosts are less likely to be hosted by quiescent galaxies and more likely to be hosted by Main Sequence or starburst galaxies.
Leathwick, Dave M; Luo, Dongwen
2017-08-30
The concentration profile of anthelmintic reaching the target worms in the host can vary between animals even when administered doses are tailored to individual liveweight at the manufacturer's recommended rate. Factors contributing to variation in drug concentration include weather, breed of animal, formulation and the route by which drugs are administered. The implications of this variability for the development of anthelmintic resistance was investigated using Monte-Carlo simulation. A model framework was established where 100 animals each received a single drug treatment. The 'dose' of drug allocated to each animal (i.e. the concentration-time profile of drug reaching the target worms) was sampled at random from a distribution of doses with mean m and standard deviation s. For each animal the dose of drug was used in conjunction with pre-determined dose-response relationships, representing single and poly-genetic inheritance, to calculate efficacy against susceptible and resistant genotypes. These data were then used to calculate the overall change in resistance gene frequency for the worm population as a result of the treatment. Values for m and s were varied to reflect differences in both mean dose and the variability in dose, and for each combination of these 100,000 simulations were run. The resistance gene frequency in the population after treatment increased as m decreased and as s increased. This occurred for both single and poly-gene models and for different levels of dominance (survival under treatment) of the heterozygote genotype(s). The results indicate that factors which result in lower and/or more variable concentrations of active reaching the target worms are more likely to select for resistance. The potential of different routes of anthelmintic administration to play a role in the development of anthelmintic resistance is discussed. Copyright © 2017 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Saba Moghimi
Full Text Available Music-induced brain activity modulations in areas involved in emotion regulation may be useful in achieving therapeutic outcomes. Clinical applications of music may involve prolonged or repeated exposures to music. However, the variability of the observed brain activity patterns in repeated exposures to music is not well understood. We hypothesized that multiple exposures to the same music would elicit more consistent activity patterns than exposure to different music. In this study, the temporal and spatial variability of cerebral prefrontal hemodynamic response was investigated across multiple exposures to self-selected musical excerpts in 10 healthy adults. The hemodynamic changes were measured using prefrontal cortex near infrared spectroscopy and represented by instantaneous phase values. Based on spatial and temporal characteristics of these observed hemodynamic changes, we defined a consistency index to represent variability across these domains. The consistency index across repeated exposures to the same piece of music was compared to the consistency index corresponding to prefrontal activity from randomly matched non-identical musical excerpts. Consistency indexes were significantly different for identical versus non-identical musical excerpts when comparing a subset of repetitions. When all four exposures were compared, no significant difference was observed between the consistency indexes of randomly matched non-identical musical excerpts and the consistency index corresponding to repetitions of the same musical excerpts. This observation suggests the existence of only partial consistency between repeated exposures to the same musical excerpt, which may stem from the role of the prefrontal cortex in regulating other cognitive and emotional processes.
Current trends in Bayesian methodology with applications
Upadhyay, Satyanshu K; Dey, Dipak K; Loganathan, Appaia
2015-01-01
Collecting Bayesian material scattered throughout the literature, Current Trends in Bayesian Methodology with Applications examines the latest methodological and applied aspects of Bayesian statistics. The book covers biostatistics, econometrics, reliability and risk analysis, spatial statistics, image analysis, shape analysis, Bayesian computation, clustering, uncertainty assessment, high-energy astrophysics, neural networking, fuzzy information, objective Bayesian methodologies, empirical Bayes methods, small area estimation, and many more topics.Each chapter is self-contained and focuses on
Badra, Mohammad; Mehio-Sibai, Abla; Zeki Al-Hazzouri, Adina; Abou Naja, Hala; Baliki, Ghassan; Salamoun, Mariana; Afeiche, Nadim; Baddoura, Omar; Bulos, Suhayl; Haidar, Rachid; Lakkis, Suhayl; Musharrafieh, Ramzi; Nsouli, Afif; Taha, Assaad; Tayim, Ahmad; El-Hajj Fuleihan, Ghada
2009-01-01
Bone mineral density (BMD) and fracture incidence vary greatly worldwide. The data, if any, on clinical and densitometric characteristics of patients with hip fractures from the Middle East are scarce. The objective of the study was to define risk estimates from clinical and densitometric variables and the impact of database selection on such estimates. Clinical and densitometric information were obtained in 60 hip fracture patients and 90 controls. Hip fracture subjects were 74 yr (9.4) old, were significantly taller, lighter, and more likely to be taking anxiolytics and sleeping pills than controls. National Health and Nutrition Examination Survey (NHANES) database selection resulted in a higher sensitiv