WorldWideScience

Sample records for bayesian variable selection

  1. Bayesian Variable Selection in Spatial Autoregressive Models

    OpenAIRE

    Jesus Crespo Cuaresma; Philipp Piribauer

    2015-01-01

    This paper compares the performance of Bayesian variable selection approaches for spatial autoregressive models. We present two alternative approaches which can be implemented using Gibbs sampling methods in a straightforward way and allow us to deal with the problem of model uncertainty in spatial autoregressive models in a flexible and computationally efficient way. In a simulation study we show that the variable selection approaches tend to outperform existing Bayesian model averaging tech...

  2. Bayesian Variable Selection via Particle Stochastic Search.

    Science.gov (United States)

    Shi, Minghui; Dunson, David B

    2011-02-01

    We focus on Bayesian variable selection in regression models. One challenge is to search the huge model space adequately, while identifying high posterior probability regions. In the past decades, the main focus has been on the use of Markov chain Monte Carlo (MCMC) algorithms for these purposes. In this article, we propose a new computational approach based on sequential Monte Carlo (SMC), which we refer to as particle stochastic search (PSS). We illustrate PSS through applications to linear regression and probit models.

  3. Bayesian variable selection with spherically symmetric priors

    CERN Document Server

    De Kock, M B

    2014-01-01

    We propose that Bayesian variable selection for linear parametrisations with Gaussian iid likelihoods be based on the spherical symmetry of the diagonalised parameter space. This reduces the multidimensional parameter space problem to one dimension without the need for conjugate priors. Combining this likelihood with what we call the r-prior results in a framework in which we can derive closed forms for the evidence, posterior and characteristic function for four different r-priors, including the hyper-g prior and the Zellner-Siow prior, which are shown to be special cases of our r-prior. Two scenarios of a single variable dispersion parameter and of fixed dispersion are studied separately, and asymptotic forms comparable to the traditional information criteria are derived. In a simple simulation exercise, we find that model comparison based on our uniform r-prior appears to fare better than the current model comparison schemes.

  4. Bayesian Variable Selection for Detecting Adaptive Genomic Differences Among Populations

    OpenAIRE

    Riebler, Andrea; Held, Leonhard; Stephan, Wolfgang

    2008-01-01

    We extend an Fst-based Bayesian hierarchical model, implemented via Markov chain Monte Carlo, for the detection of loci that might be subject to positive selection. This model divides the Fst-influencing factors into locus-specific effects, population-specific effects, and effects that are specific for the locus in combination with the population. We introduce a Bayesian auxiliary variable for each locus effect to automatically select nonneutral locus effects. As a by-product, the efficiency ...

  5. A Bayesian variable selection procedure to rank overlapping gene sets

    Directory of Open Access Journals (Sweden)

    Skarman Axel

    2012-05-01

    Full Text Available Abstract Background Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability and stable given a limited number of observations. Conclusions Bayesian variable selection is a useful way to prioritize gene sets while considering their overlaps. Ignoring the overlaps gives different and possibly misleading results. Additional procedures may be needed in cases of highly overlapping pathways that are hard to prioritize.

  6. A Bayesian variable selection procedure for ranking overlapping gene sets

    DEFF Research Database (Denmark)

    Skarman, Axel; Mahdi Shariati, Mohammad; Janss, Luc;

    2012-01-01

    variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our......Background Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been...... described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian...

  7. Bayesian variable selection for detecting adaptive genomic differences among populations.

    Science.gov (United States)

    Riebler, Andrea; Held, Leonhard; Stephan, Wolfgang

    2008-03-01

    We extend an F(st)-based Bayesian hierarchical model, implemented via Markov chain Monte Carlo, for the detection of loci that might be subject to positive selection. This model divides the F(st)-influencing factors into locus-specific effects, population-specific effects, and effects that are specific for the locus in combination with the population. We introduce a Bayesian auxiliary variable for each locus effect to automatically select nonneutral locus effects. As a by-product, the efficiency of the original approach is improved by using a reparameterization of the model. The statistical power of the extended algorithm is assessed with simulated data sets from a Wright-Fisher model with migration. We find that the inclusion of model selection suggests a clear improvement in discrimination as measured by the area under the receiver operating characteristic (ROC) curve. Additionally, we illustrate and discuss the quality of the newly developed method on the basis of an allozyme data set of the fruit fly Drosophila melanogaster and a sequence data set of the wild tomato Solanum chilense. For data sets with small sample sizes, high mutation rates, and/or long sequences, however, methods based on nucleotide statistics should be preferred. PMID:18245358

  8. Bayesian Biclustering on Discrete Data: Variable Selection Methods

    OpenAIRE

    Guo, Lei

    2013-01-01

    Biclustering is a technique for clustering rows and columns of a data matrix simultaneously. Over the past few years, we have seen its applications in biology-related fields, as well as in many data mining projects. As opposed to classical clustering methods, biclustering groups objects that are similar only on a subset of variables. Many biclustering algorithms on continuous data have emerged over the last decade. In this dissertation, we will focus on two Bayesian biclustering algorithms we...

  9. Steady-state priors and Bayesian variable selection in VAR forecasting

    OpenAIRE

    Louzis, Dimitrios P.

    2015-01-01

    This study proposes methods for estimating Bayesian vector autoregressions (VARs) with an automatic variable selection and an informative prior on the unconditional mean or steady-state of the system. We show that extant Gibbs sampling methods for Bayesian variable selection can be efficiently extended to incorporate prior beliefs on the steady-state of the economy. Empirical analysis, based on three major US macroeconomic time series, indicates that the out-of-sample forecasting accuracy of ...

  10. Bayesian variable selection and data integration for biological regulatory networks

    OpenAIRE

    Jensen, Shane T; Chen, Guang; Stoeckert, Jr, Christian J.

    2007-01-01

    A substantial focus of research in molecular biology are gene regulatory networks: the set of transcription factors and target genes which control the involvement of different biological processes in living cells. Previous statistical approaches for identifying gene regulatory networks have used gene expression data, ChIP binding data or promoter sequence data, but each of these resources provides only partial information. We present a Bayesian hierarchical model that integrates all three dat...

  11. Multiple SNP-sets Analysis for Genome-wide Association Studies through Bayesian Latent Variable Selection

    OpenAIRE

    Lu, Zhaohua; Zhu, Hongtu; Knickmeyer, Rebecca C.; Sullivan, Patrick F.; Stephanie, Williams N.; Zou, Fei

    2015-01-01

    The power of genome-wide association studies (GWAS) for mapping complex traits with single SNP analysis may be undermined by modest SNP effect sizes, unobserved causal SNPs, correlation among adjacent SNPs, and SNP-SNP interactions. Alternative approaches for testing the association between a single SNP-set and individual phenotypes have been shown to be promising for improving the power of GWAS. We propose a Bayesian latent variable selection (BLVS) method to simultaneously model the joint a...

  12. Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences.

    Science.gov (United States)

    Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric

    2016-01-01

    Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach. PMID:27314566

  13. Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences.

    Science.gov (United States)

    Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric

    2016-01-01

    Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach.

  14. Locating disease genes using Bayesian variable selection with the Haseman-Elston method

    Directory of Open Access Journals (Sweden)

    He Qimei

    2003-12-01

    Full Text Available Abstract Background We applied stochastic search variable selection (SSVS, a Bayesian model selection method, to the simulated data of Genetic Analysis Workshop 13. We used SSVS with the revisited Haseman-Elston method to find the markers linked to the loci determining change in cholesterol over time. To study gene-gene interaction (epistasis and gene-environment interaction, we adopted prior structures, which incorporate the relationship among the predictors. This allows SSVS to search in the model space more efficiently and avoid the less likely models. Results In applying SSVS, instead of looking at the posterior distribution of each of the candidate models, which is sensitive to the setting of the prior, we ranked the candidate variables (markers according to their marginal posterior probability, which was shown to be more robust to the prior. Compared with traditional methods that consider one marker at a time, our method considers all markers simultaneously and obtains more favorable results. Conclusions We showed that SSVS is a powerful method for identifying linked markers using the Haseman-Elston method, even for weak effects. SSVS is very effective because it does a smart search over the entire model space.

  15. A bayesian integrative model for genetical genomics with spatially informed variable selection.

    Science.gov (United States)

    Cassese, Alberto; Guindani, Michele; Vannucci, Marina

    2014-01-01

    We consider a Bayesian hierarchical model for the integration of gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. The approach defines a measurement error model that relates the gene expression levels to latent copy number states. In turn, the latent states are related to the observed surrogate CGH measurements via a hidden Markov model. The model further incorporates variable selection with a spatial prior based on a probit link that exploits dependencies across adjacent DNA segments. Posterior inference is carried out via Markov chain Monte Carlo stochastic search techniques. We study the performance of the model in simulations and show better results than those achieved with recently proposed alternative priors. We also show an application to data from a genomic study on lung squamous cell carcinoma, where we identify potential candidates of associations between copy number variants and the transcriptional activity of target genes. Gene ontology (GO) analyses of our findings reveal enrichments in genes that code for proteins involved in cancer. Our model also identifies a number of potential candidate biomarkers for further experimental validation. PMID:25288877

  16. Joint high-dimensional Bayesian variable and covariance selection with an application to eQTL analysis.

    Science.gov (United States)

    Bhadra, Anindya; Mallick, Bani K

    2013-06-01

    We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. PMID:23607608

  17. Joint High-Dimensional Bayesian Variable and Covariance Selection with an Application to eQTL Analysis

    KAUST Repository

    Bhadra, Anindya

    2013-04-22

    We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. © 2013, The International Biometric Society.

  18. A spatio-temporal nonparametric Bayesian variable selection model of fMRI data for clustering correlated time courses.

    Science.gov (United States)

    Zhang, Linlin; Guindani, Michele; Versace, Francesco; Vannucci, Marina

    2014-07-15

    In this paper we present a novel wavelet-based Bayesian nonparametric regression model for the analysis of functional magnetic resonance imaging (fMRI) data. Our goal is to provide a joint analytical framework that allows to detect regions of the brain which exhibit neuronal activity in response to a stimulus and, simultaneously, infer the association, or clustering, of spatially remote voxels that exhibit fMRI time series with similar characteristics. We start by modeling the data with a hemodynamic response function (HRF) with a voxel-dependent shape parameter. We detect regions of the brain activated in response to a given stimulus by using mixture priors with a spike at zero on the coefficients of the regression model. We account for the complex spatial correlation structure of the brain by using a Markov random field (MRF) prior on the parameters guiding the selection of the activated voxels, therefore capturing correlation among nearby voxels. In order to infer association of the voxel time courses, we assume correlated errors, in particular long memory, and exploit the whitening properties of discrete wavelet transforms. Furthermore, we achieve clustering of the voxels by imposing a Dirichlet process (DP) prior on the parameters of the long memory process. For inference, we use Markov Chain Monte Carlo (MCMC) sampling techniques that combine Metropolis-Hastings schemes employed in Bayesian variable selection with sampling algorithms for nonparametric DP models. We explore the performance of the proposed model on simulated data, with both block- and event-related design, and on real fMRI data. PMID:24650600

  19. Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection

    NARCIS (Netherlands)

    Calus, Mario P.L.; Bouwman, Aniek C.; Schrooten, Chris; Veerkamp, Roel F.

    2016-01-01

    Background: Use of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. In this study, we investigated whether the split-and-merge Bayesian stochastic search variable

  20. Across population genomic prediction scenarios in which Bayesian variable selection outperforms GBLUP

    NARCIS (Netherlands)

    Berg, van den S.; Calus, M.P.L.; Meuwissen, T.H.E.; Wientjes, Y.C.J.

    2015-01-01

    Background: The use of information across populations is an attractive approach to increase the accuracy of genomic prediction for numerically small populations. However, accuracies of across population genomic prediction, in which reference and selection individuals are from different population

  1. Learning dynamic Bayesian networks with mixed variables

    DEFF Research Database (Denmark)

    Bøttcher, Susanne Gammelgaard

    This paper considers dynamic Bayesian networks for discrete and continuous variables. We only treat the case, where the distribution of the variables is conditional Gaussian. We show how to learn the parameters and structure of a dynamic Bayesian network and also how the Markov order can be learned...

  2. Bayesian variable order Markov models: Towards Bayesian predictive state representations

    NARCIS (Netherlands)

    C. Dimitrakakis

    2009-01-01

    We present a Bayesian variable order Markov model that shares many similarities with predictive state representations. The resulting models are compact and much easier to specify and learn than classical predictive state representations. Moreover, we show that they significantly outperform a more st

  3. Post hoc Analysis for Detecting Individual Rare Variant Risk Associations Using Probit Regression Bayesian Variable Selection Methods in Case-Control Sequencing Studies.

    Science.gov (United States)

    Larson, Nicholas B; McDonnell, Shannon; Albright, Lisa Cannon; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham; MacInnis, Robert; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catolona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J

    2016-09-01

    Rare variants (RVs) have been shown to be significant contributors to complex disease risk. By definition, these variants have very low minor allele frequencies and traditional single-marker methods for statistical analysis are underpowered for typical sequencing study sample sizes. Multimarker burden-type approaches attempt to identify aggregation of RVs across case-control status by analyzing relatively small partitions of the genome, such as genes. However, it is generally the case that the aggregative measure would be a mixture of causal and neutral variants, and these omnibus tests do not directly provide any indication of which RVs may be driving a given association. Recently, Bayesian variable selection approaches have been proposed to identify RV associations from a large set of RVs under consideration. Although these approaches have been shown to be powerful at detecting associations at the RV level, there are often computational limitations on the total quantity of RVs under consideration and compromises are necessary for large-scale application. Here, we propose a computationally efficient alternative formulation of this method using a probit regression approach specifically capable of simultaneously analyzing hundreds to thousands of RVs. We evaluate our approach to detect causal variation on simulated data and examine sensitivity and specificity in instances of high RV dimensionality as well as apply it to pathway-level RV analysis results from a prostate cancer (PC) risk case-control sequencing study. Finally, we discuss potential extensions and future directions of this work. PMID:27312771

  4. Bayesian Model Averaging in the Instrumental Variable Regression Model

    OpenAIRE

    Gary Koop; Robert Leon Gonzalez; Rodney Strachan

    2011-01-01

    This paper considers the instrumental variable regression model when there is uncertainly about the set of instruments, exogeneity restrictions, the validity of identifying restrictions and the set of exogenous regressors. This uncertainly can result in a huge number of models. To avoid statistical problems associated with standard model selection procedures, we develop a reversible jump Markov chain Monte Carlo algorithm that allows us to do Bayesian model averaging. The algorithm is very fl...

  5. Bayesian model selection in Gaussian regression

    CERN Document Server

    Abramovich, Felix

    2009-01-01

    We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting estimator. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for "nearly-orthogonal" and "multicollinear" designs.

  6. Bayesian site selection for fast Gaussian process regression

    KAUST Repository

    Pourhabib, Arash

    2014-02-05

    Gaussian Process (GP) regression is a popular method in the field of machine learning and computer experiment designs; however, its ability to handle large data sets is hindered by the computational difficulty in inverting a large covariance matrix. Likelihood approximation methods were developed as a fast GP approximation, thereby reducing the computation cost of GP regression by utilizing a much smaller set of unobserved latent variables called pseudo points. This article reports a further improvement to the likelihood approximation methods by simultaneously deciding both the number and locations of the pseudo points. The proposed approach is a Bayesian site selection method where both the number and locations of the pseudo inputs are parameters in the model, and the Bayesian model is solved using a reversible jump Markov chain Monte Carlo technique. Through a number of simulated and real data sets, it is demonstrated that with appropriate priors chosen, the Bayesian site selection method can produce a good balance between computation time and prediction accuracy: it is fast enough to handle large data sets that a full GP is unable to handle, and it improves, quite often remarkably, the prediction accuracy, compared with the existing likelihood approximations. © 2014 Taylor and Francis Group, LLC.

  7. Improving randomness characterization through Bayesian model selection

    CERN Document Server

    R., Rafael Díaz-H; Martínez, Alí M Angulo; U'Ren, Alfred B; Hirsch, Jorge G; Marsili, Matteo; Castillo, Isaac Pérez

    2016-01-01

    Nowadays random number generation plays an essential role in technology with important applications in areas ranging from cryptography, which lies at the core of current communication protocols, to Monte Carlo methods, and other probabilistic algorithms. In this context, a crucial scientific endeavour is to develop effective methods that allow the characterization of random number generators. However, commonly employed methods either lack formality (e.g. the NIST test suite), or are inapplicable in principle (e.g. the characterization derived from the Algorithmic Theory of Information (ATI)). In this letter we present a novel method based on Bayesian model selection, which is both rigorous and effective, for characterizing randomness in a bit sequence. We derive analytic expressions for a model's likelihood which is then used to compute its posterior probability distribution. Our method proves to be more rigorous than NIST's suite and the Borel-Normality criterion and its implementation is straightforward. We...

  8. Entropic Priors and Bayesian Model Selection

    CERN Document Server

    Brewer, Brendon J

    2009-01-01

    We demonstrate that the principle of maximum relative entropy (ME), used judiciously, can ease the specification of priors in model selection problems. The resulting effect is that models that make sharp predictions are disfavoured, weakening the usual Bayesian "Occam's Razor". This is illustrated with a simple example involving what Jaynes called a "sure thing" hypothesis. Jaynes' resolution of the situation involved introducing a large number of alternative "sure thing" hypotheses that were possible before we observed the data. However, in more complex situations, it may not be possible to explicitly enumerate large numbers of alternatives. The entropic priors formalism produces the desired result without modifying the hypothesis space or requiring explicit enumeration of alternatives; all that is required is a good model for the prior predictive distribution for the data. This idea is illustrated with a simple rigged-lottery example, and we outline how this idea may help to resolve a recent debate amongst ...

  9. Bayesian item selection in constrained adaptive testing using shadow tests

    NARCIS (Netherlands)

    Veldkamp, Bernard P.

    2010-01-01

    Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specificati

  10. Bayesian Item Selection in Constrained Adaptive Testing Using Shadow Tests

    Science.gov (United States)

    Veldkamp, Bernard P.

    2010-01-01

    Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item…

  11. Estimation and variable selection with exponential weights

    OpenAIRE

    Arias-Castro, Ery; Lounici, Karim

    2014-01-01

    In the context of a linear model with a sparse coefficient vector, exponential weights methods have been shown to be achieve oracle inequalities for denoising/prediction. We show that such methods also succeed at variable selection and estimation under the near minimum condition on the design matrix, instead of much stronger assumptions required by other methods such as the Lasso or the Dantzig Selector. The same analysis yields consistency results for Bayesian methods and BIC-type variable s...

  12. Bayesian item selection in constrained adaptive testing using shadow tests

    OpenAIRE

    Bernard P. Veldkamp

    2010-01-01

    Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item selection process. The Shadow Test Approach is a general purpose algorithm for administering constrained CAT. In this paper it is shown how the approac...

  13. Entropic Priors and Bayesian Model Selection

    Science.gov (United States)

    Brewer, Brendon J.; Francis, Matthew J.

    2009-12-01

    We demonstrate that the principle of maximum relative entropy (ME), used judiciously, can ease the specification of priors in model selection problems. The resulting effect is that models that make sharp predictions are disfavoured, weakening the usual Bayesian ``Occam's Razor.'' This is illustrated with a simple example involving what Jaynes called a ``sure thing'' hypothesis. Jaynes' resolution of the situation involved introducing a large number of alternative ``sure thing'' hypotheses that were possible before we observed the data. However, in more complex situations, it may not be possible to explicitly enumerate large numbers of alternatives. The entropic priors formalism produces the desired result without modifying the hypothesis space or requiring explicit enumeration of alternatives; all that is required is a good model for the prior predictive distribution for the data. This idea is illustrated with a simple rigged-lottery example, and we outline how this idea may help to resolve a recent debate amongst cosmologists: is dark energy a cosmological constant, or has it evolved with time in some way? And how shall we decide, when the data are in?

  14. A guide to Bayesian model selection for ecologists

    Science.gov (United States)

    Hooten, Mevin B.; Hobbs, N.T.

    2015-01-01

    The steady upward trend in the use of model selection and Bayesian methods in ecological research has made it clear that both approaches to inference are important for modern analysis of models and data. However, in teaching Bayesian methods and in working with our research colleagues, we have noticed a general dissatisfaction with the available literature on Bayesian model selection and multimodel inference. Students and researchers new to Bayesian methods quickly find that the published advice on model selection is often preferential in its treatment of options for analysis, frequently advocating one particular method above others. The recent appearance of many articles and textbooks on Bayesian modeling has provided welcome background on relevant approaches to model selection in the Bayesian framework, but most of these are either very narrowly focused in scope or inaccessible to ecologists. Moreover, the methodological details of Bayesian model selection approaches are spread thinly throughout the literature, appearing in journals from many different fields. Our aim with this guide is to condense the large body of literature on Bayesian approaches to model selection and multimodel inference and present it specifically for quantitative ecologists as neutrally as possible. We also bring to light a few important and fundamental concepts relating directly to model selection that seem to have gone unnoticed in the ecological literature. Throughout, we provide only a minimal discussion of philosophy, preferring instead to examine the breadth of approaches as well as their practical advantages and disadvantages. This guide serves as a reference for ecologists using Bayesian methods, so that they can better understand their options and can make an informed choice that is best aligned with their goals for inference.

  15. Discriminative variable subsets in Bayesian classification with mixture models, with application in flow cytometry studies.

    Science.gov (United States)

    Lin, Lin; Chan, Cliburn; West, Mike

    2016-01-01

    We discuss the evaluation of subsets of variables for the discriminative evidence they provide in multivariate mixture modeling for classification. The novel development of Bayesian classification analysis presented is partly motivated by problems of design and selection of variables in biomolecular studies, particularly involving widely used assays of large-scale single-cell data generated using flow cytometry technology. For such studies and for mixture modeling generally, we define discriminative analysis that overlays fitted mixture models using a natural measure of concordance between mixture component densities, and define an effective and computationally feasible method for assessing and prioritizing subsets of variables according to their roles in discrimination of one or more mixture components. We relate the new discriminative information measures to Bayesian classification probabilities and error rates, and exemplify their use in Bayesian analysis of Dirichlet process mixture models fitted via Markov chain Monte Carlo methods as well as using a novel Bayesian expectation-maximization algorithm. We present a series of theoretical and simulated data examples to fix concepts and exhibit the utility of the approach, and compare with prior approaches. We demonstrate application in the context of automatic classification and discriminative variable selection in high-throughput systems biology using large flow cytometry datasets. PMID:26040910

  16. Discriminative variable subsets in Bayesian classification with mixture models, with application in flow cytometry studies.

    Science.gov (United States)

    Lin, Lin; Chan, Cliburn; West, Mike

    2016-01-01

    We discuss the evaluation of subsets of variables for the discriminative evidence they provide in multivariate mixture modeling for classification. The novel development of Bayesian classification analysis presented is partly motivated by problems of design and selection of variables in biomolecular studies, particularly involving widely used assays of large-scale single-cell data generated using flow cytometry technology. For such studies and for mixture modeling generally, we define discriminative analysis that overlays fitted mixture models using a natural measure of concordance between mixture component densities, and define an effective and computationally feasible method for assessing and prioritizing subsets of variables according to their roles in discrimination of one or more mixture components. We relate the new discriminative information measures to Bayesian classification probabilities and error rates, and exemplify their use in Bayesian analysis of Dirichlet process mixture models fitted via Markov chain Monte Carlo methods as well as using a novel Bayesian expectation-maximization algorithm. We present a series of theoretical and simulated data examples to fix concepts and exhibit the utility of the approach, and compare with prior approaches. We demonstrate application in the context of automatic classification and discriminative variable selection in high-throughput systems biology using large flow cytometry datasets.

  17. Dissecting Magnetar Variability with Bayesian Hierarchical Models

    Science.gov (United States)

    Huppenkothen, Daniela; Brewer, Brendon J.; Hogg, David W.; Murray, Iain; Frean, Marcus; Elenbaas, Chris; Watts, Anna L.; Levin, Yuri; van der Horst, Alexander J.; Kouveliotou, Chryssa

    2015-09-01

    Neutron stars are a prime laboratory for testing physical processes under conditions of strong gravity, high density, and extreme magnetic fields. Among the zoo of neutron star phenomena, magnetars stand out for their bursting behavior, ranging from extremely bright, rare giant flares to numerous, less energetic recurrent bursts. The exact trigger and emission mechanisms for these bursts are not known; favored models involve either a crust fracture and subsequent energy release into the magnetosphere, or explosive reconnection of magnetic field lines. In the absence of a predictive model, understanding the physical processes responsible for magnetar burst variability is difficult. Here, we develop an empirical model that decomposes magnetar bursts into a superposition of small spike-like features with a simple functional form, where the number of model components is itself part of the inference problem. The cascades of spikes that we model might be formed by avalanches of reconnection, or crust rupture aftershocks. Using Markov Chain Monte Carlo sampling augmented with reversible jumps between models with different numbers of parameters, we characterize the posterior distributions of the model parameters and the number of components per burst. We relate these model parameters to physical quantities in the system, and show for the first time that the variability within a burst does not conform to predictions from ideas of self-organized criticality. We also examine how well the properties of the spikes fit the predictions of simplified cascade models for the different trigger mechanisms.

  18. Dissecting magnetar variability with Bayesian hierarchical models

    CERN Document Server

    Huppenkothen, D; Hogg, D W; Murray, I; Frean, M; Elenbaas, C; Watts, A L; Levin, Y; van der Horst, A J; Kouveliotou, C

    2015-01-01

    Neutron stars are a prime laboratory for testing physical processes under conditions of strong gravity, high density, and extreme magnetic fields. Among the zoo of neutron star phenomena, magnetars stand out for their bursting behaviour, ranging from extremely bright, rare giant flares to numerous, less energetic recurrent bursts. The exact trigger and emission mechanisms for these bursts are not known; favoured models involve either a crust fracture and subsequent energy release into the magnetosphere, or explosive reconnection of magnetic field lines. In the absence of a predictive model, understanding the physical processes responsible for magnetar burst variability is difficult. Here, we develop an empirical model that decomposes magnetar bursts into a superposition of small spike-like features with a simple functional form, where the number of model components is itself part of the inference problem. The cascades of spikes that we model might be formed by avalanches of reconnection, or crust rupture afte...

  19. Two-Stage Bayesian Model Averaging in Endogenous Variable Models.

    Science.gov (United States)

    Lenkoski, Alex; Eicher, Theo S; Raftery, Adrian E

    2014-01-01

    Economic modeling in the presence of endogeneity is subject to model uncertainty at both the instrument and covariate level. We propose a Two-Stage Bayesian Model Averaging (2SBMA) methodology that extends the Two-Stage Least Squares (2SLS) estimator. By constructing a Two-Stage Unit Information Prior in the endogenous variable model, we are able to efficiently combine established methods for addressing model uncertainty in regression models with the classic technique of 2SLS. To assess the validity of instruments in the 2SBMA context, we develop Bayesian tests of the identification restriction that are based on model averaged posterior predictive p-values. A simulation study showed that 2SBMA has the ability to recover structure in both the instrument and covariate set, and substantially improves the sharpness of resulting coefficient estimates in comparison to 2SLS using the full specification in an automatic fashion. Due to the increased parsimony of the 2SBMA estimate, the Bayesian Sargan test had a power of 50 percent in detecting a violation of the exogeneity assumption, while the method based on 2SLS using the full specification had negligible power. We apply our approach to the problem of development accounting, and find support not only for institutions, but also for geography and integration as development determinants, once both model uncertainty and endogeneity have been jointly addressed. PMID:24223471

  20. Bayesian genomic selection: the effect of haplotype lenghts and priors

    DEFF Research Database (Denmark)

    Villumsen, Trine Michelle; Janss, Luc

    2009-01-01

    Breeding values for animals with marker data are estimated using a genomic selection approach where data is analyzed using Bayesian multi-marker association models. Fourteen model scenarios with varying haplotype lengths, hyper parameter and prior distributions were compared to find the scenario ...

  1. Bayesian Model Selection for LISA Pathfinder

    CERN Document Server

    Karnesis, Nikolaos; Sopuerta, Carlos F; Gibert, Ferran; Armano, Michele; Audley, Heather; Congedo, Giuseppe; Diepholz, Ingo; Ferraioli, Luigi; Hewitson, Martin; Hueller, Mauro; Korsakova, Natalia; Plagnol, Eric; Vitale, and Stefano

    2013-01-01

    The main goal of the LISA Pathfinder (LPF) mission is to fully characterize the acceleration noise models and to test key technologies for future space-based gravitational-wave observatories similar to the LISA/eLISA concept. The Data Analysis (DA) team has developed complex three-dimensional models of the LISA Technology Package (LTP) experiment on-board LPF. These models are used for simulations, but more importantly, they will be used for parameter estimation purposes during flight operations. One of the tasks of the DA team is to identify the physical effects that contribute significantly to the properties of the instrument noise. A way of approaching to this problem is to recover the essential parameters of the LTP which describe the data. Thus, we want to define the simplest model that efficiently explains the observations. To do so, adopting a Bayesian framework, one has to estimate the so-called Bayes Factor between two competing models. In our analysis, we use three main different methods to estimate...

  2. BASE-9: Bayesian Analysis for Stellar Evolution with nine variables

    Science.gov (United States)

    Robinson, Elliot; von Hippel, Ted; Stein, Nathan; Stenning, David; Wagner-Kaiser, Rachel; Si, Shijing; van Dyk, David

    2016-08-01

    The BASE-9 (Bayesian Analysis for Stellar Evolution with nine variables) software suite recovers star cluster and stellar parameters from photometry and is useful for analyzing single-age, single-metallicity star clusters, binaries, or single stars, and for simulating such systems. BASE-9 uses a Markov chain Monte Carlo (MCMC) technique along with brute force numerical integration to estimate the posterior probability distribution for the age, metallicity, helium abundance, distance modulus, line-of-sight absorption, and parameters of the initial-final mass relation (IFMR) for a cluster, and for the primary mass, secondary mass (if a binary), and cluster probability for every potential cluster member. The MCMC technique is used for the cluster quantities (the first six items listed above) and numerical integration is used for the stellar quantities (the last three items in the above list).

  3. Bayesian Methods for Analyzing Structural Equation Models with Covariates, Interaction, and Quadratic Latent Variables

    Science.gov (United States)

    Lee, Sik-Yum; Song, Xin-Yuan; Tang, Nian-Sheng

    2007-01-01

    The analysis of interaction among latent variables has received much attention. This article introduces a Bayesian approach to analyze a general structural equation model that accommodates the general nonlinear terms of latent variables and covariates. This approach produces a Bayesian estimate that has the same statistical optimal properties as a…

  4. Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.

    Science.gov (United States)

    Patri, Jean-François; Diard, Julien; Perrier, Pascal

    2015-12-01

    The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.

  5. Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.

    Science.gov (United States)

    Patri, Jean-François; Diard, Julien; Perrier, Pascal

    2015-12-01

    The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way. PMID:26497359

  6. Variable Selection with Exponential Weights and $l_0$-Penalization

    OpenAIRE

    Arias-Castro, Ery; Lounici, Karim

    2012-01-01

    In the context of a linear model with a sparse coefficient vector, exponential weights methods have been shown to be achieve oracle inequalities for prediction. We show that such methods also succeed at variable selection and estimation under the necessary identifiability condition on the design matrix, instead of much stronger assumptions required by other methods such as the Lasso or the Dantzig Selector. The same analysis yields consistency results for Bayesian methods and BIC-type variabl...

  7. Adaptive Robust Variable Selection

    CERN Document Server

    Fan, Jianqing; Barut, Emre

    2012-01-01

    Heavy-tailed high-dimensional data are commonly encountered in various scientific fields and pose great challenges to modern statistical analysis. A natural procedure to address this problem is to use penalized least absolute deviation (LAD) method with weighted $L_1$-penalty, called weighted robust Lasso (WR-Lasso), in which weights are introduced to ameliorate the bias problem induced by the $L_1$-penalty. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of the WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is c...

  8. Bayesian predictive modeling for genomic based personalized treatment selection.

    Science.gov (United States)

    Ma, Junsheng; Stingo, Francesco C; Hobbs, Brian P

    2016-06-01

    Efforts to personalize medicine in oncology have been limited by reductive characterizations of the intrinsically complex underlying biological phenomena. Future advances in personalized medicine will rely on molecular signatures that derive from synthesis of multifarious interdependent molecular quantities requiring robust quantitative methods. However, highly parameterized statistical models when applied in these settings often require a prohibitively large database and are sensitive to proper characterizations of the treatment-by-covariate interactions, which in practice are difficult to specify and may be limited by generalized linear models. In this article, we present a Bayesian predictive framework that enables the integration of a high-dimensional set of genomic features with clinical responses and treatment histories of historical patients, providing a probabilistic basis for using the clinical and molecular information to personalize therapy for future patients. Our work represents one of the first attempts to define personalized treatment assignment rules based on large-scale genomic data. We use actual gene expression data acquired from The Cancer Genome Atlas in the settings of leukemia and glioma to explore the statistical properties of our proposed Bayesian approach for personalizing treatment selection. The method is shown to yield considerable improvements in predictive accuracy when compared to penalized regression approaches. PMID:26575856

  9. Dynamic sensor action selection with Bayesian decision analysis

    Science.gov (United States)

    Kristensen, Steen; Hansen, Volker; Kondak, Konstantin

    1998-10-01

    The aim of this work is to create a framework for the dynamic planning of sensor actions for an autonomous mobile robot. The framework uses Bayesian decision analysis, i.e., a decision-theoretic method, to evaluate possible sensor actions and selecting the most appropriate ones given the available sensors and what is currently known about the state of the world. Since sensing changes the knowledge of the system and since the current state of the robot (task, position, etc.) determines what knowledge is relevant, the evaluation and selection of sensing actions is an on-going process that effectively determines the behavior of the robot. The framework has been implemented on a real mobile robot and has been proven to be able to control in real-time the sensor actions of the system. In current work we are investigating methods to reduce or automatically generate the necessary model information needed by the decision- theoretic method to select the appropriate sensor actions.

  10. Bayesian modeling of ChIP-chip data using latent variables.

    KAUST Repository

    Wu, Mingqi

    2009-10-26

    BACKGROUND: The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations. RESULTS: In this paper, we propose a Bayesian latent model for the ChIP-chip data. The new model mainly differs from the existing Bayesian models, such as the joint deconvolution model, the hierarchical gamma mixture model, and the Bayesian hierarchical model, in two respects. Firstly, it works on the difference between the averaged treatment and control samples. This enables the use of a simple model for the data, which avoids the probe-specific effect and the sample (control/treatment) effect. As a consequence, this enables an efficient MCMC simulation of the posterior distribution of the model, and also makes the model more robust to the outliers. Secondly, it models the neighboring dependence of probes by introducing a latent indicator vector. A truncated Poisson prior distribution is assumed for the latent indicator variable, with the rationale being justified at length. CONCLUSION: The Bayesian latent method is successfully applied to real and ten simulated datasets, with comparisons with some of the existing Bayesian methods, hidden Markov model methods, and sliding window methods. The numerical results indicate that the

  11. Bayesian modeling of ChIP-chip data using latent variables

    Directory of Open Access Journals (Sweden)

    Tian Yanan

    2009-10-01

    Full Text Available Abstract Background The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations. Results In this paper, we propose a Bayesian latent model for the ChIP-chip data. The new model mainly differs from the existing Bayesian models, such as the joint deconvolution model, the hierarchical gamma mixture model, and the Bayesian hierarchical model, in two respects. Firstly, it works on the difference between the averaged treatment and control samples. This enables the use of a simple model for the data, which avoids the probe-specific effect and the sample (control/treatment effect. As a consequence, this enables an efficient MCMC simulation of the posterior distribution of the model, and also makes the model more robust to the outliers. Secondly, it models the neighboring dependence of probes by introducing a latent indicator vector. A truncated Poisson prior distribution is assumed for the latent indicator variable, with the rationale being justified at length. Conclusion The Bayesian latent method is successfully applied to real and ten simulated datasets, with comparisons with some of the existing Bayesian methods, hidden Markov model methods, and sliding window methods. The numerical results

  12. Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem

    CERN Document Server

    Scott, James G; 10.1214/10-AOS792

    2010-01-01

    This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham's-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.

  13. Feature Selection for Bayesian Evaluation of Trauma Death Risk

    CERN Document Server

    Jakaite, L

    2008-01-01

    In the last year more than 70,000 people have been brought to the UK hospitals with serious injuries. Each time a clinician has to urgently take a patient through a screening procedure to make a reliable decision on the trauma treatment. Typically, such procedure comprises around 20 tests; however the condition of a trauma patient remains very difficult to be tested properly. What happens if these tests are ambiguously interpreted, and information about the severity of the injury will come misleading? The mistake in a decision can be fatal: using a mild treatment can put a patient at risk of dying from posttraumatic shock, while using an overtreatment can also cause death. How can we reduce the risk of the death caused by unreliable decisions? It has been shown that probabilistic reasoning, based on the Bayesian methodology of averaging over decision models, allows clinicians to evaluate the uncertainty in decision making. Based on this methodology, in this paper we aim at selecting the most important screeni...

  14. Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition

    OpenAIRE

    Stephenson, Todd Andrew; Magimai.-Doss, Mathew; Bourlard, Hervé

    2001-01-01

    Standard hidden Markov models (HMMs), as used in automatic speech recognition (ASR), calculate their emission probabilities by an artificial neural network (ANN) or a Gaussian distribution conditioned on the hidden state variable, considering the emissions independent of any other variable in the model. Recent work showed the benefit of conditioning the emission distributions on a discrete auxiliary variable, which is observed in training and hidden in recognition. Related work has shown the ...

  15. Family Background Variables as Instruments for Education in Income Regressions: A Bayesian Analysis

    Science.gov (United States)

    Hoogerheide, Lennart; Block, Joern H.; Thurik, Roy

    2012-01-01

    The validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the estimation results. We show that, in case of moderate direct…

  16. A Bayesian Alternative to Mutual Information for the Hierarchical Clustering of Dependent Random Variables.

    Directory of Open Access Journals (Sweden)

    Guillaume Marrelec

    Full Text Available The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity, provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.

  17. Errata: A survey of Bayesian predictive methods for model assessment, selection and comparison

    Directory of Open Access Journals (Sweden)

    Aki Vehtari

    2014-03-01

    Full Text Available Errata for “A survey of Bayesian predictive methods for model assessment, selection and comparison” by A. Vehtari and J. Ojanen, Statistics Surveys, 6 (2012, 142–228. doi:10.1214/12-SS102.

  18. Bayesian techniques for comparing time-dependent GRMHD simulations to variable Event Horizon Telescope observations

    CERN Document Server

    Kim, Junhan; Chan, Chi-kwan; Medeiros, Lia; Ozel, Feryal; Psaltis, Dimitrios

    2016-01-01

    The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long baseline interferometer (VLBI) that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. We also apply our method to the early EHT data...

  19. Multi-variable Echo State Network Optimized by Bayesian Regulation for Daily Peak Load Forecasting

    Directory of Open Access Journals (Sweden)

    Dongxiao Niu

    2012-11-01

    Full Text Available In this paper, a multi-variable echo state network trained with Bayesian regulation has been developed for the short-time load forecasting. In this study, we focus on the generalization of a new recurrent network. Therefore, Bayesian regulation and Levenberg-Marquardt algorithm is adopted to modify the output weight. The model is verified by data from a local power company in south China and its performance is rather satisfactory. Besides, traditional methods are also used for the same task as comparison. The simulation results lead to the conclusion that the proposed scheme is feasible and has great robustness and satisfactory capacity of generalization.

  20. Bayesian inference of selection in a heterogeneous environment from genetic time-series data.

    Science.gov (United States)

    Gompert, Zachariah

    2016-01-01

    Evolutionary geneticists have sought to characterize the causes and molecular targets of selection in natural populations for many years. Although this research programme has been somewhat successful, most statistical methods employed were designed to detect consistent, weak to moderate selection. In contrast, phenotypic studies in nature show that selection varies in time and that individual bouts of selection can be strong. Measurements of the genomic consequences of such fluctuating selection could help test and refine hypotheses concerning the causes of ecological specialization and the maintenance of genetic variation in populations. Herein, I proposed a Bayesian nonhomogeneous hidden Markov model to estimate effective population sizes and quantify variable selection in heterogeneous environments from genetic time-series data. The model is described and then evaluated using a series of simulated data, including cases where selection occurs on a trait with a simple or polygenic molecular basis. The proposed method accurately distinguished neutral loci from non-neutral loci under strong selection, but not from those under weak selection. Selection coefficients were accurately estimated when selection was constant or when the fitness values of genotypes varied linearly with the environment, but these estimates were less accurate when fitness was polygenic or the relationship between the environment and the fitness of genotypes was nonlinear. Past studies of temporal evolutionary dynamics in laboratory populations have been remarkably successful. The proposed method makes similar analyses of genetic time-series data from natural populations more feasible and thereby could help answer fundamental questions about the causes and consequences of evolution in the wild.

  1. Bayesian approach to inverse problems for functions with a variable-index Besov prior

    Science.gov (United States)

    Jia, Junxiong; Peng, Jigen; Gao, Jinghuai

    2016-08-01

    The Bayesian approach has been adopted to solve inverse problems that reconstruct a function from noisy observations. Prior measures play a key role in the Bayesian method. Hence, many probability measures have been proposed, among which total variation (TV) is a well-known prior measure that can preserve sharp edges. However, it has two drawbacks, the staircasing effect and a lack of the discretization-invariant property. The variable-index TV prior has been proposed and analyzed in the area of image analysis for the former, and the Besov prior has been employed recently for the latter. To overcome both issues together, in this paper, we present a variable-index Besov prior measure, which is a non-Gaussian measure. Some useful properties of this new prior measure have been proven for functions defined on a torus. We have also generalized Bayesian inverse theory in infinite dimensions for our new setting. Finally, this theory has been applied to integer- and fractional-order backward diffusion problems. To the best of our knowledge, this is the first time that the Bayesian approach has been used for the fractional-order backward diffusion problem, which provides an opportunity to quantify its uncertainties.

  2. Bayesian parameter inference and model selection by population annealing in systems biology.

    Science.gov (United States)

    Murakami, Yohei

    2014-01-01

    Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named "posterior parameter ensemble". We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.

  3. Bayesian natural selection and the evolution of perceptual systems.

    OpenAIRE

    Geisler, Wilson S.; Diehl, Randy L.

    2002-01-01

    In recent years, there has been much interest in characterizing statistical properties of natural stimuli in order to better understand the design of perceptual systems. A fruitful approach has been to compare the processing of natural stimuli in real perceptual systems with that of ideal observers derived within the framework of Bayesian statistical decision theory. While this form of optimization theory has provided a deeper understanding of the information contained in natural stimuli as w...

  4. Implementation of upper limit calculation for a Poisson variable by Bayesian approach

    Institute of Scientific and Technical Information of China (English)

    ZHU Yong-Sheng

    2008-01-01

    The calculation of Bayesian confidence upper limit for a Poisson variable including both signal and background with and without systematic uncertainties has been formulated.A Fortran 77 routine,BPULE,has been developed to implement the calculation.The routine can account for systematic uncertainties in the background expectation and signal efficiency.The systematic uncertainties may be separately parameterized by a Gaussian,Log-Gaussian or fiat probability density function (pdf).Some technical details of BPULE have been discussed.

  5. Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling

    Directory of Open Access Journals (Sweden)

    D. Das

    2014-04-01

    Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.

  6. Bayesian data fusion for spatial prediction of categorical variables in environmental sciences

    Energy Technology Data Exchange (ETDEWEB)

    Gengler, Sarah, E-mail: sarahgengler@gmail.com; Bogaert, Patrick, E-mail: sarahgengler@gmail.com [Earth and Life Institute, Environmental Sciences. Université catholique de Louvain, Croix du Sud 2/L7.05.16, B-1348 Louvain-la-Neuve (Belgium)

    2014-12-05

    First developed to predict continuous variables, Bayesian Maximum Entropy (BME) has become a complete framework in the context of space-time prediction since it has been extended to predict categorical variables and mixed random fields. This method proposes solutions to combine several sources of data whatever the nature of the information. However, the various attempts that were made for adapting the BME methodology to categorical variables and mixed random fields faced some limitations, as a high computational burden. The main objective of this paper is to overcome this limitation by generalizing the Bayesian Data Fusion (BDF) theoretical framework to categorical variables, which is somehow a simplification of the BME method through the convenient conditional independence hypothesis. The BDF methodology for categorical variables is first described and then applied to a practical case study: the estimation of soil drainage classes using a soil map and point observations in the sandy area of Flanders around the city of Mechelen (Belgium). The BDF approach is compared to BME along with more classical approaches, as Indicator CoKringing (ICK) and logistic regression. Estimators are compared using various indicators, namely the Percentage of Correctly Classified locations (PCC) and the Average Highest Probability (AHP). Although BDF methodology for categorical variables is somehow a simplification of BME approach, both methods lead to similar results and have strong advantages compared to ICK and logistic regression.

  7. Using Bayesian Model Selection to Characterize Neonatal Eeg Recordings

    Science.gov (United States)

    Mitchell, Timothy J.

    2009-12-01

    The brains of premature infants must undergo significant maturation outside of the womb and are thus particularly susceptible to injury. Electroencephalographic (EEG) recordings are an important diagnostic tool in determining if a newborn's brain is functioning normally or if injury has occurred. However, interpreting the recordings is difficult and requires the skills of a trained electroencephelographer. Because these EEG specialists are rare, an automated interpretation of newborn EEG recordings would increase access to an important diagnostic tool for physicians. To automate this procedure, we employ Bayesian probability theory to compute the posterior probability for the EEG features of interest and use the results in a program designed to mimic EEG specialists. Specifically, we will be identifying waveforms of varying frequency and amplitude, as well as periods of flat recordings where brain activity is minimal.

  8. Bayesian methods for meta-analysis of causal relationships estimated using genetic instrumental variables

    DEFF Research Database (Denmark)

    2010-01-01

    of multiple genetic markers measured in multiple studies, based on the analysis of individual participant data. First, for a single genetic marker in one study, we show that the usual ratio of coefficients approach can be reformulated as a regression with heterogeneous error in the explanatory variable......Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context....... This can be implemented using a Bayesian approach, which is next extended to include multiple genetic markers. We then propose a hierarchical model for undertaking a meta-analysis of multiple studies, in which it is not necessary that the same genetic markers are measured in each study. This provides...

  9. A Bayesian outlier criterion to detect SNPs under selection in large data sets.

    Directory of Open Access Journals (Sweden)

    Mathieu Gautier

    Full Text Available BACKGROUND: The recent advent of high-throughput SNP genotyping technologies has opened new avenues of research for population genetics. In particular, a growing interest in the identification of footprints of selection, based on genome scans for adaptive differentiation, has emerged. METHODOLOGY/PRINCIPAL FINDINGS: The purpose of this study is to develop an efficient model-based approach to perform bayesian exploratory analyses for adaptive differentiation in very large SNP data sets. The basic idea is to start with a very simple model for neutral loci that is easy to implement under a bayesian framework and to identify selected loci as outliers via Posterior Predictive P-values (PPP-values. Applications of this strategy are considered using two different statistical models. The first one was initially interpreted in the context of populations evolving respectively under pure genetic drift from a common ancestral population while the second one relies on populations under migration-drift equilibrium. Robustness and power of the two resulting bayesian model-based approaches to detect SNP under selection are further evaluated through extensive simulations. An application to a cattle data set is also provided. CONCLUSIONS/SIGNIFICANCE: The procedure described turns out to be much faster than former bayesian approaches and also reasonably efficient especially to detect loci under positive selection.

  10. Finding the Most Distant Quasars Using Bayesian Selection Methods

    CERN Document Server

    Mortlock, Daniel

    2014-01-01

    Quasars, the brightly glowing disks of material that can form around the super-massive black holes at the centres of large galaxies, are amongst the most luminous astronomical objects known and so can be seen at great distances. The most distant known quasars are seen as they were when the Universe was less than a billion years old (i.e., $\\sim\\!7%$ of its current age). Such distant quasars are, however, very rare, and so are difficult to distinguish from the billions of other comparably-bright sources in the night sky. In searching for the most distant quasars in a recent astronomical sky survey (the UKIRT Infrared Deep Sky Survey, UKIDSS), there were $\\sim\\!10^3$ apparently plausible candidates for each expected quasar, far too many to reobserve with other telescopes. The solution to this problem was to apply Bayesian model comparison, making models of the quasar population and the dominant contaminating population (Galactic stars) to utilise the information content in the survey measurements. The result wa...

  11. A survey of Bayesian predictive methods for model assessment, selection and comparison

    Directory of Open Access Journals (Sweden)

    Aki Vehtari

    2012-01-01

    Full Text Available To date, several methods exist in the statistical literature formodel assessment, which purport themselves specifically as Bayesian predictive methods. The decision theoretic assumptions on which these methodsare based are not always clearly stated in the original articles, however.The aim of this survey is to provide a unified review of Bayesian predictivemodel assessment and selection methods, and of methods closely related tothem. We review the various assumptions that are made in this context anddiscuss the connections between different approaches, with an emphasis onhow each method approximates the expected utility of using a Bayesianmodel for the purpose of predicting future data.

  12. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    KAUST Repository

    Elsheikh, Ahmed H.

    2014-02-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems. © 2013 Elsevier Inc.

  13. Stochastic search variable selection for identifying multiple quantitative trait loci.

    Science.gov (United States)

    Yi, Nengjun; George, Varghese; Allison, David B

    2003-07-01

    In this article, we utilize stochastic search variable selection methodology to develop a Bayesian method for identifying multiple quantitative trait loci (QTL) for complex traits in experimental designs. The proposed procedure entails embedding multiple regression in a hierarchical normal mixture model, where latent indicators for all markers are used to identify the multiple markers. The markers with significant effects can be identified as those with higher posterior probability included in the model. A simple and easy-to-use Gibbs sampler is employed to generate samples from the joint posterior distribution of all unknowns including the latent indicators, genetic effects for all markers, and other model parameters. The proposed method was evaluated using simulated data and illustrated using a real data set. The results demonstrate that the proposed method works well under typical situations of most QTL studies in terms of number of markers and marker density. PMID:12871920

  14. Variable selection: Current practice in epidemiological studies

    NARCIS (Netherlands)

    S. Walter (Stefan); H.W. Tiemeier (Henning)

    2009-01-01

    textabstractSelection of covariates is among the most controversial and difficult tasks in epidemiologic analysis. Correct variable selection addresses the problem of confounding in etiologic research and allows unbiased estimation of probabilities in prognostic studies. The aim of this commentary i

  15. A default Bayesian hypothesis test for ANOVA designs

    NARCIS (Netherlands)

    R. Wetzels; R.P.P.P. Grasman; E.J. Wagenmakers

    2012-01-01

    This article presents a Bayesian hypothesis test for analysis of variance (ANOVA) designs. The test is an application of standard Bayesian methods for variable selection in regression models. We illustrate the effect of various g-priors on the ANOVA hypothesis test. The Bayesian test for ANOVA desig

  16. Variable Selection in Logistic Regression Mo del

    Institute of Scientific and Technical Information of China (English)

    ZHANG Shangli; ZHANG Lili; QIU Kuanmin; LU Ying; CAI Baigen

    2015-01-01

    Variable selection is one of the most impor-tant problems in pattern recognition. In linear regression model, there are many methods can solve this problem, such as Least absolute shrinkage and selection operator (LASSO) and many improved LASSO methods, but there are few variable selection methods in generalized linear models. We study the variable selection problem in logis-tic regression model. We propose a new variable selection method–the logistic elastic net, prove that it has grouping eff ect which means that the strongly correlated predictors tend to be in or out of the model together. The logistic elastic net is particularly useful when the number of pre-dictors (p) is much bigger than the number of observations (n). By contrast, the LASSO is not a very satisfactory vari-able selection method in the case when p is more larger than n. The advantage and eff ectiveness of this method are demonstrated by real leukemia data and a simulation study.

  17. Sea-level variability in tide-gauge and geological records: An empirical Bayesian analysis (Invited)

    Science.gov (United States)

    Kopp, R. E.; Hay, C.; Morrow, E.; Mitrovica, J. X.; Horton, B.; Kemp, A.

    2013-12-01

    Sea level varies at a range of temporal and spatial scales, and understanding all its significant sources of variability is crucial to building sea-level rise projections relevant to local decision-making. In the twentieth-century record, sites along the U.S. east coast have exhibited typical year-to-year variability of several centimeters. A faster-than-global increase in sea-level rise in the northeastern United States since about 1990 has led some to hypothesize a 'sea-level rise hot spot' in this region, perhaps driven by a trend in the Atlantic Meridional Overturning Circulation related to anthropogenic climate change [1]. However, such hypotheses must be evaluated in the context of natural variability, as revealed by observational and paleo-records. Bayesian and empirical Bayesian statistical approaches are well suited for assimilating data from diverse sources, such as tide-gauges and peats, with differing data availability and uncertainties, and for identifying regionally covarying patterns within these data. We present empirical Bayesian analyses of twentieth-century tide gauge data [2]. We find that the mid-Atlantic region of the United States has experienced a clear acceleration of sea level relative to the global average since about 1990, but this acceleration does not appear to be unprecedented in the twentieth-century record. The rate and extent of this acceleration instead appears comparable to an acceleration observed in the 1930s and 1940s. Both during the earlier episode of acceleration and today, the effect appears to be significantly positively correlated with the Atlantic Multidecadal Oscillation and likely negatively correlated with the North Atlantic Oscillation [2]. The Holocene and Common Era database of geological sea-level rise proxies [3,4] may allow these relationships to be assessed beyond the span of the direct observational record. At a global scale, similar approaches can be employed to look for the spatial fingerprints of land ice

  18. QUASAR SELECTION BASED ON PHOTOMETRIC VARIABILITY

    International Nuclear Information System (INIS)

    We develop a method for separating quasars from other variable point sources using Sloan Digital Sky Survey (SDSS) Stripe 82 light-curve data for ∼ 10,000 variable objects. To statistically describe quasar variability, we use a damped random walk model parametrized by a damping timescale, τ, and an asymptotic amplitude (structure function), SF∞. With the aid of an SDSS spectroscopically confirmed quasar sample, we demonstrate that variability selection in typical extragalactic fields with low stellar density can deliver complete samples with reasonable purity (or efficiency, E). Compared to a selection method based solely on the slope of the structure function, the inclusion of the τ information boosts E from 60% to 75% while maintaining a highly complete sample (98%) even in the absence of color information. For a completeness of C = 90%, E is boosted from 80% to 85%. Conversely, C improves from 90% to 97% while maintaining E = 80% when imposing a lower limit on τ. With the aid of color selection, the purity can be further boosted to 96%, with C = 93%. Hence, selection methods based on variability will play an important role in the selection of quasars with data provided by upcoming large sky surveys, such as Pan-STARRS and the Large Synoptic Survey Telescope (LSST). For a typical (simulated) LSST cadence over 10 years and a photometric accuracy of 0.03 mag (achieved at i ∼ 22), C is expected to be 88% for a simple sample selection criterion of >100 days. In summary, given an adequate survey cadence, photometric variability provides an even better method than color selection for separating quasars from stars.

  19. Heart rate variability estimation in photoplethysmography signals using Bayesian learning approach.

    Science.gov (United States)

    Alqaraawi, Ahmed; Alwosheel, Ahmad; Alasaad, Amr

    2016-06-01

    Heart rate variability (HRV) has become a marker for various health and disease conditions. Photoplethysmography (PPG) sensors integrated in wearable devices such as smart watches and phones are widely used to measure heart activities. HRV requires accurate estimation of time interval between consecutive peaks in the PPG signal. However, PPG signal is very sensitive to motion artefact which may lead to poor HRV estimation if false peaks are detected. In this Letter, the authors propose a probabilistic approach based on Bayesian learning to better estimate HRV from PPG signal recorded by wearable devices and enhance the performance of the automatic multi scale-based peak detection (AMPD) algorithm used for peak detection. The authors' experiments show that their approach enhances the performance of the AMPD algorithm in terms of number of HRV related metrics such as sensitivity, positive predictive value, and average temporal resolution. PMID:27382483

  20. Variable and subset selection in PLS regression

    DEFF Research Database (Denmark)

    Høskuldsson, Agnar

    2001-01-01

    The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...... obtained by different methods. We also present an approach to orthogonal scatter correction. The procedures and comparisons are applied to industrial data. (C) 2001 Elsevier Science B.V. All rights reserved....

  1. The selective bleed variable cycle engine

    OpenAIRE

    Nascimento, M. A. R.

    1992-01-01

    A new concept in aircraft propulsion is described in this work. In particular, variable jet engine is investigated for supersonic ASTOVL aircraft. This engine is a Selective Bleed Variable Cycle, twin shaft turbofan. At low flight speeds the engine operates as a medium bypass turbofan. At supersonic cruise it operates as low bypass turbofan without reheat. The performance of the engine and its components is analyzed using a novel matching procedure. Off-design engine performance characterist...

  2. Bayesian model selection applied to artificial neural networks used for water resources modeling

    Science.gov (United States)

    Kingston, Greer B.; Maier, Holger R.; Lambert, Martin F.

    2008-04-01

    Artificial neural networks (ANNs) have proven to be extremely valuable tools in the field of water resources engineering. However, one of the most difficult tasks in developing an ANN is determining the optimum level of complexity required to model a given problem, as there is no formal systematic model selection method. This paper presents a Bayesian model selection (BMS) method for ANNs that provides an objective approach for comparing models of varying complexity in order to select the most appropriate ANN structure. The approach uses Markov Chain Monte Carlo posterior simulations to estimate the evidence in favor of competing models and, in this study, three known methods for doing this are compared in terms of their suitability for being incorporated into the proposed BMS framework for ANNs. However, it is acknowledged that it can be particularly difficult to accurately estimate the evidence of ANN models. Therefore, the proposed BMS approach for ANNs incorporates a further check of the evidence results by inspecting the marginal posterior distributions of the hidden-to-output layer weights, which unambiguously indicate any redundancies in the hidden layer nodes. The fact that this check is available is one of the greatest advantages of the proposed approach over conventional model selection methods, which do not provide such a test and instead rely on the modeler's subjective choice of selection criterion. The advantages of a total Bayesian approach to ANN development, including training and model selection, are demonstrated on two synthetic and one real world water resources case study.

  3. Coping with Trial-to-Trial Variability of Event Related Signals: A Bayesian Inference Approach

    Science.gov (United States)

    Ding, Mingzhou; Chen, Youghong; Knuth, Kevin H.; Bressler, Steven L.; Schroeder, Charles E.

    2005-01-01

    In electro-neurophysiology, single-trial brain responses to a sensory stimulus or a motor act are commonly assumed to result from the linear superposition of a stereotypic event-related signal (e.g. the event-related potential or ERP) that is invariant across trials and some ongoing brain activity often referred to as noise. To extract the signal, one performs an ensemble average of the brain responses over many identical trials to attenuate the noise. To date, h s simple signal-plus-noise (SPN) model has been the dominant approach in cognitive neuroscience. Mounting empirical evidence has shown that the assumptions underlying this model may be overly simplistic. More realistic models have been proposed that account for the trial-to-trial variability of the event-related signal as well as the possibility of multiple differentially varying components within a given ERP waveform. The variable-signal-plus-noise (VSPN) model, which has been demonstrated to provide the foundation for separation and characterization of multiple differentially varying components, has the potential to provide a rich source of information for questions related to neural functions that complement the SPN model. Thus, being able to estimate the amplitude and latency of each ERP component on a trial-by-trial basis provides a critical link between the perceived benefits of the VSPN model and its many concrete applications. In this paper we describe a Bayesian approach to deal with this issue and the resulting strategy is referred to as the differentially Variable Component Analysis (dVCA). We compare the performance of dVCA on simulated data with Independent Component Analysis (ICA) and analyze neurobiological recordings from monkeys performing cognitive tasks.

  4. Disaggregating measurement uncertainty from population variability and Bayesian treatment of uncensored results

    International Nuclear Information System (INIS)

    In making low-level radioactivity measurements of populations, it is commonly observed that a substantial portion of net results are negative. Furthermore, the observed variance of the measurement results arises from a combination of measurement uncertainty and population variability. This paper presents a method for disaggregating measurement uncertainty from population variability to produce a probability density function (PDF) of possibly true results. To do this, simple, justifiable, and reasonable assumptions are made about the relationship of the measurements to the measurands (the 'true values'). The measurements are assumed to be unbiased, that is, that their average value is the average of the measurands. Using traditional estimates of each measurement's uncertainty to disaggregate population variability from measurement uncertainty, a PDF of measurands for the population is produced. Then, using Bayes's theorem, the same assumptions, and all the data from the population of individuals, a prior PDF is computed for each individual's measurand. These PDFs are non-negative, and their average is equal to the average of the measurement results for the population. The uncertainty in these Bayesian posterior PDFs is all Berkson with no remaining classical component. The methods are applied to baseline bioassay data from the Hanford site. The data include 90Sr urinalysis measurements on 128 people, 137Cs in vivo measurements on 5,337 people, and 239Pu urinalysis measurements on 3,270 people. The method produces excellent results for the 90Sr and 137Cs measurements, since there are nonzero concentrations of these global fallout radionuclides in people who have not been occupationally exposed. The method does not work for the 239Pu measurements in non-occupationally exposed people because the population average is essentially zero.

  5. Bayesian model selection for a finite element model of a large civil aircraft

    Energy Technology Data Exchange (ETDEWEB)

    Hemez, F. M. (François M.); Rutherford, A. C. (Amanda C.)

    2004-01-01

    Nine aircraft stiffness parameters have been varied and used as inputs to a finite element model of an aircraft to generate natural frequency and deflection features (Goge, 2003). This data set (147 input parameter configurations and associated outputs) is now used to generate a metamodel, or a fast running surrogate model, using Bayesian model selection methods. Once a forward relationship is defined, the metamodel may be used in an inverse sense. That is, knowing the measured output frequencies and deflections, what were the input stiffness parameters that caused them?

  6. Disaggregating measurement uncertainty from population variability and Bayesian treatment of uncensored results.

    Science.gov (United States)

    Strom, Daniel J; Joyce, Kevin E; MacLellan, Jay A; Watson, David J; Lynch, Timothy P; Antonio, Cheryl L; Birchall, Alan; Anderson, Kevin K; Zharov, Peter A

    2012-04-01

    In making low-level radioactivity measurements of populations, it is commonly observed that a substantial portion of net results is negative. Furthermore, the observed variance of the measurement results arises from a combination of measurement uncertainty and population variability. This paper presents a method for disaggregating measurement uncertainty from population variability to produce a probability density function (PDF) of possibly true results. To do this, simple, justifiable and reasonable assumptions are made about the relationship of the measurements to the measurands (the 'true values'). The measurements are assumed to be unbiased, that is, that their average value is the average of the measurands. Using traditional estimates of each measurement's uncertainty, a likelihood PDF for each individual's measurand is produced. Then using the same assumptions and all the data from the population of individuals, a prior PDF of measurands for the population is produced. The prior PDF is non-negative, and the average is equal to the average of the measurement results for the population. Using Bayes's theorem, posterior PDFs of each individual measurand are calculated. The uncertainty in these bayesian posterior PDFs appears to be all Berkson with no remaining classical component. The method is applied to baseline bioassay data from the Hanford site. The data include (90)Sr urinalysis measurements of 128 people, (137)Cs in vivo measurements of 5337 people and (239)Pu urinalysis measurements of 3270 people. The method produces excellent results for the (90)Sr and (137)Cs measurements, since there are non-zero concentrations of these global fallout radionuclides in people who have not been occupationally exposed. The method does not work for the (239)Pu measurements in non-occupationally exposed people because the population average is essentially zero relative to the sensitivity of the measurement technique. The method is shown to give results similar to

  7. Bayesian cross-validation for model evaluation and selection, with application to the North American Breeding Bird Survey

    Science.gov (United States)

    Link, William; Sauer, John R.

    2016-01-01

    The analysis of ecological data has changed in two important ways over the last 15 years. The development and easy availability of Bayesian computational methods has allowed and encouraged the fitting of complex hierarchical models. At the same time, there has been increasing emphasis on acknowledging and accounting for model uncertainty. Unfortunately, the ability to fit complex models has outstripped the development of tools for model selection and model evaluation: familiar model selection tools such as Akaike's information criterion and the deviance information criterion are widely known to be inadequate for hierarchical models. In addition, little attention has been paid to the evaluation of model adequacy in context of hierarchical modeling, i.e., to the evaluation of fit for a single model. In this paper, we describe Bayesian cross-validation, which provides tools for model selection and evaluation. We describe the Bayesian predictive information criterion and a Bayesian approximation to the BPIC known as the Watanabe-Akaike information criterion. We illustrate the use of these tools for model selection, and the use of Bayesian cross-validation as a tool for model evaluation, using three large data sets from the North American Breeding Bird Survey.

  8. Variable Selection in Model-based Clustering: A General Variable Role Modeling

    OpenAIRE

    Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure

    2008-01-01

    The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally indepe...

  9. Bayesian model selection without evidences: application to the dark energy equation-of-state

    CERN Document Server

    Hee, Sonke; Hobson, Mike P; Lasenby, Anthony N

    2015-01-01

    A method is presented for Bayesian model selection without explicitly computing evidences, by using a combined likelihood and introducing an integer model selection parameter $n$ so that Bayes factors, or more generally posterior odds ratios, may be read off directly from the posterior of $n$. If the total number of models under consideration is specified a priori, the full joint parameter space $(\\theta, n)$ of the models is of fixed dimensionality and can be explored using standard MCMC or nested sampling methods, without the need for reversible jump MCMC techniques. The posterior on $n$ is then obtained by straightforward marginalisation. We demonstrate the efficacy of our approach by application to several toy models. We then apply it to constraining the dark energy equation-of-state using a free-form reconstruction technique. We show that $\\Lambda$CDM is significantly favoured over all extensions, including the simple $w(z){=}{\\rm constant}$ model.

  10. A Bayesian Network Approach for Offshore Risk Analysis Through Linguistic Variables

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    This paper presents a new approach for offshore risk analysis that is capable of dealing with linguistic probabilities in Bayesian networks (BNs). In this paper, linguistic probabilities are used to describe occurrence likelihood of hazardous events that may cause possible accidents in offshore operations. In order to use fuzzy information, an f-weighted valuation function is proposed to transform linguistic judgements into crisp probability distributions which can be easily put into a BN to model causal relationships among risk factors. The use of linguistic variables makes it easier for human experts to express their knowledge, and the transformation of linguistic judgements into crisp probabilities can significantly save the cost of computation, modifying and maintaining a BN model. The flexibility of the method allows for multiple forms of information to be used to quantify model relationships, including formally assessed expert opinion when quantitative data are lacking, or when only qualitative or vague statements can be made. The model is a modular representation of uncertain knowledge caused due to randomness, vagueness and ignorance. This makes the risk analysis of offshore engineering systems more functional and easier in many assessment contexts. Specifically, the proposed f-weighted valuation function takes into account not only the dominating values, but also the α-level values that are ignored by conventional valuation methods. A case study of the collision risk between a Floating Production, Storage and Off-loading (FPSO) unit and the authorised vessels due to human elements during operation is used to illustrate the application of the proposed model.

  11. Gamma prior distribution selection for Bayesian analysis of failure rate and reliability

    International Nuclear Information System (INIS)

    It is assumed that the phenomenon under study is such that the time-to-failure may be modeled by an exponential distribution with failure rate lambda. For Bayesian analyses of the assumed model, the family of gamma distributions provides conjugate prior models for lambda. Thus, an experimenter needs to select a particular gamma model to conduct a Bayesian reliability analysis. The purpose of this report is to present a methodology that can be used to translate engineering information, experience, and judgment into a choice of a gamma prior distribution. The proposed methodology assumes that the practicing engineer can provide percentile data relating to either the failure rate or the reliability of the phenomenon being investigated. For example, the methodology will select the gamma prior distribution which conveys an engineer's belief that the failure rate lambda simultaneously satisfies the probability statements, P(lambda less than 1.0 x 10-3) equals 0.50 and P(lambda less than 1.0 x 10-5) equals 0.05. That is, two percentiles provided by an engineer are used to determine a gamma prior model which agrees with the specified percentiles. For those engineers who prefer to specify reliability percentiles rather than the failure rate percentiles illustrated above, it is possible to use the induced negative-log gamma prior distribution which satisfies the probability statements, P(R(t0) less than 0.99) equals 0.50 and P(R(t0) less than 0.99999) equals 0.95, for some operating time t0. The report also includes graphs for selected percentiles which assist an engineer in applying the procedure. 28 figures, 16 tables

  12. Gamma prior distribution selection for Bayesian analysis of failure rate and reliability

    Energy Technology Data Exchange (ETDEWEB)

    Waller, R.A.; Johnson, M.M.; Waterman, M.S.; Martz, H.F. Jr.

    1976-07-01

    It is assumed that the phenomenon under study is such that the time-to-failure may be modeled by an exponential distribution with failure rate lambda. For Bayesian analyses of the assumed model, the family of gamma distributions provides conjugate prior models for lambda. Thus, an experimenter needs to select a particular gamma model to conduct a Bayesian reliability analysis. The purpose of this report is to present a methodology that can be used to translate engineering information, experience, and judgment into a choice of a gamma prior distribution. The proposed methodology assumes that the practicing engineer can provide percentile data relating to either the failure rate or the reliability of the phenomenon being investigated. For example, the methodology will select the gamma prior distribution which conveys an engineer's belief that the failure rate lambda simultaneously satisfies the probability statements, P(lambda less than 1.0 x 10/sup -3/) equals 0.50 and P(lambda less than 1.0 x 10/sup -5/) equals 0.05. That is, two percentiles provided by an engineer are used to determine a gamma prior model which agrees with the specified percentiles. For those engineers who prefer to specify reliability percentiles rather than the failure rate percentiles illustrated above, it is possible to use the induced negative-log gamma prior distribution which satisfies the probability statements, P(R(t/sub 0/) less than 0.99) equals 0.50 and P(R(t/sub 0/) less than 0.99999) equals 0.95, for some operating time t/sub 0/. The report also includes graphs for selected percentiles which assist an engineer in applying the procedure. 28 figures, 16 tables.

  13. A Framework for Parameter Estimation and Model Selection from Experimental Data in Systems Biology Using Approximate Bayesian Computation

    Science.gov (United States)

    Liepe, Juliane; Kirk, Paul; Filippi, Sarah; Toni, Tina; Barnes, Chris P.; Stumpf, Michael P.H.

    2016-01-01

    As modeling becomes a more widespread practice in the life- and biomedical sciences, we require reliable tools to calibrate models against ever more complex and detailed data. Here we present an approximate Bayesian computation framework and software environment, ABC-SysBio, which enables parameter estimation and model selection in the Bayesian formalism using Sequential Monte-Carlo approaches. We outline the underlying rationale, discuss the computational and practical issues, and provide detailed guidance as to how the important tasks of parameter inference and model selection can be carried out in practice. Unlike other available packages, ABC-SysBio is highly suited for investigating in particular the challenging problem of fitting stochastic models to data. Although computationally expensive, the additional insights gained in the Bayesian formalism more than make up for this cost, especially in complex problems. PMID:24457334

  14. Introduction to Bayesian statistics

    CERN Document Server

    Bolstad, William M

    2016-01-01

    There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this Third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian staistics. The author continues to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inferenfe cfor discrete random variables, bionomial proprotion, Poisson, normal mean, and simple linear regression. In addition, newly-developing topics in the field are presented in four new chapters: Bayesian inference with unknown mean and variance; Bayesian inference for Multivariate Normal mean vector; Bayesian inference for Multiple Linear RegressionModel; and Computati...

  15. Selection of Trusted Service Providers by Enforcing Bayesian Analysis in iVCE

    Institute of Scientific and Technical Information of China (English)

    GU Bao-jun; LI Xiao-yong; WANG Wei-nong

    2008-01-01

    The initiative of internet-based virtual computing environment (iVCE) aims to provide the end users and applications With a harmonious, trustworthy and transparent integrated computing environment which will facilitate sharing and collaborating of network resources between applications. Trust management is an elementary component for iVCE. The uncertain and dynamic characteristics of iVCE necessitate the requirement for the trust management to be subjective, historical evidence based and context dependent. This paper presents a Bayesian analysis-based trust model, which aims to secure the active agents for selecting appropriate trustod services in iVCE. Simulations are made to analyze the properties of the trust model which show that the subjective prior information influences trust evaluation a lot and the model stimulates positive interactions.

  16. Optimizing the Amount of Models Taken into Consideration During Model Selection in Bayesian Networks

    NARCIS (Netherlands)

    Castelo, J.R.; Siebes, A.P.J.M.

    1999-01-01

    Graphical model selection from data embodies several difficulties. Among them, it is specially challenging the size of the sample space of models on which one should carry out model selection, even considering only a modest amount of variables. This becomes more severe when one works on those graphi

  17. A Bayesian Approach for Nonlinear Structural Equation Models with Dichotomous Variables Using Logit and Probit Links

    Science.gov (United States)

    Lee, Sik-Yum; Song, Xin-Yuan; Cai, Jing-Heng

    2010-01-01

    Analysis of ordered binary and unordered binary data has received considerable attention in social and psychological research. This article introduces a Bayesian approach, which has several nice features in practical applications, for analyzing nonlinear structural equation models with dichotomous data. We demonstrate how to use the software…

  18. Variable selection in model-based discriminant analysis

    OpenAIRE

    Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure

    2010-01-01

    A general methodology for selecting predictors for Gaussian generative classification models is presented. The problem is regarded as a model selection problem. Three different roles for each possible predictor are considered: a variable can be a relevant classification predictor or not, and the irrelevant classification variables can be linearly dependent on a part of the relevant predictors or independent variables. This variable selection model was inspired by the model-based clustering mo...

  19. Adverse and Advantageous Selection in the Medicare Supplemental Market: A Bayesian Analysis of Prescription drug Expenditure.

    Science.gov (United States)

    Li, Qian; Trivedi, Pravin K

    2016-02-01

    This paper develops an extended specification of the two-part model, which controls for unobservable self-selection and heterogeneity of health insurance, and analyzes the impact of Medicare supplemental plans on the prescription drug expenditure of the elderly, using a linked data set based on the Medicare Current Beneficiary Survey data for 2003-2004. The econometric analysis is conducted using a Bayesian econometric framework. We estimate the treatment effects for different counterfactuals and find significant evidence of endogeneity in plan choice and the presence of both adverse and advantageous selections in the supplemental insurance market. The average incentive effect is estimated to be $757 (2004 value) or 41% increase per person per year for the elderly enrolled in supplemental plans with drug coverage against the Medicare fee-for-service counterfactual and is $350 or 21% against the supplemental plans without drug coverage counterfactual. The incentive effect varies by different sources of drug coverage: highest for employer-sponsored insurance plans, followed by Medigap and managed medicare plans.

  20. Adverse and Advantageous Selection in the Medicare Supplemental Market: A Bayesian Analysis of Prescription drug Expenditure.

    Science.gov (United States)

    Li, Qian; Trivedi, Pravin K

    2016-02-01

    This paper develops an extended specification of the two-part model, which controls for unobservable self-selection and heterogeneity of health insurance, and analyzes the impact of Medicare supplemental plans on the prescription drug expenditure of the elderly, using a linked data set based on the Medicare Current Beneficiary Survey data for 2003-2004. The econometric analysis is conducted using a Bayesian econometric framework. We estimate the treatment effects for different counterfactuals and find significant evidence of endogeneity in plan choice and the presence of both adverse and advantageous selections in the supplemental insurance market. The average incentive effect is estimated to be $757 (2004 value) or 41% increase per person per year for the elderly enrolled in supplemental plans with drug coverage against the Medicare fee-for-service counterfactual and is $350 or 21% against the supplemental plans without drug coverage counterfactual. The incentive effect varies by different sources of drug coverage: highest for employer-sponsored insurance plans, followed by Medigap and managed medicare plans. PMID:25504934

  1. Comparison of Two Gas Selection Methodologies: An Application of Bayesian Model Averaging

    Energy Technology Data Exchange (ETDEWEB)

    Renholds, Andrea S.; Thompson, Sandra E.; Anderson, Kevin K.; Chilton, Lawrence K.

    2006-03-31

    One goal of hyperspectral imagery analysis is the detection and characterization of plumes. Characterization includes identifying the gases in the plumes, which is a model selection problem. Two gas selection methods compared in this report are Bayesian model averaging (BMA) and minimum Akaike information criterion (AIC) stepwise regression (SR). Simulated spectral data from a three-layer radiance transfer model were used to compare the two methods. Test gases were chosen to span the types of spectra observed, which exhibit peaks ranging from broad to sharp. The size and complexity of the search libraries were varied. Background materials were chosen to either replicate a remote area of eastern Washington or feature many common background materials. For many cases, BMA and SR performed the detection task comparably in terms of the receiver operating characteristic curves. For some gases, BMA performed better than SR when the size and complexity of the search library increased. This is encouraging because we expect improved BMA performance upon incorporation of prior information on background materials and gases.

  2. A Bayesian Approach to Service Selection for Secondary Users in Cognitive Radio Networks

    Directory of Open Access Journals (Sweden)

    Elaheh Homayounvala

    2015-10-01

    Full Text Available In cognitive radio networks where secondary users (SUs use the time-frequency gaps of primary users' (PUs licensed spectrum opportunistically, the experienced throughput of SUs depend not only on the traffic load of the PUs but also on the PUs' service type. Each service has its own pattern of channel usage, and if the SUs know the dominant pattern of primary channel usage, then they can make a better decision on choosing which service is better to be used at a specific time to get the best advantage of the primary channel, in terms of higher achievable throughput. However, it is difficult to inform directly SUs of PUs' dominant used services in each area, for practical reasons. This paper proposes a learning mechanism embedded in SUs to sense the primary channel for a specific length of time. This algorithm recommends the SUs upon sensing a free primary channel, to choose the best service in order to get the best performance, in terms of maximum achieved throughput and the minimum experienced delay. The proposed learning mechanism is based on a Bayesian approach that can predict the performance of a requested service for a given SU. Simulation results show that this service selection method outperforms the blind opportunistic SU service selection, significantly.

  3. Variable Selection and Updating In Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications.

    Science.gov (United States)

    Murphy, Thomas Brendan; Dean, Nema; Raftery, Adrian E

    2010-03-01

    Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins.

  4. Bayesian Graphical Models

    DEFF Research Database (Denmark)

    Jensen, Finn Verner; Nielsen, Thomas Dyhre

    2016-01-01

    Mathematically, a Bayesian graphical model is a compact representation of the joint probability distribution for a set of variables. The most frequently used type of Bayesian graphical models are Bayesian networks. The structural part of a Bayesian graphical model is a graph consisting of nodes...... and edges. The nodes represent variables, which may be either discrete or continuous. An edge between two nodes A and B indicates a direct influence between the state of A and the state of B, which in some domains can also be interpreted as a causal relation. The wide-spread use of Bayesian networks...... is largely due to the availability of efficient inference algorithms for answering probabilistic queries about the states of the variables in the network. Furthermore, to support the construction of Bayesian network models, learning algorithms are also available. We give an overview of the Bayesian network...

  5. Bayesian model selection for testing the no-hair theorem with black hole ringdowns

    CERN Document Server

    Gossan, S; Sathyaprakash, B S

    2011-01-01

    General relativity predicts that a black hole that results from the merger of two compact stars (either black holes or neutron stars) is initially highly deformed but soon settles down to a quiescent state by emitting a superposition of quasi-normal modes (QNMs). The QNMs are damped sinusoids with characteristic frequencies and decay times that depend only on the mass and spin of the black hole and no other parameter - a statement of the no-hair theorem. In this paper we have examined the extent to which QNMs could be used to test the no-hair theorem with future ground- and space-based gravitational-wave detectors. We model departures from general relativity (GR) by introducing extra parameters which change the mode frequencies or decay times from their general relativistic values. With the aid of numerical simulations and Bayesian model selection, we assess the extent to which the presence of such a parameter could be inferred, and its value estimated. We find that it is harder to decipher the departure of d...

  6. Finding the right balance between groundwater model complexity and experimental effort via Bayesian model selection

    Science.gov (United States)

    Schöniger, Anneli; Illman, Walter A.; Wöhling, Thomas; Nowak, Wolfgang

    2015-12-01

    Groundwater modelers face the challenge of how to assign representative parameter values to the studied aquifer. Several approaches are available to parameterize spatial heterogeneity in aquifer parameters. They differ in their conceptualization and complexity, ranging from homogeneous models to heterogeneous random fields. While it is common practice to invest more effort into data collection for models with a finer resolution of heterogeneities, there is a lack of advice which amount of data is required to justify a certain level of model complexity. In this study, we propose to use concepts related to Bayesian model selection to identify this balance. We demonstrate our approach on the characterization of a heterogeneous aquifer via hydraulic tomography in a sandbox experiment (Illman et al., 2010). We consider four increasingly complex parameterizations of hydraulic conductivity: (1) Effective homogeneous medium, (2) geology-based zonation, (3) interpolation by pilot points, and (4) geostatistical random fields. First, we investigate the shift in justified complexity with increasing amount of available data by constructing a model confusion matrix. This matrix indicates the maximum level of complexity that can be justified given a specific experimental setup. Second, we determine which parameterization is most adequate given the observed drawdown data. Third, we test how the different parameterizations perform in a validation setup. The results of our test case indicate that aquifer characterization via hydraulic tomography does not necessarily require (or justify) a geostatistical description. Instead, a zonation-based model might be a more robust choice, but only if the zonation is geologically adequate.

  7. Variable Selection of Partially Linear Single-index Mo dels

    Institute of Scientific and Technical Information of China (English)

    LU Yi-qiang; HU Bin

    2014-01-01

    In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average variance with adaptive l1 penalty. Implementation algorithm is given. Under some regular conditions, we demonstrate the oracle properties of aLASSO procedure for PLSIM. Simulations are used to investigate the effectiveness of the proposed method for variable selection of PLSIM.

  8. Fast Selection of Spectral Variables with B-Spline Compression

    CERN Document Server

    Rossi, Fabrice; Wertz, Vincent; Meurens, Marc; Verleysen, Michel

    2007-01-01

    The large number of spectral variables in most data sets encountered in spectral chemometrics often renders the prediction of a dependent variable uneasy. The number of variables hopefully can be reduced, by using either projection techniques or selection methods; the latter allow for the interpretation of the selected variables. Since the optimal approach of testing all possible subsets of variables with the prediction model is intractable, an incremental selection approach using a nonparametric statistics is a good option, as it avoids the computationally intensive use of the model itself. It has two drawbacks however: the number of groups of variables to test is still huge, and colinearities can make the results unstable. To overcome these limitations, this paper presents a method to select groups of spectral variables. It consists in a forward-backward procedure applied to the coefficients of a B-Spline representation of the spectra. The criterion used in the forward-backward procedure is the mutual infor...

  9. Using Instrumental Variables Properly to Account for Selection Effects

    Science.gov (United States)

    Porter, Stephen R.

    2012-01-01

    Selection bias is problematic when evaluating the effects of postsecondary interventions on college students, and can lead to biased estimates of program effects. While instrumental variables can be used to account for endogeneity due to self-selection, current practice requires that all five assumptions of instrumental variables be met in order…

  10. The Properties of Model Selection when Retaining Theory Variables

    DEFF Research Database (Denmark)

    Hendry, David F.; Johansen, Søren

    Economic theories are often fitted directly to data to avoid possible model selection biases. We show that embedding a theory model that specifies the correct set of m relevant exogenous variables, x{t}, within the larger set of m+k candidate variables, (x{t},w{t}), then selection over the second...

  11. Linking bovine tuberculosis on cattle farms to white-tailed deer and environmental variables using Bayesian hierarchical analysis.

    Directory of Open Access Journals (Sweden)

    W David Walter

    Full Text Available Bovine tuberculosis is a bacterial disease caused by Mycobacterium bovis in livestock and wildlife with hosts that include Eurasian badgers (Meles meles, brushtail possum (Trichosurus vulpecula, and white-tailed deer (Odocoileus virginianus. Risk-assessment efforts in Michigan have been initiated on farms to minimize interactions of cattle with wildlife hosts but research on M. bovis on cattle farms has not investigated the spatial context of disease epidemiology. To incorporate spatially explicit data, initial likelihood of infection probabilities for cattle farms tested for M. bovis, prevalence of M. bovis in white-tailed deer, deer density, and environmental variables for each farm were modeled in a Bayesian hierarchical framework. We used geo-referenced locations of 762 cattle farms that have been tested for M. bovis, white-tailed deer prevalence, and several environmental variables that may lead to long-term survival and viability of M. bovis on farms and surrounding habitats (i.e., soil type, habitat type. Bayesian hierarchical analyses identified deer prevalence and proportion of sandy soil within our sampling grid as the most supported model. Analysis of cattle farms tested for M. bovis identified that for every 1% increase in sandy soil resulted in an increase in odds of infection by 4%. Our analysis revealed that the influence of prevalence of M. bovis in white-tailed deer was still a concern even after considerable efforts to prevent cattle interactions with white-tailed deer through on-farm mitigation and reduction in the deer population. Cattle farms test positive for M. bovis annually in our study area suggesting that the potential for an environmental source either on farms or in the surrounding landscape may contributing to new or re-infections with M. bovis. Our research provides an initial assessment of potential environmental factors that could be incorporated into additional modeling efforts as more knowledge of deer herd

  12. The Time Domain Spectroscopic Survey: Variable Selection and Anticipated Results

    Science.gov (United States)

    Morganson, Eric; Green, Paul J.; Anderson, Scott F.; Ruan, John J.; Myers, Adam D.; Eracleous, Michael; Kelly, Brandon; Badenes, Carlos; Bañados, Eduardo; Blanton, Michael R.; Bershady, Matthew A.; Borissova, Jura; Nielsen Brandt, William; Burgett, William S.; Chambers, Kenneth; Draper, Peter W.; Davenport, James R. A.; Flewelling, Heather; Garnavich, Peter; Hawley, Suzanne L.; Hodapp, Klaus W.; Isler, Jedidah C.; Kaiser, Nick; Kinemuchi, Karen; Kudritzki, Rolf P.; Metcalfe, Nigel; Morgan, Jeffrey S.; Pâris, Isabelle; Parvizi, Mahmoud; Poleski, Radosław; Price, Paul A.; Salvato, Mara; Shanks, Tom; Schlafly, Eddie F.; Schneider, Donald P.; Shen, Yue; Stassun, Keivan; Tonry, John T.; Walter, Fabian; Waters, Chris Z.

    2015-06-01

    We present the selection algorithm and anticipated results for the Time Domain Spectroscopic Survey (TDSS). TDSS is an Sloan Digital Sky Survey (SDSS)-IV Extended Baryon Oscillation Spectroscopic Survey (eBOSS) subproject that will provide initial identification spectra of approximately 220,000 luminosity-variable objects (variable stars and active galactic nuclei across 7500 deg2 selected from a combination of SDSS and multi-epoch Pan-STARRS1 photometry. TDSS will be the largest spectroscopic survey to explicitly target variable objects, avoiding pre-selection on the basis of colors or detailed modeling of specific variability characteristics. Kernel Density Estimate analysis of our target population performed on SDSS Stripe 82 data suggests our target sample will be 95% pure (meaning 95% of objects we select have genuine luminosity variability of a few magnitudes or more). Our final spectroscopic sample will contain roughly 135,000 quasars and 85,000 stellar variables, approximately 4000 of which will be RR Lyrae stars which may be used as outer Milky Way probes. The variability-selected quasar population has a smoother redshift distribution than a color-selected sample, and variability measurements similar to those we develop here may be used to make more uniform quasar samples in large surveys. The stellar variable targets are distributed fairly uniformly across color space, indicating that TDSS will obtain spectra for a wide variety of stellar variables including pulsating variables, stars with significant chromospheric activity, cataclysmic variables, and eclipsing binaries. TDSS will serve as a pathfinder mission to identify and characterize the multitude of variable objects that will be detected photometrically in even larger variability surveys such as Large Synoptic Survey Telescope.

  13. Research on Some Questions About Selection of Independent Variables

    Institute of Scientific and Technical Information of China (English)

    TAO Jing-xuan

    2002-01-01

    The paper studies four methods about selection of independent variables in multivariate analysis. In general condition, advanced statistical method and backward statistical method could not obtain the best subset of independent variables. It is possibly affected by the orders of variables or associations among variables. When multicollinearity is presented in a set of explanatory variables-abnormal state, it is not effective to use the method, although stepwise regression and optimum selecting method of total subsets is widely used.According to this case, the paper proposes a new method which combines deleting variables with ingredient analysis and is used in research and science practically.The important characteristic of this paper is that it gives some examples to support each conclusion.

  14. Collinear Latent Variables in Multilevel Confirmatory Factor Analysis: A Comparison of Maximum Likelihood and Bayesian Estimations

    Science.gov (United States)

    Can, Seda; van de Schoot, Rens; Hox, Joop

    2015-01-01

    Because variables may be correlated in the social and behavioral sciences, multicollinearity might be problematic. This study investigates the effect of collinearity manipulated in within and between levels of a two-level confirmatory factor analysis by Monte Carlo simulation. Furthermore, the influence of the size of the intraclass correlation…

  15. Collinear Latent Variables in Multilevel Confirmatory Factor Analysis: A Comparison of Maximum Likelihood and Bayesian Estimations

    NARCIS (Netherlands)

    Can, Seda; van de Schoot, Rens; Hox, Joop

    2014-01-01

    Because variables may be correlated in the social and behavioral sciences, multicollinearity might be problematic. This study investigates the effect of collinearity manipulated in within and between levels of a two-level confirmatory factor analysis by Monte Carlo simulation. Furthermore, the influ

  16. Automatised selection of load paths to construct reduced-order models in computational damage micromechanics: from dissipation-driven random selection to Bayesian optimization

    Science.gov (United States)

    Goury, Olivier; Amsallem, David; Bordas, Stéphane Pierre Alain; Liu, Wing Kam; Kerfriden, Pierre

    2016-08-01

    In this paper, we present new reliable model order reduction strategies for computational micromechanics. The difficulties rely mainly upon the high dimensionality of the parameter space represented by any load path applied onto the representative volume element. We take special care of the challenge of selecting an exhaustive snapshot set. This is treated by first using a random sampling of energy dissipating load paths and then in a more advanced way using Bayesian optimization associated with an interlocked division of the parameter space. Results show that we can insure the selection of an exhaustive snapshot set from which a reliable reduced-order model can be built.

  17. Linking bovine tuberculosis on cattle farms to white-tailed deer and environmental variables using Bayesian hierarchical analysis

    Science.gov (United States)

    Walter, William D.; Smith, Rick; Vanderklok, Mike; VerCauterren, Kurt C.

    2014-01-01

    Bovine tuberculosis is a bacterial disease caused by Mycobacterium bovis in livestock and wildlife with hosts that include Eurasian badgers (Meles meles), brushtail possum (Trichosurus vulpecula), and white-tailed deer (Odocoileus virginianus). Risk-assessment efforts in Michigan have been initiated on farms to minimize interactions of cattle with wildlife hosts but research onM. bovis on cattle farms has not investigated the spatial context of disease epidemiology. To incorporate spatially explicit data, initial likelihood of infection probabilities for cattle farms tested for M. bovis, prevalence of M. bovis in white-tailed deer, deer density, and environmental variables for each farm were modeled in a Bayesian hierarchical framework. We used geo-referenced locations of 762 cattle farms that have been tested for M. bovis, white-tailed deer prevalence, and several environmental variables that may lead to long-term survival and viability of M. bovis on farms and surrounding habitats (i.e., soil type, habitat type). Bayesian hierarchical analyses identified deer prevalence and proportion of sandy soil within our sampling grid as the most supported model. Analysis of cattle farms tested for M. bovisidentified that for every 1% increase in sandy soil resulted in an increase in odds of infection by 4%. Our analysis revealed that the influence of prevalence of M. bovis in white-tailed deer was still a concern even after considerable efforts to prevent cattle interactions with white-tailed deer through on-farm mitigation and reduction in the deer population. Cattle farms test positive for M. bovis annually in our study area suggesting that the potential for an environmental source either on farms or in the surrounding landscape may contributing to new or re-infections with M. bovis. Our research provides an initial assessment of potential environmental factors that could be incorporated into additional modeling efforts as more knowledge of deer herd

  18. THE IDENTIFICATION OF INFLATION RATE DETERMINANTS IN THE USA USING THE STOCHASTIC SEARCH VARIABLE SELECTION

    Directory of Open Access Journals (Sweden)

    Mihaela SIMIONESCU

    2016-03-01

    Full Text Available Inflation rate determinants for the USA have been analyzed in this study starting with 2008, when the American economy was already in crisis. This research brings, as a novelty, the use of Bayesian Econometrics methods to identify the monthly inflation rate in the USA. The Stochastic Search Variable Selection (SSVS has been applied for a subjective probability acceptance of 0.3. The results are validated also by economic theory. The monthly inflation rate was influenced during 2008-2015 by: the unemployment rate, the exchange rate, crude oil prices, the trade weighted U.S. Dollar Index and the M2 Money Stock. The study might be continued by considering other potential determinants of the inflation rate.

  19. A Bayesian approach to study the risk variables for tuberculosis occurrence in domestic and wild ungulates in South Central Spain

    Directory of Open Access Journals (Sweden)

    Rodríguez-Prieto Víctor

    2012-08-01

    Full Text Available Abstract Background Bovine tuberculosis (bTB is a chronic infectious disease mainly caused by Mycobacterium bovis. Although eradication is a priority for the European authorities, bTB remains active or even increasing in many countries, causing significant economic losses. The integral consideration of epidemiological factors is crucial to more cost-effectively allocate control measures. The aim of this study was to identify the nature and extent of the association between TB distribution and a list of potential risk factors regarding cattle, wild ungulates and environmental aspects in Ciudad Real, a Spanish province with one of the highest TB herd prevalences. Results We used a Bayesian mixed effects multivariable logistic regression model to predict TB occurrence in either domestic or wild mammals per municipality in 2007 by using information from the previous year. The municipal TB distribution and endemicity was clustered in the western part of the region and clearly overlapped with the explanatory variables identified in the final model: (1 incident cattle farms, (2 number of years of veterinary inspection of big game hunting events, (3 prevalence in wild boar, (4 number of sampled cattle, (5 persistent bTB-infected cattle farms, (6 prevalence in red deer, (7 proportion of beef farms, and (8 farms devoted to bullfighting cattle. Conclusions The combination of these eight variables in the final model highlights the importance of the persistence of the infection in the hosts, surveillance efforts and some cattle management choices in the circulation of M. bovis in the region. The spatial distribution of these variables, together with particular Mediterranean features that favour the wildlife-livestock interface may explain the M. bovis persistence in this region. Sanitary authorities should allocate efforts towards specific areas and epidemiological situations where the wildlife-livestock interface seems to critically hamper the definitive b

  20. Bayesian methods for meta-analysis of causal relationships estimated using genetic instrumental variables

    DEFF Research Database (Denmark)

    Burgess, Stephen; Thompson, Simon G; Andrews, G;

    2010-01-01

    Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context...... an overall estimate of the causal relationship between the phenotype and the outcome, and an assessment of its heterogeneity across studies. As an example, we estimate the causal relationship of blood concentrations of C-reactive protein on fibrinogen levels using data from 11 studies. These methods provide...... a flexible framework for efficient estimation of causal relationships derived from multiple studies. Issues discussed include weak instrument bias, analysis of binary outcome data such as disease risk, missing genetic data, and the use of haplotypes....

  1. Variable selection and estimation for longitudinal survey data

    KAUST Repository

    Wang, Li

    2014-09-01

    There is wide interest in studying longitudinal surveys where sample subjects are observed successively over time. Longitudinal surveys have been used in many areas today, for example, in the health and social sciences, to explore relationships or to identify significant variables in regression settings. This paper develops a general strategy for the model selection problem in longitudinal sample surveys. A survey weighted penalized estimating equation approach is proposed to select significant variables and estimate the coefficients simultaneously. The proposed estimators are design consistent and perform as well as the oracle procedure when the correct submodel was known. The estimating function bootstrap is applied to obtain the standard errors of the estimated parameters with good accuracy. A fast and efficient variable selection algorithm is developed to identify significant variables for complex longitudinal survey data. Simulated examples are illustrated to show the usefulness of the proposed methodology under various model settings and sampling designs. © 2014 Elsevier Inc.

  2. Optimal speech motor control and token-to-token variability: a Bayesian modeling approach

    OpenAIRE

    Patri, Jean-François; Diard, Julien; Perrier, Pascal

    2015-01-01

    The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the Central Nervous System selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of...

  3. A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration.

    Science.gov (United States)

    Yun, Yong-Huan; Wang, Wei-Ting; Tan, Min-Li; Liang, Yi-Zeng; Li, Hong-Dong; Cao, Dong-Sheng; Lu, Hong-Mei; Xu, Qing-Song

    2014-01-01

    Nowadays, with a high dimensionality of dataset, it faces a great challenge in the creation of effective methods which can select an optimal variables subset. In this study, a strategy that considers the possible interaction effect among variables through random combinations was proposed, called iteratively retaining informative variables (IRIV). Moreover, the variables are classified into four categories as strongly informative, weakly informative, uninformative and interfering variables. On this basis, IRIV retains both the strongly and weakly informative variables in every iterative round until no uninformative and interfering variables exist. Three datasets were employed to investigate the performance of IRIV coupled with partial least squares (PLS). The results show that IRIV is a good alternative for variable selection strategy when compared with three outstanding and frequently used variable selection methods such as genetic algorithm-PLS, Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS) and competitive adaptive reweighted sampling (CARS). The MATLAB source code of IRIV can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.

  4. Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure

    Directory of Open Access Journals (Sweden)

    Mabaso Musawenkosi LH

    2007-09-01

    Full Text Available Abstract Background Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana. Results Of 50 potential explanatory variables from eight environmental data themes, 42 were significantly associated with malaria prevalence in univariate logistic regression and were ranked by the Akaike Information Criterion. Those correlated with higher-ranking relatives of the same environmental theme, were temporarily excluded. The remaining 14 candidates were ranked by selection frequency after running automated step-wise selection procedures on 1000 bootstrap samples drawn from the data. A non-spatial multiple-variable model was developed through step-wise inclusion in order of selection frequency. Previously excluded variables were then re-evaluated for inclusion, using further step-wise bootstrap procedures, resulting in the exclusion of another variable. Finally a Bayesian geo-statistical model using Markov Chain Monte Carlo simulation was fitted to the data, resulting in a final model of three predictor variables, namely summer rainfall, mean annual temperature and altitude. Each was independently and significantly associated with malaria prevalence after allowing for spatial correlation. This model was used to predict malaria prevalence at unobserved locations, producing a smooth risk map for the whole country. Conclusion We have

  5. Supersaturated plans for variable selection in large databases

    Directory of Open Access Journals (Sweden)

    Christina Parpoula

    2014-05-01

    Full Text Available Over the last decades, the collection and storage of data has become massive with the advance of technology and variable selection has become a fundamental tool to large dimensional statistical modelling problems. In this study we implement data mining techniques, metaheuristics and use experimental designs in databases in order to determine the most relevant variables for classification in regression problems in cases where observations and labels of a large database are available. We propose a database-driven scheme for the encryption of specific fields of a database in order to select an optimal supersaturated design consisting of the variables of a large database which have been found to influence significantly the response outcome. The proposed design selection approach is quite promising, since we are able to retrieve an optimal supersaturated plan using a very small percentage of the available runs, a fact that makes the statistical analysis of a large database computationally feasible and affordable.

  6. Variable Selection in the Partially Linear Errors-in-Variables Models for Longitudinal Data

    Institute of Scientific and Technical Information of China (English)

    Yi-ping YANG; Liu-gen XUE; Wei-hu CHENG

    2012-01-01

    This paper proposes a new approach for variable selection in partially linear errors-in-variables (EV) models for longitudinal data by penalizing appropriate estimating functions.We apply the SCAD penalty to simultaneously select significant variables and estimate unknown parameters.The rate of convergence and the asymptotic normality of the resulting estimators are established.Furthermore,with proper choice of regularization parameters,we show that the proposed estimators perform as well as the oracle procedure.A new algorithm is proposed for solving penalized estimating equation.The asymptotic results are augmented by a simulation study.

  7. Bayesian model selection framework for identifying growth patterns in filamentous fungi.

    Science.gov (United States)

    Lin, Xiao; Terejanu, Gabriel; Shrestha, Sajan; Banerjee, Sourav; Chanda, Anindya

    2016-06-01

    This paper describes a rigorous methodology for quantification of model errors in fungal growth models. This is essential to choose the model that best describes the data and guide modeling efforts. Mathematical modeling of growth of filamentous fungi is necessary in fungal biology for gaining systems level understanding on hyphal and colony behaviors in different environments. A critical challenge in the development of these mathematical models arises from the indeterminate nature of their colony architecture, which is a result of processing diverse intracellular signals induced in response to a heterogeneous set of physical and nutritional factors. There exists a practical gap in connecting fungal growth models with measurement data. Here, we address this gap by introducing the first unified computational framework based on Bayesian inference that can quantify individual model errors and rank the statistical models based on their descriptive power against data. We show that this Bayesian model comparison is just a natural formalization of Occam׳s razor. The application of this framework is discussed in comparing three models in the context of synthetic data generated from a known true fungal growth model. This framework of model comparison achieves a trade-off between data fitness and model complexity and the quantified model error not only helps in calibrating and comparing the models, but also in making better predictions and guiding model refinements. PMID:27000772

  8. Sparse covariance thresholding for high-dimensional variable selection

    OpenAIRE

    Daye, X. Jessie Jeng And Z. John

    2010-01-01

    In high-dimensions, many variable selection methods, such as the lasso, are often limited by excessive variability and rank deficiency of the sample covariance matrix. Covariance sparsity is a natural phenomenon in high-dimensional applications, such as microarray analysis, image processing, etc., in which a large number of predictors are independent or weakly correlated. In this paper, we propose the covariance-thresholded lasso, a new class of regression methods that can utilize covariance ...

  9. Noncausal Bayesian Vector Autoregression

    DEFF Research Database (Denmark)

    Lanne, Markku; Luoto, Jani

    We propose a Bayesian inferential procedure for the noncausal vector autoregressive (VAR) model that is capable of capturing nonlinearities and incorporating effects of missing variables. In particular, we devise a fast and reliable posterior simulator that yields the predictive distribution...

  10. Boosting model performance and interpretation by entangling preprocessing selection and variable selection.

    Science.gov (United States)

    Gerretzen, Jan; Szymańska, Ewa; Bart, Jacob; Davies, Antony N; van Manen, Henk-Jan; van den Heuvel, Edwin R; Jansen, Jeroen J; Buydens, Lutgarde M C

    2016-09-28

    The aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS, …) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of

  11. Boosting model performance and interpretation by entangling preprocessing selection and variable selection.

    Science.gov (United States)

    Gerretzen, Jan; Szymańska, Ewa; Bart, Jacob; Davies, Antony N; van Manen, Henk-Jan; van den Heuvel, Edwin R; Jansen, Jeroen J; Buydens, Lutgarde M C

    2016-09-28

    The aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS, …) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of

  12. A New Statistic for Variable Selection in Questionnaire Analysis

    Institute of Scientific and Technical Information of China (English)

    ZHANG Jun-hua; FANG Wei-wu

    2001-01-01

    In this paper, a new statistic is proposed for variable selection which is one of the important problems in analysis of questionnaire data. Contrasting to other methods, the approach introduced here can be used not only for two groups of samples but can also be easily generalized to the multi-group case.

  13. Robust Bayesian Fluorescence Lifetime Estimation, Decay Model Selection and Instrument Response Determination for Low-Intensity FLIM Imaging.

    Directory of Open Access Journals (Sweden)

    Mark I Rowley

    Full Text Available We present novel Bayesian methods for the analysis of exponential decay data that exploit the evidence carried by every detected decay event and enables robust extension to advanced processing. Our algorithms are presented in the context of fluorescence lifetime imaging microscopy (FLIM and particular attention has been paid to model the time-domain system (based on time-correlated single photon counting with unprecedented accuracy. We present estimates of decay parameters for mono- and bi-exponential systems, offering up to a factor of two improvement in accuracy compared to previous popular techniques. Results of the analysis of synthetic and experimental data are presented, and areas where the superior precision of our techniques can be exploited in Förster Resonance Energy Transfer (FRET experiments are described. Furthermore, we demonstrate two advanced processing methods: decay model selection to choose between differing models such as mono- and bi-exponential, and the simultaneous estimation of instrument and decay parameters.

  14. Action selection performance of a reconfigurable Basal Ganglia inspired model with Hebbian-Bayesian Go-NoGo connectivity

    Directory of Open Access Journals (Sweden)

    Pierre eBerthet

    2012-10-01

    Full Text Available Several studies have shown a strong involvement of the basal ganglia (BG in action selection and dopamine dependent learning. The dopaminergic signal to striatum, the input stage of the BG, has been commonly described as coding a reward prediction error (RPE, i.e. the difference between the predicted and actual reward. The RPE has been hypothesized to be critical in the modulation of the synaptic plasticity in cortico-striatal synapses in the direct and indirect pathway. We developed an abstract computational model of the BG, with a dual pathway structure functionally corresponding to the direct and indirect pathways, and compared its behaviour to biological data as well as other reinforcement learning models. The computations in our model are inspired by Bayesian inference, and the synaptic plasticity changes depend on a three factor Hebbian-Bayesian learning rule based on co-activation of pre- and post-synaptic units and on the value of the RPE. The model builds on a modified Actor-Critic architecture and implements the direct (Go and the indirect (NoGo pathway, as well as the reward prediction (RP system, acting in a complementary fashion. We investigated the performance of the model system when different configurations of the Go, NoGo and RP system were utilized, e.g. using only the Go, NoGo, or RP system, or combinations of those. Learning performance was investigated in several types of learning paradigms, such as learning-relearning, successive learning, stochastic learning, reversal learning and a two-choice task. The RPE and the activity of the model during learning were similar to monkey electrophysiological and behavioural data. Our results, however, show that there is not a unique best way to configure this BG model to handle well all the learning paradigms tested. We thus suggest that an agent might dynamically configure its action selection mode, possibly depending on task characteristics and also on how much time is available.

  15. Action selection performance of a reconfigurable basal ganglia inspired model with Hebbian-Bayesian Go-NoGo connectivity.

    Science.gov (United States)

    Berthet, Pierre; Hellgren-Kotaleski, Jeanette; Lansner, Anders

    2012-01-01

    Several studies have shown a strong involvement of the basal ganglia (BG) in action selection and dopamine dependent learning. The dopaminergic signal to striatum, the input stage of the BG, has been commonly described as coding a reward prediction error (RPE), i.e., the difference between the predicted and actual reward. The RPE has been hypothesized to be critical in the modulation of the synaptic plasticity in cortico-striatal synapses in the direct and indirect pathway. We developed an abstract computational model of the BG, with a dual pathway structure functionally corresponding to the direct and indirect pathways, and compared its behavior to biological data as well as other reinforcement learning models. The computations in our model are inspired by Bayesian inference, and the synaptic plasticity changes depend on a three factor Hebbian-Bayesian learning rule based on co-activation of pre- and post-synaptic units and on the value of the RPE. The model builds on a modified Actor-Critic architecture and implements the direct (Go) and the indirect (NoGo) pathway, as well as the reward prediction (RP) system, acting in a complementary fashion. We investigated the performance of the model system when different configurations of the Go, NoGo, and RP system were utilized, e.g., using only the Go, NoGo, or RP system, or combinations of those. Learning performance was investigated in several types of learning paradigms, such as learning-relearning, successive learning, stochastic learning, reversal learning and a two-choice task. The RPE and the activity of the model during learning were similar to monkey electrophysiological and behavioral data. Our results, however, show that there is not a unique best way to configure this BG model to handle well all the learning paradigms tested. We thus suggest that an agent might dynamically configure its action selection mode, possibly depending on task characteristics and also on how much time is available. PMID:23060764

  16. DETERMINANTS METHOD OF EXPLANATORY VARIABLES SET SELECTION TO LINEAR MODEL

    Directory of Open Access Journals (Sweden)

    Witold Rzymowski

    2014-09-01

    Full Text Available The determinants method of explanatory variables set selection to the linear model is shown in this article. This method is very useful to find such a set of variables which satisfy small relative error of the linear model as well as small relative error of parameters estimation of this model. Knowledge of the values of the parameters of this model is not necessary. An example of the use of the determinants method for world’s population model is also shown in this article. This method was tested for 224 – 1 models for a set of 23 potential explanatory variables. 5 world’s population models with one, two, three, four and five explanatory variables were chosen and analysed.

  17. CHARACTERIZING THE OPTICAL VARIABILITY OF BRIGHT BLAZARS: VARIABILITY-BASED SELECTION OF FERMI ACTIVE GALACTIC NUCLEI

    International Nuclear Information System (INIS)

    We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the ∼30% of γ-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the optical LINEAR asteroid survey, we characterize the optical variability of blazars by fitting a damped random walk model to individual light curves with two main model parameters, the characteristic timescales of variability τ, and driving amplitudes on short timescales σ-circumflex. Imposing cuts on minimum τ and σ-circumflex allows for blazar selection with high efficiency E and completeness C. To test the efficacy of this approach, we apply this method to optically variable LINEAR objects that fall within the several-arcminute error ellipses of γ-ray sources in the Fermi 2FGL catalog. Despite the extreme stellar contamination at the shallow depth of the LINEAR survey, we are able to recover previously associated optical counterparts to Fermi active galactic nuclei with E ≥ 88% and C = 88% in Fermi 95% confidence error ellipses having semimajor axis r < 8'. We find that the suggested radio counterpart to Fermi source 2FGL J1649.6+5238 has optical variability consistent with other γ-ray blazars and is likely to be the γ-ray source. Our results suggest that the variability of the non-thermal jet emission in blazars is stochastic in nature, with unique variability properties due to the effects of relativistic beaming. After correcting for beaming, we estimate that the characteristic timescale of blazar variability is ∼3 years in the rest frame of the jet, in contrast with the ∼320 day disk flux timescale observed in quasars. The variability-based selection method presented will be useful for blazar identification in time-domain optical surveys and is also a probe of jet physics.

  18. Portfolio Selection Based on Distance between Fuzzy Variables

    Directory of Open Access Journals (Sweden)

    Weiyi Qian

    2014-01-01

    Full Text Available This paper researches portfolio selection problem in fuzzy environment. We introduce a new simple method in which the distance between fuzzy variables is used to measure the divergence of fuzzy investment return from a prior one. Firstly, two new mathematical models are proposed by expressing divergence as distance, investment return as expected value, and risk as variance and semivariance, respectively. Secondly, the crisp forms of the new models are also provided for different types of fuzzy variables. Finally, several numerical examples are given to illustrate the effectiveness of the proposed approach.

  19. Effect of Instructions on Selected Jump Squat Variables.

    Science.gov (United States)

    Talpey, Scott W; Young, Warren B; Beseler, Bradley

    2016-09-01

    Talpey, SW, Young, WB, and Beseler, B. Effect of instructions on selected jump squat variables. J Strength Cond Res 30(9): 2508-2513, 2016-The purpose of this study was to compare 2 instructions on the performance of selected variables in a jump squat (JS) exercise. The second purpose was to determine the relationships between JS variables and sprint performance. Eighteen male subjects with resistance training experience performed 2 sets of 4 JS with no extra load with the instructions to concentrate on (a) jumping for maximum height and (b) extending the legs as fast as possible to maximize explosive force. Sprint performance was assessed at 0- to 10-m and 10- to 20-m distances. From the JS jump height, peak power, relative peak power, peak force, peak velocity, and countermovement distance were measured from a force platform and position transducer system. The JS variables under the 2 instructions were compared with paired t-tests, and the relationships between these variables and sprint performance were determined with Pearson's correlations. The jump height instruction produced greater mean jump height and peak velocity (p 0.05). Jump height was the variable that correlated most strongly with 10-m time and 10- to 20-m time under both instructions. The height instruction produced a stronger correlation with 10-m time (r = -0.455), but the fast leg extension JS produced a greater correlation with 10-20 time (r = -0.545). The results indicate that instructions have a meaningful influence on JS variables and therefore need to be taken into consideration when assessing or training athletes.

  20. Variable selection in covariate dependent random partition models: an application to urinary tract infection.

    Science.gov (United States)

    Barcella, William; Iorio, Maria De; Baio, Gianluca; Malone-Lee, James

    2016-04-15

    Lower urinary tract symptoms can indicate the presence of urinary tract infection (UTI), a condition that if it becomes chronic requires expensive and time consuming care as well as leading to reduced quality of life. Detecting the presence and gravity of an infection from the earliest symptoms is then highly valuable. Typically, white blood cell (WBC) count measured in a sample of urine is used to assess UTI. We consider clinical data from 1341 patients in their first visit in which UTI (i.e. WBC ≥ 1) is diagnosed. In addition, for each patient, a clinical profile of 34 symptoms was recorded. In this paper, we propose a Bayesian nonparametric regression model based on the Dirichlet process prior aimed at providing the clinicians with a meaningful clustering of the patients based on both the WBC (response variable) and possible patterns within the symptoms profiles (covariates). This is achieved by assuming a probability model for the symptoms as well as for the response variable. To identify the symptoms most associated to UTI, we specify a spike and slab base measure for the regression coefficients: this induces dependence of symptoms selection on cluster assignment. Posterior inference is performed through Markov Chain Monte Carlo methods. PMID:26536840

  1. ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER'S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS

    Directory of Open Access Journals (Sweden)

    Henry de-Graft Acquah

    2013-03-01

    Full Text Available Alternative formulations of the Bayesian Information Criteria provide a basis for choosing between competing methods for detecting price asymmetry. However, very little is understood about their performance in the asymmetric price transmission modelling framework. In addressing this issue, this paper introduces and applies parametric bootstrap techniques to evaluate the ability of Bayesian Information Criteria (BIC and Draper's Information Criteria (DIC in discriminating between alternative asymmetric price transmission models under various error and sample size conditions. The results of the bootstrap simulations indicate that model selection performance depends on bootstrap sample size and the amount of noise in the data generating process. The Bayesian criterion clearly identifies the true asymmetric model out of different competing models in the presence of bootstrap samples. Draper's Information Criteria (DIC; Draper, 1995 outperforms BIC at either larger bootstrap sample size or lower noise level.

  2. A Bayesian Optimisation Algorithm for the Nurse Scheduling Problem

    CERN Document Server

    Jingpeng, Li

    2008-01-01

    A Bayesian optimization algorithm for the nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurses assignment. Unlike our previous work that used Gas to implement implicit learning, the learning in the proposed algorithm is explicit, ie. Eventually, we will be able to identify and mix building blocks directly. The Bayesian optimization algorithm is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated, ie in our case, a new rule string has been obtained. Another set of rule strings will be generated in this way, some of which will replace previous strings based on fitness selection. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again usin...

  3. Variable Selection for Marginal Longitudinal Generalized Linear Models

    OpenAIRE

    Eva Cantoni; Joanna Mills Flemming; Elvezio Ronchetti

    2003-01-01

    Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this paper we propose a generalized version of Mallows's Cp (GCp) suitable for use with both parametric and nonparametric models. GCp provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in ...

  4. Multi-scale inference of interaction rules in animal groups using Bayesian model selection.

    Directory of Open Access Journals (Sweden)

    Richard P Mann

    2012-01-01

    Full Text Available Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis. We show that these exhibit a stereotypical 'phase transition', whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have 'memory' of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture fine scale rules of interaction, which are primarily mediated by physical contact. Conversely, the Markovian self-propelled particle model captures the fine scale rules of interaction but fails to reproduce global dynamics. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics. We conclude that prawns' movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects.

  5. Multi-scale inference of interaction rules in animal groups using Bayesian model selection.

    Directory of Open Access Journals (Sweden)

    Richard P Mann

    Full Text Available Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis. We show that these exhibit a stereotypical 'phase transition', whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have 'memory' of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture the observed locality of interactions. Traditional self-propelled particle models fail to capture the fine scale dynamics of the system. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics, while maintaining a biologically plausible perceptual range. We conclude that prawns' movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects.

  6. Bayesian Probability Theory

    Science.gov (United States)

    von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo

    2014-06-01

    Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.

  7. Bayesian Variable Selection to identify QTL affecting a simulated quantitative trait

    NARCIS (Netherlands)

    Schurink, A.; Janss, L.L.G.; Heuven, H.C.M.

    2012-01-01

    Background Recent developments in genetic technology and methodology enable accurate detection of QTL and estimation of breeding values, even in individuals without phenotypes. The QTL-MAS workshop offers the opportunity to test different methods to perform a genome-wide association study on simulat

  8. Bayesian Variable Selection to identify QTL affecting a simulated quantitative trait

    NARCIS (Netherlands)

    Schurink, A.; Janss, L.L.G.; Heuven, H.C.M.

    2012-01-01

    Abstract Background: Recent developments in genetic technology and methodology enable accurate detection of QTL and estimation of breeding values, even in individuals without phenotypes. The QTL-MAS workshop offers the opportunity to test different methods to perform a genome-wide association study

  9. How to avoid mismodelling in GLM-based fMRI data analysis: cross-validated Bayesian model selection.

    Science.gov (United States)

    Soch, Joram; Haynes, John-Dylan; Allefeld, Carsten

    2016-11-01

    Voxel-wise general linear models (GLMs) are a standard approach for analyzing functional magnetic resonance imaging (fMRI) data. An advantage of GLMs is that they are flexible and can be adapted to the requirements of many different data sets. However, the specification of first-level GLMs leaves the researcher with many degrees of freedom which is problematic given recent efforts to ensure robust and reproducible fMRI data analysis. Formal model comparisons that allow a systematic assessment of GLMs are only rarely performed. On the one hand, too simple models may underfit data and leave real effects undiscovered. On the other hand, too complex models might overfit data and also reduce statistical power. Here we present a systematic approach termed cross-validated Bayesian model selection (cvBMS) that allows to decide which GLM best describes a given fMRI data set. Importantly, our approach allows for non-nested model comparison, i.e. comparing more than two models that do not just differ by adding one or more regressors. It also allows for spatially heterogeneous modelling, i.e. using different models for different parts of the brain. We validate our method using simulated data and demonstrate potential applications to empirical data. The increased use of model comparison and model selection should increase the reliability of GLM results and reproducibility of fMRI studies. PMID:27477536

  10. Embryologic changes in rabbit lines selected for litter size variability.

    Science.gov (United States)

    García, M L; Blasco, A; Argente, M J

    2016-09-15

    A divergent selection experiment on litter size variability was carried out. Correlated response in early embryo survival, embryonic development, size of embryos, and size of embryonic coats after four generations of selection was estimated. A total of 429 embryos from 51 high-line females and 648 embryos from 80 low-line females were used in the experiment. The traits studied were percentage of normal embryos, embryo diameter, zona pellucida thickness, and mucin coat thickness. Traits were measured at 24, 48, and 72 hours postcoitum (hpc); mucin coat thickness was only measured at 48 and 72 hpc. The embryos were classified as zygotes or two-cell embryos at 24 hpc; 16-cell embryos or early morulae at 48 hpc; and early morulae, compacted morulae, or blastocyst at 72 hpc. At 24 hpc, the percentage of normal embryos in the high line was lower than in the low line (-2.5%), and embryos in the high line showed 10% higher zona pellucida thickness than those of the low line. No differences in percentage of zygotes or two-cell embryos were found. At 48 hpc, the high-line embryos were less developed, with a higher percentage of 16-cell embryos (23.4%) and a lower percentage of early morulae (-23.4%). At 72 hpc, high-line embryos continued to be less developed, showing higher percentages of early morulae and compact morulae and lower percentages of blastocyst (-1.8%). No differences in embryo diameter or mucin coat thickness were found at any time. In conclusion, selection for litter size variability has consequences on early embryonic survival and development, with embryos presenting a lower state of development and a lower percentage of normal embryos in the line selected for higher variability. PMID:27207473

  11. A novel Bayesian approach to quantify clinical variables and to determine their spectroscopic counterparts in 1H NMR metabonomic data

    Directory of Open Access Journals (Sweden)

    Kaski Kimmo

    2007-05-01

    Full Text Available Abstract Background A key challenge in metabonomics is to uncover quantitative associations between multidimensional spectroscopic data and biochemical measures used for disease risk assessment and diagnostics. Here we focus on clinically relevant estimation of lipoprotein lipids by 1H NMR spectroscopy of serum. Results A Bayesian methodology, with a biochemical motivation, is presented for a real 1H NMR metabonomics data set of 75 serum samples. Lipoprotein lipid concentrations were independently obtained for these samples via ultracentrifugation and specific biochemical assays. The Bayesian models were constructed by Markov chain Monte Carlo (MCMC and they showed remarkably good quantitative performance, the predictive R-values being 0.985 for the very low density lipoprotein triglycerides (VLDL-TG, 0.787 for the intermediate, 0.943 for the low, and 0.933 for the high density lipoprotein cholesterol (IDL-C, LDL-C and HDL-C, respectively. The modelling produced a kernel-based reformulation of the data, the parameters of which coincided with the well-known biochemical characteristics of the 1H NMR spectra; particularly for VLDL-TG and HDL-C the Bayesian methodology was able to clearly identify the most characteristic resonances within the heavily overlapping information in the spectra. For IDL-C and LDL-C the resulting model kernels were more complex than those for VLDL-TG and HDL-C, probably reflecting the severe overlap of the IDL and LDL resonances in the 1H NMR spectra. Conclusion The systematic use of Bayesian MCMC analysis is computationally demanding. Nevertheless, the combination of high-quality quantification and the biochemical rationale of the resulting models is expected to be useful in the field of metabonomics.

  12. Eigenvector Subset Selection Using Bayesian Optimization Algorithm%基于贝叶斯优化算法的脸面特征向量子集选择

    Institute of Scientific and Technical Information of China (English)

    郭卫锋; 林亚平; 罗光平

    2002-01-01

    Eigenvector subset selection is the key to face recognition. In this paper ,we propose ESS-BOA, a newrandomized, population-based evolutionary algorithm which deals with the Eigenvector Subset Selection (ESS)prob-lem on face recognition application. In ESS-BOA ,the ESS problem, stated as a search problem ,uses the BayesianOptimization Algorithm (BOA) as searching engine and the distance degree as the object function to select eigenvec-tor. Experimental results show that ESS-BOA outperforms the traditional the eigenface selection algorithm.

  13. Isoenzymatic variability in tropical maize populations under reciprocal recurrent selection

    Directory of Open Access Journals (Sweden)

    Pinto Luciana Rossini

    2003-01-01

    Full Text Available Maize (Zea mays L. is one of the crops in which the genetic variability has been extensively studied at isoenzymatic loci. The genetic variability of the maize populations BR-105 and BR-106, and the synthetics IG-3 and IG-4, obtained after one cycle of a high-intensity reciprocal recurrent selection (RRS, was investigated at seven isoenzymatic loci. A total of twenty alleles were identified, and most of the private alleles were found in the BR-106 population. One cycle of reciprocal recurrent selection (RRS caused reductions of 12% in the number of alleles in both populations. Changes in allele frequencies were also observed between populations and synthetics, mainly for the Est 2 locus. Populations presented similar values for the number of alleles per locus, percentage of polymorphic loci, and observed and expected heterozygosities. A decrease of the genetic variation values was observed for the synthetics as a consequence of genetic drift effects and reduction of the effective population sizes. The distribution of the genetic diversity within and between populations revealed that most of the diversity was maintained within them, i.e. BR-105 x BR-106 (G ST = 3.5% and IG-3 x IG-4 (G ST = 4.0%. The genetic distances between populations and synthetics increased approximately 21%. An increase in the genetic divergence between the populations occurred without limiting new selection procedures.

  14. Robust nonlinear variable selective control for networked systems

    Science.gov (United States)

    Rahmani, Behrooz

    2016-10-01

    This paper is concerned with the networked control of a class of uncertain nonlinear systems. In this way, Takagi-Sugeno (T-S) fuzzy modelling is used to extend the previously proposed variable selective control (VSC) methodology to nonlinear systems. This extension is based upon the decomposition of the nonlinear system to a set of fuzzy-blended locally linearised subsystems and further application of the VSC methodology to each subsystem. To increase the applicability of the T-S approach for uncertain nonlinear networked control systems, this study considers the asynchronous premise variables in the plant and the controller, and then introduces a robust stability analysis and control synthesis. The resulting optimal switching-fuzzy controller provides a minimum guaranteed cost on an H2 performance index. Simulation studies on three nonlinear benchmark problems demonstrate the effectiveness of the proposed method.

  15. Estimation and variable selection for generalized additive partial linear models

    KAUST Repository

    Wang, Li

    2011-08-01

    We study generalized additive partial linear models, proposing the use of polynomial spline smoothing for estimation of nonparametric functions, and deriving quasi-likelihood based estimators for the linear parameters. We establish asymptotic normality for the estimators of the parametric components. The procedure avoids solving large systems of equations as in kernel-based procedures and thus results in gains in computational simplicity. We further develop a class of variable selection procedures for the linear parameters by employing a nonconcave penalized quasi-likelihood, which is shown to have an asymptotic oracle property. Monte Carlo simulations and an empirical example are presented for illustration. © Institute of Mathematical Statistics, 2011.

  16. Secondary eclipses in the CoRoT light curves: A homogeneous search based on Bayesian model selection

    CERN Document Server

    Parviainen, Hannu; Belmonte, Juan Antonio

    2012-01-01

    We aim to identify and characterize secondary eclipses in the original light curves of all published CoRoT planets using uniform detection and evaluation critetia. Our analysis is based on a Bayesian model selection between two competing models: one with and one without an eclipse signal. The search is carried out by mapping the Bayes factor in favor of the eclipse model as a function of the eclipse center time, after which the characterization of plausible eclipse candidates is done by estimating the posterior distributions of the eclipse model parameters using Markov Chain Monte Carlo. We discover statistically significant eclipse events for two planets, CoRoT-6b and CoRoT-11b, and for one brown dwarf, CoRoT-15b. We also find marginally significant eclipse events passing our plausibility criteria for CoRoT-3b, 13b, 18b, and 21b. The previously published CoRoT-1b and CoRoT-2b eclipses are also confirmed.

  17. Bayesian model selection for pathological neuroimaging data applied to white matter lesion segmentation.

    Science.gov (United States)

    Sudre, Carole H; Cardoso, M Jorge; Bouvy, Willem H; Biessels, Geert Jan; Barnes, Josephine; Ourselin, Sebastien

    2015-10-01

    In neuroimaging studies, pathologies can present themselves as abnormal intensity patterns. Thus, solutions for detecting abnormal intensities are currently under investigation. As each patient is unique, an unbiased and biologically plausible model of pathological data would have to be able to adapt to the subject's individual presentation. Such a model would provide the means for a better understanding of the underlying biological processes and improve one's ability to define pathologically meaningful imaging biomarkers. With this aim in mind, this work proposes a hierarchical fully unsupervised model selection framework for neuroimaging data which enables the distinction between different types of abnormal image patterns without pathological a priori knowledge. Its application on simulated and clinical data demonstrated the ability to detect abnormal intensity clusters, resulting in a competitive to improved behavior in white matter lesion segmentation when compared to three other freely-available automated methods. PMID:25850086

  18. A Bayesian functional data model for predicting forest variables using high-dimensional waveform LiDAR over large geographic domains

    Science.gov (United States)

    Finley, A. O.; Banerjee, S.; Cook, B. D.

    2010-12-01

    Recent advances in remote sensing, specifically waveform Light Detection and Ranging (LiDAR) sensors, provide the data needed to quantify forest variables at a fine spatial resolution over large domains. Of particular interest is LiDAR data from NASA's Laser Vegetation Imaging Sensor (LVIS), upcoming Deformation, Ecosystem Structure, and Dynamics of Ice (DESDynI) missions, and NSF's National Ecological Observatory Network planned Airborne Observation Platform. A central challenge to using these data is to couple field measurements of forest variables (e.g., species, indices of structural complexity, light competition, or drought stress) with the high-dimensional LiDAR signal through a model, which allows prediction of the tree-level variables at locations where only the remotely sensed data area are available. It is common to model the high-dimensional signal vector as a mixture of a relatively small number of Gaussian distributions. The parameters from these Gaussian distributions, or indices derived from the parameters, can then be used as regressors in a regression model. These approaches retain only a small amount of information contained in the signal. Further, it is not known a priori which features of the signal explain the most variability in the response variables. It is possible to fully exploit the information in the signal by treating it as an object, thus, we define a framework to couple a spatial latent factor model with forest variables using a fully Bayesian functional spatial data analysis. Our proposed modeling framework explicitly: 1) reduces the dimensionality of signals in an optimal way (i.e., preserves the information that describes the maximum variability in response variable); 2) propagates uncertainty in data and parameters through to prediction, and; 3) acknowledges and leverages spatial dependence among the regressors and model residuals to meet statistical assumptions and improve prediction. The proposed modeling framework is

  19. Evaluating experimental design for soil-plant model selection using a Bootstrap Filter and Bayesian model averaging

    Science.gov (United States)

    Wöhling, T.; Schöniger, A.; Geiges, A.; Nowak, W.; Gayler, S.

    2013-12-01

    The objective selection of appropriate models for realistic simulations of coupled soil-plant processes is a challenging task since the processes are complex, not fully understood at larger scales, and highly non-linear. Also, comprehensive data sets are scarce, and measurements are uncertain. In the past decades, a variety of different models have been developed that exhibit a wide range of complexity regarding their approximation of processes in the coupled model compartments. We present a method for evaluating experimental design for maximum confidence in the model selection task. The method considers uncertainty in parameters, measurements and model structures. Advancing the ideas behind Bayesian Model Averaging (BMA), we analyze the changes in posterior model weights and posterior model choice uncertainty when more data are made available. This allows assessing the power of different data types, data densities and data locations in identifying the best model structure from among a suite of plausible models. The models considered in this study are the crop models CERES, SUCROS, GECROS and SPASS, which are coupled to identical routines for simulating soil processes within the modelling framework Expert-N. The four models considerably differ in the degree of detail at which crop growth and root water uptake are represented. Monte-Carlo simulations were conducted for each of these models considering their uncertainty in soil hydraulic properties and selected crop model parameters. Using a Bootstrap Filter (BF), the models were then conditioned on field measurements of soil moisture, matric potential, leaf-area index, and evapotranspiration rates (from eddy-covariance measurements) during a vegetation period of winter wheat at a field site at the Swabian Alb in Southwestern Germany. Following our new method, we derived model weights when using all data or different subsets thereof. We discuss to which degree the posterior mean outperforms the prior mean and all

  20. Empirical Likelihood Based Variable Selection for Varying Coefficient Partially Linear Models with Censored Data

    Institute of Scientific and Technical Information of China (English)

    Peixin ZHAO

    2013-01-01

    In this paper,we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data.By constructing a penalized auxiliary vector ingeniously,we propose an empirical likelihood based variable selection procedure,and show that it is consistent and satisfies the sparsity.The simulation studies show that the proposed variable selection method is workable.

  1. Birth order and selected work-related personality variables.

    Science.gov (United States)

    Phillips, A S; Bedeian, A G; Mossholder, K W; Touliatos, J

    1988-12-01

    A possible link between birth order and various individual characteristics (e. g., intelligence, potential eminence, need for achievement, sociability) has been suggested by personality theorists such as Adler for over a century. The present study examines whether birth order is associated with selected personality variables that may be related to various work outcomes. 3 of 7 hypotheses were supported and the effect sizes for these were small. Firstborns scored significantly higher than later borns on measures of dominance, good impression, and achievement via conformity. No differences between firstborns and later borns were found in managerial potential, work orientation, achievement via independence, and sociability. The study's sample consisted of 835 public, government, and industrial accountants responding to a national US survey of accounting professionals. The nature of the sample may have been partially responsible for the results obtained. Its homogeneity may have caused any birth order effects to wash out. It can be argued that successful membership in the accountancy profession requires internalization of a set of prescribed rules and standards. It may be that accountants as a group are locked in to a behavioral framework. Any differentiation would result from spurious interpersonal differences, not from predictable birth-order related characteristics. A final interpretation is that birth order effects are nonexistent or statistical artifacts. Given the present data and particularistic sample, however, the authors have insufficient information from which to draw such a conclusion. PMID:12281942

  2. Birth order and selected work-related personality variables.

    Science.gov (United States)

    Phillips, A S; Bedeian, A G; Mossholder, K W; Touliatos, J

    1988-12-01

    A possible link between birth order and various individual characteristics (e. g., intelligence, potential eminence, need for achievement, sociability) has been suggested by personality theorists such as Adler for over a century. The present study examines whether birth order is associated with selected personality variables that may be related to various work outcomes. 3 of 7 hypotheses were supported and the effect sizes for these were small. Firstborns scored significantly higher than later borns on measures of dominance, good impression, and achievement via conformity. No differences between firstborns and later borns were found in managerial potential, work orientation, achievement via independence, and sociability. The study's sample consisted of 835 public, government, and industrial accountants responding to a national US survey of accounting professionals. The nature of the sample may have been partially responsible for the results obtained. Its homogeneity may have caused any birth order effects to wash out. It can be argued that successful membership in the accountancy profession requires internalization of a set of prescribed rules and standards. It may be that accountants as a group are locked in to a behavioral framework. Any differentiation would result from spurious interpersonal differences, not from predictable birth-order related characteristics. A final interpretation is that birth order effects are nonexistent or statistical artifacts. Given the present data and particularistic sample, however, the authors have insufficient information from which to draw such a conclusion.

  3. Variable Selection for Generalized Varying Coefficient Partially Linear Models with Diverging Number of Parameters

    Institute of Scientific and Technical Information of China (English)

    Zheng-yan Lin; Yu-ze Yuan

    2012-01-01

    Semiparametric models with diverging number of predictors arise in many contemporary scientific areas. Variable selection for these models consists of two components: model selection for non-parametric components and selection of significant variables for the parametric portion.In this paper,we consider a variable selection procedure by combining basis function approximation with SCAD penalty.The proposed procedure simultaneously selects significant variables in the parametric components and the nonparametric components.With appropriate selection of tuning parameters,we establish the consistency and sparseness of this procedure.

  4. Hybrid Batch Bayesian Optimization

    CERN Document Server

    Azimi, Javad; Fern, Xiaoli

    2012-01-01

    Bayesian Optimization aims at optimizing an unknown non-convex/concave function that is costly to evaluate. We are interested in application scenarios where concurrent function evaluations are possible. Under such a setting, BO could choose to either sequentially evaluate the function, one input at a time and wait for the output of the function before making the next selection, or evaluate the function at a batch of multiple inputs at once. These two different settings are commonly referred to as the sequential and batch settings of Bayesian Optimization. In general, the sequential setting leads to better optimization performance as each function evaluation is selected with more information, whereas the batch setting has an advantage in terms of the total experimental time (the number of iterations). In this work, our goal is to combine the strength of both settings. Specifically, we systematically analyze Bayesian optimization using Gaussian process as the posterior estimator and provide a hybrid algorithm t...

  5. Bayesian biostatistics

    CERN Document Server

    Lesaffre, Emmanuel

    2012-01-01

    The growth of biostatistics has been phenomenal in recent years and has been marked by considerable technical innovation in both methodology and computational practicality. One area that has experienced significant growth is Bayesian methods. The growing use of Bayesian methodology has taken place partly due to an increasing number of practitioners valuing the Bayesian paradigm as matching that of scientific discovery. In addition, computational advances have allowed for more complex models to be fitted routinely to realistic data sets. Through examples, exercises and a combination of introd

  6. Bayesian networks as a tool for epidemiological systems analysis

    Science.gov (United States)

    Lewis, F. I.

    2012-11-01

    Bayesian network analysis is a form of probabilistic modeling which derives from empirical data a directed acyclic graph (DAG) describing the dependency structure between random variables. Bayesian networks are increasingly finding application in areas such as computational and systems biology, and more recently in epidemiological analyses. The key distinction between standard empirical modeling approaches, such as generalised linear modeling, and Bayesian network analyses is that the latter attempts not only to identify statistically associated variables, but to additionally, and empirically, separate these into those directly and indirectly dependent with one or more outcome variables. Such discrimination is vastly more ambitious but has the potential to reveal far more about key features of complex disease systems. Applying Bayesian network modeling to biological and medical data has considerable computational demands, combined with the need to ensure robust model selection given the vast model space of possible DAGs. These challenges require the use of approximation techniques, such as the Laplace approximation, Markov chain Monte Carlo simulation and parametric bootstrapping, along with computational parallelization. A case study in structure discovery - identification of an optimal DAG for given data - is presented which uses additive Bayesian networks to explore veterinary disease data of industrial and medical relevance.

  7. Variable Selection for Varying-Coefficient Models with Missing Response at Random

    Institute of Scientific and Technical Information of China (English)

    Pei Xin ZHAO; Liu Gen XUE

    2011-01-01

    In this paper, we present a variable selection procedure by combining basis function approximations with penalized estimating equations for varying-coefficient models with missing response at random. With appropriate selection of the tuning parameters, we establish the consistency of the variable selection procedure and the optimal convergence rate of the regularized estimators. A simulation study is undertaken to assess the finite sample performance of the proposed variable selection procedure.

  8. Bayesian statistics

    OpenAIRE

    Draper, D.

    2001-01-01

    © 2012 Springer Science+Business Media, LLC. All rights reserved. Article Outline: Glossary Definition of the Subject and Introduction The Bayesian Statistical Paradigm Three Examples Comparison with the Frequentist Statistical Paradigm Future Directions Bibliography

  9. Estimation of Genetic Variance Components Including Mutation and Epistasis using Bayesian Approach in a Selection Experiment on Body Weight in Mice

    DEFF Research Database (Denmark)

    Widyas, Nuzul; Jensen, Just; Nielsen, Vivi Hunnicke

    selected downwards and three lines were kept as controls. Bayesian statistical methods are used to estimate the genetic variance components. Mixed model analysis is modified including mutation effect following the methods by Wray (1990). DIC was used to compare the model. Models including mutation effect...... have better fit compared to the model with only additive effect. Mutation as direct effect contributes 3.18% of the total phenotypic variance. While in the model with interactions between additive and mutation, it contributes 1.43% as direct effect and 1.36% as interaction effect of the total variance...

  10. Bayesian Analysis Made Simple An Excel GUI for WinBUGS

    CERN Document Server

    Woodward, Philip

    2011-01-01

    From simple NLMs to complex GLMMs, this book describes how to use the GUI for WinBUGS - BugsXLA - an Excel add-in written by the author that allows a range of Bayesian models to be easily specified. With case studies throughout, the text shows how to routinely apply even the more complex aspects of model specification, such as GLMMs, outlier robust models, random effects Emax models, auto-regressive errors, and Bayesian variable selection. It provides brief, up-to-date discussions of current issues in the practical application of Bayesian methods. The author also explains how to obtain free so

  11. Bayesian modeling using WinBUGS

    CERN Document Server

    Ntzoufras, Ioannis

    2009-01-01

    A hands-on introduction to the principles of Bayesian modeling using WinBUGS Bayesian Modeling Using WinBUGS provides an easily accessible introduction to the use of WinBUGS programming techniques in a variety of Bayesian modeling settings. The author provides an accessible treatment of the topic, offering readers a smooth introduction to the principles of Bayesian modeling with detailed guidance on the practical implementation of key principles. The book begins with a basic introduction to Bayesian inference and the WinBUGS software and goes on to cover key topics, including: Markov Chain Monte Carlo algorithms in Bayesian inference Generalized linear models Bayesian hierarchical models Predictive distribution and model checking Bayesian model and variable evaluation Computational notes and screen captures illustrate the use of both WinBUGS as well as R software to apply the discussed techniques. Exercises at the end of each chapter allow readers to test their understanding of the presented concepts and all ...

  12. Variable Selection for Semiparametric Varying-Coefficient Partially Linear Models with Missing Response at Random

    Institute of Scientific and Technical Information of China (English)

    Pei Xin ZHAO; Liu Gen XUE

    2011-01-01

    In this paper,we present a variable selection procedure by combining basis function approximations with penalized estimating equations for semiparametric varying-coefficient partially linear models with missing response at random.The proposed procedure simultaneously selects significant variables in parametric components and nonparametric components.With appropriate selection of the tuning parameters,we establish the consistency of the variable selection procedure and the convergence rate of the regularized estimators.A simulation study is undertaken to assess the finite sample performance of the proposed variable selection procedure.

  13. Efficient Latent Variable Graphical Model Selection via Split Bregman Method

    CERN Document Server

    Ye, Gui-Bo; Chen, Yifei; Xie, Xiaohui

    2011-01-01

    We consider the problem of covariance matrix estimation in the presence of latent variables. Under suitable conditions, it is possible to learn the marginal covariance matrix of the observed variables via a tractable convex program, where the concentration matrix of the observed variables is decomposed into a sparse matrix (representing the graphical structure of the observed variables) and a low rank matrix (representing the marginalization effect of latent variables). We present an efficient first-order method based on split Bregman to solve the convex problem. The algorithm is guaranteed to converge under mild conditions. We show that our algorithm is significantly faster than the state-of-the-art algorithm on both artificial and real-world data. Applying the algorithm to a gene expression data involving thousands of genes, we show that most of the correlation between observed variables can be explained by only a few dozen latent factors.

  14. Bayesian Analysis for Risk Assessment of Selected Medical Events in Support of the Integrated Medical Model Effort

    Science.gov (United States)

    Gilkey, Kelly M.; Myers, Jerry G.; McRae, Michael P.; Griffin, Elise A.; Kallrui, Aditya S.

    2012-01-01

    The Exploration Medical Capability project is creating a catalog of risk assessments using the Integrated Medical Model (IMM). The IMM is a software-based system intended to assist mission planners in preparing for spaceflight missions by helping them to make informed decisions about medical preparations and supplies needed for combating and treating various medical events using Probabilistic Risk Assessment. The objective is to use statistical analyses to inform the IMM decision tool with estimated probabilities of medical events occurring during an exploration mission. Because data regarding astronaut health are limited, Bayesian statistical analysis is used. Bayesian inference combines prior knowledge, such as data from the general U.S. population, the U.S. Submarine Force, or the analog astronaut population located at the NASA Johnson Space Center, with observed data for the medical condition of interest. The posterior results reflect the best evidence for specific medical events occurring in flight. Bayes theorem provides a formal mechanism for combining available observed data with data from similar studies to support the quantification process. The IMM team performed Bayesian updates on the following medical events: angina, appendicitis, atrial fibrillation, atrial flutter, dental abscess, dental caries, dental periodontal disease, gallstone disease, herpes zoster, renal stones, seizure, and stroke.

  15. Relationships of Selected Personal and Social Variables in Conforming Judgment

    Science.gov (United States)

    Long, Huey B.

    1970-01-01

    To help determine relationships between certain personality variables and conforming judgment, and difference in conforming judgments among differently structured groups, prison immates were studies for the personality variables of IQ (California Capacity Questionnaire), agreement response set (Couch and Kenniston Scale), and dogmatism (Form E,…

  16. The application of Kriging and empirical Kriging based on the variables selected by SCAD.

    Science.gov (United States)

    Peng, Xiao-Ling; Yin, Hong; Li, Runze; Fang, Kai-Tai

    2006-09-25

    The commonly used approach for building a structure-activity/property relationship consists of three steps. First, one determines the descriptors for the molecular structure, then builds a metamodel by using some proper mathematical methods, and finally evaluates the meta-model. Some existing methods only can select important variables from the candidates, while most metamodels just explore linear relationships between inputs and outputs. Some techniques are useful to build more complicated relationship, but they may not be able to select important variables from a large number of variables. In this paper, we propose to screen important variables by the smoothly clipped absolute deviation (SCAD) variable selection procedure, and then apply Kriging model and empirical Kriging model for quantitative structure-activity/property relationship (QSAR/QSPR) research based on the selected important variables. We demonstrate the proposed procedure retains the virtues of both variable selection and Kriging model. PMID:17723710

  17. Hierarchical Bayesian spatial models for predicting multiple forest variables using waveform LiDAR, hyperspectral imagery, and large inventory datasets

    Science.gov (United States)

    Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.

    2013-01-01

    In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.

  18. Selection of the treatment effect for sample size determination in a superiority clinical trial using a hybrid classical and Bayesian procedure.

    Science.gov (United States)

    Ciarleglio, Maria M; Arendt, Christopher D; Makuch, Robert W; Peduzzi, Peter N

    2015-03-01

    Specification of the treatment effect that a clinical trial is designed to detect (θA) plays a critical role in sample size and power calculations. However, no formal method exists for using prior information to guide the choice of θA. This paper presents a hybrid classical and Bayesian procedure for choosing an estimate of the treatment effect to be detected in a clinical trial that formally integrates prior information into this aspect of trial design. The value of θA is found that equates the pre-specified frequentist power and the conditional expected power of the trial. The conditional expected power averages the traditional frequentist power curve using the conditional prior distribution of the true unknown treatment effect θ as the averaging weight. The Bayesian prior distribution summarizes current knowledge of both the magnitude of the treatment effect and the strength of the prior information through the assumed spread of the distribution. By using a hybrid classical and Bayesian approach, we are able to formally integrate prior information on the uncertainty and variability of the treatment effect into the design of the study, mitigating the risk that the power calculation will be overly optimistic while maintaining a frequentist framework for the final analysis. The value of θA found using this method may be written as a function of the prior mean μ0 and standard deviation τ0, with a unique relationship for a given ratio of μ0/τ0. Results are presented for Normal, Uniform, and Gamma priors for θ. PMID:25583273

  19. Implementing Bayesian Vector Autoregressions Implementing Bayesian Vector Autoregressions

    Directory of Open Access Journals (Sweden)

    Richard M. Todd

    1988-03-01

    Full Text Available Implementing Bayesian Vector Autoregressions This paper discusses how the Bayesian approach can be used to construct a type of multivariate forecasting model known as a Bayesian vector autoregression (BVAR. In doing so, we mainly explain Doan, Littermann, and Sims (1984 propositions on how to estimate a BVAR based on a certain family of prior probability distributions. indexed by a fairly small set of hyperparameters. There is also a discussion on how to specify a BVAR and set up a BVAR database. A 4-variable model is used to iliustrate the BVAR approach.

  20. Bayesian Peak Picking for NMR Spectra

    KAUST Repository

    Cheng, Yichen

    2014-02-01

    Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method.

  1. Bayesian Nonparametric Shrinkage Applied to Cepheid Star Oscillations.

    Science.gov (United States)

    Berger, James; Jefferys, William; Müller, Peter

    2012-01-01

    Bayesian nonparametric regression with dependent wavelets has dual shrinkage properties: there is shrinkage through a dependent prior put on functional differences, and shrinkage through the setting of most of the wavelet coefficients to zero through Bayesian variable selection methods. The methodology can deal with unequally spaced data and is efficient because of the existence of fast moves in model space for the MCMC computation. The methodology is illustrated on the problem of modeling the oscillations of Cepheid variable stars; these are a class of pulsating variable stars with the useful property that their periods of variability are strongly correlated with their absolute luminosity. Once this relationship has been calibrated, knowledge of the period gives knowledge of the luminosity. This makes these stars useful as "standard candles" for estimating distances in the universe. PMID:24368873

  2. Random Forests for Ordinal Response Data: Prediction and Variable Selection

    OpenAIRE

    Janitza, Silke; Tutz, Gerhard; Boulesteix, Anne-Laure

    2014-01-01

    The random forest method is a commonly used tool for classification with high-dimensional data that is able to rank candidate predictors through its inbuilt variable importance measures (VIMs). It can be applied to various kinds of regression problems including nominal, metric and survival response variables. While classification and regression problems using random forest methodology have been extensively investigated in the past, there seems to be a lack of literature on handling ordinal re...

  3. Bayesian phylogeography finds its roots.

    Directory of Open Access Journals (Sweden)

    Philippe Lemey

    2009-09-01

    Full Text Available As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has been frequently interpreted in the light of their genetic histories. Unfortunately, inference of historical dispersal or migration patterns of viruses has mainly been restricted to model-free heuristic approaches that provide little insight into the temporal setting of the spatial dynamics. The introduction of probabilistic models of evolution, however, offers unique opportunities to engage in this statistical endeavor. Here we introduce a Bayesian framework for inference, visualization and hypothesis testing of phylogeographic history. By implementing character mapping in a Bayesian software that samples time-scaled phylogenies, we enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. Standard Markov model inference is extended with a stochastic search variable selection procedure that identifies the parsimonious descriptions of the diffusion process. In addition, we propose priors that can incorporate geographical sampling distributions or characterize alternative hypotheses about the spatial dynamics. To visualize the spatial and temporal information, we summarize inferences using virtual globe software. We describe how Bayesian phylogeography compares with previous parsimony analysis in the investigation of the influenza A H5N1 origin and H5N1 epidemiological linkage among sampling localities. Analysis of rabies in West African dog populations reveals how virus diffusion may enable endemic maintenance through continuous epidemic cycles. From these analyses, we conclude that our phylogeographic framework will make an important asset in molecular epidemiology that can be easily generalized to infer biogeogeography from genetic data for many organisms.

  4. A Simple Method for Variable Selection in Regression with Respect to Treatment Selection

    Directory of Open Access Journals (Sweden)

    Lacey Gunter

    2011-09-01

    Full Text Available In this paper, we compare the method of Gunter et al. (2011 for variable selection in treatment comparison analysis (an approach to regression analysis where treatment-covariate interactions are deemed important with a simple stepwise selection method that we introduce. The stepwise method has several advantages, most notably its generalization to  regression models that are not necessarily linear, its simplicity and its intuitive nature. We show that the new simple method works surprisingly well compared to the more complex method when compared in the linear regression framework. We use four generative models (explicitly detailed in the paper for the simulations and compare spuriously identified interactions and where applicable (generative models 3 and 4 correctly identified interactions. We also apply the new method to logistic regression and Poisson regression and illustrate its  performance in Table 2 in the paper. The simple method can be applied to other types of regression models including various other generalized linear models, Cox proportional hazard models and nonlinear models.

  5. VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.

  6. High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning

    CERN Document Server

    Bach, Francis

    2009-01-01

    We consider the problem of high-dimensional non-linear variable selection for supervised learning. Our approach is based on performing linear selection among exponentially many appropriately defined positive definite kernels that characterize non-linear interactions between the original variables. To select efficiently from these many kernels, we use the natural hierarchical structure of the problem to extend the multiple kernel learning framework to kernels that can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a graph-adapted sparsity-inducing norm, in polynomial time in the number of selected kernels. Moreover, we study the consistency of variable selection in high-dimensional settings, showing that under certain assumptions, our regularization framework allows a number of irrelevant variables which is exponential in the number of observations. Our simulations on synthetic datasets and datasets from the UCI repository show state-of-the-art pre...

  7. Anthropogenic environments exert variable selection on cranial capacity in mammals.

    Science.gov (United States)

    Snell-Rood, Emilie C; Wick, Naomi

    2013-10-22

    It is thought that behaviourally flexible species will be able to cope with novel and rapidly changing environments associated with human activity. However, it is unclear whether such environments are selecting for increases in behavioural plasticity, and whether some species show more pronounced evolutionary changes in plasticity. To test whether anthropogenic environments are selecting for increased behavioural plasticity within species, we measured variation in relative cranial capacity over time and space in 10 species of mammals. We predicted that urban populations would show greater cranial capacity than rural populations and that cranial capacity would increase over time in urban populations. Based on relevant theory, we also predicted that species capable of rapid population growth would show more pronounced evolutionary responses. We found that urban populations of two small mammal species had significantly greater cranial capacity than rural populations. In addition, species with higher fecundity showed more pronounced differentiation between urban and rural populations. Contrary to expectations, we found no increases in cranial capacity over time in urban populations-indeed, two species tended to have a decrease in cranial capacity over time in urban populations. Furthermore, rural populations of all insectivorous species measured showed significant increases in relative cranial capacity over time. Our results provide partial support for the hypothesis that urban environments select for increased behavioural plasticity, although this selection may be most pronounced early during the urban colonization process. Furthermore, these data also suggest that behavioural plasticity may be simultaneously favoured in rural environments, which are also changing because of human activity.

  8. Optical variability of X-ray-selected QSOs

    International Nuclear Information System (INIS)

    Photometric data for ten X-ray-selected quasistellar objects have been obtained from archival records of the Rosemary Hill Observatory. Reliable magnitudes were obtained for seven of the ten sources and six displayed optical variations significant at the 95 percent confidence level or greater. One source appeared to exhibit optically violent behavior. Light curves and photographic magnitudes are presented and discussed. 22 references

  9. On the Evidence for Cosmic Variation of the Fine Structure Constant (II): A Semi-Parametric Bayesian Model Selection Analysis of the Quasar Dataset

    CERN Document Server

    Cameron, Ewan

    2013-01-01

    In the second paper of this series we extend our Bayesian reanalysis of the evidence for a cosmic variation of the fine structure constant to the semi-parametric modelling regime. By adopting a mixture of Dirichlet processes prior for the unexplained errors in each instrumental subgroup of the benchmark quasar dataset we go some way towards freeing our model selection procedure from the apparent subjectivity of a fixed distributional form. Despite the infinite-dimensional domain of the error hierarchy so constructed we are able to demonstrate a recursive scheme for marginal likelihood estimation with prior-sensitivity analysis directly analogous to that presented in Paper I, thereby allowing the robustness of our posterior Bayes factors to hyper-parameter choice and model specification to be readily verified. In the course of this work we elucidate various similarities between unexplained error problems in the seemingly disparate fields of astronomy and clinical meta-analysis, and we highlight a number of sop...

  10. Compiling Relational Bayesian Networks for Exact Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred; Chavira, Mark; Darwiche, Adnan

    2004-01-01

    We describe a system for exact inference with relational Bayesian networks as defined in the publicly available \\primula\\ tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating...... and differentiating these circuits in time linear in their size. We report on experimental results showing the successful compilation, and efficient inference, on relational Bayesian networks whose {\\primula}--generated propositional instances have thousands of variables, and whose jointrees have clusters...

  11. Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection

    DEFF Research Database (Denmark)

    Karaman, Ibrahim; Qannari, El Mostafa; Martens, Harald;

    2013-01-01

    The objective of this study was to compare two different techniques of variable selection, Sparse PLSR and Jack-knife PLSR, with respect to their predictive ability and their ability to identify relevant variables. Sparse PLSR is a method that is frequently used in genomics, whereas Jack-knife PLSR...... was highlighted by the frequency of the selection of each variable in the cross model validation segments. Computationally, Jack-knife PLSR was much faster than Sparse PLSR. But while it was found that both methods have more or less the same predictive ability, Sparse PLSR turned out to be generally very stable...... in selecting the relevant variables, whereas Jack-knife PLSR was very prone to selecting also uninformative variables. To remedy this drawback, a strategy of analysis consisting in adding a perturbation parameter to the uncertainty variances obtained by means of Jack-knife PLSR is demonstrated....

  12. Resting high frequency heart rate variability selectively predicts cooperative behavior.

    Science.gov (United States)

    Beffara, Brice; Bret, Amélie G; Vermeulen, Nicolas; Mermillod, Martial

    2016-10-01

    This study explores whether the vagal connection between the heart and the brain is involved in prosocial behaviors. The Polyvagal Theory postulates that vagal activity underlies prosocial tendencies. Even if several results suggest that vagal activity is associated with prosocial behaviors, none of them used behavioral measures of prosociality to establish this relationship. We recorded the resting state vagal activity (reflected by High Frequency Heart Rate Variability, HF-HRV) of 48 (42 suitale for analysis) healthy human adults and measured their level of cooperation during a hawk-dove game. We also manipulated the consequence of mutual defection in the hawk-dove game (severe vs. moderate). Results show that HF-HRV is positively and linearly related to cooperation level, but only when the consequence of mutual defection is severe (compared to moderate). This supports that i) prosocial behaviors are likely to be underpinned by vagal functioning ii) physiological disposition to cooperate interacts with environmental context. We discuss these results within the theoretical framework of the Polyvagal Theory. PMID:27343804

  13. 3D Bayesian contextual classifiers

    DEFF Research Database (Denmark)

    Larsen, Rasmus

    2000-01-01

    We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours.......We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours....

  14. Variability-based Active Galactic Nucleus Selection Using Image Subtraction in the SDSS and LSST Era

    Science.gov (United States)

    Choi, Yumi; Gibson, Robert R.; Becker, Andrew C.; Ivezić, Željko; Connolly, Andrew J.; MacLeod, Chelsea L.; Ruan, John J.; Anderson, Scott F.

    2014-02-01

    With upcoming all-sky surveys such as LSST poised to generate a deep digital movie of the optical sky, variability-based active galactic nucleus (AGN) selection will enable the construction of highly complete catalogs with minimum contamination. In this study, we generate g-band difference images and construct light curves (LCs) for QSO/AGN candidates listed in Sloan Digital Sky Survey Stripe 82 public catalogs compiled from different methods, including spectroscopy, optical colors, variability, and X-ray detection. Image differencing excels at identifying variable sources embedded in complex or blended emission regions such as Type II AGNs and other low-luminosity AGNs that may be omitted from traditional photometric or spectroscopic catalogs. To separate QSOs/AGNs from other sources using our difference image LCs, we explore several LC statistics and parameterize optical variability by the characteristic damping timescale (τ) and variability amplitude. By virtue of distinguishable variability parameters of AGNs, we are able to select them with high completeness of 93.4% and efficiency (i.e., purity) of 71.3%. Based on optical variability, we also select highly variable blazar candidates, whose infrared colors are consistent with known blazars. One-third of them are also radio detected. With the X-ray selected AGN candidates, we probe the optical variability of X-ray detected optically extended sources using their difference image LCs for the first time. A combination of optical variability and X-ray detection enables us to select various types of host-dominated AGNs. Contrary to the AGN unification model prediction, two Type II AGN candidates (out of six) show detectable variability on long-term timescales like typical Type I AGNs. This study will provide a baseline for future optical variability studies of extended sources.

  15. Punishment-Induced Behavioral and Neurophysiological Variability Reveals Dopamine-Dependent Selection of Kinematic Movement Parameters

    OpenAIRE

    Galea, J. M.; Ruge, D.; Buijink, A.; Bestmann, S.; Rothwell, J. C.

    2013-01-01

    Action selection describes the high-level process which selects between competing movements. In animals, behavioural variability is critical for the motor exploration required to select the action which optimizes reward and minimizes cost/punishment, and is guided by dopamine (DA). The aim of this study was to test in humans whether low-level movement parameters are affected by punishment and reward in ways similar to high-level action selection. Moreover, we addressed the proposed dependence...

  16. Variable selection in multiple linear regression: The influence of individual cases

    OpenAIRE

    SJ Steel; DW Uys

    2007-01-01

    The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the C_p criterion and Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is proposed as the selection influence of the specific omitted case. Four standard examples from the literature are considered and the selection influence of the cases is calcul...

  17. Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

    CERN Document Server

    Bouveyron, Charles

    2012-01-01

    The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the efficiency of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through $\\ell_{1}$-type penalizations. Experimental comparisons with existing approach...

  18. Variability-based AGN selection using image subtraction in the SDSS and LSST era

    CERN Document Server

    Choi, Yumi; Becker, Andrew C; Ivezić, \\vZeljko; Connolly, Andrew J; MacLeod, Chelsea L; Ruan, John J; Anderson, Scott F

    2013-01-01

    With upcoming all sky surveys such as LSST poised to generate a deep digital movie of the optical sky, variability-based AGN selection will enable the construction of highly-complete catalogs with minimum contamination. In this study, we generate $g$-band difference images and construct light curves for QSO/AGN candidates listed in SDSS Stripe 82 public catalogs compiled from different methods, including spectroscopy, optical colors, variability, and X-ray detection. Image differencing excels at identifying variable sources embedded in complex or blended emission regions such as Type II AGNs and other low-luminosity AGNs that may be omitted from traditional photometric or spectroscopic catalogs. To separate QSOs/AGNs from other sources using our difference image light curves, we explore several light curve statistics and parameterize optical variability by the characteristic damping timescale ($\\tau$) and variability amplitude. By virtue of distinguishable variability parameters of AGNs, we are able to select...

  19. Predictive modeling with high-dimensional data streams: an on-line variable selection approach

    OpenAIRE

    McWilliams, Brian; Montana, Giovanni

    2009-01-01

    International audience In this paper we propose a computationally efficient algorithm for on-line variable selection in multivariate regression problems involving high dimensional data streams. The algorithm recursively extracts all the latent factors of a partial least squares solution and selects the most important variables for each factor. This is achieved by means of only one sparse singular value decomposition which can be efficiently updated on-line and in an adaptive fashion. Simul...

  20. Sparse partial least squares for on-line variable selection in multivariate data streams

    OpenAIRE

    McWilliams, Brian; Montana, Giovanni

    2009-01-01

    In this paper we propose a computationally efficient algorithm for on-line variable selection in multivariate regression problems involving high dimensional data streams. The algorithm recursively extracts all the latent factors of a partial least squares solution and selects the most important variables for each factor. This is achieved by means of only one sparse singular value decomposition which can be efficiently updated on-line and in an adaptive fashion. Simulation results based on art...

  1. COMPARATIVE EFFECT OF AEROBICS AND RESISTANCE EXERCISES ON SELECTED PHYSIOLOGICAL VARIABLES AMONG OBESE CHILDREN

    OpenAIRE

    M. Dhanalakshmi; Grace Helina; Senthilkumar

    2015-01-01

    The aim of this study was to find out the comparative effects of aerobics and resistance exercises on selected physiological variables among obese children. To achieve the purpose, 60 Obese children, whose BMI was greater than 30 kg/m2 were randomly selected and assigned into three groups, aerobics exercises group (AEG), resistance training group (RTG) and Control group (CG) consisting of 20 in each. After assessing physiological variables, forced vital capacity and resting heart rate init...

  2. Variability-selected low luminosity AGNs in the SA57 and in the CDFS

    CERN Document Server

    Vagnetti, F; Trevese, D

    2009-01-01

    Low Luminosity Active Galactic Nuclei (LLAGNs) are contaminated by the light of their host galaxies, thus they cannot be detected by the usual colour techniques. For this reason their evolution in cosmic time is poorly known. Variability is a property shared by virtually all active galactic nuclei, and it was adopted as a criterion to select them using multi epoch surveys. Here we report on two variability surveys in different sky areas, the Selected Area 57 and the Chandra Deep Field South.

  3. Stock market reaction to selected macroeconomic variables in the Nigerian economy

    OpenAIRE

    Abraham, Terfa Williams

    2011-01-01

    This study examines the relationship between the stock market and selected macroeconomic variables in Nigeria. The all share index was used as a proxy for the stock market while inflation, interest and exchange rates were the macroeconomic variables selected. Employing error correction model, it was found that a significant negative short run relationship exists between the stock market and the minimum rediscounting rate (MRR) implying that, a decrease in the MRR, would improve the performanc...

  4. Integrating biological knowledge into variable selection: an empirical Bayes approach with an application in cancer biology

    OpenAIRE

    Hill Steven M; Neve Richard M; Bayani Nora; Kuo Wen-Lin; Ziyad Safiyyah; Spellman Paul T; Gray Joe W; Mukherjee Sach

    2012-01-01

    Abstract Background An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary informatio...

  5. Relationship of Selected Listener Variables and Musical Preference of Young Students in Singapore

    Science.gov (United States)

    Teo, Timothy

    2005-01-01

    The purpose of this study was to examine the relationship between selected listener variables and musical preference of young students in Singapore. Based on the Leblanc 1982 model, gender, age, race, musical training and familiarity were chosen as independent variables. The data collected showed that musical preference was also influenced by…

  6. Probability and Bayesian statistics

    CERN Document Server

    1987-01-01

    This book contains selected and refereed contributions to the "Inter­ national Symposium on Probability and Bayesian Statistics" which was orga­ nized to celebrate the 80th birthday of Professor Bruno de Finetti at his birthplace Innsbruck in Austria. Since Professor de Finetti died in 1985 the symposium was dedicated to the memory of Bruno de Finetti and took place at Igls near Innsbruck from 23 to 26 September 1986. Some of the pa­ pers are published especially by the relationship to Bruno de Finetti's scientific work. The evolution of stochastics shows growing importance of probability as coherent assessment of numerical values as degrees of believe in certain events. This is the basis for Bayesian inference in the sense of modern statistics. The contributions in this volume cover a broad spectrum ranging from foundations of probability across psychological aspects of formulating sub­ jective probability statements, abstract measure theoretical considerations, contributions to theoretical statistics an...

  7. Bayesian programming

    CERN Document Server

    Bessiere, Pierre; Ahuactzin, Juan Manuel; Mekhnacha, Kamel

    2013-01-01

    Probability as an Alternative to Boolean LogicWhile logic is the mathematical foundation of rational reasoning and the fundamental principle of computing, it is restricted to problems where information is both complete and certain. However, many real-world problems, from financial investments to email filtering, are incomplete or uncertain in nature. Probability theory and Bayesian computing together provide an alternative framework to deal with incomplete and uncertain data. Decision-Making Tools and Methods for Incomplete and Uncertain DataEmphasizing probability as an alternative to Boolean

  8. ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER''S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS

    OpenAIRE

    Henry de-Graft Acquah; Joseph Acquah

    2013-01-01

    Alternative formulations of the Bayesian Information Criteria provide a basis for choosing between competing methods for detecting price asymmetry. However, very little is understood about their performance in the asymmetric price transmission modelling framework. In addressing this issue, this paper introduces and applies parametric bootstrap techniques to evaluate the ability of Bayesian Information Criteria (BIC) and Draper's Information Criteria (DIC) in discriminating between alternative...

  9. The accuracy and clinical feasibility of a new Bayesian-based closed-loop control system for propofol administration using the bispectral index as a controlled variable

    NARCIS (Netherlands)

    De Smet, Tom; Struys, Michel M. R. F.; Neckebroek, Martine M.; Van den Hauwe, Kristof; Bonte, Sjoert; Mortier, Eric P.

    2008-01-01

    BACKGROUND: Closed-loop control of the hypnotic component of anesthesia has been proposed in an attempt to optimize drug delivery. Here, we introduce a newly developed Bayesian-based, patient-individualized, model-based, adaptive control method for bispectral index (BIS) guided propofol infusion int

  10. The Time Domain Spectroscopic Survey: Variable Object Selection and Anticipated Results

    CERN Document Server

    Morganson, Eric; Anderson, Scott F; Ruan, John J; Myers, Adam D; Eracleous, Michael; Kelly, Brandon; Badenes, Carlos; Banados, Eduardo; Blanton, Michael R; Bershady, Matthew A; Borissova, Jura; Brandt, William Nielsen; Burgett, William S; Chambers, Kenneth; Draper, Peter W; Davenport, James R A; Flewelling, Heather; Garnavich, Peter; Hawley, Suzanne L; Hodapp, Klaus W; Isler, Jedidah C; Kaiser, Nick; Kinemuchi, Karen; Kudritzki, Rolf P; Metcalfe, Nigel; Morgan, Jeffrey S; Paris, Isabelle; Parvizi, Mahmoud; Poleski, Radoslaw; Price, Paul A; Salvato, Mara; Shanks, Tom; Schlafly, Eddie F; Schneider, Donald P; Shen, Yue; Stassun, Keivan; Tonry, John T; Walter, Fabian; Waters, Chris Z

    2015-01-01

    We present the selection algorithm and anticipated results for the Time Domain Spectroscopic Survey (TDSS). TDSS is an SDSS-IV eBOSS subproject that will provide initial identification spectra of approximately 220,000 luminosity-variable objects (variable stars and AGN) across 7,500 square degrees selected from a combination of SDSS and multi-epoch Pan-STARRS1 photometry. TDSS will be the largest spectroscopic survey to explicitly target variable objects, avoiding pre-selection on the basis of colors or detailed modeling of specific variability characteristics. Kernel Density Estimate (KDE) analysis of our target population performed on SDSS Stripe 82 data suggests our target sample will be 95% pure (meaning 95% of objects we select have genuine luminosity variability of a few magnitudes or more). Our final spectroscopic sample will contain roughly 135,000 quasars and 85,000 stellar variables, approximately 4,000 of which will be RR Lyrae stars which may be used as outer Milky Way probes. The variability-sele...

  11. A bootstrapping soft shrinkage approach for variable selection in chemical modeling.

    Science.gov (United States)

    Deng, Bai-Chuan; Yun, Yong-Huan; Cao, Dong-Sheng; Yin, Yu-Long; Wang, Wei-Ting; Lu, Hong-Mei; Luo, Qian-Yi; Liang, Yi-Zeng

    2016-02-18

    In this study, a new variable selection method called bootstrapping soft shrinkage (BOSS) method is developed. It is derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA). The weights of variables are determined based on the absolute values of regression coefficients. WBS is applied according to the weights to generate sub-models and MPA is used to analyze the sub-models to update weights for variables. The optimization procedure follows the rule of soft shrinkage, in which less important variables are not eliminated directly but are assigned smaller weights. The algorithm runs iteratively and terminates until the number of variables reaches one. The optimal variable set with the lowest root mean squared error of cross-validation (RMSECV) is selected. The method was tested on three groups of near infrared (NIR) spectroscopic datasets, i.e. corn datasets, diesel fuels datasets and soy datasets. Three high performing variable selection methods, i.e. Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm partial least squares (GA-PLS) are used for comparison. The results show that BOSS is promising with improved prediction performance. The Matlab codes for implementing BOSS are freely available on the website: http://www.mathworks.com/matlabcentral/fileexchange/52770-boss. PMID:26826688

  12. COMPARISON OF SELECTED PHYSIOLOGICAL VARIABLES OF PLAYERS BELONGING TO VARIOUS DISTANCE RUNNERS

    OpenAIRE

    Satpal Yadav; Arvind S.Sajwan; Ankan Sinha

    2009-01-01

    The purpose of the study was to compare the selected physiological variables namely; maximum oxygen consumption, vital capacity, resting heart rate and hemoglobin content among various distance runners. Thesubjects were selected from the male athlete’s of Gwalior district of various distance runners i.e. short, middle and long distance runners for this study. Ten (10) male athletes from each groups namely short, middle and long distance groups were selected as the subject for the study. Selec...

  13. Refining gene signatures: a Bayesian approach

    Directory of Open Access Journals (Sweden)

    Labbe Aurélie

    2009-12-01

    Full Text Available Abstract Background In high density arrays, the identification of relevant genes for disease classification is complicated by not only the curse of dimensionality but also the highly correlated nature of the array data. In this paper, we are interested in the question of how many and which genes should be selected for a disease class prediction. Our work consists of a Bayesian supervised statistical learning approach to refine gene signatures with a regularization which penalizes for the correlation between the variables selected. Results Our simulation results show that we can most often recover the correct subset of genes that predict the class as compared to other methods, even when accuracy and subset size remain the same. On real microarray datasets, we show that our approach can refine gene signatures to obtain either the same or better predictive performance than other existing methods with a smaller number of genes. Conclusions Our novel Bayesian approach includes a prior which penalizes highly correlated features in model selection and is able to extract key genes in the highly correlated context of microarray data. The methodology in the paper is described in the context of microarray data, but can be applied to any array data (such as micro RNA, for example as a first step towards predictive modeling of cancer pathways. A user-friendly software implementation of the method is available.

  14. A survey of variable selection methods in two Chinese epidemiology journals

    Directory of Open Access Journals (Sweden)

    Lynn Henry S

    2010-09-01

    Full Text Available Abstract Background Although much has been written on developing better procedures for variable selection, there is little research on how it is practiced in actual studies. This review surveys the variable selection methods reported in two high-ranking Chinese epidemiology journals. Methods Articles published in 2004, 2006, and 2008 in the Chinese Journal of Epidemiology and the Chinese Journal of Preventive Medicine were reviewed. Five categories of methods were identified whereby variables were selected using: A - bivariate analyses; B - multivariable analysis; e.g. stepwise or individual significance testing of model coefficients; C - first bivariate analyses, followed by multivariable analysis; D - bivariate analyses or multivariable analysis; and E - other criteria like prior knowledge or personal judgment. Results Among the 287 articles that reported using variable selection methods, 6%, 26%, 30%, 21%, and 17% were in categories A through E, respectively. One hundred sixty-three studies selected variables using bivariate analyses, 80% (130/163 via multiple significance testing at the 5% alpha-level. Of the 219 multivariable analyses, 97 (44% used stepwise procedures, 89 (41% tested individual regression coefficients, but 33 (15% did not mention how variables were selected. Sixty percent (58/97 of the stepwise routines also did not specify the algorithm and/or significance levels. Conclusions The variable selection methods reported in the two journals were limited in variety, and details were often missing. Many studies still relied on problematic techniques like stepwise procedures and/or multiple testing of bivariate associations at the 0.05 alpha-level. These deficiencies should be rectified to safeguard the scientific validity of articles published in Chinese epidemiology journals.

  15. Naïve Bayesian Classifier for Selecting Good/Bad Projects during the Early Stage of International Construction Bidding Decisions

    Directory of Open Access Journals (Sweden)

    Woosik Jang

    2015-01-01

    Full Text Available Since the 1970s, revenues generated by Korean contractors in international construction have increased rapidly, exceeding USD 70 billion per year in recent years. However, Korean contractors face significant risks from market uncertainty and sensitivity to economic volatility and technical difficulties. As the volatility of these risks threatens project profitability, approximately 15% of bad projects were found to account for 74% of losses from the same international construction sector. Anticipating bad projects via preemptive risk management can better prevent losses so that contractors can enhance the efficiency of bidding decisions during the early stages of a project cycle. In line with these objectives, this paper examines the effect of such factors on the degree of project profitability. The Naïve Bayesian classifier is applied to identify a good project screening tool, which increases practical applicability using binomial variables with limited information that is obtainable in the early stages. The proposed model produced superior classification results that adequately reflect contractor views of risk. It is anticipated that when users apply the proposed model based on their own knowledge and expertise, overall firm profit rates will increase as a result of early abandonment of bad projects as well as the prioritization of good projects before final bidding decisions are made.

  16. Variable selectivity and the role of nutritional quality in food selection by a planktonic rotifer

    International Nuclear Information System (INIS)

    To investigate the potential for selective feeding to enhance fitness, I test the hypothesis that an herbivorous zooplankter selects those food items that best support its reproduction. Under this hypothesis, growth and reproduction on selected food items should be higher than on less preferred items. The hypothesis is not supported. In situ selectivity by the rotifer Keratella taurocephala for Cryptomonas relative to Chlamydomonas goes through a seasonal cycle, in apparent response to fluctuating Cryptomonas populations. However, reproduction on a unialgal diet of Cryptomonas is consistently high and similar to that on Chlamydomonas. Oocystis, which also supports reproduction equivalent to that supported by Chlamydomonas, is sometimes rejected by K. taurocephala. In addition, K. taurocephala does not discriminate between Merismopedia and Chlamydomonas even though Merismopedia supports virtually no reproduction by the rotifer. Selection by K. taurocephala does not simply maximize the intake of food items that yield high reproduction. Selectivity is a complex, dynamic process, one function of which may be the exploitation of locally or seasonally abundant foods. (author)

  17. Bayesian modeling of censored partial linear models using scale-mixtures of normal distributions

    Science.gov (United States)

    Castro, Luis M.; Lachos, Victor H.; Ferreira, Guillermo P.; Arellano-Valle, Reinaldo B.

    2012-10-01

    Regression models where the dependent variable is censored (limited) are usually considered in statistical analysis. Particularly, the case of a truncation to the left of zero and a normality assumption for the error terms is studied in detail by [1] in the well known Tobit model. In the present article, this typical censored regression model is extended by considering a partial linear model with errors belonging to the class of scale mixture of normal distributions. We achieve a fully Bayesian inference by adopting a Metropolis algorithm within a Gibbs sampler. The likelihood function is utilized to compute not only some Bayesian model selection measures but also to develop Bayesian case-deletion influence diagnostics based on the q-divergence measures. We evaluate the performances of the proposed methods with simulated data. In addition, we present an application in order to know what type of variables affect the income of housewives.

  18. Bayesian Methods and Universal Darwinism

    CERN Document Server

    Campbell, John

    2010-01-01

    Bayesian methods since the time of Laplace have been understood by their practitioners as closely aligned to the scientific method. Indeed a recent champion of Bayesian methods, E. T. Jaynes, titled his textbook on the subject Probability Theory: the Logic of Science. Many philosophers of science including Karl Popper and Donald Campbell have interpreted the evolution of Science as a Darwinian process consisting of a 'copy with selective retention' algorithm abstracted from Darwin's theory of Natural Selection. Arguments are presented for an isomorphism between Bayesian Methods and Darwinian processes. Universal Darwinism, as the term has been developed by Richard Dawkins, Daniel Dennett and Susan Blackmore, is the collection of scientific theories which explain the creation and evolution of their subject matter as due to the operation of Darwinian processes. These subject matters span the fields of atomic physics, chemistry, biology and the social sciences. The principle of Maximum Entropy states that system...

  19. Bayesian inference for OPC modeling

    Science.gov (United States)

    Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.

    2016-03-01

    The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.

  20. The Origins and Maintenance of Female Genital Modification across Africa : Bayesian Phylogenetic Modeling of Cultural Evolution under the Influence of Selection.

    Science.gov (United States)

    Ross, Cody T; Strimling, Pontus; Ericksen, Karen Paige; Lindenfors, Patrik; Mulder, Monique Borgerhoff

    2016-06-01

    We present formal evolutionary models for the origins and persistence of the practice of Female Genital Modification (FGMo). We then test the implications of these models using normative cross-cultural data on FGMo in Africa and Bayesian phylogenetic methods that explicitly model adaptive evolution. Empirical evidence provides some support for the findings of our evolutionary models that the de novo origins of the FGMo practice should be associated with social stratification, and that social stratification should place selective pressures on the adoption of FGMo; these results, however, are tempered by the finding that FGMo has arisen in many cultures that have no social stratification, and that forces operating orthogonally to stratification appear to play a more important role in the cross-cultural distribution of FGMo. To explain these cases, one must consider cultural evolutionary explanations in conjunction with behavioral ecological ones. We conclude with a discussion of the implications of our study for policies designed to end the practice of FGMo. PMID:26846688

  1. Selection of relevant input variables in storm water quality modeling by multiobjective evolutionary polynomial regression paradigm

    Science.gov (United States)

    Creaco, E.; Berardi, L.; Sun, Siao; Giustolisi, O.; Savic, D.

    2016-04-01

    The growing availability of field data, from information and communication technologies (ICTs) in "smart" urban infrastructures, allows data modeling to understand complex phenomena and to support management decisions. Among the analyzed phenomena, those related to storm water quality modeling have recently been gaining interest in the scientific literature. Nonetheless, the large amount of available data poses the problem of selecting relevant variables to describe a phenomenon and enable robust data modeling. This paper presents a procedure for the selection of relevant input variables using the multiobjective evolutionary polynomial regression (EPR-MOGA) paradigm. The procedure is based on scrutinizing the explanatory variables that appear inside the set of EPR-MOGA symbolic model expressions of increasing complexity and goodness of fit to target output. The strategy also enables the selection to be validated by engineering judgement. In such context, the multiple case study extension of EPR-MOGA, called MCS-EPR-MOGA, is adopted. The application of the proposed procedure to modeling storm water quality parameters in two French catchments shows that it was able to significantly reduce the number of explanatory variables for successive analyses. Finally, the EPR-MOGA models obtained after the input selection are compared with those obtained by using the same technique without benefitting from input selection and with those obtained in previous works where other data-modeling techniques were used on the same data. The comparison highlights the effectiveness of both EPR-MOGA and the input selection procedure.

  2. Universal Darwinism as a process of Bayesian inference

    CERN Document Server

    Campbell, John O

    2016-01-01

    Many of the mathematical frameworks describing natural selection are equivalent to Bayes Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment". Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description clo...

  3. Geographic Elements Selection Algorithm Based on Quadtree in Variable-scale Visualization

    OpenAIRE

    Hao Guo; Feixiang Chen; Junjie Peng

    2013-01-01

    In order to balance the demand between local and global visualization in the data acquisition, this paper adopts the variable-scale visualization technology and uses quadrangular frustum pyramid projection to show geographic information continuously on a mobile device. In addition, the geographic elements in the variable-scale transition region are crowd because of unceasingly scale changing. In order to solve this problem, this paper presents a quadtree-based geographic elements selection al...

  4. Bagging survival tree procedure for variable selection and prediction in the presence of nonsusceptible patients

    OpenAIRE

    Mbogning, Cyprien; Broët, Philippe

    2015-01-01

    International audience; AbstractBackgroundFor clinical genomic studies with high-dimensional datasets, tree-based ensemble methods offer a powerful solution for variable selection and prediction taking into account the complex interrelationships between explanatory variables. One of the key component of the tree-building process is the splitting criterion. For survival data, the classical splitting criterion is the Logrank statistic. However, the presence of a fraction of nonsusceptible patie...

  5. Bayesian nonparametric data analysis

    CERN Document Server

    Müller, Peter; Jara, Alejandro; Hanson, Tim

    2015-01-01

    This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.

  6. Low rank updated LS-SVM classifiers for fast variable selection.

    Science.gov (United States)

    Ojeda, Fabian; Suykens, Johan A K; De Moor, Bart

    2008-01-01

    Least squares support vector machine (LS-SVM) classifiers are a class of kernel methods whose solution follows from a set of linear equations. In this work we present low rank modifications to the LS-SVM classifiers that are useful for fast and efficient variable selection. The inclusion or removal of a candidate variable can be represented as a low rank modification to the kernel matrix (linear kernel) of the LS-SVM classifier. In this way, the LS-SVM solution can be updated rather than being recomputed, which improves the efficiency of the overall variable selection process. Relevant variables are selected according to a closed form of the leave-one-out (LOO) error estimator, which is obtained as a by-product of the low rank modifications. The proposed approach is applied to several benchmark data sets as well as two microarray data sets. When compared to other related algorithms used for variable selection, simulations applying our approach clearly show a lower computational complexity together with good stability on the generalization error.

  7. Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection

    KAUST Repository

    Chen, Lisha

    2012-12-01

    The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.

  8. Seleção de variáveis em QSAR Variable selection in QSAR

    Directory of Open Access Journals (Sweden)

    Márcia Miguel Castro Ferreira

    2002-05-01

    Full Text Available The process of building mathematical models in quantitative structure-activity relationship (QSAR studies is generally limited by the size of the dataset used to select variables from. For huge datasets, the task of selecting a given number of variables that produces the best linear model can be enormous, if not unfeasible. In this case, some methods can be used to separate good parameter combinations from the bad ones. In this paper three methodologies are analyzed: systematic search, genetic algorithm and chemometric methods. These methods have been exposed and discussed through practical examples.

  9. Current Debates on Variability in Child Welfare Decision-Making: A Selected Literature Review

    Directory of Open Access Journals (Sweden)

    Emily Keddell

    2014-11-01

    Full Text Available This article considers selected drivers of decision variability in child welfare decision-making and explores current debates in relation to these drivers. Covering the related influences of national orientation, risk and responsibility, inequality and poverty, evidence-based practice, constructions of abuse and its causes, domestic violence and cognitive processes, it discusses the literature in regards to how each of these influences decision variability. It situates these debates in relation to the ethical issue of variability and the equity issues that variability raises. I propose that despite the ecological complexity that drives decision variability, that improving internal (within-country decision consistency is still a valid goal. It may be that the use of annotated case examples, kind learning systems, and continued commitments to the social justice issues of inequality and individualisation can contribute to this goal.

  10. The use of vector bootstrapping to improve variable selection precision in Lasso models.

    Science.gov (United States)

    Laurin, Charles; Boomsma, Dorret; Lubke, Gitta

    2016-08-01

    The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping. Data were simulated to represent genomic data under a polygenic model as well as under a model with effect sizes representative of typical GWAS results. We compared these approaches to each other as well as to software defaults for the Lasso. Nested cross-validation had the most precise variable selection at small effect sizes. At larger effect sizes, there was no advantage to nesting. We illustrated the nested approach with empirical data comprising SNPs and SNP-SNP interactions from the most significant SNPs in a GWAS of borderline personality symptoms. In the empirical example, we found that the default Lasso selected low-reliability SNPs and interactions which were excluded by bootstrapping. PMID:27248122

  11. Long-term Optical Variability of Radio-Selected Quasars from the FIRST Survey

    CERN Document Server

    Helfand, D J; Willman, B; White, R L; Becker, R H; Price, T; Gregg, M D; McMahon, R G; Helfand, David J.; Stone, Remington P.S.; Willman, Beth; White, Richard L.; Becker, Robert H.; Price, Trevor; Gregg, Michael D.; Mahon, Richard G. Mc

    2001-01-01

    We have obtained single-epoch optical photometry for 201 quasars, taken from the FIRST Bright Quasar Survey, which span a wide range in radio loudness. Comparison with the magnitudes of these objects on the POSS-I plates provides by far the largest sample of long-term variability amplitudes for radio-selected quasars yet produced. We find the quasars to be more variable in the blue than in the red band, consistent with work on optically selected samples. The previously noted trend of decreasing variability with increasing optical luminosity applies only to radio-quiet objects. Furthermore, we do not confirm a rise in variability amplitude with redshift, nor do we see any dependence on radio flux or luminosity. The variability over a radio-optical flux ratio range spanning a factor of 60,000 from radio-quiet to extreme radio-loud objects is largely constant, although there is a suggestion of greater variability in the extreme radio-loud objects. We demonstrate the importance of Malmquist bias in variability st...

  12. Efficiency of genomic selection using Bayesian multimarker models for traits selected to reflect a wide range of heritabilities and frequencies of detected quantitative traits loci in mice

    DEFF Research Database (Denmark)

    Kapell, Dagmar NRG; Sorensen, Daniel; Su, Guosheng;

    2012-01-01

    , behavioural and physiological traits were selected for the analysis to reflect a wide range of heritabilities (0.10 to 0.74) and numbers of detected quantitative traits loci (QTL) (1 to 20) affecting those traits. The analysis included estimation of variance components and cross-validation within and between...... families. Results Genomic selection showed a high predictive ability (PA) in comparison to traditional polygenic selection, especially for traits of moderate heritability and when cross-validation was between families. This occurred although the proportion of genomic variance of traits using genomic models...... generally performed better than traditional polygenic selection, especially in the context of between family cross-validation. Reducing the number of markers considered to affect the trait did not significantly change PA for most traits, particularly in the case of within family cross...

  13. Security of Post-selection based Continuous Variable Quantum Key Distribution against Arbitrary Attacks

    CERN Document Server

    Walk, Nathan; Ralph, Timothy C; Lam, Ping Koy

    2011-01-01

    We extend the security proof for continuous variable quantum key distribution protocols using post selection to account for arbitrary eavesdropping attacks by employing the concept of an equiv- alent protocol where the post-selection is implemented as a projective quantum measurement. We demonstrate that the security can be calculated using only experimentally accessible quantities and finally explicitly evaluate the performance for the case of a noisy Gaussian channel in the limit of unbounded key length.

  14. Selection of variables using 'independence Bayes' in computer-aided diagnosis of upper gastrointestinal bleeding

    OpenAIRE

    Ohmann, C; Künneke, M.; Zaczyk, R.; Thon, K.; Lorenz, Wilfried

    1986-01-01

    In this paper two problems of computer-aided diagnosis with 'independence Bayes' were investigated: selection of variables and monotonicity in performance as the number of measurements is increased. Using prospective data from patients with upper gastrointestinal bleeding, the stepwise forward selection approach maximizing the apparent diagnostic accuracy was analysed with respect to different kinds of bias in estimation of the true diagnostic accuracy and to the stability of the number and t...

  15. Variable selectivity of the Hitachi chemistry analyzer chloride ion-selective electrode toward interfering ions.

    Science.gov (United States)

    Wang, T; Diamandis, E P; Lane, A; Baines, A D

    1994-02-01

    Chloride measurements by ion-selective electrodes are vulnerable to interference by anions such as iodide, thiocyanate, nitrate, and bromide. We have found that the degree of interference of these anions on the Hitachi chemistry analyzer chloride electrode varies from electrode to electrode and this variation can even occur within the same lot of membrane. This variation is not dependent upon the length of time the cartridge has been in the analyzer because no correlation existed between the usage time and the electrode response to interfering ions. Neither is this variation due to the deterioration of the electrode because all electrodes tested had calibration slopes within the manufacturer's specification. Our study, however, showed that even after repeated exposure to a plasma sample containing 2 mM thiocyanate, the chloride electrode was still able to accurately measure the chloride in plasma without thiocyanate, thus confirming that a carryover effect does not exist from a previous thiocyanate-containing sample.

  16. Variables selection for quantitative determination of cotton content in textile blends by near infrared spectroscopy

    Science.gov (United States)

    Sun, Xu-dong; Zhou, Ming-xing; Sun, Yi-ze

    2016-07-01

    Investigations were initiated to develop near infrared (NIR) techniques coupled with variables selection method to rapidly measure cotton content in blend fabrics of cotton and polyester. Multiplicative scatter correction (MSC), smooth, first derivative (1Der), second derivative (2Der) and their combination were employed to preprocess the spectra. Monte Carlo uninformative variables elimination (MCUVE), successive projections algorithm (SPA), and genetic algorithm (GA) were performed comparatively to choose characteristic variables associated with cotton content distributions. One hundred and thirty-five and fifty-nine samples were used to calibrate models and assess the performance of the models, respectively. Through comparing the performance of partial least squares (PLS) regression models with new samples, the optimal model of cotton content was obtained with spectral pretreatment method of 2 Der-Smooth-MSC and variables selection method of MCUVE-SPA-PLS. The correlation coefficient of prediction (rp) and root mean square errors of prediction (RMSEP) were 0.988% and 2.100%, respectively. The results suggest that NIR technique combining with variables selection method of MCUVE-SPA has significant potential to quantitatively analyze cotton content in blend fabrics of cotton and polyester; moreover, it could indicate the related spectral contributions.

  17. Spatial variable selection methods for investigating acute health effects of fine particulate matter components.

    Science.gov (United States)

    Boehm Vock, Laura F; Reich, Brian J; Fuentes, Montserrat; Dominici, Francesca

    2015-03-01

    Multi-site time series studies have reported evidence of an association between short term exposure to particulate matter (PM) and adverse health effects, but the effect size varies across the United States. Variability in the effect may partially be due to differing community level exposure and health characteristics, but also due to the chemical composition of PM which is known to vary greatly by location and time. The objective of this article is to identify particularly harmful components of this chemical mixture. Because of the large number of highly-correlated components, we must incorporate some regularization into a statistical model. We assume that, at each spatial location, the regression coefficients come from a mixture model with the flavor of stochastic search variable selection, but utilize a copula to share information about variable inclusion and effect magnitude across locations. The model differs from current spatial variable selection techniques by accommodating both local and global variable selection. The model is used to study the association between fine PM (PM <2.5μm) components, measured at 115 counties nationally over the period 2000-2008, and cardiovascular emergency room admissions among Medicare patients.

  18. Attention in a bayesian framework

    DEFF Research Database (Denmark)

    Whiteley, Louise Emma; Sahani, Maneesh

    2012-01-01

    The behavioral phenomena of sensory attention are thought to reflect the allocation of a limited processing resource, but there is little consensus on the nature of the resource or why it should be limited. Here we argue that a fundamental bottleneck emerges naturally within Bayesian models...... of perception, and use this observation to frame a new computational account of the need for, and action of, attention - unifying diverse attentional phenomena in a way that goes beyond previous inferential, probabilistic and Bayesian models. Attentional effects are most evident in cluttered environments......, and include both selective phenomena, where attention is invoked by cues that point to particular stimuli, and integrative phenomena, where attention is invoked dynamically by endogenous processing. However, most previous Bayesian accounts of attention have focused on describing relatively simple experimental...

  19. Meta-Statistics for Variable Selection: The R Package BioMark

    Directory of Open Access Journals (Sweden)

    Ron Wehrens

    2012-11-01

    Full Text Available Biomarker identification is an ever more important topic in the life sciences. With the advent of measurement methodologies based on microarrays and mass spectrometry, thousands of variables are routinely being measured on complex biological samples. Often, the question is what makes two groups of samples different. Classical hypothesis testing suffers from the multiple testing problem; however, correcting for this often leads to a lack of power. In addition, choosing α cutoff levels remains somewhat arbitrary. Also in a regression context, a model depending on few but relevant variables will be more accurate and precise, and easier to interpret biologically.We propose an R package, BioMark, implementing two meta-statistics for variable selection. The first, higher criticism, presents a data-dependent selection threshold for significance, instead of a cookbook value of α = 0.05. It is applicable in all cases where two groups are compared. The second, stability selection, is more general, and can also be applied in a regression context. This approach uses repeated subsampling of the data in order to assess the variability of the model coefficients and selects those that remain consistently important. It is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data. BioMark also contains functionality for simulating data with specific characteristics for algorithm development and testing.

  20. A QSAR Study of Environmental Estrogens Based on a Novel Variable Selection Method

    Directory of Open Access Journals (Sweden)

    Aiqian Zhang

    2012-05-01

    Full Text Available A large number of descriptors were employed to characterize the molecular structure of 53 natural, synthetic, and environmental chemicals which are suspected of disrupting endocrine functions by mimicking or antagonizing natural hormones and may thus pose a serious threat to the health of humans and wildlife. In this work, a robust quantitative structure-activity relationship (QSAR model with a novel variable selection method has been proposed for the effective estrogens. The variable selection method is based on variable interaction (VSMVI with leave-multiple-out cross validation (LMOCV to select the best subset. During variable selection, model construction and assessment, the Organization for Economic Co-operation and Development (OECD principles for regulation of QSAR acceptability were fully considered, such as using an unambiguous multiple-linear regression (MLR algorithm to build the model, using several validation methods to assessment the performance of the model, giving the define of applicability domain and analyzing the outliers with the results of molecular docking. The performance of the QSAR model indicates that the VSMVI is an effective, feasible and practical tool for rapid screening of the best subset from large molecular descriptors.

  1. Variable selection with random forest: Balancing stability, performance, and interpretation in ecological and environmental modeling

    Science.gov (United States)

    Random forest (RF) is popular in ecological and environmental modeling, in part, because of its insensitivity to correlated predictors and resistance to overfitting. Although variable selection has been proposed to improve both performance and interpretation of RF models, it is u...

  2. Variable Selection and Functional Form Uncertainty in Cross-Country Growth Regressions

    NARCIS (Netherlands)

    T. Salimans (Tim)

    2011-01-01

    textabstractRegression analyses of cross-country economic growth data are complicated by two main forms of model uncertainty: the uncertainty in selecting explanatory variables and the uncertainty in specifying the functional form of the regression function. Most discussions in the literature addres

  3. Bayesian artificial intelligence

    CERN Document Server

    Korb, Kevin B

    2003-01-01

    As the power of Bayesian techniques has become more fully realized, the field of artificial intelligence has embraced Bayesian methodology and integrated it to the point where an introduction to Bayesian techniques is now a core course in many computer science programs. Unlike other books on the subject, Bayesian Artificial Intelligence keeps mathematical detail to a minimum and covers a broad range of topics. The authors integrate all of Bayesian net technology and learning Bayesian net technology and apply them both to knowledge engineering. They emphasize understanding and intuition but also provide the algorithms and technical background needed for applications. Software, exercises, and solutions are available on the authors' website.

  4. Bayesian artificial intelligence

    CERN Document Server

    Korb, Kevin B

    2010-01-01

    Updated and expanded, Bayesian Artificial Intelligence, Second Edition provides a practical and accessible introduction to the main concepts, foundation, and applications of Bayesian networks. It focuses on both the causal discovery of networks and Bayesian inference procedures. Adopting a causal interpretation of Bayesian networks, the authors discuss the use of Bayesian networks for causal modeling. They also draw on their own applied research to illustrate various applications of the technology.New to the Second EditionNew chapter on Bayesian network classifiersNew section on object-oriente

  5. Variable selection in PLSR and extensions to a multi-block setting for metabolomics data

    DEFF Research Database (Denmark)

    Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach;

    of genomics [1]. They became quickly well established in the field of statistics because a close relationship to elastic net has been established. In sparse variable selection combined with PLSR, a soft thresholding is applied on each loading weight separately. In the field of chemometrics Jack-knifing has...... been introduced for variable selection in PLSR [2]. Jack-knifing has been frequently applied in the field of spectroscopy and is implemented in software tools like The Unscrambler. In Jack-knifing uncertainty estimates of regression coefficients are estimated and a t-test is applied on these estimates...... in order to assess whether the regression coefficient associated to each variable is significantly different from zero. In a recent study we have compared sparse PLSR [1] and Jack-knife PLSR for FTIR spectroscopic data, metabolomics data (LC-MS, NMR) and simulated data. While sparse PLSR turned out...

  6. Uninformative variable elimination assisted by Gram-Schmidt Orthogonalization/successive projection algorithm for descriptor selection in QSAR

    DEFF Research Database (Denmark)

    Omidikia, Nematollah; Kompany-Zareh, Mohsen

    2013-01-01

    Employment of Uninformative Variable Elimination (UVE) as a robust variable selection method is reported in this study. Each regression coefficient represents the contribution of the corresponding variable in the established model, but in the presence of uninformative variables as well as colline...

  7. The change of genetic and phenotypic variability of yield components after recurrent selection of maize

    Directory of Open Access Journals (Sweden)

    Deletić Nebojša

    2009-01-01

    Full Text Available This paper deals with 31 SSD lines from ZP-Syn-1 C0 and 37 from ZP-Syn-1 C3 maize populations. After line selection and seed multiplication in the first year of the study, the trials were set during two years in Kruševac and Zemun Polje, in RCB design with three replications. Additive and phenotypic variances of yield components were calculated, as well as the estimation of genetic variability narrowing by multivariate cluster analysis. The differences in additive and phenotypic variances between the cycles were significant for ear length only and highly significant for grain row number per ear and for percent of root and stalk lodged plants. It means, a significant narrowing of additive and phenotypic variance occurred only for those three traits, and the other traits did not change their variability by selection in a significant manner. However, according to cluster analysis, distances among genotypes and groups in the zero selection cycle were approximately double than in the third one, but group definition was better in the third selection cycle. It can suggest indirectly to a total variability narrowing after three cycles of recurrent selection.

  8. Penalized variable selection procedure for Cox models with semiparametric relative risk

    CERN Document Server

    Du, Pang; Liang, Hua; 10.1214/09-AOS780

    2010-01-01

    We study the Cox models with semiparametric relative risk, which can be partially linear with one nonparametric component, or multiple additive or nonadditive nonparametric components. A penalized partial likelihood procedure is proposed to simultaneously estimate the parameters and select variables for both the parametric and the nonparametric parts. Two penalties are applied sequentially. The first penalty, governing the smoothness of the multivariate nonlinear covariate effect function, provides a smoothing spline ANOVA framework that is exploited to derive an empirical model selection tool for the nonparametric part. The second penalty, either the smoothly-clipped-absolute-deviation (SCAD) penalty or the adaptive LASSO penalty, achieves variable selection in the parametric part. We show that the resulting estimator of the parametric part possesses the oracle property, and that the estimator of the nonparametric part achieves the optimal rate of convergence. The proposed procedures are shown to work well i...

  9. The Impact of Variable Degrees of Freedom and Scale Parameters in Bayesian Methods for Genomic Prediction in Chinese Simmental Beef Cattle.

    Science.gov (United States)

    Zhu, Bo; Zhu, Miao; Jiang, Jicai; Niu, Hong; Wang, Yanhui; Wu, Yang; Xu, Lingyang; Chen, Yan; Zhang, Lupei; Gao, Xue; Gao, Huijiang; Liu, Jianfeng; Li, Junya

    2016-01-01

    Three conventional Bayesian approaches (BayesA, BayesB and BayesCπ) have been demonstrated to be powerful in predicting genomic merit for complex traits in livestock. A priori, these Bayesian models assume that the non-zero SNP effects (marginally) follow a t-distribution depending on two fixed hyperparameters, degrees of freedom and scale parameters. In this study, we performed genomic prediction in Chinese Simmental beef cattle and treated degrees of freedom and scale parameters as unknown with inappropriate priors. Furthermore, we compared the modified methods (BayesFA, BayesFB and BayesFCπ) with their corresponding counterparts using simulation datasets. We found that the modified methods with distribution assumed to the two hyperparameters were beneficial for improving the predictive accuracy. Our results showed that the predictive accuracies of the modified methods were slightly higher than those of their counterparts especially for traits with low heritability and a small number of QTLs. Moreover, cross-validation analysis for three traits, namely carcass weight, live weight and tenderloin weight, in 1136 Simmental beef cattle suggested that predictive accuracy of BayesFCπ noticeably outperformed BayesCπ with the highest increase (3.8%) for live weight using the cohort masking cross-validation. PMID:27139889

  10. The Impact of Variable Degrees of Freedom and Scale Parameters in Bayesian Methods for Genomic Prediction in Chinese Simmental Beef Cattle.

    Directory of Open Access Journals (Sweden)

    Bo Zhu

    Full Text Available Three conventional Bayesian approaches (BayesA, BayesB and BayesCπ have been demonstrated to be powerful in predicting genomic merit for complex traits in livestock. A priori, these Bayesian models assume that the non-zero SNP effects (marginally follow a t-distribution depending on two fixed hyperparameters, degrees of freedom and scale parameters. In this study, we performed genomic prediction in Chinese Simmental beef cattle and treated degrees of freedom and scale parameters as unknown with inappropriate priors. Furthermore, we compared the modified methods (BayesFA, BayesFB and BayesFCπ with their corresponding counterparts using simulation datasets. We found that the modified methods with distribution assumed to the two hyperparameters were beneficial for improving the predictive accuracy. Our results showed that the predictive accuracies of the modified methods were slightly higher than those of their counterparts especially for traits with low heritability and a small number of QTLs. Moreover, cross-validation analysis for three traits, namely carcass weight, live weight and tenderloin weight, in 1136 Simmental beef cattle suggested that predictive accuracy of BayesFCπ noticeably outperformed BayesCπ with the highest increase (3.8% for live weight using the cohort masking cross-validation.

  11. Characterizing uncertainty and population variability in the toxicokinetics of trichloroethylene and metabolites in mice, rats, and humans using an updated database, physiologically based pharmacokinetic (PBPK) model, and Bayesian approach

    International Nuclear Information System (INIS)

    We have developed a comprehensive, Bayesian, PBPK model-based analysis of the population toxicokinetics of trichloroethylene (TCE) and its metabolites in mice, rats, and humans, considering a wider range of physiological, chemical, in vitro, and in vivo data than any previously published analysis of TCE. The toxicokinetics of the 'population average,' its population variability, and their uncertainties are characterized in an approach that strives to be maximally transparent and objective. Estimates of experimental variability and uncertainty were also included in this analysis. The experimental database was expanded to include virtually all available in vivo toxicokinetic data, which permitted, in rats and humans, the specification of separate datasets for model calibration and evaluation. The total combination of these approaches and PBPK analysis provides substantial support for the model predictions. In addition, we feel confident that the approach employed also yields an accurate characterization of the uncertainty in metabolic pathways for which available data were sparse or relatively indirect, such as GSH conjugation and respiratory tract metabolism. Key conclusions from the model predictions include the following: (1) as expected, TCE is substantially metabolized, primarily by oxidation at doses below saturation; (2) GSH conjugation and subsequent bioactivation in humans appear to be 10- to 100-fold greater than previously estimated; and (3) mice had the greatest rate of respiratory tract oxidative metabolism as compared to rats and humans. In a situation such as TCE in which there is large database of studies coupled with complex toxicokinetics, the Bayesian approach provides a systematic method of simultaneously estimating model parameters and characterizing their uncertainty and variability. However, care needs to be taken in its implementation to ensure biological consistency, transparency, and objectivity.

  12. Bootstrap rank-ordered conditional mutual information (broCMI): A nonlinear input variable selection method for water resources modeling

    Science.gov (United States)

    Quilty, John; Adamowski, Jan; Khalil, Bahaa; Rathinasamy, Maheswaran

    2016-03-01

    The input variable selection problem has recently garnered much interest in the time series modeling community, especially within water resources applications, demonstrating that information theoretic (nonlinear)-based input variable selection algorithms such as partial mutual information (PMI) selection (PMIS) provide an improved representation of the modeled process when compared to linear alternatives such as partial correlation input selection (PCIS). PMIS is a popular algorithm for water resources modeling problems considering nonlinear input variable selection; however, this method requires the specification of two nonlinear regression models, each with parametric settings that greatly influence the selected input variables. Other attempts to develop input variable selection methods using conditional mutual information (CMI) (an analog to PMI) have been formulated under different parametric pretenses such as k nearest-neighbor (KNN) statistics or kernel density estimates (KDE). In this paper, we introduce a new input variable selection method based on CMI that uses a nonparametric multivariate continuous probability estimator based on Edgeworth approximations (EA). We improve the EA method by considering the uncertainty in the input variable selection procedure by introducing a bootstrap resampling procedure that uses rank statistics to order the selected input sets; we name our proposed method bootstrap rank-ordered CMI (broCMI). We demonstrate the superior performance of broCMI when compared to CMI-based alternatives (EA, KDE, and KNN), PMIS, and PCIS input variable selection algorithms on a set of seven synthetic test problems and a real-world urban water demand (UWD) forecasting experiment in Ottawa, Canada.

  13. Compiling Relational Bayesian Networks for Exact Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred; Darwiche, Adnan; Chavira, Mark

    2006-01-01

    We describe in this paper a system for exact inference with relational Bayesian networks as defined in the publicly available PRIMULA tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference...... by evaluating and differentiating these circuits in time linear in their size. We report on experimental results showing successful compilation and efficient inference on relational Bayesian networks, whose PRIMULA--generated propositional instances have thousands of variables, and whose jointrees have clusters...

  14. Effect of emissions uncertainty and variability on high-resolution concentrations of carbon monoxide, fine particle black carbon, and nitrogen oxides in Fort Collins, Colorado: development of a Bayesian uncertainty modeling and evaluation framework

    Science.gov (United States)

    Mendoza, D. L.; Stuart, A. L.; Dagne, G.; Yu, H.

    2013-12-01

    Uncertainties in emissions estimates are known to be one of the primary sources of uncertainty in calculating concentrations and subsequent exposure estimates. Despite continued improvement in the accuracy of emissions downscaling, the quantification of uncertainties is necessary in order to generate a representative emissions product. Bayesian data assimilation is a promising approach to uncertainty estimation when used to calibrate model results with measurement data. This study discusses an emissions inventory and concentration estimates for carbon monoxide (CO), fine particle (PM2.5) black carbon, and nitrogen oxides (NOx) for the city of Fort Collins, Colorado. The development of a Bayesian framework for updating estimates of emissions and concentrations in multiple stages, using measurement data, is also presented. The emissions inventory was constructed using the 2008 National Emissions Inventory (NEI). The spatial and temporal allocation methods from the Emission Modeling Clearinghouse data set are used to downscale the NEI data from annual and county-level resolution for point, nonpoint, and nonroad sources. Onroad mobile source emissions were estimated by combining a bottom-up emissions calculation approach (using emission factors and activities) for large roadway links within Fort Collins with a top-down spatial allocation approach for other roadways. Vehicle activity data for road links were obtained from local 2009 travel demand model results and automatic traffic recorder (ATR) data. The CALPUFF Gaussian puff dispersion model was used to estimate air pollutant concentrations. Hourly, 1.33 km x 1.33 km MM5 meteorological data was used to capture temporal variability in transport. Distributions of concentrations are obtained for spatial locations and time spans using a Monte Carlo sampling approach. Data for ensemble members are sampled from distributions defined from the emissions inventory and meteorological data. Modeled concentrations of CO, PM2

  15. Bayesian data analysis

    CERN Document Server

    Gelman, Andrew; Stern, Hal S; Dunson, David B; Vehtari, Aki; Rubin, Donald B

    2013-01-01

    FUNDAMENTALS OF BAYESIAN INFERENCEProbability and InferenceSingle-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian ApproachesHierarchical ModelsFUNDAMENTALS OF BAYESIAN DATA ANALYSISModel Checking Evaluating, Comparing, and Expanding ModelsModeling Accounting for Data Collection Decision AnalysisADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional ApproximationsREGRESSION MODELS Introduction to Regression Models Hierarchical Linear

  16. Bayesian Mediation Analysis

    OpenAIRE

    Yuan, Ying; MacKinnon, David P.

    2009-01-01

    This article proposes Bayesian analysis of mediation effects. Compared to conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian mediation analysis, inference is straightforward and exact, which makes it appealing for studies with small samples. Third, the Bayesian approach is conceptua...

  17. X-ray variability in a complete sample of Soft X-ray selected AGN

    OpenAIRE

    D. Grupe; Thomas, H. -C.; Beuermann, K.

    2000-01-01

    We present ROSAT All-Sky Survey and ROSAT pointed observations (PSPC and HRI) of a complete sample of 113 bright soft X-ray AGN selected from the ROSAT Bright Source Catalog. We compare these observations in order to search for extreme cases of flux and spectral X-ray variability - X-ray transient AGN. Three definite transients and one transient candidate are found. The other sources show amplitude variations typically by factors of 2-3 on timescales of years. We found that the variability st...

  18. Bayesian Games with Intentions

    OpenAIRE

    Bjorndahl, Adam; Halpern, Joseph Y.; Pass, Rafael

    2016-01-01

    We show that standard Bayesian games cannot represent the full spectrum of belief-dependent preferences. However, by introducing a fundamental distinction between intended and actual strategies, we remove this limitation. We define Bayesian games with intentions, generalizing both Bayesian games and psychological games, and prove that Nash equilibria in psychological games correspond to a special class of equilibria as defined in our setting.

  19. Bayesian Analysis of High Dimensional Classification

    Science.gov (United States)

    Mukhopadhyay, Subhadeep; Liang, Faming

    2009-12-01

    Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. In these cases , there is a lot of interest in searching for sparse model in High Dimensional regression(/classification) setup. we first discuss two common challenges for analyzing high dimensional data. The first one is the curse of dimensionality. The complexity of many existing algorithms scale exponentially with the dimensionality of the space and by virtue of that algorithms soon become computationally intractable and therefore inapplicable in many real applications. secondly, multicollinearities among the predictors which severely slowdown the algorithm. In order to make Bayesian analysis operational in high dimension we propose a novel 'Hierarchical stochastic approximation monte carlo algorithm' (HSAMC), which overcomes the curse of dimensionality, multicollinearity of predictors in high dimension and also it possesses the self-adjusting mechanism to avoid the local minima separated by high energy barriers. Models and methods are illustrated by simulation inspired from from the feild of genomics. Numerical results indicate that HSAMC can work as a general model selection sampler in high dimensional complex model space.

  20. Calibration Variable Selection and Natural Zero Determination for Semispan and Canard Balances

    Science.gov (United States)

    Ulbrich, Norbert M.

    2013-01-01

    Independent calibration variables for the characterization of semispan and canard wind tunnel balances are discussed. It is shown that the variable selection for a semispan balance is determined by the location of the resultant normal and axial forces that act on the balance. These two forces are the first and second calibration variable. The pitching moment becomes the third calibration variable after the normal and axial forces are shifted to the pitch axis of the balance. Two geometric distances, i.e., the rolling and yawing moment arms, are the fourth and fifth calibration variable. They are traditionally substituted by corresponding moments to simplify the use of calibration data during a wind tunnel test. A canard balance is related to a semispan balance. It also only measures loads on one half of a lifting surface. However, the axial force and yawing moment are of no interest to users of a canard balance. Therefore, its calibration variable set is reduced to the normal force, pitching moment, and rolling moment. The combined load diagrams of the rolling and yawing moment for a semispan balance are discussed. They may be used to illustrate connections between the wind tunnel model geometry, the test section size, and the calibration load schedule. Then, methods are reviewed that may be used to obtain the natural zeros of a semispan or canard balance. In addition, characteristics of three semispan balance calibration rigs are discussed. Finally, basic requirements for a full characterization of a semispan balance are reviewed.

  1. Análisis bayesiano de variables relacionadas con el desarrollo del sindrome de Burnout en profesionales sanitarios (Bayesian analysis of variables related to the developmentof Burnout syndrome in health professionals

    Directory of Open Access Journals (Sweden)

    Guillermo A. Cañadas

    2010-12-01

    Full Text Available Burnout syndrome has a high incidence among professional healthcare and social workers. This leads to deterioration in the quality of their working life and affects their health, the organization where they work and, via their clients, society itself. Given these serious effects, many studies have investigated this construct and identified groups at increased risk of the syndrome. The present work has 2 main aims: to compare burnout levels in potential risk groups among professional healthcare workers; and to compare them using standard and Bayesian statistical analysis. The sample consisted of 108 psycho-social care workers based at 2 centers run by the Granada Council in Spain. All participants, anonymously and individually, filled in a booklet that included questions on personal information and the Spanish adaptation of the Maslach Burnout Inventory (MBI. Standard and Bayesian analysis of variance were used to identify the risk factors associated with different levels of burnout. It was found that the information provided by the Bayesian procedure complemented that provided by the standard procedure.

  2. Dynamic Bayesian diffusion estimation

    CERN Document Server

    Dedecius, K

    2012-01-01

    The rapidly increasing complexity of (mainly wireless) ad-hoc networks stresses the need of reliable distributed estimation of several variables of interest. The widely used centralized approach, in which the network nodes communicate their data with a single specialized point, suffers from high communication overheads and represents a potentially dangerous concept with a single point of failure needing special treatment. This paper's aim is to contribute to another quite recent method called diffusion estimation. By decentralizing the operating environment, the network nodes communicate just within a close neighbourhood. We adopt the Bayesian framework to modelling and estimation, which, unlike the traditional approaches, abstracts from a particular model case. This leads to a very scalable and universal method, applicable to a wide class of different models. A particularly interesting case - the Gaussian regressive model - is derived as an example.

  3. Bayesian Network--Response Regression

    OpenAIRE

    WANG, LU; Durante, Daniele; Dunson, David B.

    2016-01-01

    There is an increasing interest in learning how human brain networks vary with continuous traits (e.g., personality, cognitive abilities, neurological disorders), but flexible procedures to accomplish this goal are limited. We develop a Bayesian semiparametric model, which combines low-rank factorizations and Gaussian process priors to allow flexible shifts of the conditional expectation for a network-valued random variable across the feature space, while including subject-specific random eff...

  4. Bayesian segmentation of hyperspectral images

    CERN Document Server

    Mohammadpour, Adel; Mohammad-Djafari, Ali

    2007-01-01

    In this paper we consider the problem of joint segmentation of hyperspectral images in the Bayesian framework. The proposed approach is based on a Hidden Markov Modeling (HMM) of the images with common segmentation, or equivalently with common hidden classification label variables which is modeled by a Potts Markov Random Field. We introduce an appropriate Markov Chain Monte Carlo (MCMC) algorithm to implement the method and show some simulation results.

  5. Bayesian segmentation of hyperspectral images

    Science.gov (United States)

    Mohammadpour, Adel; Féron, Olivier; Mohammad-Djafari, Ali

    2004-11-01

    In this paper we consider the problem of joint segmentation of hyperspectral images in the Bayesian framework. The proposed approach is based on a Hidden Markov Modeling (HMM) of the images with common segmentation, or equivalently with common hidden classification label variables which is modeled by a Potts Markov Random Field. We introduce an appropriate Markov Chain Monte Carlo (MCMC) algorithm to implement the method and show some simulation results.

  6. Gametocytes infectiousness to mosquitoes: variable selection using random forests, and zero inflated models

    CERN Document Server

    Genuer, Robin; Toussile, Wilson

    2011-01-01

    Malaria control strategies aiming at reducing disease transmission intensity may impact both oocyst intensity and infection prevalence in the mosquito vector. Thus far, mathematical models failed to identify a clear relationship between Plasmodium falciparum gametocytes and their infectiousness to mosquitoes. Natural isolates of gametocytes are genetically diverse and biologically complex. Infectiousness to mosquitoes relies on multiple parameters such as density, sex-ratio, maturity, parasite genotypes and host immune factors. In this article, we investigated how density and genetic diversity of gametocytes impact on the success of transmission in the mosquito vector. We analyzed data for which the number of covariates plus attendant interactions is at least of order of the sample size, precluding usage of classical models such as general linear models. We then considered the variable importance from random forests to address the problem of selecting the most influent variables. The selected covariates were ...

  7. On Data and Parameter Estimation Using the Variational Bayesian EM-algorithm for Block-fading Frequency-selective MIMO Channels

    DEFF Research Database (Denmark)

    Christensen, Lars P.B.; Larsen, Jan

    2006-01-01

    A general Variational Bayesian framework for iterative data and parameter estimation for coherent detection is introduced as a generalization of the EM-algorithm. Explicit solutions are given for MIMO channel estimation with Gaussian prior and noise covariance estimation with inverse-Wishart prio...

  8. The Effects of Basic Gymnastics Training Integrated with Physical Education Courses on Selected Motor Performance Variables

    Science.gov (United States)

    Alpkaya, Ufuk

    2013-01-01

    The purpose of this study is to determine the influence of gymnastics training integrated with physical education courses on selected motor performance variables in seven year old girls. Subjects were divided into two groups: (1) control group (N=15, X=7.56 plus or minus 0.46 year old); (2) gymnastics group (N=16, X=7.60 plus or minus 0.50 year…

  9. Variable selection and regression analysis for graph-structured covariates with an application to genomics

    OpenAIRE

    Li, Caiyan; Li, Hongzhe

    2010-01-01

    Graphs and networks are common ways of depicting biological information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein--protein interaction networks. This kind of a priori use of graphs is a useful supplement to the standard numerical data such as microarray gene expression data. In this paper we consider the problem of regression analysis and variable selection when the covariates are linked on a graph. ...

  10. Hybrid Model Based on Genetic Algorithms and SVM Applied to Variable Selection within Fruit Juice Classification

    Science.gov (United States)

    Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.

    2013-01-01

    Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933

  11. Hybrid Model Based on Genetic Algorithms and SVM Applied to Variable Selection within Fruit Juice Classification

    OpenAIRE

    C. Fernandez-Lozano; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.

    2013-01-01

    Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected.

  12. Floral variability in selected species of the genus Coelogyne Lindl., Orchidaceae

    Directory of Open Access Journals (Sweden)

    Romuald Kosina

    2015-05-01

    Full Text Available Correlations of the lip characters in the Coelogyne flower proved a synchronised development of this organ. The lip is a very interspecifically variable organ. A numerical taxonomy approach permitted to select in an ordination space some extreme species, based on a description of lip morphology, Coelogyne salmonicolor versus C. fuliginosa and C. quinquelamellata versus C. nitida. A hybrid C. lawrenceana × mooreana appeared to be close to its paternal species.

  13. Floral variability in selected species of the genus Coelogyne Lindl., Orchidaceae

    OpenAIRE

    Romuald Kosina; Marta Szkudlarek

    2015-01-01

    Correlations of the lip characters in the Coelogyne flower proved a synchronised development of this organ. The lip is a very interspecifically variable organ. A numerical taxonomy approach permitted to select in an ordination space some extreme species, based on a description of lip morphology, Coelogyne salmonicolor versus C. fuliginosa and C. quinquelamellata versus C. nitida. A hybrid C. lawrenceana × mooreana appeared to be close to its paternal species.

  14. Variable Selection for Generalized Linear Mixed Models by L1-Penalized Estimation

    OpenAIRE

    Groll, Andreas

    2011-01-01

    Generalized linear mixed models are a widely used tool for modeling longitudinal data. However, their use is typically restricted to few covariates, because the presence of many predictors yields unstable estimates. The presented approach to the fitting of generalized linear mixed models includes an L1-penalty term that enforces variable selection and shrinkage simultaneously. A gradient ascent algorithm is proposed that allows to maximize the penalized loglikelihood yielding models with r...

  15. Hybrid Model Based on Genetic Algorithms and SVM Applied to Variable Selection within Fruit Juice Classification

    Directory of Open Access Journals (Sweden)

    C. Fernandez-Lozano

    2013-01-01

    Full Text Available Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM. Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA, the most representative variables for a specific classification problem can be selected.

  16. Improving breast cancer classification with mammography, supported on an appropriate variable selection analysis

    Science.gov (United States)

    Pérez, Noel; Guevara, Miguel A.; Silva, Augusto

    2013-02-01

    This work addresses the issue of variable selection within the context of breast cancer classification with mammography. A comprehensive repository of feature vectors was used including a hybrid subset gathering image-based and clinical features. It aimed to gather experimental evidence of variable selection in terms of cardinality, type and find a classification scheme that provides the best performance over the Area Under Receiver Operating Characteristics Curve (AUC) scores using the ranked features subset. We evaluated and classified a total of 300 subsets of features formed by the application of Chi-Square Discretization, Information-Gain, One-Rule and RELIEF methods in association with Feed-Forward Backpropagation Neural Network (FFBP), Support Vector Machine (SVM) and Decision Tree J48 (DTJ48) Machine Learning Algorithms (MLA) for a comparative performance evaluation based on AUC scores. A variable selection analysis was performed for Single-View Ranking and Multi-View Ranking groups of features. Features subsets representing Microcalcifications (MCs), Masses and both MCs and Masses lesions achieved AUC scores of 0.91, 0.954 and 0.934 respectively. Experimental evidence demonstrated that classification performance was improved by combining image-based and clinical features. The most important clinical and image-based features were StromaDistortion and Circularity respectively. Other less important but worth to use due to its consistency were Contrast, Perimeter, Microcalcification, Correlation and Elongation.

  17. Most frugal explanations in Bayesian networks

    NARCIS (Netherlands)

    Kwisthout, J.H.P.

    2015-01-01

    Inferring the most probable explanation to a set of variables, given a partial observation of the remaining variables, is one of the canonical computational problems in Bayesian networks, with widespread applications in AI and beyond. This problem, known as MAP, is computationally intractable (NP-ha

  18. The Use of Variable Q1 Isolation Windows Improves Selectivity in LC-SWATH-MS Acquisition.

    Science.gov (United States)

    Zhang, Ying; Bilbao, Aivett; Bruderer, Tobias; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard; Varesio, Emmanuel

    2015-10-01

    As tryptic peptides and metabolites are not equally distributed along the mass range, the probability of cross fragment ion interference is higher in certain windows when fixed Q1 SWATH windows are applied. We evaluated the benefits of utilizing variable Q1 SWATH windows with regards to selectivity improvement. Variable windows based on equalizing the distribution of either the precursor ion population (PIP) or the total ion current (TIC) within each window were generated by an in-house software, swathTUNER. These two variable Q1 SWATH window strategies outperformed, with respect to quantification and identification, the basic approach using a fixed window width (FIX) for proteomic profiling of human monocyte-derived dendritic cells (MDDCs). Thus, 13.8 and 8.4% additional peptide precursors, which resulted in 13.1 and 10.0% more proteins, were confidently identified by SWATH using the strategy PIP and TIC, respectively, in the MDDC proteomic sample. On the basis of the spectral library purity score, some improvement warranted by variable Q1 windows was also observed, albeit to a lesser extent, in the metabolomic profiling of human urine. We show that the novel concept of "scheduled SWATH" proposed here, which incorporates (i) variable isolation windows and (ii) precursor retention time segmentation further improves both peptide and metabolite identifications. PMID:26302369

  19. Bayesian Modeling of a Human MMORPG Player

    CERN Document Server

    Synnaeve, Gabriel

    2010-01-01

    This paper describes an application of Bayesian programming to the control of an autonomous avatar in a multiplayer role-playing game (the example is based on World of Warcraft). We model a particular task, which consists of choosing what to do and to select which target in a situation where allies and foes are present. We explain the model in Bayesian programming and show how we could learn the conditional probabilities from data gathered during human-played sessions.

  20. Bayesian Modeling of a Human MMORPG Player

    Science.gov (United States)

    Synnaeve, Gabriel; Bessière, Pierre

    2011-03-01

    This paper describes an application of Bayesian programming to the control of an autonomous avatar in a multiplayer role-playing game (the example is based on World of Warcraft). We model a particular task, which consists of choosing what to do and to select which target in a situation where allies and foes are present. We explain the model in Bayesian programming and show how we could learn the conditional probabilities from data gathered during human-played sessions.

  1. ACTIVE LEARNING TO OVERCOME SAMPLE SELECTION BIAS: APPLICATION TO PHOTOMETRIC VARIABLE STAR CLASSIFICATION

    International Nuclear Information System (INIS)

    Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because (1) standard assumptions for machine-learned model selection procedures break down and (2) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting, co-training, and active learning (AL). We argue that AL—where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up—is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and Optical Gravitational Lensing Experiment, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a Web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply AL to classify variable stars in the All Sky Automated Survey, finding dramatic improvement in our agreement with the ASAS Catalog of Variable Stars, from 65.5% to 79.5%, and a significant increase in the classifier's average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.

  2. Social variables exert selective pressures in the evolution and form of primate mimetic musculature.

    Science.gov (United States)

    Burrows, Anne M; Li, Ly; Waller, Bridget M; Micheletta, Jerome

    2016-04-01

    Mammals use their faces in social interactions more so than any other vertebrates. Primates are an extreme among most mammals in their complex, direct, lifelong social interactions and their frequent use of facial displays is a means of proximate visual communication with conspecifics. The available repertoire of facial displays is primarily controlled by mimetic musculature, the muscles that move the face. The form of these muscles is, in turn, limited by and influenced by phylogenetic inertia but here we use examples, both morphological and physiological, to illustrate the influence that social variables may exert on the evolution and form of mimetic musculature among primates. Ecomorphology is concerned with the adaptive responses of morphology to various ecological variables such as diet, foliage density, predation pressures, and time of day activity. We present evidence that social variables also exert selective pressures on morphology, specifically using mimetic muscles among primates as an example. Social variables include group size, dominance 'style', and mating systems. We present two case studies to illustrate the potential influence of social behavior on adaptive morphology of mimetic musculature in primates: (1) gross morphology of the mimetic muscles around the external ear in closely related species of macaque (Macaca mulatta and Macaca nigra) characterized by varying dominance styles and (2) comparative physiology of the orbicularis oris muscle among select ape species. This muscle is used in both facial displays/expressions and in vocalizations/human speech. We present qualitative observations of myosin fiber-type distribution in this muscle of siamang (Symphalangus syndactylus), chimpanzee (Pan troglodytes), and human to demonstrate the potential influence of visual and auditory communication on muscle physiology. In sum, ecomorphologists should be aware of social selective pressures as well as ecological ones, and that observed morphology might

  3. Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification

    Science.gov (United States)

    Richards, Joseph W.; Starr, Dan L.; Brink, Henrik; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; James, J. Berian; Long, James P.; Rice, John

    2012-01-01

    Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because (1) standard assumptions for machine-learned model selection procedures break down and (2) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting, co-training, and active learning (AL). We argue that AL—where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up—is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and Optical Gravitational Lensing Experiment, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a Web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply AL to classify variable stars in the All Sky Automated Survey, finding dramatic improvement in our agreement with the ASAS Catalog of Variable Stars, from 65.5% to 79.5%, and a significant increase in the classifier's average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.

  4. Bayesian networks as a tool for epidemiological systems analysis

    OpenAIRE

    Lewis, F.I.

    2012-01-01

    Bayesian network analysis is a form of probabilistic modeling which derives from empirical data a directed acyclic graph (DAG) describing the dependency structure between random variables. Bayesian networks are increasingly finding application in areas such as computational and systems biology, and more recently in epidemiological analyses. The key distinction between standard empirical modeling approaches, such as generalised linear modeling, and Bayesian network analyses is that the latter ...

  5. Bayesian estimation of turbulent motion

    OpenAIRE

    Héas, P.; Herzet, C.; Mémin, E.; Heitz, D.; P. D. Mininni

    2013-01-01

    International audience Based on physical laws describing the multi-scale structure of turbulent flows, this article proposes a regularizer for fluid motion estimation from an image sequence. Regularization is achieved by imposing some scale invariance property between histograms of motion increments computed at different scales. By reformulating this problem from a Bayesian perspective, an algorithm is proposed to jointly estimate motion, regularization hyper-parameters, and to select the ...

  6. Space Shuttle RTOS Bayesian Network

    Science.gov (United States)

    Morris, A. Terry; Beling, Peter A.

    2001-01-01

    With shrinking budgets and the requirements to increase reliability and operational life of the existing orbiter fleet, NASA has proposed various upgrades for the Space Shuttle that are consistent with national space policy. The cockpit avionics upgrade (CAU), a high priority item, has been selected as the next major upgrade. The primary functions of cockpit avionics include flight control, guidance and navigation, communication, and orbiter landing support. Secondary functions include the provision of operational services for non-avionics systems such as data handling for the payloads and caution and warning alerts to the crew. Recently, a process to selection the optimal commercial-off-the-shelf (COTS) real-time operating system (RTOS) for the CAU was conducted by United Space Alliance (USA) Corporation, which is a joint venture between Boeing and Lockheed Martin, the prime contractor for space shuttle operations. In order to independently assess the RTOS selection, NASA has used the Bayesian network-based scoring methodology described in this paper. Our two-stage methodology addresses the issue of RTOS acceptability by incorporating functional, performance and non-functional software measures related to reliability, interoperability, certifiability, efficiency, correctness, business, legal, product history, cost and life cycle. The first stage of the methodology involves obtaining scores for the various measures using a Bayesian network. The Bayesian network incorporates the causal relationships between the various and often competing measures of interest while also assisting the inherently complex decision analysis process with its ability to reason under uncertainty. The structure and selection of prior probabilities for the network is extracted from experts in the field of real-time operating systems. Scores for the various measures are computed using Bayesian probability. In the second stage, multi-criteria trade-off analyses are performed between the scores

  7. Effect of recurrent selection on the variability of the UENF-14 popcorn population

    Directory of Open Access Journals (Sweden)

    Rodrigo Moreira Ribeiro

    2016-07-01

    Full Text Available This study aimed to evaluate the effect of recurrent selection on the genetic variability of UENF-14 population after six selections. Two hundred and ten half-sib families were evaluated in two environments in the state of Rio de Janeiro, using incomplete randomized blocks design with treatments arranged in replication within “Sets”. There was significant effect for Families within the “Set” (F/S, proving that there is enough genetic variability to be exploited in the popcorn breeding program of UENF. The significance for the source of variation Environment (E shows that the environments were distinct enough to promote differences between the evaluated characteristics. It was found that for both characteristics of greatest interest, GY and PE, the magnitude of the additive variance remains with close values in advanced cycles of UENF-14 population, indicating that variability remains, with no evidence of decreases in advanced cycles. This is concluded by the longevity of UENF breeding program.

  8. Genetic variability of rice recurrent selection populations as affected by male sterility or manual recombination

    Directory of Open Access Journals (Sweden)

    Letícia da Silveira Pinheiro

    2012-06-01

    Full Text Available The objective of this work was to determine the effect of male sterility or manual recombination on genetic variability of rice recurrent selection populations. The populations CNA-IRAT 4, with a gene for male sterility, and CNA 12, which was manually recombined, were evaluated. Genetic variability among selection cycles was estimated using14 simple sequence repeat (SSR markers. A total of 926 plants were analyzed, including ten genitors and 180 individuals from each of the evaluated cycles (1, 2 and 5 of the population CNA-IRAT 4, and 16 genitors and 180 individuals from each of the cycles (1 and 2 of CNA 12. The analysis allowed the identification of alleles not present among the genitors for both populations, in all cycles, especially for the CNA-IRAT 4 population. These alleles resulted from unwanted fertilization with genotypes that were not originally part of the populations. The parameters of Wright's F-statistic (F IS and F IT indicated that the manual recombination expands the genetic variability of the CNA 12 population, whereas male sterility reduces the one of CNA-IRAT 4.

  9. Variable selection for distribution-free models for longitudinal zero-inflated count responses.

    Science.gov (United States)

    Chen, Tian; Wu, Pan; Tang, Wan; Zhang, Hui; Feng, Changyong; Kowalski, Jeanne; Tu, Xin M

    2016-07-20

    Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26844819

  10. TAOS Project: Searching for Variable Stars in the Selected TAOS Fields and Optical Followup Observations

    Science.gov (United States)

    Ngeow, Chow Choong; Chang, D.; Pan, K.; Chung, T.; Koptelova, E.; TAOS Collaboration

    2010-05-01

    The Taiwan-American Occultation Survey (TAOS) project is aimed to find Kuiper Belt Objects (KBO) and measure their size distribution using the occultation technique. The TAOS project employed four 20-inch wide-field (F/1.9, 3 degree-squared FOV) telescopes, equipped with a 2K x 2K CCD, to simultaneously monitor the same patch of the sky. All four TAOS telescopes, which can be operated automatically, were located at the Lulin Observatory in central Taiwan. The TAOS project has been continuously taking data since 2005. In addition of finding KBO, the dense sampling strategy employed in TAOS can also be used to find variable stars. We report the search of variable stars from selected TAOS fields at this Meeting. For example, we found about 50 candidate variables (out of 2600 stars) in TAOS 60 Field (RA: 04h48m00s, DEC: +20d46m20s, with limiting magnitudes about15 mag. at S/N=10), including three previously known variables, using sigma deviation and Stetson's J-index methods. The available data in this field spanned about 150 days in time. However, TAOS observations were conducted using a customized filter. We therefore initiated a followup program to observe and construct the light curves of these candidate variables in the BVRI bands, using the Lulin's One-Meter telescope, Lulin's SLT telescope (16-inch aperture) and 32-inch telescope from the Tenagra II Observatory. The multi-band optical followup observation will help in improving the classification of these candidates, estimate their BVRI mean magnitudes, colors as well as extinction. This will enable a wide range of research in astrophysics for these variables. We also present our preliminary results based on the first season of the followup observations. CCN acknowledges the support from NSC 98-2112-M-008-013-MY3.

  11. Selection of Variable in Sampling Investigation%抽样调查中变量选择

    Institute of Scientific and Technical Information of China (English)

    陶凤梅; 杨启昌; 胡锡衡

    2002-01-01

    在抽样调查中,问卷的设计者常常尽可能多地设计变量,以保证不丢失有用的信息.但是,问卷中含有太多变量会减少问卷的回收率,进而导致分析结果.本文通过对应分析的方法介绍了幼儿主体性发展的变量选择,并分析了其合理性.%In sampling investigation,the designer of questionnaire usually attempt to collect the questions as many as possible,so as to avoid losing some useful information.Whereas,the questionnaire including too many questions might reduce the ratio of receiving answer and make trouble in analysing the investigation results.In this paper,we select the variables of questionnaire for infant activity development by the method of variable selection in correspondence analysis and analyze the rationality of the selection.

  12. Correlation structure and variable selection in generalized estimating equations via composite likelihood information criteria.

    Science.gov (United States)

    Nikoloulopoulos, Aristidis K

    2016-06-30

    The method of generalized estimating equations (GEE) is popular in the biostatistics literature for analyzing longitudinal binary and count data. It assumes a generalized linear model for the outcome variable, and a working correlation among repeated measurements. In this paper, we introduce a viable competitor: the weighted scores method for generalized linear model margins. We weight the univariate score equations using a working discretized multivariate normal model that is a proper multivariate model. Because the weighted scores method is a parametric method based on likelihood, we propose composite likelihood information criteria as an intermediate step for model selection. The same criteria can be used for both correlation structure and variable selection. Simulations studies and the application example show that our method outperforms other existing model selection methods in GEE. From the example, it can be seen that our methods not only improve on GEE in terms of interpretability and efficiency but also can change the inferential conclusions with respect to GEE. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Correlation structure and variable selection in generalized estimating equations via composite likelihood information criteria.

    Science.gov (United States)

    Nikoloulopoulos, Aristidis K

    2016-06-30

    The method of generalized estimating equations (GEE) is popular in the biostatistics literature for analyzing longitudinal binary and count data. It assumes a generalized linear model for the outcome variable, and a working correlation among repeated measurements. In this paper, we introduce a viable competitor: the weighted scores method for generalized linear model margins. We weight the univariate score equations using a working discretized multivariate normal model that is a proper multivariate model. Because the weighted scores method is a parametric method based on likelihood, we propose composite likelihood information criteria as an intermediate step for model selection. The same criteria can be used for both correlation structure and variable selection. Simulations studies and the application example show that our method outperforms other existing model selection methods in GEE. From the example, it can be seen that our methods not only improve on GEE in terms of interpretability and efficiency but also can change the inferential conclusions with respect to GEE. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26822854

  14. Effect of Selected Organic Acids on Cadmium Sorption by Variable-and Permanent-Charge Soils

    Institute of Scientific and Technical Information of China (English)

    HU Hong-Qing; LIU Hua-Liang; HE Ji-Zheng; HUANG Qiao-Yun

    2007-01-01

    Batch equilibrium experiments were conducted to investigate cadmium (Cd) sorption by two permanent-charge soils, a yellow-cinnamon soil and a yellow-brown soil, and two variable-charge soils, a red soil and a latosol, with addition of selected organic acids (acetate, tartrate, and citrate). Results showed that with an increase in acetate concentrations from 0 to 3.0 mmol L-1, Cd sorption percentage by the yellow-cinnamon soil, the yellow-brown soil, and the latosol decreased. The sorption percentage of Cd by the yellow-cinnamon soil and generally the yellow-brown soil (permanent-charge soils)decreased with an increase in tartrate concentration, but increased at low tartrate concentrations for the red soil and the latosol. Curves of percentage of Cd sorption for citrate were similar to those for tartrate. For the variable-charge soils with tartrate and citrate, there were obvious peaks in Cd sorption percentage. These peaks, where organic acids had maximum influence, changed with soil type, and were at a higher organic acid concentration for the variable-charge soils than for the permanent charge soils. Addition of cadmium after tartrate adsorption resulted in higher sorption increase for the variable-charge soils than permanent-charge soils. When tartrate and Cd solution were added together, sorption of Cd decreased with tartrate concentration for the yellow-brown soil, but increased at low tartrate concentrations and then decreased with tartrate concentration for the red soil and the latosol.

  15. Variable selection based on entropic criterion and its application to the debris-flow triggering

    CERN Document Server

    Chen, C; Tseng, C Y; Chen, Chien-chih; Dong, Jia-Jyun; Tseng, Chih-Yuan

    2006-01-01

    We propose a new data analyzing scheme, the method of minimum entropy analysis (MEA), in this paper. New MEA provides a quantitative criterion to select relevant variables for modeling the physical system interested. Such method can be easily extended to various geophysical/geological data analysis, where many relevant or irrelevant available measurements may obscure the understanding of the highly complicated physical system like the triggering of debris-flows. After demonstrating and testing the MEA method, we apply this method to a dataset of debris-flow occurrences in Taiwan and successfully find out three relevant variables, i.e. the hydrological form factor, numbers and areas of landslides, to the triggering of observed debris-flow events due to the 1996 Typhoon Herb.

  16. A Bayesian approach to linear regression in astronomy

    CERN Document Server

    Sereno, Mauro

    2015-01-01

    Linear regression is common in astronomical analyses. I discuss a Bayesian hierarchical modeling of data with heteroscedastic and possibly correlated measurement errors and intrinsic scatter. The method fully accounts for time evolution. The slope, the normalization, and the intrinsic scatter of the relation can evolve with the redshift. The intrinsic distribution of the independent variable is approximated using a mixture of Gaussian distributions whose means and standard deviations depend on time. The method can address scatter in the measured independent variable (a kind of Eddington bias), selection effects in the response variable (Malmquist bias), and departure from linearity in form of a knee. I tested the method with toy models and simulations and quantified the effect of biases and inefficient modeling. The R-package LIRA (LInear Regression in Astronomy) is made available to perform the regression.

  17. Implementation of Phonetic Context Variable Length Unit Selection Module for Malay Text to Speech

    Directory of Open Access Journals (Sweden)

    Tian-Swee Tan

    2008-01-01

    Full Text Available Problem statement: The main problem with current Malay Text-To-Speech (MTTS synthesis system is the poor quality of the generated speech sound due to the inability of traditional TTS system to provide multiple choices of unit for generating more accurate synthesized speech. Approach: This study proposes a phonetic context variable length unit selection MTTS system that is capable of providing more natural and accurate unit selection for synthesized speech. It implemented a phonetic context algorithm for unit selection for MTTS. The unit selection method (without phonetic context may encounter the problem of selecting the speech unit from different sources and affect the quality of concatenation. This study proposes the design of speech corpus and unit selection method according to phonetic context so that it can select a string of continuous phoneme from same source instead of individual phoneme from different sources. This can further reduce the concatenation point and increase the quality of concatenation. The speech corpus was transcribed according to phonetic context to preserve the phonetic information. This method utilizes word base concatenation method. Firstly it will search through the speech corpus for the target word, if the target is found; it will be used for concatenation. If the word does not exist, then it will construct the words from phoneme sequence. Results: This system had been tested with 40 participants in Mean Opinion Score (MOS listening test with the average rates for naturalness, pronunciation and intelligibility are 3.9, 4.1 and 3.9. Conclusion/Recommendation: Through this study, a very first version of Corpus-based MTTS has been designed; it has improved the naturalness, pronunciation and intelligibility of synthetic speech. But it still has some lacking that need to be perfected such as the prosody module to support the phrasing analysis and intonation of input text to match with the waveform modifier.

  18. Comparison of linear mixed model analysis and genealogy-based haplotype clustering with a Bayesian approach for association mapping in a pedigreed population

    DEFF Research Database (Denmark)

    Dashab, Golam Reza; Kadri, Naveen Kumar; Mahdi Shariati, Mohammad;

    2012-01-01

    ) Mixed model analysis (MMA), 2) Random haplotype model (RHM), 3) Genealogy-based mixed model (GENMIX), and 4) Bayesian variable selection (BVS). The data consisted of phenotypes of 2000 animals from 20 sire families and were genotyped with 9990 SNPs on five chromosomes. Results: Out of the eight...

  19. An Approach with Support Vector Machine using Variable Features Selection on Breast Cancer Prognosis

    Directory of Open Access Journals (Sweden)

    Sandeep Chaurasia

    2013-09-01

    Full Text Available Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of machine learning. In this paper we have used an approach by using support vector machine classifier to construct a model that is useful for the breast cancer survivability prediction. We have used both 5 cross and 10 cross validation of variable selection on input feature vectors and the performance measurement through bio-learning class performance while measuring AUC, specificity and sensitivity. The performance of the SVM is much better than the other machine learning classifier.

  20. Soft Sensing Modelling Based on Optimal Selection of Secondary Variables and Its Application

    Institute of Scientific and Technical Information of China (English)

    Qi Li; Cheng Shao

    2009-01-01

    The composition of the distillation column is a very important quality value in refineries, unfortunately, few hardware sensors are available on-line to measure the distillation compositions. In this paper, a novel method using sensitivity matrix analysis and kernel ridge regression (KRR) to implement on-line soft sensing of distillation compositions is proposed. In this approach, the sensitivity matrix analysis is presented to select the most suitable secondary variables to be used as the soft sensor's input. The KRR is used to build the composition soft sensor. Application to a simulated distillation column demonstrates the effectiveness of the method.

  1. The Galex Time Domain Survey. I. Selection And Classification of Over a Thousand Ultraviolet Variable Sources

    Science.gov (United States)

    Gezari, S.; Martin, D. C.; Forster, K.; Neill, J. D.; Huber, M.; Heckman, T.; Bianchi, L.; Morrissey, P.; Neff, S. G.; Seibert, M.; Schiminovich, D.; Wyder, T. K.; Burgett, W. S.; Chambers, K. C.; Kaiser, N.; Magnier, E. A.; Price, P. A.; Tonry, J. L.

    2013-01-01

    We present the selection and classification of over a thousand ultraviolet (UV) variable sources discovered in approximately 40 deg(exp 2) of GALEX Time Domain Survey (TDS) NUV images observed with a cadence of 2 days and a baseline of observations of approximately 3 years. The GALEX TDS fields were designed to be in spatial and temporal coordination with the Pan-STARRS1 Medium Deep Survey, which provides deep optical imaging and simultaneous optical transient detections via image differencing. We characterize the GALEX photometric errors empirically as a function of mean magnitude, and select sources that vary at the 5 sigma level in at least one epoch. We measure the statistical properties of the UV variability, including the structure function on timescales of days and years. We report classifications for the GALEX TDS sample using a combination of optical host colors and morphology, UV light curve characteristics, and matches to archival X-ray, and spectroscopy catalogs. We classify 62% of the sources as active galaxies (358 quasars and 305 active galactic nuclei), and 10% as variable stars (including 37 RR Lyrae, 53 M dwarf flare stars, and 2 cataclysmic variables). We detect a large-amplitude tail in the UV variability distribution for M-dwarf flare stars and RR Lyrae, reaching up to absolute value(?m) = 4.6 mag and 2.9 mag, respectively. The mean amplitude of the structure function for quasars on year timescales is five times larger than observed at optical wavelengths. The remaining unclassified sources include UV-bright extragalactic transients, two of which have been spectroscopically confirmed to be a young core-collapse supernova and a flare from the tidal disruption of a star by dormant supermassive black hole. We calculate a surface density for variable sources in the UV with NUV less than 23 mag and absolute value(?m) greater than 0.2 mag of approximately 8.0, 7.7, and 1.8 deg(exp -2) for quasars, active galactic nuclei, and RR Lyrae stars

  2. Understanding Computational Bayesian Statistics

    CERN Document Server

    Bolstad, William M

    2011-01-01

    A hands-on introduction to computational statistics from a Bayesian point of view Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistic

  3. Bayesian statistics an introduction

    CERN Document Server

    Lee, Peter M

    2012-01-01

    Bayesian Statistics is the school of thought that combines prior beliefs with the likelihood of a hypothesis to arrive at posterior beliefs. The first edition of Peter Lee’s book appeared in 1989, but the subject has moved ever onwards, with increasing emphasis on Monte Carlo based techniques. This new fourth edition looks at recent techniques such as variational methods, Bayesian importance sampling, approximate Bayesian computation and Reversible Jump Markov Chain Monte Carlo (RJMCMC), providing a concise account of the way in which the Bayesian approach to statistics develops as wel

  4. The Extended Baryon Oscillation Spectroscopic Survey: Variability Selection and Quasar Luminosity Function

    CERN Document Server

    Palanque-Delabrouille, N; Yèche, Ch; Pâris, I; Petitjean, P; Burtin, E; Dawson, K; McGreer, I; Myers, A D; Rossi, G; Schlegel, D; Schneider, D; Streblyanska, A; Tinker, J

    2015-01-01

    The SDSS-IV/eBOSS has an extensive quasar program that combines several selection methods. Among these, the photometric variability technique provides highly uniform samples, unaffected by the redshift bias of traditional optical-color selections, when $z= 2.7 - 3.5$ quasars cross the stellar locus or when host galaxy light affects quasar colors at $z 2.2$. Both models are constrained to be continuous at $z=2.2$. They present a flattening of the bright-end slope at large redshift. The LEDE model indicates a reduction of the break density with increasing redshift, but the evolution of the break magnitude depends on the parameterization. The models are in excellent accord, predicting quasar counts that agree within 0.3\\% (resp., 1.1\\%) to $g<22.5$ (resp., $g<23$). The models are also in good agreement over the entire redshift range with models from previous studies.

  5. A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION

    Science.gov (United States)

    We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...

  6. Bayesian Data-Model Fit Assessment for Structural Equation Modeling

    Science.gov (United States)

    Levy, Roy

    2011-01-01

    Bayesian approaches to modeling are receiving an increasing amount of attention in the areas of model construction and estimation in factor analysis, structural equation modeling (SEM), and related latent variable models. However, model diagnostics and model criticism remain relatively understudied aspects of Bayesian SEM. This article describes…

  7. The bugs book a practical introduction to Bayesian analysis

    CERN Document Server

    Lunn, David; Best, Nicky; Thomas, Andrew; Spiegelhalter, David

    2012-01-01

    Introduction: Probability and ParametersProbabilityProbability distributionsCalculating properties of probability distributionsMonte Carlo integrationMonte Carlo Simulations Using BUGSIntroduction to BUGSDoodleBUGSUsing BUGS to simulate from distributionsTransformations of random variablesComplex calculations using Monte CarloMultivariate Monte Carlo analysisPredictions with unknown parametersIntroduction to Bayesian InferenceBayesian learningPosterior predictive distributionsConjugate Bayesian inferenceInference about a discrete parameterCombinations of conjugate analysesBayesian and classica

  8. Impact of perennial energy crops income variability on the crop selection of risk averse farmers

    International Nuclear Information System (INIS)

    The UK Government policy is for the area of perennial energy crops in the UK to expand significantly. Farmers need to choose these crops in preference to conventional rotations for this to be achievable. This paper looks at the potential level and variability of perennial energy crop incomes and the relation to incomes from conventional arable crops. Assuming energy crop prices are correlated to oil prices the results suggests that incomes from them are not well correlated to conventional arable crop incomes. A farm scale mathematical programming model is then used to attempt to understand the affect on risk averse farmers crop selection. The inclusion of risk reduces the energy crop price required for the selection of these crops. However yields towards the highest of those predicted in the UK are still required to make them an optimal choice, suggesting only a small area of energy crops within the UK would be expected to be chosen to be grown. This must be regarded as a tentative conclusion, primarily due to high sensitivity found to crop yields, resulting in the proposal for further work to apply the model using spatially disaggregated data. - Highlights: ► Energy crop and conventional crop incomes suggested as uncorrelated. ► Diversification effect of energy crops investigated for a risk averse farmer. ► Energy crops indicated as optimal selection only on highest yielding UK sites. ► Large establishment grant rates to substantially alter crop selections.

  9. Spatiotemporal Variability of Remotely Sensed PM2.5 Concentrations in China from 1998 to 2014 Based on a Bayesian Hierarchy Model

    Science.gov (United States)

    Li, Junming; Jin, Meijun; Xu, Zheng

    2016-01-01

    With the rapid industrial development and urbanization in China over the past three decades, PM2.5 pollution has become a severe environmental problem that threatens public health. Due to its unbalanced development and intrinsic topography features, the distribution of PM2.5 concentrations over China is spatially heterogeneous. In this study, we explore the spatiotemporal variations of PM2.5 pollution in China and four great urban areas from 1998 to 2014. A space-time Bayesian hierarchy model is employed to analyse PM2.5 pollution. The results show that a stable “3-Clusters” spatial PM2.5 pollution pattern has formed. The mean and 90% quantile of the PM2.5 concentrations in China have increased significantly, with annual increases of 0.279 μg/m3 (95% CI: 0.083−0.475) and 0.735 μg/m3 (95% CI: 0.261−1.210), respectively. The area with a PM2.5 pollution level of more than 70 μg/m3 has increased significantly, with an annual increase of 0.26 percentage points. Two regions in particular, the North China Plain and Sichuan Basin, are experiencing the largest amounts of PM2.5 pollution. The polluted areas, with a high local magnitude of more than 1.0 relative to the overall PM2.5 concentration, affect an area with a human population of 949 million, which corresponded to 69.3% of the total population in 2010. North and south differentiation occurs in the urban areas of the Jingjinji and Yangtze Delta, and circular and radial gradient differentiation occur in the urban areas of the Cheng-Yu and Pearl Deltas. The spatial heterogeneity of the urban Jingjinji group is the strongest. Eighteen cities located in the Yangtze Delta urban group, including Shanghai and Nanjing, have experienced high PM2.5 concentrations and faster local trends of increasing PM2.5. The percentage of exposure to PM2.5 concentrations greater than 70 μg/m3 and 100 μg/m3 is increasing significantly. PMID:27490557

  10. Prediction of Placental Barrier Permeability: A Model Based on Partial Least Squares Variable Selection Procedure

    Directory of Open Access Journals (Sweden)

    Yong-Hong Zhang

    2015-05-01

    Full Text Available Assessing the human placental barrier permeability of drugs is very important to guarantee drug safety during pregnancy. Quantitative structure–activity relationship (QSAR method was used as an effective assessing tool for the placental transfer study of drugs, while in vitro human placental perfusion is the most widely used method. In this study, the partial least squares (PLS variable selection and modeling procedure was used to pick out optimal descriptors from a pool of 620 descriptors of 65 compounds and to simultaneously develop a QSAR model between the descriptors and the placental barrier permeability expressed by the clearance indices (CI. The model was subjected to internal validation by cross-validation and y-randomization and to external validation by predicting CI values of 19 compounds. It was shown that the model developed is robust and has a good predictive potential (r2 = 0.9064, RMSE = 0.09, q2 = 0.7323, rp2 = 0.7656, RMSP = 0.14. The mechanistic interpretation of the final model was given by the high variable importance in projection values of descriptors. Using PLS procedure, we can rapidly and effectively select optimal descriptors and thus construct a model with good stability and predictability. This analysis can provide an effective tool for the high-throughput screening of the placental barrier permeability of drugs.

  11. A genetic algorithm for variable selection in logistic regression analysis of radiotherapy treatment outcomes.

    Science.gov (United States)

    Gayou, Olivier; Das, Shiva K; Zhou, Su-Min; Marks, Lawrence B; Parda, David S; Miften, Moyed

    2008-12-01

    A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.

  12. Spatial variability of selected physicochemical parameters within peat deposits in small valley mire: a geostatistical approach

    Directory of Open Access Journals (Sweden)

    Pawłowski Dominik

    2014-12-01

    Full Text Available Geostatistical methods for 2D and 3D modelling spatial variability of selected physicochemical properties of biogenic sediments were applied to a small valley mire in order to identify the processes that lead to the formation of various types of peat. A sequential Gaussian simulation was performed to reproduce the statistical distribution of the input data (pH and organic matter and their semivariances, as well as to honouring of data values, yielding more ‘realistic’ models that show microscale spatial variability, despite the fact that the input sample cores were sparsely distributed in the X-Y space of the study area. The stratigraphy of peat deposits in the Ldzań mire shows a record of long-term evolution of water conditions, which is associated with the variability in water supply over time. Ldzań is a fen (a rheotrophic mire with a through-flow of groundwater. Additionally, the vicinity of the Grabia River is marked by seasonal inundations of the southwest part of the mire and increased participation of mineral matter in the peat. In turn, the upper peat layers of some of the central part of Ldzań mire are rather spongy, and these peat-forming phytocoenoses probably formed during permanent waterlogging.

  13. Bayesian priors for transiting planets

    CERN Document Server

    Kipping, David M

    2016-01-01

    As astronomers push towards discovering ever-smaller transiting planets, it is increasingly common to deal with low signal-to-noise ratio (SNR) events, where the choice of priors plays an influential role in Bayesian inference. In the analysis of exoplanet data, the selection of priors is often treated as a nuisance, with observers typically defaulting to uninformative distributions. Such treatments miss a key strength of the Bayesian framework, especially in the low SNR regime, where even weak a priori information is valuable. When estimating the parameters of a low-SNR transit, two key pieces of information are known: (i) the planet has the correct geometric alignment to transit and (ii) the transit event exhibits sufficient signal-to-noise to have been detected. These represent two forms of observational bias. Accordingly, when fitting transits, the model parameter priors should not follow the intrinsic distributions of said terms, but rather those of both the intrinsic distributions and the observational ...

  14. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    Directory of Open Access Journals (Sweden)

    Ueki Masao

    2012-05-01

    Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.

  15. A variant of sparse partial least squares for variable selection and data exploration

    Directory of Open Access Journals (Sweden)

    Megan Jodene Olson Hunt

    2014-03-01

    Full Text Available When data are sparse and/or predictors multicollinear, current implementation of sparse partial least squares (SPLS does not give estimates for non-selected predictors nor provide a measure of inference. In response, an approach termed all-possible SPLS is proposed, which fits a SPLS model for all tuning parameter values across a set grid. Noted is the percentage of time a given predictor is chosen, as well as the average non-zero parameter estimate. Using a large number of multicollinear predictors, simulation confirmed variables not associated with the outcome were least likely to be chosen as sparsity increased across the grid of tuning parameters, while the opposite was true for those strongly associated. Lastly, variables with a weak association were chosen more often than those with no association, but less often than those with a strong relationship to the outcome. Similarly, predictors most strongly related to the outcome had the largest average parameter estimate magnitude, followed by those with a weak relationship, followed by those with no relationship. Across two independent studies regarding the relationship between volumetric MRI measures and a cognitive test score, this method confirmed a priori hypotheses about which brain regions would be selected most often and have the largest average parameter estimates. In conclusion, the percentage of time a predictor is chosen is a useful measure for ordering the strength of the relationship between the independent and dependent variables, serving as a form of inference. The average parameter estimates give further insight regarding the direction and strength of association. As a result, all-possible SPLS gives more information than the dichotomous output of traditional SPLS, making it useful when undertaking data exploration and hypothesis generation for a large number of potential predictors.

  16. A variant of sparse partial least squares for variable selection and data exploration.

    Science.gov (United States)

    Olson Hunt, Megan J; Weissfeld, Lisa; Boudreau, Robert M; Aizenstein, Howard; Newman, Anne B; Simonsick, Eleanor M; Van Domelen, Dane R; Thomas, Fridtjof; Yaffe, Kristine; Rosano, Caterina

    2014-01-01

    When data are sparse and/or predictors multicollinear, current implementation of sparse partial least squares (SPLS) does not give estimates for non-selected predictors nor provide a measure of inference. In response, an approach termed "all-possible" SPLS is proposed, which fits a SPLS model for all tuning parameter values across a set grid. Noted is the percentage of time a given predictor is chosen, as well as the average non-zero parameter estimate. Using a "large" number of multicollinear predictors, simulation confirmed variables not associated with the outcome were least likely to be chosen as sparsity increased across the grid of tuning parameters, while the opposite was true for those strongly associated. Lastly, variables with a weak association were chosen more often than those with no association, but less often than those with a strong relationship to the outcome. Similarly, predictors most strongly related to the outcome had the largest average parameter estimate magnitude, followed by those with a weak relationship, followed by those with no relationship. Across two independent studies regarding the relationship between volumetric MRI measures and a cognitive test score, this method confirmed a priori hypotheses about which brain regions would be selected most often and have the largest average parameter estimates. In conclusion, the percentage of time a predictor is chosen is a useful measure for ordering the strength of the relationship between the independent and dependent variables, serving as a form of inference. The average parameter estimates give further insight regarding the direction and strength of association. As a result, all-possible SPLS gives more information than the dichotomous output of traditional SPLS, making it useful when undertaking data exploration and hypothesis generation for a large number of potential predictors.

  17. Approach to the Correlation Discovery of Chinese Linguistic Parameters Based on Bayesian Method

    Institute of Scientific and Technical Information of China (English)

    WANG Wei(王玮); CAI LianHong(蔡莲红)

    2003-01-01

    Bayesian approach is an important method in statistics. The Bayesian belief network is a powerful knowledge representation and reasoning tool under the conditions of uncertainty.It is a graphics model that encodes probabilistic relationships among variables of interest. In this paper, an approach to Bayesian network construction is given for discovering the Chinese linguistic parameter relationship in the corpus.

  18. On Fuzzy Bayesian Inference

    OpenAIRE

    Frühwirth-Schnatter, Sylvia

    1990-01-01

    In the paper at hand we apply it to Bayesian statistics to obtain "Fuzzy Bayesian Inference". In the subsequent sections we will discuss a fuzzy valued likelihood function, Bayes' theorem for both fuzzy data and fuzzy priors, a fuzzy Bayes' estimator, fuzzy predictive densities and distributions, and fuzzy H.P.D .-Regions. (author's abstract)

  19. Bayesian Mediation Analysis

    Science.gov (United States)

    Yuan, Ying; MacKinnon, David P.

    2009-01-01

    In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…

  20. Dynamic Batch Bayesian Optimization

    CERN Document Server

    Azimi, Javad; Fern, Xiaoli

    2011-01-01

    Bayesian optimization (BO) algorithms try to optimize an unknown function that is expensive to evaluate using minimum number of evaluations/experiments. Most of the proposed algorithms in BO are sequential, where only one experiment is selected at each iteration. This method can be time inefficient when each experiment takes a long time and more than one experiment can be ran concurrently. On the other hand, requesting a fix-sized batch of experiments at each iteration causes performance inefficiency in BO compared to the sequential policies. In this paper, we present an algorithm that asks a batch of experiments at each time step t where the batch size p_t is dynamically determined in each step. Our algorithm is based on the observation that the sequence of experiments selected by the sequential policy can sometimes be almost independent from each other. Our algorithm identifies such scenarios and request those experiments at the same time without degrading the performance. We evaluate our proposed method us...

  1. Universal Darwinism As a Process of Bayesian Inference.

    Science.gov (United States)

    Campbell, John O

    2016-01-01

    Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment." Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature. PMID:27375438

  2. Universal Darwinism As a Process of Bayesian Inference.

    Science.gov (United States)

    Campbell, John O

    2016-01-01

    Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment." Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.

  3. Universal Darwinism as a process of Bayesian inference

    Directory of Open Access Journals (Sweden)

    John Oberon Campbell

    2016-06-01

    Full Text Available Many of the mathematical frameworks describing natural selection are equivalent to Bayes’ Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians. As Bayesian inference can always be cast in terms of (variational free energy minimization, natural selection can be viewed as comprising two components: a generative model of an ‘experiment’ in the external world environment, and the results of that 'experiment' or the 'surprise' entailed by predicted and actual outcomes of the ‘experiment’. Minimization of free energy implies that the implicit measure of 'surprise' experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.

  4. Evaluation of a partial genome screening of two asthma susceptibility regions using bayesian network based bayesian multilevel analysis of relevance.

    Directory of Open Access Journals (Sweden)

    Ildikó Ungvári

    Full Text Available Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls. The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA. This method uses bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated.With frequentist methods one SNP (rs3751464 in the FRMD6 gene provided evidence for an association with asthma (OR = 1.43(1.2-1.8; p = 3×10(-4. The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics.In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance.

  5. Robust check loss-based variable selection of high-dimensional single-index varying-coefficient model

    Science.gov (United States)

    Song, Yunquan; Lin, Lu; Jian, Ling

    2016-07-01

    Single-index varying-coefficient model is an important mathematical modeling method to model nonlinear phenomena in science and engineering. In this paper, we develop a variable selection method for high-dimensional single-index varying-coefficient models using a shrinkage idea. The proposed procedure can simultaneously select significant nonparametric components and parametric components. Under defined regularity conditions, with appropriate selection of tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. Moreover, due to the robustness of the check loss function to outliers in the finite samples, our proposed variable selection method is more robust than the ones based on the least squares criterion. Finally, the method is illustrated with numerical simulations.

  6. Cluster analysis for identifying sub-groups and selecting potential discriminatory variables in human encephalitis

    Directory of Open Access Journals (Sweden)

    Crowcroft Natasha S

    2010-12-01

    Full Text Available Abstract Background Encephalitis is an acute clinical syndrome of the central nervous system (CNS, often associated with fatal outcome or permanent damage, including cognitive and behavioural impairment, affective disorders and epileptic seizures. Infection of the central nervous system is considered to be a major cause of encephalitis and more than 100 different pathogens have been recognized as causative agents. However, a large proportion of cases have unknown disease etiology. Methods We perform hierarchical cluster analysis on a multicenter England encephalitis data set with the aim of identifying sub-groups in human encephalitis. We use the simple matching similarity measure which is appropriate for binary data sets and performed variable selection using cluster heatmaps. We also use heatmaps to visually assess underlying patterns in the data, identify the main clinical and laboratory features and identify potential risk factors associated with encephalitis. Results Our results identified fever, personality and behavioural change, headache and lethargy as the main characteristics of encephalitis. Diagnostic variables such as brain scan and measurements from cerebrospinal fluids are also identified as main indicators of encephalitis. Our analysis revealed six major clusters in the England encephalitis data set. However, marked within-cluster heterogeneity is observed in some of the big clusters indicating possible sub-groups. Overall, the results show that patients are clustered according to symptom and diagnostic variables rather than causal agents. Exposure variables such as recent infection, sick person contact and animal contact have been identified as potential risk factors. Conclusions It is in general assumed and is a common practice to group encephalitis cases according to disease etiology. However, our results indicate that patients are clustered with respect to mainly symptom and diagnostic variables rather than causal agents

  7. Scaling Bayesian network discovery through incremental recovery

    NARCIS (Netherlands)

    Castelo, J.R.; Siebes, A.P.J.M.

    1999-01-01

    Bayesian networks are a type of graphical models that, e.g., allow one to analyze the interaction among the variables in a database. A well-known problem with the discovery of such models from a database is the ``problem of high-dimensionality''. That is, the discovery of a network from a database w

  8. Assessment of acute pesticide toxicity with selected biochemical variables in suicide attempting subjects

    International Nuclear Information System (INIS)

    Pesticide induced changes were assessed in thirty two subjects of attempted suicide cases. Among all, the farmers and their families were recorded as most frequently suicide attempting. The values obtained from seven biochemical variables of 29 years old (average age) hospitalized subjects were compared to the same number and age matched normal volunteers. The results revealed major differences in the mean values of the selected parameters. The mean difference calculate; alkaline phosphatase (178.7 mu/l), Bilirubin (7.5 mg/dl), GPT (59.2 mu/l) and glucose (38.6 mg/dl) were higher than the controls, which indicate the hepatotoxicity induced by the pesticides in suicide attempting individuals. Increase in serum creatinine and urea indicated renal malfunction that could be linked with pesticide induced nephrotoxicity among them. (author)

  9. Repeat what after whom? Exploring variable selectivity in a cross-dialectal shadowing task.

    Science.gov (United States)

    Walker, Abby; Campbell-Kibler, Kathryn

    2015-01-01

    Twenty women from Christchurch, New Zealand and 16 from Columbus Ohio (dialect region U.S. Midland) participated in a bimodal lexical naming task where they repeated monosyllabic words after four speakers from four regional dialects: New Zealand, Australia, U.S. Inland North and U.S. Midland. The resulting utterances were acoustically analyzed, and presented to listeners on Amazon Mechanical Turk in an AXB task. Convergence is observed, but differs depending on the dialect of the speaker, the dialect of the model, the particular word class being shadowed, and the order in which dialects are presented to participants. We argue that these patterns are generally consistent with findings that convergence is promoted by a large phonetic distance between shadower and model (Babel, 2010, contra Kim et al., 2011), and greater existing variability in a vowel class (Babel, 2012). The results also suggest that more comparisons of accommodation toward different dialects are warranted, and that the investigation of the socio-indexical meaning of specific linguistic forms in context is a promising avenue for understanding variable selectivity in convergence. PMID:26029129

  10. Interdependency of selected metabolic variables in an animal model of metabolic syndrome.

    Science.gov (United States)

    Mellouk, Zoheir; Sener, Abdullah; Yahia, Dalila Ait; Malaisse, Willy J

    2014-10-01

    In the present study, the correlation between the percentage of glycated hemoglobin, taken as representative of changes in glucose homeostasis, and selected variables was investigated. Rats were treated for 8 weeks with diets containing 64% starch and 5% sunflower oil or containing 64% D-fructose mixed with: 5% sunflower oil; 3.4% sunflower oil and 1.6% salmon oil; or 3.4% sunflower oil and 1.6% safflower oil. Positive correlations were found between glycated hemoglobin and plasma albumin, urea, creatinine, phospholipids, triglycerides and total cholesterol, liver cholesterol, triglyceride and phospholipid content, and the plasma, liver, heart, kidney, soleus muscle and visceral adipose tissue content of thiobarbituric acid reactive substances, carbonyl derivatives and hydroperoxides. Inversely, negative correlations were observed between glycated hemoglobin and plasma calcium, iron and HDL-cholesterol concentrations, liver, heart, kidney, soleus muscle and visceral adipose tissue superoxide dismutase and catalase activity; as well as plasma, liver, heart, kidney, soleus muscle and visceral adipose tissue nitric oxide content. Only the liver glucokinase activity and liver, heart, kidney, soleus muscle and visceral adipose tissue glutathione reductase activity failed to display a significant correlation with glycated hemoglobin. These findings confirm the hypothesis that there is a close association between glucose homeostasis and other variables when considering the effects of long-chain polyunsaturated ω3 and ω6 fatty acids in rats with fructose-induced metabolic syndrome. PMID:25187839

  11. The impact of selected organizational variables and managerial leadership on radiation therapists' organizational commitment

    International Nuclear Information System (INIS)

    The purpose of this study was to examine the impact of selected organizational factors and the leadership behavior of supervisors on radiation therapists' commitment to their organizations. The population for this study consists of all full time clinical radiation therapists registered by the American Registry of Radiologic Technologists (ARRT) in the United States. A random sample of 800 radiation therapists was obtained from the ARRT for this study. Questionnaires were mailed to all participants and measured organizational variables; managerial leadership variable and three components of organizational commitment (affective, continuance and normative). It was determined that organizational support, and leadership behavior of supervisors each had a significant and positive affect on normative and affective commitment of radiation therapists and each of the models predicted over 40% of the variance in radiation therapists organizational commitment. This study examined radiation therapists' commitment to their organizations and found that affective (emotional attachment to the organization) and normative (feelings of obligation to the organization) commitments were more important than continuance commitment (awareness of the costs of leaving the organization). This study can help radiation oncology administrators and physicians to understand the values their radiation therapy employees hold that are predictive of their commitment to the organization. A crucial result of the study is the importance of the perceived support of the organization and the leadership skills of managers/supervisors on radiation therapists' commitment to the organization.

  12. Selection of controlled variables in bioprocesses. Application to a SHARON-Anammox process for autotrophic nitrogen removal

    DEFF Research Database (Denmark)

    Mauricio Iglesias, Miguel; Valverde Perez, Borja; Sin, Gürkan

    Selecting the right controlled variables in a bioprocess is challenging since the objectives of the process (yields, product or substrate concentration) are difficult to relate with a given actuator. We apply here process control tools that can be used to assist in the selection of controlled var...

  13. The Relationship between Organizational Climate and Selected Variables of Productivity-Reading Achievement, Teacher Experience and Teacher Attrition.

    Science.gov (United States)

    Smith, Stanley Jeffery

    This study investigated the relationship between organizational climate and selected organizational variables--reading achievement, teacher experience, and teacher attrition. The study sample consisted of the total teaching staffs and 642 randomly selected students from five elementary schools in a metropolitan school district. Data were collected…

  14. Bayesian analysis of factors associated with fibromyalgia syndrome subjects

    Science.gov (United States)

    Jayawardana, Veroni; Mondal, Sumona; Russek, Leslie

    2015-01-01

    Factors contributing to movement-related fear were assessed by Russek, et al. 2014 for subjects with Fibromyalgia (FM) based on the collected data by a national internet survey of community-based individuals. The study focused on the variables, Activities-Specific Balance Confidence scale (ABC), Primary Care Post-Traumatic Stress Disorder screen (PC-PTSD), Tampa Scale of Kinesiophobia (TSK), a Joint Hypermobility Syndrome screen (JHS), Vertigo Symptom Scale (VSS-SF), Obsessive-Compulsive Personality Disorder (OCPD), Pain, work status and physical activity dependent from the "Revised Fibromyalgia Impact Questionnaire" (FIQR). The study presented in this paper revisits same data with a Bayesian analysis where appropriate priors were introduced for variables selected in the Russek's paper.

  15. Evolutionary feature selection to estimate forest stand variables using LiDAR

    Science.gov (United States)

    Garcia-Gutierrez, Jorge; Gonzalez-Ferreiro, Eduardo; Riquelme-Santos, Jose C.; Miranda, David; Dieguez-Aranda, Ulises; Navarro-Cerrillo, Rafael M.

    2014-02-01

    Light detection and ranging (LiDAR) has become an important tool in forestry. LiDAR-derived models are mostly developed by means of multiple linear regression (MLR) after stepwise selection of predictors. An increasing interest in machine learning and evolutionary computation has recently arisen to improve regression use in LiDAR data processing. Although evolutionary machine learning has already proven to be suitable for regression, evolutionary computation may also be applied to improve parametric models such as MLR. This paper provides a hybrid approach based on joint use of MLR and a novel genetic algorithm for the estimation of the main forest stand variables. We show a comparison between our genetic approach and other common methods of selecting predictors. The results obtained from several LiDAR datasets with different pulse densities in two areas of the Iberian Peninsula indicate that genetic algorithms perform better than the other methods statistically. Preliminary studies suggest that a lack of parametric conditions in field data and possible misuse of parametric tests may be the main reasons for the better performance of the genetic algorithm. This research confirms the findings of previous studies that outline the importance of evolutionary computation in the context of LiDAR analisys of forest data, especially when the size of fieldwork datatasets is reduced.

  16. The extended Baryon Oscillation Spectroscopic Survey: Variability selection and quasar luminosity function

    Science.gov (United States)

    Palanque-Delabrouille, N.; Magneville, Ch.; Yèche, Ch.; Pâris, I.; Petitjean, P.; Burtin, E.; Dawson, K.; McGreer, I.; Myers, A. D.; Rossi, G.; Schlegel, D.; Schneider, D.; Streblyanska, A.; Tinker, J.

    2016-03-01

    The extended Baryon Oscillation Spectroscopic Survey of the Sloan Digital Sky Survey (SDSS-IV/eBOSS) has an extensive quasar program that combines several selection methods. Among these, the photometric variability technique provides highly uniform samples, which are unaffected by the redshift bias of traditional optical-color selections, when z = 2.7-3.5 quasars cross the stellar locus or when host galaxy light affects quasar colors at z 2.2. Both models are constrained to be continuous at z = 2.2. They present a flattening of the bright-end slope at high redshift. The LEDE model indicates a reduction of the break density with increasing redshift, but the evolution of the break magnitude depends on the parameterization. The models are in excellent accord, predicting quasar counts that agree within 0.3% (resp., 1.1%) to g< 22.5 (resp., g< 23). The models are also in good agreement over the entire redshift range with models from previous studies.

  17. Predictive validity of variables used to select students for postgraduate management courses.

    Science.gov (United States)

    Lane, John; Lane, Andrew M

    2002-06-01

    The present study set in the United Kingdom examined the predictive validity of variables used to select graduate students into postgraduate management programs at a UK business school. 303 postgraduate students completed a cognitive ability test (MD5, Mental Ability Test), a questionnaire to assess perceptions of self-efficacy to succeed on the program, and reported their performance on their first (undergraduate) degree. Students completed these measures at the start of the programs. Each program comprised 12 modules, which all students were required to complete successfully. Students' performance was measured by the average grade obtained over the 12 modules. Multiple regression indicated that only 22% of the variance (Adjusted R2 = .22, p<.001) in students' performance was predicted significantly by cognitive ability scores. Results show that neither performance on first degree nor scores for self-efficacy showed a significant relationship to the criterion measure. Findings from the present study suggest that in the UK, the use of cognitive ability tests may play a significant role in the selection of students into postgraduate programs. Nonsignificant self-efficacy and performance relationships are ascribed to unclear knowledge of the demands of the program. We suggest that there is need for further research to examine factors related to performance.

  18. FCERI and Histamine Metabolism Gene Variability in Selective Responders to NSAIDS

    Science.gov (United States)

    Amo, Gemma; Cornejo-García, José A.; García-Menaya, Jesus M.; Cordobes, Concepcion; Torres, M. J.; Esguevillas, Gara; Mayorga, Cristobalina; Martinez, Carmen; Blanca-Lopez, Natalia; Canto, Gabriela; Ramos, Alfonso; Blanca, Miguel; Agúndez, José A. G.; García-Martín, Elena

    2016-01-01

    The high-affinity IgE receptor (Fcε RI) is a heterotetramer of three subunits: Fcε RIα, Fcε RIβ, and Fcε RIγ (αβγ2) encoded by three genes designated as FCER1A, FCER1B (MS4A2), and FCER1G, respectively. Recent evidence points to FCERI gene variability as a relevant factor in the risk of developing allergic diseases. Because Fcε RI plays a key role in the events downstream of the triggering factors in immunological response, we hypothesized that FCERI gene variants might be related with the risk of, or with the clinical response to, selective (IgE mediated) non-steroidal anti-inflammatory (NSAID) hypersensitivity. From a cohort of 314 patients suffering from selective hypersensitivity to metamizole, ibuprofen, diclofenac, paracetamol, acetylsalicylic acid (ASA), propifenazone, naproxen, ketoprofen, dexketoprofen, etofenamate, aceclofenac, etoricoxib, dexibuprofen, indomethacin, oxyphenylbutazone, or piroxicam, and 585 unrelated healthy controls that tolerated these NSAIDs, we analyzed the putative effects of the FCERI SNPs FCER1A rs2494262, rs2427837, and rs2251746; FCER1B rs1441586, rs569108, and rs512555; FCER1G rs11587213, rs2070901, and rs11421. Furthermore, in order to identify additional genetic markers which might be associated with the risk of developing selective NSAID hypersensitivity, or which may modify the putative association of FCERI gene variations with risk, we analyzed polymorphisms known to affect histamine synthesis or metabolism, such as rs17740607, rs2073440, rs1801105, rs2052129, rs10156191, rs1049742, and rs1049793 in the HDC, HNMT, and DAO genes. No major genetic associations with risk or with clinical presentation, and no gene-gene interactions, or gene-phenotype interactions (including age, gender, IgE concentration, antecedents of atopy, culprit drug, or clinical presentation) were identified in patients. However, logistic regression analyses indicated that the presence of antecedents of atopy and the DAO SNP rs2052129 (GG

  19. Modeling operational risks of the nuclear industry with Bayesian networks

    International Nuclear Information System (INIS)

    Basically, planning a new industrial plant requires information on the industrial management, regulations, site selection, definition of initial and planned capacity, and on the estimation of the potential demand. However, this is far from enough to assure the success of an industrial enterprise. Unexpected and extremely damaging events may occur that deviates from the original plan. The so-called operational risks are not only in the system, equipment, process or human (technical or managerial) failures. They are also in intentional events such as frauds and sabotage, or extreme events like terrorist attacks or radiological accidents and even on public reaction to perceived environmental or future generation impacts. For the nuclear industry, it is a challenge to identify and to assess the operational risks and their various sources. Early identification of operational risks can help in preparing contingency plans, to delay the decision to invest or to approve a project that can, at an extreme, affect the public perception of the nuclear energy. A major problem in modeling operational risk losses is the lack of internal data that are essential, for example, to apply the loss distribution approach. As an alternative, methods that consider qualitative and subjective information can be applied, for example, fuzzy logic, neural networks, system dynamic or Bayesian networks. An advantage of applying Bayesian networks to model operational risk is the possibility to include expert opinions and variables of interest, to structure the model via causal dependencies among these variables, and to specify subjective prior and conditional probabilities distributions at each step or network node. This paper suggests a classification of operational risks in industry and discusses the benefits and obstacles of the Bayesian networks approach to model those risks. (author)

  20. Practical Bayesian Tomography

    CERN Document Server

    Granade, Christopher; Cory, D G

    2015-01-01

    In recent years, Bayesian methods have been proposed as a solution to a wide range of issues in quantum state and process tomography. State-of- the-art Bayesian tomography solutions suffer from three problems: numerical intractability, a lack of informative prior distributions, and an inability to track time-dependent processes. Here, we solve all three problems. First, we use modern statistical methods, as pioneered by Husz\\'ar and Houlsby and by Ferrie, to make Bayesian tomography numerically tractable. Our approach allows for practical computation of Bayesian point and region estimators for quantum states and channels. Second, we propose the first informative priors on quantum states and channels. Finally, we develop a method that allows online tracking of time-dependent states and estimates the drift and diffusion processes affecting a state. We provide source code and animated visual examples for our methods.

  1. Bayesian Lensing Shear Measurement

    CERN Document Server

    Bernstein, Gary M

    2013-01-01

    We derive an estimator of weak gravitational lensing shear from background galaxy images that avoids noise-induced biases through a rigorous Bayesian treatment of the measurement. The Bayesian formalism requires a prior describing the (noiseless) distribution of the target galaxy population over some parameter space; this prior can be constructed from low-noise images of a subsample of the target population, attainable from long integrations of a fraction of the survey field. We find two ways to combine this exact treatment of noise with rigorous treatment of the effects of the instrumental point-spread function and sampling. The Bayesian model fitting (BMF) method assigns a likelihood of the pixel data to galaxy models (e.g. Sersic ellipses), and requires the unlensed distribution of galaxies over the model parameters as a prior. The Bayesian Fourier domain (BFD) method compresses galaxies to a small set of weighted moments calculated after PSF correction in Fourier space. It requires the unlensed distributi...

  2. No Customer Left Behind: A Distribution-Free Bayesian Approach to Accounting for Missing Xs in Marketing Models

    OpenAIRE

    Yi Qian; Hui Xie

    2011-01-01

    In marketing applications, it is common that some key covariates in a regression model, such as marketing mix variables or consumer profiles, are subject to missingness. The convenient method that excludes the consumers with missingness in any covariate can result in a substantial loss of efficiency and may lead to strong selection bias in the estimation of consumer preferences and sensitivities. To solve these problems, we propose a new Bayesian distribution-free approach, which can ensure t...

  3. Relative controls of external and internal variability on time-variable transit time distributions, and the importance of StorAge Selection function approaches

    Science.gov (United States)

    Kim, M.; Pangle, L. A.; Cardoso, C.; Lora, M.; Meira, A.; Volkmann, T. H. M.; Wang, Y.; Harman, C. J.; Troch, P. A. A.

    2015-12-01

    Transit time distributions (TTDs) are an efficient way of characterizing complex transport dynamics of a hydrologic system. Time-invariant TTD has been studied extensively, but TTDs are time-varying under unsteady hydrologic systems due to both external variability (e.g., time-variability in fluxes), and internal variability (e.g., time-varying flow pathways). The use of "flow-weighted time" has been suggested to account for the effect of external variability on TTDs, but neglects the role of internal variability. Recently, to account both types of variability, StorAge Selection (SAS) function approaches were developed. One of these approaches enables the transport characteristics of a system - how the different aged water in the storage is sampled by the outflow - to be parameterized by time-variable probability distribution called the rank SAS (rSAS) function, and uses it directly to determine the time-variable TTDs resulting from a given timeseries of fluxes in and out of a system. Unlike TTDs, the form of the rSAS function varies only due to changes in flow pathways, but is not affected by the timing of fluxes alone. However, the relation between physical mechanisms and the time-varying rSAS functions are not well understood. In this study, relative effects of internal and external variability on the TTDs are examined using observations from a homogeneously packed 1 m3 sloping soil lysimeter. The observations suggest the importance of internal variability on TTDs, and reinforce the need to account for this variability using time-variable rSAS functions. Furthermore, the relative usefulness of two other formulations of SAS functions and the mortality rate (which plays a similar role to SAS functions in the McKendrick-von Foerster model of age-structured population dynamics) are also discussed. Finally, numerical modeling is used to explore the role of internal and external variability for hydrologic systems with diverse geomorphic and climate characteristics

  4. Bayesian Image Reconstruction Based on Voronoi Diagrams

    CERN Document Server

    Cabrera, G F; Hitschfeld, N

    2007-01-01

    We present a Bayesian Voronoi image reconstruction technique (VIR) for interferometric data. Bayesian analysis applied to the inverse problem allows us to derive the a-posteriori probability of a novel parameterization of interferometric images. We use a variable Voronoi diagram as our model in place of the usual fixed pixel grid. A quantization of the intensity field allows us to calculate the likelihood function and a-priori probabilities. The Voronoi image is optimized including the number of polygons as free parameters. We apply our algorithm to deconvolve simulated interferometric data. Residuals, restored images and chi^2 values are used to compare our reconstructions with fixed grid models. VIR has the advantage of modeling the image with few parameters, obtaining a better image from a Bayesian point of view.

  5. An entropy-based input variable selection approach to identify equally informative subsets for data-driven hydrological models

    Science.gov (United States)

    Karakaya, Gulsah; Taormina, Riccardo; Galelli, Stefano; Damla Ahipasaoglu, Selin

    2015-04-01

    Input Variable Selection (IVS) is an essential step in hydrological modelling problems, since it allows determining the optimal subset of input variables from a large set of candidates to characterize a preselected output. Interestingly, most of the existing IVS algorithms select a single subset, or, at most, one subset of input variables for each cardinality level, thus overlooking the fact that, for a given cardinality, there can be several subsets with similar information content. In this study, we develop a novel IVS approach specifically conceived to account for this issue. The approach is based on the formulation of a four-objective optimization problem that aims at minimizing the number of selected variables and maximizing the prediction accuracy of a data-driven model, while optimizing two entropy-based measures of relevance and redundancy. The redundancy measure ensures that the cross-dependence between the variables in a subset is minimized, while the relevance measure guarantees that the information content of each subset is maximized. In addition to the capability of selecting equally informative subsets, the approach is characterized by two other properties, namely 1) the capability of handling nonlinear interactions between the candidate input variables and preselected output, and 2) computational efficiency. These properties are guaranteed by the adoption of Extreme Learning Machine and Borg MOEA as data-driven model and heuristic optimization procedure, respectively. The approach is demonstrated on a long-term streamflow prediction problem, with the input dataset including both hydro-meteorological variables and climate indices representing dominant modes of climate variability. Results show that the availability of several equally informative subsets allows 1) determining the relative importance of each candidate input, thus supporting the understanding of the underlying physical processes, and 2) finding a better trade-off between multiple

  6. Disruption of Brewers' yeast by hydrodynamic cavitation: Process variables and their influence on selective release.

    Science.gov (United States)

    Balasundaram, B; Harrison, S T L

    2006-06-01

    Intracellular products, not secreted from the microbial cell, are released by breaking the cell envelope consisting of cytoplasmic membrane and an outer cell wall. Hydrodynamic cavitation has been reported to cause microbial cell disruption. By manipulating the operating variables involved, a wide range of intensity of cavitation can be achieved resulting in a varying extent of disruption. The effect of the process variables including cavitation number, initial cell concentration of the suspension and the number of passes across the cavitation zone on the release of enzymes from various locations of the Brewers' yeast was studied. The release profile of the enzymes studied include alpha-glucosidase (periplasmic), invertase (cell wall bound), alcohol dehydrogenase (ADH; cytoplasmic) and glucose-6-phosphate dehydrogenase (G6PDH; cytoplasmic). An optimum cavitation number Cv of 0.13 for maximum disruption was observed across the range Cv 0.09-0.99. The optimum cell concentration was found to be 0.5% (w/v, wet wt) when varying over the range 0.1%-5%. The sustained effect of cavitation on the yeast cell wall when re-circulating the suspension across the cavitation zone was found to release the cell wall bound enzyme invertase (86%) to a greater extent than the enzymes from other locations of the cell (e.g. periplasmic alpha-glucosidase at 17%). Localised damage to the cell wall could be observed using transmission electron microscopy (TEM) of cells subjected to less intense cavitation conditions. Absence of the release of cytoplasmic enzymes to a significant extent, absence of micronisation as observed by TEM and presence of a lower number of proteins bands in the culture supernatant on SDS-PAGE analysis following hydrodynamic cavitation compared to disruption by high-pressure homogenisation confirmed the selective release offered by hydrodynamic cavitation.

  7. PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data.

    Science.gov (United States)

    Bujak, Renata; Daghir-Wojtkowiak, Emilia; Kaliszan, Roman; Markuszewski, Michał J

    2016-01-01

    Non-targeted metabolomics constitutes a part of the systems biology and aims at determining numerous metabolites in complex biological samples. Datasets obtained in the non-targeted metabolomics studies are high-dimensional due to sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Therefore, a proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study). The orthogonal projections to latent structures-discriminant analysis (OPLS-DA) without and with multiple testing correction as well as the least absolute shrinkage and selection operator (LASSO) with bootstrapping, were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction selected 46 and 218 variables based on the VIP criteria using Pareto and UV scaling, respectively. For the PH study, 217 and 320 variables were selected based on the VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built after correcting for multiple testing, selected 4 and 19 variables as in terms of Pareto and UV scaling, respectively. For the PH study, 14 and 18 variables were selected based on the VIP criteria in terms of Pareto and UV scaling, respectively. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3 and 100%, respectively. In the light of PLS-based models, the larger the search space the higher the probability of developing models that fit the training data well with simultaneous poor predictive performance on the validation set. The LASSO offers potential improvements over standard linear regression due to the presence of the constrain, which promotes sparse solutions. This paper is the first one to date

  8. Malicious Bayesian Congestion Games

    CERN Document Server

    Gairing, Martin

    2008-01-01

    In this paper, we introduce malicious Bayesian congestion games as an extension to congestion games where players might act in a malicious way. In such a game each player has two types. Either the player is a rational player seeking to minimize her own delay, or - with a certain probability - the player is malicious in which case her only goal is to disturb the other players as much as possible. We show that such games do in general not possess a Bayesian Nash equilibrium in pure strategies (i.e. a pure Bayesian Nash equilibrium). Moreover, given a game, we show that it is NP-complete to decide whether it admits a pure Bayesian Nash equilibrium. This result even holds when resource latency functions are linear, each player is malicious with the same probability, and all strategy sets consist of singleton sets. For a slightly more restricted class of malicious Bayesian congestion games, we provide easy checkable properties that are necessary and sufficient for the existence of a pure Bayesian Nash equilibrium....

  9. Dimensionality reduction in Bayesian estimation algorithms

    Directory of Open Access Journals (Sweden)

    G. W. Petty

    2013-03-01

    Full Text Available An idealized synthetic database loosely resembling 3-channel passive microwave observations of precipitation against a variable background is employed to examine the performance of a conventional Bayesian retrieval algorithm. For this dataset, algorithm performance is found to be poor owing to an irreconcilable conflict between the need to find matches in the dependent database versus the need to exclude inappropriate matches. It is argued that the likelihood of such conflicts increases sharply with the dimensionality of the observation space of real satellite sensors, which may utilize 9 to 13 channels to retrieve precipitation, for example. An objective method is described for distilling the relevant information content from N real channels into a much smaller number (M of pseudochannels while also regularizing the background (geophysical plus instrument noise component. The pseudochannels are linear combinations of the original N channels obtained via a two-stage principal component analysis of the dependent dataset. Bayesian retrievals based on a single pseudochannel applied to the independent dataset yield striking improvements in overall performance. The differences between the conventional Bayesian retrieval and reduced-dimensional Bayesian retrieval suggest that a major potential problem with conventional multichannel retrievals – whether Bayesian or not – lies in the common but often inappropriate assumption of diagonal error covariance. The dimensional reduction technique described herein avoids this problem by, in effect, recasting the retrieval problem in a coordinate system in which the desired covariance is lower-dimensional, diagonal, and unit magnitude.

  10. Bayesian Methods and Universal Darwinism

    Science.gov (United States)

    Campbell, John

    2009-12-01

    Bayesian methods since the time of Laplace have been understood by their practitioners as closely aligned to the scientific method. Indeed a recent Champion of Bayesian methods, E. T. Jaynes, titled his textbook on the subject Probability Theory: the Logic of Science. Many philosophers of science including Karl Popper and Donald Campbell have interpreted the evolution of Science as a Darwinian process consisting of a `copy with selective retention' algorithm abstracted from Darwin's theory of Natural Selection. Arguments are presented for an isomorphism between Bayesian Methods and Darwinian processes. Universal Darwinism, as the term has been developed by Richard Dawkins, Daniel Dennett and Susan Blackmore, is the collection of scientific theories which explain the creation and evolution of their subject matter as due to the Operation of Darwinian processes. These subject matters span the fields of atomic physics, chemistry, biology and the social sciences. The principle of Maximum Entropy states that Systems will evolve to states of highest entropy subject to the constraints of scientific law. This principle may be inverted to provide illumination as to the nature of scientific law. Our best cosmological theories suggest the universe contained much less complexity during the period shortly after the Big Bang than it does at present. The scientific subject matter of atomic physics, chemistry, biology and the social sciences has been created since that time. An explanation is proposed for the existence of this subject matter as due to the evolution of constraints in the form of adaptations imposed on Maximum Entropy. It is argued these adaptations were discovered and instantiated through the Operations of a succession of Darwinian processes.

  11. The effects of selective breeding against scrapie susceptibility on the genetic variability of the Latxa Black-Faced sheep breed

    OpenAIRE

    Legarra Andrés; Parada Analia; Alfonso Leopoldo; Ugarte Eva; Arana Ana

    2006-01-01

    Abstract Breeding sheep populations for scrapie resistance could result in a loss of genetic variability. In this study, the effect on genetic variability of selection for increasing the ARR allele frequency was estimated in the Latxa breed. Two sources of information were used, pedigree and genetic polymorphisms (fifteen microsatellites). The results based on the genealogical information were conditioned by a low pedigree completeness level that revealed the interest of also using the inform...

  12. Variable Selection for Functional Logistic Regression in fMRI Data Analysis

    Directory of Open Access Journals (Sweden)

    Nedret BILLOR

    2015-03-01

    Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.

  13. Resiliency and subjective health assessment. Moderating role of selected psychosocial variables

    Directory of Open Access Journals (Sweden)

    Michalina Sołtys

    2015-12-01

    Full Text Available Background Resiliency is defined as a relatively permanent personality trait, which may be assigned to the category of health resources. The aim of this study was to determine conditions in which resiliency poses a significant health resource (moderation, thereby broadening knowledge of the specifics of the relationship between resiliency and subjective health assessment. Participants and procedure The study included 142 individuals. In order to examine the level of resiliency, the Assessment Resiliency Scale (SPP-25 by N. Ogińska-Bulik and Z. Juczyński was used. Participants evaluated subjective health state by means of an analogue-visual scale. Additionally, in the research the following moderating variables were controlled: sex, objective health status, having a partner, professional activity and age. These data were obtained by personal survey. Results The results confirmed the relationship between resiliency and subjective health assessment. Multiple regression analysis revealed that sex, having a partner and professional activity are significant moderators of associations between level of resiliency and subjective health evaluation. However, statistically significant interaction effects for health status and age as a moderator were not observed. Conclusions Resiliency is associated with subjective health assessment among adults, and selected socio-demographic features (such as sex, having a partner, professional activity moderate this relationship. This confirms the significant role of resiliency as a health resource and a reason to emphasize the benefits of enhancing the potential of individuals for their psychophysical wellbeing. However, the research requires replication in a more homogeneous sample.

  14. Variation in Age and Training on Selected Biochemical Variables of Indian Hockey Players

    Directory of Open Access Journals (Sweden)

    I. Manna

    2010-04-01

    Full Text Available The present study was aimed to find out the variation of age and training on biochemical variables of Indian elite hockey players. A total of 120 hockey players who volunteered for the present study, were equally divided (n=30 into 4 groups: under 16 years (14-15 yrs; under 19 years (16-18 yrs; under 23 years (19-22 yrs; and senior (23-30 yrs. The training sessions were divided into 3 phases: Transition Phase (TP, Preparatory Phase (PP, and Competitive Phase (CP. The training programme consisted of aerobic, anaerobic and skill training; and completed 4 hours in morning and evening sessions, 5 days/week. Selected biochemical parameters were measured and data were analyzed by applying Two-way ANOVA and Post hoc test. The mean values of haemoglobin (Hb, total cholesterol (TC, triglyceride (TG, high density lipoprotein cholesterol (HDL-C and low density lipoprotein cholesterol (LDL-C have been increased significantly (P<0.05 with the advancement of age of players. A significant increase (P<0.05 in serum urea, uric acid and HDL-C and a significant decrease (P<0.05 in Hb, TC, TG and LDL-C have been noted in PP and CP when compared to that of TP. The present study would provide useful information for biochemical monitoring of training of hockey players.

  15. A Bayesian Concept Learning Approach to Crowdsourcing

    DEFF Research Database (Denmark)

    Viappiani, Paolo Renato; Zilles, Sandra; Hamilton, Howard J.;

    2011-01-01

    We develop a Bayesian approach to concept learning for crowdsourcing applications. A probabilistic belief over possible concept definitions is maintained and updated according to (noisy) observations from experts, whose behaviors are modeled using discrete types. We propose recommendation...... techniques, inference methods, and query selection strategies to assist a user charged with choosing a configuration that satisfies some (partially known) concept. Our model is able to simultaneously learn the concept definition and the types of the experts. We evaluate our model with simulations, showing...... that our Bayesian strategies are effective even in large concept spaces with many uninformative experts....

  16. Comparative performance of selected variability detection techniques in photometric time series data

    CERN Document Server

    Sokolovsky, K V; Karampelas, A; Antipin, S V; Bellas-Velidis, I; Benni, P; Bonanos, A Z; Burdanov, A Y; Derlopa, S; Hatzidimitriou, D; Khokhryakova, A D; Kolesnikova, D M; Korotkiy, S A; Lapukhin, E G; Moretti, M I; Popov, A A; Pouliasis, E; Samus, N N; Spetsieri, Z; Veselkov, S A; Volkov, K V; Yang, M; Zubareva, A M

    2016-01-01

    Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time-series datasets obtained with telescopes ranging in size from a telephoto lens to 1m-class and probing variability on timescales from minutes to decades. The test datasets together include lightcurves of 127539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampli...

  17. A Bayesian analysis of sensible heat flux estimation: Quantifying uncertainty in meteorological forcing to improve model prediction

    KAUST Repository

    Ershadi, Ali

    2013-05-01

    The influence of uncertainty in land surface temperature, air temperature, and wind speed on the estimation of sensible heat flux is analyzed using a Bayesian inference technique applied to the Surface Energy Balance System (SEBS) model. The Bayesian approach allows for an explicit quantification of the uncertainties in input variables: a source of error generally ignored in surface heat flux estimation. An application using field measurements from the Soil Moisture Experiment 2002 is presented. The spatial variability of selected input meteorological variables in a multitower site is used to formulate the prior estimates for the sampling uncertainties, and the likelihood function is formulated assuming Gaussian errors in the SEBS model. Land surface temperature, air temperature, and wind speed were estimated by sampling their posterior distribution using a Markov chain Monte Carlo algorithm. Results verify that Bayesian-inferred air temperature and wind speed were generally consistent with those observed at the towers, suggesting that local observations of these variables were spatially representative. Uncertainties in the land surface temperature appear to have the strongest effect on the estimated sensible heat flux, with Bayesian-inferred values differing by up to ±5°C from the observed data. These differences suggest that the footprint of the in situ measured land surface temperature is not representative of the larger-scale variability. As such, these measurements should be used with caution in the calculation of surface heat fluxes and highlight the importance of capturing the spatial variability in the land surface temperature: particularly, for remote sensing retrieval algorithms that use this variable for flux estimation.

  18. Elicitation of prior distributions of variable-selection problems in regression

    OpenAIRE

    Garthwaite, Paul H.; Dickey, James M.

    1992-01-01

    This paper addresses the problem of quantifying expert opinion about a normal linear regression model when there is uncertainty as to which independent variables should be included in the model. Opinion is modeled as a mixture of natural conjugate prior distributions with each distribution in the mixture corresponding to a different subset of the independent variables. It is shown that for certain values of the independent variables, the predictive distribution of the dependent variable simpl...

  19. Variational Bayesian Inference of Line Spectra

    DEFF Research Database (Denmark)

    Badiu, Mihai Alin; Hansen, Thomas Lundgaard; Fleury, Bernard Henri

    2016-01-01

    In this paper, we address the fundamental problem of line spectral estimation in a Bayesian framework. We target model order and parameter estimation via variational inference in a probabilistic model in which the frequencies are continuous-valued, i.e., not restricted to a grid; and the coeffici......In this paper, we address the fundamental problem of line spectral estimation in a Bayesian framework. We target model order and parameter estimation via variational inference in a probabilistic model in which the frequencies are continuous-valued, i.e., not restricted to a grid......; and the coefficients are governed by a Bernoulli-Gaussian prior model turning model order selection into binary sequence detection. Unlike earlier works which retain only point estimates of the frequencies, we undertake a more complete Bayesian treatment by estimating the posterior probability density functions (pdfs...

  20. Comparative performance of selected variability detection techniques in photometric time series data

    Science.gov (United States)

    Sokolovsky, K. V.; Gavras, P.; Karampelas, A.; Antipin, S. V.; Bellas-Velidis, I.; Benni, P.; Bonanos, A. Z.; Burdanov, A. Y.; Derlopa, S.; Hatzidimitriou, D.; Khokhryakova, A. D.; Kolesnikova, D. M.; Korotkiy, S. A.; Lapukhin, E. G.; Moretti, M. I.; Popov, A. A.; Pouliasis, E.; Samus, N. N.; Spetsieri, Z.; Veselkov, S. A.; Volkov, K. V.; Yang, M.; Zubareva, A. M.

    2016-09-01

    Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time-series datasets obtained with telescopes ranging in size from a telephoto lens to 1 m-class and probing variability on timescales from minutes to decades. The test datasets together include lightcurves of 127539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampling patterns, photometric accuracies, and percentages of outlier measurements. The first index is the interquartile range (IQR) of magnitude measurements, sensitive to variability irrespective of a timescale and resistant to outliers. It can be complemented by the ratio of the lightcurve variance to the mean square successive difference, 1/η, which is efficient in detecting variability on timescales longer than the typical time interval between observations. Variable objects have larger 1/η and/or IQR values than non-variable objects of similar brightness. Another approach to variability detection is to combine many variability indices using principal component analysis. We present 124 previously unknown variable stars found in the test data.

  1. Variability Selected Low-Luminosity Active Galactic Nuclei in the 4 Ms Chandra Deep Field-South

    Science.gov (United States)

    Young, M.; Brandt, W. N.; Xue, Y. Q.; Paolillo, D. M.; Alexander, F. E.; Bauer, F. E.; Lehmer, B. D.; Luo, B.; Shemmer, O.; Schneider, D. P.; Vignail, C.

    2012-01-01

    The 4 Ms Chandra Deep Field-South (CDF-S) and other deep X-ray surveys have been highly effective at selecting active galactic nuclei (AGN). However, cosmologically distant low-luminosity AGN (LLAGN) have remained a challenge to identify due to significant contribution from the host galaxy. We identify long-term X ray variability (approx. month years, observed frame) in 20 of 92 CDF-S galaxies spanning redshifts approx equals 00.8 - 1.02 that do not meet other AGN selection criteria. We show that the observed variability cannot be explained by X-ray binary populations or ultraluminous X-ray sources, so the variability is most likely caused by accretion onto a supermassive black hole. The variable galaxies are not heavily obscured in general, with a stacked effective power-law photon index of Gamma(sub Stack) approx equals 1.93 +/- 0.13, and arc therefore likely LLAGN. The LLAGN tend to lie it factor of approx equal 6-89 below the extrapolated linear variability-luminosity relation measured for luminous AGN. This may he explained by their lower accretion rates. Variability-independent black-hole mass and accretion-rate estimates for variable galaxies show that they sample a significantly different black hole mass-accretion-rate space, with masses a factor of 2.4 lower and accretion rates a factor of 22.5 lower than variable luminous AGNs at the same redshift. We find that an empirical model based on a universal broken power-law power spectral density function, where the break frequency depends on SMBH mass and accretion rate, roughly reproduces the shape, but not the normalization, of the variability-luminosity trends measured for variable galaxies and more luminous AGNs.

  2. Bayesian simultaneous equation models for the analysis of energy intake and partitioning in growing pigs

    DEFF Research Database (Denmark)

    Strathe, Anders Bjerring; Jørgensen, Henry; Kebreab, E;

    2012-01-01

    ABSTRACT SUMMARY The objective of the current study was to develop Bayesian simultaneous equation models for modelling energy intake and partitioning in growing pigs. A key feature of the Bayesian approach is that parameters are assigned prior distributions, which may reflect the current state...... of nature. In the models, rates of metabolizable energy (ME) intake, protein deposition (PD) and lipid deposition (LD) were treated as dependent variables accounting for residuals being correlated. Two complementary equation systems were used to model ME intake (MEI), PD and LD. Informative priors were...... genders (barrows, boars and gilts) selected on the basis of similar birth weight. The pigs were fed four diets based on barley, wheat and soybean meal supplemented with crystalline amino acids to meet or exceed Danish nutrient requirement standards. Nutrient balances and gas exchanges were measured at c...

  3. Bayesian experimental design for the active nitridation of graphite by atomic nitrogen

    CERN Document Server

    Terejanu, Gabriel; Miki, Kenji

    2011-01-01

    The problem of optimal data collection to efficiently learn the model parameters of a graphite nitridation experiment is studied in the context of Bayesian analysis using both synthetic and real experimental data. The paper emphasizes that the optimal design can be obtained as a result of an information theoretic sensitivity analysis. Thus, the preferred design is where the statistical dependence between the model parameters and observables is the highest possible. In this paper, the statistical dependence between random variables is quantified by mutual information and estimated using a k-nearest neighbor based approximation. It is shown, that by monitoring the inference process via measures such as entropy or Kullback-Leibler divergence, one can determine when to stop the data collection process. The methodology is applied to select the most informative designs on both a simulated data set and on an experimental data set, previously published in the literature. It is also shown that the sequential Bayesian ...

  4. Bayesian synthetic evaluation of multistage reliability growth with instant and delayed fix modes

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    In the multistage reliability growth tests with instant and delayed fix modes, the failure data can be assumed to follow Weibull processes with different parameters at different stages. For the Weibull process within a stage, by the proper selection of prior distribution form and the parameters, a concise posterior distribution form is obtained, thus simplifying the Bayesian analysis. In the multistage tests, the improvement factor is used to convert the posterior of one stage to the prior of the subsequent stage. The conversion criterion is carefully analyzed to determine the distribution parameters of the subsequent stage's variable reasonably. Based on the mentioned results, a new synthetic Bayesian evaluation program and algorithm framework is put forward to evaluate the multistage reliability growth tests with instant and delayed fix modes. The example shows the effectiveness and flexibility of this method.

  5. Python Environment for Bayesian Learning: Inferring the Structure of Bayesian Networks from Knowledge and Data.

    Science.gov (United States)

    Shah, Abhik; Woolf, Peter

    2009-06-01

    In this paper, we introduce pebl, a Python library and application for learning Bayesian network structure from data and prior knowledge that provides features unmatched by alternative software packages: the ability to use interventional data, flexible specification of structural priors, modeling with hidden variables and exploitation of parallel processing. PMID:20161541

  6. Variability of levels of PM, black carbon and particle number concentration in selected European cities

    Directory of Open Access Journals (Sweden)

    C. Reche

    2011-03-01

    Full Text Available In many large cities of Europe standard air quality limit values of particulate matter (PM are exceeded. Emissions from road traffic and biomass burning are frequently reported to be the major causes. As a consequence of these exceedances a large number of air quality plans, most of them focusing on traffic emissions reductions, have been implemented in the last decade. In spite of this implementation, a number of cities did not record a decrease of PM levels. Thus, is the efficiency of air quality plans overestimated? Or do we need a more specific metric to evaluate the impact of the above emissions on the levels of urban aerosols?

    This study shows the results of the interpretation of the 2009 variability of levels of PM, black carbon (BC, aerosol number concentration (N and a number of gaseous pollutants in seven selected urban areas covering road traffic, urban background, urban-industrial, and urban-shipping environments from southern, central and northern Europe.

    The results showed that variations of PM and N levels do not always reflect the variation of the impact of road traffic emissions on urban aerosols. However, BC levels vary proportionally with those of traffic related gaseous pollutants, such as CO, NO2 and NO. Due to this high correlation, one may suppose that monitoring the levels of these gaseous pollutants would be enough to extrapolate exposure to traffic-derived BC levels. However, the BC/CO, BC/NO2 and BC/NO ratios vary widely among the cities studied, as a function of distance to traffic emissions, vehicle fleet composition and the influence of other emission sources such as biomass burning. Thus, levels of BC should be measured at air quality monitoring sites.

    During traffic rush hours, a narrow variation in the N/BC ratio was evidenced, but a wide variation of this ratio was determined for the noon period. Although in central and northern Europe N and BC levels tend to vary

  7. Network-based group variable selection for detecting expression quantitative trait loci (eQTL

    Directory of Open Access Journals (Sweden)

    Zhang Xuegong

    2011-06-01

    Full Text Available Abstract Background Analysis of expression quantitative trait loci (eQTL aims to identify the genetic loci associated with the expression level of genes. Penalized regression with a proper penalty is suitable for the high-dimensional biological data. Its performance should be enhanced when we incorporate biological knowledge of gene expression network and linkage disequilibrium (LD structure between loci in high-noise background. Results We propose a network-based group variable selection (NGVS method for QTL detection. Our method simultaneously maps highly correlated expression traits sharing the same biological function to marker sets formed by LD. By grouping markers, complex joint activity of multiple SNPs can be considered and the dimensionality of eQTL problem is reduced dramatically. In order to demonstrate the power and flexibility of our method, we used it to analyze two simulations and a mouse obesity and diabetes dataset. We considered the gene co-expression network, grouped markers into marker sets and treated the additive and dominant effect of each locus as a group: as a consequence, we were able to replicate results previously obtained on the mouse linkage dataset. Furthermore, we observed several possible sex-dependent loci and interactions of multiple SNPs. Conclusions The proposed NGVS method is appropriate for problems with high-dimensional data and high-noise background. On eQTL problem it outperforms the classical Lasso method, which does not consider biological knowledge. Introduction of proper gene expression and loci correlation information makes detecting causal markers more accurate. With reasonable model settings, NGVS can lead to novel biological findings.

  8. QUASI-STELLAR OBJECT SELECTION ALGORITHM USING TIME VARIABILITY AND MACHINE LEARNING: SELECTION OF 1620 QUASI-STELLAR OBJECT CANDIDATES FROM MACHO LARGE MAGELLANIC CLOUD DATABASE

    International Nuclear Information System (INIS)

    We present a new quasi-stellar object (QSO) selection algorithm using a Support Vector Machine, a supervised classification method, on a set of extracted time series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars, and microlensing events using 58 known QSOs, 1629 variable stars, and 4288 non-variables in the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ∼80% of known QSOs with a 25% false-positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) data set, which consists of 40 million light curves, and found 1620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false-positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxy's Evolution LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.

  9. Computational statistics using the bBayesian Inference Engine

    CERN Document Server

    Weinberg, Martin D

    2012-01-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel-optimised software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organise and reuse expensive derived data. I describe key concepts that illustrate the power of Bayesian inference to address these needs and outline the computational challenge. The techniques presented are based on experience gained in modelling star-counts and stellar populations, analysing the morphology of galaxy images, and performing Bayesian investigations of semi-analytic models of galaxy formation. These inference problems require advanced Markov chain Monte Carlo (MCMC) algorithms that expedite sampling, mixing, and the analysis of the Bayesian posterior distribution. The BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. By providing a variety of statistical algorithms for all phases of the inference problem, a u...

  10. 类相关性影响可变选择性贝叶斯分类器%A Selective Bayesian Classifier Based on Change of Class Relevance Influence

    Institute of Scientific and Technical Information of China (English)

    程玉虎; 仝瑶瑶; 王雪松

    2011-01-01

    A selective Bayesian classifier based on change of class relevance influence (CCRI SBC) was proposed by introducing a regulator factor into an attribute selection method, namely maximum relevance and minimum redundancy (mRMR). The regulator factor was used to change the influence degree of class relevance on the attribute selection, which can avoid the existence of redundant attributes in mRMR. In addition, a Bayesian information criterion was used to determine the optimal number of attributes automatically, which can overcome the randomness of classification results that easily caused by the setting number of attributes manually.In order to further make the CCRI SBC is applicable for continuous data,a discretization method,I.e. .equal frequency class attribute interdependent maximization was proposed, which has advantages of high classification correct rate and short discretization time.Experimental results on UC1 datasets show that the proposed method can deal with the classification problem for discrete or continuous and high-dimensional data effectively.%在最大相关最小冗余(mRMR)属性选择方法的基础上,通过设置一个调节因子来改变类别相关性在属性选择中的影响程度,解决mRMR方法易于引入冗余属性的问题,提出一种类相关性影响可变选择性贝叶斯分类器(CCRI SBC).为克服人为指定属性个数易于导致的分类结果随意性,采用贝叶斯信息准则来自动确定最优属性个数.为使CCRI SBC能够处理含有连续变量的数据集,提出等频类别依赖最大化离散化方法,具有分类准确率高和离散化时间短的优点.UCI数据集的实验结果表明,本文方法能够有效处理离散和连续高维数据的分类问题.

  11. Bayesian least squares deconvolution

    CERN Document Server

    Ramos, A Asensio

    2015-01-01

    Aims. To develop a fully Bayesian least squares deconvolution (LSD) that can be applied to the reliable detection of magnetic signals in noise-limited stellar spectropolarimetric observations using multiline techniques. Methods. We consider LSD under the Bayesian framework and we introduce a flexible Gaussian Process (GP) prior for the LSD profile. This prior allows the result to automatically adapt to the presence of signal. We exploit several linear algebra identities to accelerate the calculations. The final algorithm can deal with thousands of spectral lines in a few seconds. Results. We demonstrate the reliability of the method with synthetic experiments and we apply it to real spectropolarimetric observations of magnetic stars. We are able to recover the magnetic signals using a small number of spectral lines, together with the uncertainty at each velocity bin. This allows the user to consider if the detected signal is reliable. The code to compute the Bayesian LSD profile is freely available.

  12. Bayesian least squares deconvolution

    Science.gov (United States)

    Asensio Ramos, A.; Petit, P.

    2015-11-01

    Aims: We develop a fully Bayesian least squares deconvolution (LSD) that can be applied to the reliable detection of magnetic signals in noise-limited stellar spectropolarimetric observations using multiline techniques. Methods: We consider LSD under the Bayesian framework and we introduce a flexible Gaussian process (GP) prior for the LSD profile. This prior allows the result to automatically adapt to the presence of signal. We exploit several linear algebra identities to accelerate the calculations. The final algorithm can deal with thousands of spectral lines in a few seconds. Results: We demonstrate the reliability of the method with synthetic experiments and we apply it to real spectropolarimetric observations of magnetic stars. We are able to recover the magnetic signals using a small number of spectral lines, together with the uncertainty at each velocity bin. This allows the user to consider if the detected signal is reliable. The code to compute the Bayesian LSD profile is freely available.

  13. Bayesian Adaptive Exploration

    CERN Document Server

    Loredo, T J

    2004-01-01

    I describe a framework for adaptive scientific exploration based on iterating an Observation--Inference--Design cycle that allows adjustment of hypotheses and observing protocols in response to the results of observation on-the-fly, as data are gathered. The framework uses a unified Bayesian methodology for the inference and design stages: Bayesian inference to quantify what we have learned from the available data and predict future data, and Bayesian decision theory to identify which new observations would teach us the most. When the goal of the experiment is simply to make inferences, the framework identifies a computationally efficient iterative ``maximum entropy sampling'' strategy as the optimal strategy in settings where the noise statistics are independent of signal properties. Results of applying the method to two ``toy'' problems with simulated data--measuring the orbit of an extrasolar planet, and locating a hidden one-dimensional object--show the approach can significantly improve observational eff...

  14. Bayesian Exploratory Factor Analysis

    DEFF Research Database (Denmark)

    Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.;

    2014-01-01

    This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corr......This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor......, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates...

  15. Learning Bayesian Networks from Data by Particle Swarm Optimization

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Learning Bayesian network is an NP-hard problem. When the number of variables is large, the process of searching optimal network structure could be very time consuming and tends to return a structure which is local optimal. The particle swarm optimization (PSO) was introduced to the problem of learning Bayesian networks and a novel structure learning algorithm using PSO was proposed. To search in directed acyclic graphs spaces efficiently, a discrete PSO algorithm especially for structure learning was proposed based on the characteristics of Bayesian networks. The results of experiments show that our PSO based algorithm is fast for convergence and can obtain better structures compared with genetic algorithm based algorithms.

  16. Bayesian multiple target tracking

    CERN Document Server

    Streit, Roy L

    2013-01-01

    This second edition has undergone substantial revision from the 1999 first edition, recognizing that a lot has changed in the multiple target tracking field. One of the most dramatic changes is in the widespread use of particle filters to implement nonlinear, non-Gaussian Bayesian trackers. This book views multiple target tracking as a Bayesian inference problem. Within this framework it develops the theory of single target tracking, multiple target tracking, and likelihood ratio detection and tracking. In addition to providing a detailed description of a basic particle filter that implements

  17. PLS-based and regularization-based methods for the selection of relevant variables in non-targeted metabolomics data

    Directory of Open Access Journals (Sweden)

    Renata Bujak

    2016-07-01

    Full Text Available Non-targeted metabolomics constitutes a part of systems biology and aims to determine many metabolites in complex biological samples. Datasets obtained in non-targeted metabolomics studies are multivariate and high-dimensional due to the sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA without and with multiple testing correction as well as least absolute shrinkage and selection operator (LASSO were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction, selected 46 and 218 variables based on VIP criteria using Pareto and UV scaling, respectively. In the case of the PH study, 217 and 320 variables were selected based on VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built with multiple testing correction, selected 4 and 19 variables as statistically significant in terms of Pareto and UV scaling, respectively. For PH study, 14 and 18 variables were selected based on VIP criteria in terms of Pareto and UV scaling, respectively. Additionally, the concept and fundaments of the least absolute shrinkage and selection operator (LASSO with bootstrap procedure evaluating reproducibility of results, was demonstrated. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3% and 100%. However, apart from the popularity of PLS-DA and OPLS-DA methods in metabolomics, it should be highlighted that they do not control type I or type II error, but only arbitrarily establish a cut-off value for PLS-DA loadings

  18. Bayesian and frequentist inequality tests

    OpenAIRE

    David M. Kaplan; Zhuo, Longhao

    2016-01-01

    Bayesian and frequentist criteria are fundamentally different, but often posterior and sampling distributions are asymptotically equivalent (and normal). We compare Bayesian and frequentist hypothesis tests of inequality restrictions in such cases. For finite-dimensional parameters, if the null hypothesis is that the parameter vector lies in a certain half-space, then the Bayesian test has (frequentist) size $\\alpha$; if the null hypothesis is any other convex subspace, then the Bayesian test...

  19. Predicting punching acceleration from selected strength and power variables in elite karate athletes: a multiple regression analysis.

    Science.gov (United States)

    Loturco, Irineu; Artioli, Guilherme Giannini; Kobal, Ronaldo; Gil, Saulo; Franchini, Emerson

    2014-07-01

    This study investigated the relationship between punching acceleration and selected strength and power variables in 19 professional karate athletes from the Brazilian National Team (9 men and 10 women; age, 23 ± 3 years; height, 1.71 ± 0.09 m; and body mass [BM], 67.34 ± 13.44 kg). Punching acceleration was assessed under 4 different conditions in a randomized order: (a) fixed distance aiming to attain maximum speed (FS), (b) fixed distance aiming to attain maximum impact (FI), (c) self-selected distance aiming to attain maximum speed, and (d) self-selected distance aiming to attain maximum impact. The selected strength and power variables were as follows: maximal dynamic strength in bench press and squat-machine, squat and countermovement jump height, mean propulsive power in bench throw and jump squat, and mean propulsive velocity in jump squat with 40% of BM. Upper- and lower-body power and maximal dynamic strength variables were positively correlated to punch acceleration in all conditions. Multiple regression analysis also revealed predictive variables: relative mean propulsive power in squat jump (W·kg-1), and maximal dynamic strength 1 repetition maximum in both bench press and squat-machine exercises. An impact-oriented instruction and a self-selected distance to start the movement seem to be crucial to reach the highest acceleration during punching execution. This investigation, while demonstrating strong correlations between punching acceleration and strength-power variables, also provides important information for coaches, especially for designing better training strategies to improve punching speed.

  20. On Bayesian Rules for Selecting 3PL Binary Items for Criterion-Referenced Interpretations and Creating Booklets for Bookmark Standard Setting.

    Science.gov (United States)

    Huynh, Huynh

    By noting that a Rasch or two parameter logistic (2PL) item belongs to the exponential family of random variables and that the probability density function (pdf) of the correct response (X=1) and the incorrect response (X=0) are symmetric with respect to the vertical line at the item location, it is shown that the conjugate prior for ability is…

  1. Culture, Organizational Learning and Selected Employee Background Variables in Small-Size Business Enterprises

    Science.gov (United States)

    Graham, Carroll M.; Nafukho, Fredrick Muyia

    2007-01-01

    Purpose: The purpose of this study is to determine the relationship between four independent variables educational level, longevity, type of enterprise, and gender and the dependent variable culture, as a dimension that explains organizational learning readiness in seven small-size business enterprises. Design/methodology/approach: An exploratory…

  2. Applied Music Teaching Behavior as a Function of Selected Personality Variables.

    Science.gov (United States)

    Schmidt, Charles P.

    1989-01-01

    Investigates the relationships among applied music teaching behaviors and personality variables as measured by the Myers-Briggs Type Indicator (MBTI). Suggests that personality variables may be important factors underlying four applied music teaching behaviors: approvals, rate of reinforcement, teacher model/performance, and pace. (LS)

  3. Exclusive breastfeeding practice in Nigeria: a bayesian stepwise regression analysis.

    Science.gov (United States)

    Gayawan, Ezra; Adebayo, Samson B; Chitekwe, Stanley

    2014-11-01

    Despite the importance of breast milk, the prevalence of exclusive breastfeeding (EBF) in Nigeria is far lower than what has been recommended for developing countries. Worse still, the practise has been on downward trend in the country recently. This study was aimed at investigating the determinants and geographical variations of EBF in Nigeria. Any intervention programme would require a good knowledge of factors that enhance the practise. A pooled data set from Nigeria Demographic and Health Survey conducted in 1999, 2003, and 2008 were analyzed using a Bayesian stepwise approach that involves simultaneous selection of variables and smoothing parameters. Further, the approach allows for geographical variations at a highly disaggregated level of states to be investigated. Within a Bayesian context, appropriate priors are assigned on all the parameters and functions. Findings reveal that education of women and their partners, place of delivery, mother's age at birth, and current age of child are associated with increasing prevalence of EBF. However, visits for antenatal care during pregnancy are not associated with EBF in Nigeria. Further, results reveal considerable geographical variations in the practise of EBF. The likelihood of exclusively breastfeeding children are significantly higher in Kwara, Kogi, Osun, and Oyo states but lower in Jigawa, Katsina, and Yobe. Intensive interventions that can lead to improved practise are required in all states in Nigeria. The importance of breastfeeding needs to be emphasized to women during antenatal visits as this can encourage and enhance the practise after delivery. PMID:24619227

  4. Bayesian Subset Modeling for High-Dimensional Generalized Linear Models

    KAUST Repository

    Liang, Faming

    2013-06-01

    This article presents a new prior setting for high-dimensional generalized linear models, which leads to a Bayesian subset regression (BSR) with the maximum a posteriori model approximately equivalent to the minimum extended Bayesian information criterion model. The consistency of the resulting posterior is established under mild conditions. Further, a variable screening procedure is proposed based on the marginal inclusion probability, which shares the same properties of sure screening and consistency with the existing sure independence screening (SIS) and iterative sure independence screening (ISIS) procedures. However, since the proposed procedure makes use of joint information from all predictors, it generally outperforms SIS and ISIS in real applications. This article also makes extensive comparisons of BSR with the popular penalized likelihood methods, including Lasso, elastic net, SIS, and ISIS. The numerical results indicate that BSR can generally outperform the penalized likelihood methods. The models selected by BSR tend to be sparser and, more importantly, of higher prediction ability. In addition, the performance of the penalized likelihood methods tends to deteriorate as the number of predictors increases, while this is not significant for BSR. Supplementary materials for this article are available online. © 2013 American Statistical Association.

  5. Variable selection for confounder control, flexible modeling and Collaborative Targeted Minimum Loss-based Estimation in causal inference

    Science.gov (United States)

    Schnitzer, Mireille E.; Lok, Judith J.; Gruber, Susan

    2015-01-01

    This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low-and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios. PMID:26226129

  6. An examination of predictive variables toward graduation of minority students in science at a selected urban university

    Science.gov (United States)

    Hunter, Evelyn M. Irving

    1998-12-01

    The purpose of this study was to examine the relationship and predictive power of the variables gender, high school GPA, class rank, SAT scores, ACT scores, and socioeconomic status on the graduation rates of minority college students majoring in the sciences at a selected urban university. Data was examined on these variables as they related to minority students majoring in science. The population consisted of 101 minority college students who had majored in the sciences from 1986 to 1996 at an urban university in the southwestern region of Texas. A non-probability sampling procedure was used in this study. The non-probability sampling procedure in this investigation was incidental sampling technique. A profile sheet was developed to record the information regarding the variables. The composite scores from SAT and ACT testing were used in the study. The dichotomous variables gender and socioeconomic status were dummy coded for analysis. For the gender variable, zero (0) indicated male, and one (1) indicated female. Additionally, zero (0) indicated high SES, and one (1) indicated low SES. Two parametric procedures were used to analyze the data in this investigation. They were the multiple correlation and multiple regression procedures. Multiple correlation is a statistical technique that indicates the relationship between one variable and a combination of two other variables. The variables socioeconomic status and GPA were found to contribute significantly to the graduation rates of minority students majoring in all sciences when combined with chemistry (Hypotheses Two and Four). These variables accounted for 7% and 15% of the respective variance in the graduation rates of minority students in the sciences and in chemistry. Hypotheses One and Three, the predictor variables gender, high school GPA, SAT Total Scores, class rank, and socioeconomic status did not contribute significantly to the graduation rates of minority students in biology and pharmacy.

  7. Multiview Bayesian Correlated Component Analysis

    DEFF Research Database (Denmark)

    Kamronn, Simon Due; Poulsen, Andreas Trier; Hansen, Lars Kai

    2015-01-01

    Correlated component analysis as proposed by Dmochowski, Sajda, Dias, and Parra (2012) is a tool for investigating brain process similarity in the responses to multiple views of a given stimulus. Correlated components are identified under the assumption that the involved spatial networks...... are identical. Here we propose a hierarchical probabilistic model that can infer the level of universality in such multiview data, from completely unrelated representations, corresponding to canonical correlation analysis, to identical representations as in correlated component analysis. This new model, which...... we denote Bayesian correlated component analysis, evaluates favorably against three relevant algorithms in simulated data. A well-established benchmark EEG data set is used to further validate the new model and infer the variability of spatial representations across multiple subjects....

  8. Bayesian Kernel Mixtures for Counts.

    Science.gov (United States)

    Canale, Antonio; Dunson, David B

    2011-12-01

    Although Bayesian nonparametric mixture models for continuous data are well developed, there is a limited literature on related approaches for count data. A common strategy is to use a mixture of Poissons, which unfortunately is quite restrictive in not accounting for distributions having variance less than the mean. Other approaches include mixing multinomials, which requires finite support, and using a Dirichlet process prior with a Poisson base measure, which does not allow smooth deviations from the Poisson. As a broad class of alternative models, we propose to use nonparametric mixtures of rounded continuous kernels. An efficient Gibbs sampler is developed for posterior computation, and a simulation study is performed to assess performance. Focusing on the rounded Gaussian case, we generalize the modeling framework to account for multivariate count data, joint modeling with continuous and categorical variables, and other complications. The methods are illustrated through applications to a developmental toxicity study and marketing data. This article has supplementary material online. PMID:22523437

  9. Prediction of road accidents: A Bayesian hierarchical approach

    DEFF Research Database (Denmark)

    Deublein, Markus; Schubert, Matthias; Adey, Bryan T.;

    2013-01-01

    -lognormal regression analysis taking into account correlations amongst multiple dependent model response variables and effects of discrete accident count data e.g. over-dispersion, and (3) Bayesian inference algorithms, which are applied by means of data mining techniques supported by Bayesian Probabilistic Networks...... in order to represent non-linearity between risk indicating and model response variables, as well as different types of uncertainties which might be present in the development of the specific models.Prior Bayesian Probabilistic Networks are first established by means of multivariate regression analysis...... of the observed frequencies of the model response variables, e.g. the occurrence of an accident, and observed values of the risk indicating variables, e.g. degree of road curvature. Subsequently, parameter learning is done using updating algorithms, to determine the posterior predictive probability distributions...

  10. Bayesian network learning for natural hazard assessments

    Science.gov (United States)

    Vogel, Kristin

    2016-04-01

    Even though quite different in occurrence and consequences, from a modelling perspective many natural hazards share similar properties and challenges. Their complex nature as well as lacking knowledge about their driving forces and potential effects make their analysis demanding. On top of the uncertainty about the modelling framework, inaccurate or incomplete event observations and the intrinsic randomness of the natural phenomenon add up to different interacting layers of uncertainty, which require a careful handling. Thus, for reliable natural hazard assessments it is crucial not only to capture and quantify involved uncertainties, but also to express and communicate uncertainties in an intuitive way. Decision-makers, who often find it difficult to deal with uncertainties, might otherwise return to familiar (mostly deterministic) proceedings. In the scope of the DFG research training group „NatRiskChange" we apply the probabilistic framework of Bayesian networks for diverse natural hazard and vulnerability studies. The great potential of Bayesian networks was already shown in previous natural hazard assessments. Treating each model component as random variable, Bayesian networks aim at capturing the joint distribution of all considered variables. Hence, each conditional distribution of interest (e.g. the effect of precautionary measures on damage reduction) can be inferred. The (in-)dependencies between the considered variables can be learned purely data driven or be given by experts. Even a combination of both is possible. By translating the (in-)dependences into a graph structure, Bayesian networks provide direct insights into the workings of the system and allow to learn about the underlying processes. Besides numerous studies on the topic, learning Bayesian networks from real-world data remains challenging. In previous studies, e.g. on earthquake induced ground motion and flood damage assessments, we tackled the problems arising with continuous variables

  11. Bayesian Dark Knowledge

    NARCIS (Netherlands)

    A. Korattikara; V. Rathod; K. Murphy; M. Welling

    2015-01-01

    We consider the problem of Bayesian parameter estimation for deep neural networks, which is important in problem settings where we may have little data, and/ or where we need accurate posterior predictive densities p(y|x, D), e.g., for applications involving bandits or active learning. One simple ap

  12. Bayesian logistic regression analysis

    NARCIS (Netherlands)

    Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.

    2012-01-01

    In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an

  13. Bayesian Adaptive Exploration

    Science.gov (United States)

    Loredo, Thomas J.

    2004-04-01

    I describe a framework for adaptive scientific exploration based on iterating an Observation-Inference-Design cycle that allows adjustment of hypotheses and observing protocols in response to the results of observation on-the-fly, as data are gathered. The framework uses a unified Bayesian methodology for the inference and design stages: Bayesian inference to quantify what we have learned from the available data and predict future data, and Bayesian decision theory to identify which new observations would teach us the most. When the goal of the experiment is simply to make inferences, the framework identifies a computationally efficient iterative ``maximum entropy sampling'' strategy as the optimal strategy in settings where the noise statistics are independent of signal properties. Results of applying the method to two ``toy'' problems with simulated data-measuring the orbit of an extrasolar planet, and locating a hidden one-dimensional object-show the approach can significantly improve observational efficiency in settings that have well-defined nonlinear models. I conclude with a list of open issues that must be addressed to make Bayesian adaptive exploration a practical and reliable tool for optimizing scientific exploration.

  14. Subjective Bayesian Beliefs

    DEFF Research Database (Denmark)

    Antoniou, Constantinos; Harrison, Glenn W.; Lau, Morten I.;

    2015-01-01

    A large literature suggests that many individuals do not apply Bayes’ Rule when making decisions that depend on them correctly pooling prior information and sample data. We replicate and extend a classic experimental study of Bayesian updating from psychology, employing the methods of experimental...

  15. Bayesian Independent Component Analysis

    DEFF Research Database (Denmark)

    Winther, Ole; Petersen, Kaare Brandt

    2007-01-01

    In this paper we present an empirical Bayesian framework for independent component analysis. The framework provides estimates of the sources, the mixing matrix and the noise parameters, and is flexible with respect to choice of source prior and the number of sources and sensors. Inside the engine...

  16. Bayesian grid matching

    DEFF Research Database (Denmark)

    Hartelius, Karsten; Carstensen, Jens Michael

    2003-01-01

    A method for locating distorted grid structures in images is presented. The method is based on the theories of template matching and Bayesian image restoration. The grid is modeled as a deformable template. Prior knowledge of the grid is described through a Markov random field (MRF) model which...

  17. Oracle Efficient Variable Selection in Random and Fixed Effects Panel Data Models

    DEFF Research Database (Denmark)

    Kock, Anders Bredahl

    This paper generalizes the results for the Bridge estimator of Huang et al. (2008) to linear random and fixed effects panel data models which are allowed to grow in both dimensions. In particular we show that the Bridge estimator is oracle efficient. It can correctly distinguish between relevant...... and irrelevant variables and the asymptotic distribution of the estimators of the coefficients of the relevant variables is the same as if only these had been included in the model, i.e. as if an oracle had revealed the true model prior to estimation. In the case of more explanatory variables than...

  18. The effects of selective breeding against scrapie susceptibility on the genetic variability of the Latxa Black-Faced sheep breed

    Directory of Open Access Journals (Sweden)

    Legarra Andrés

    2006-09-01

    Full Text Available Abstract Breeding sheep populations for scrapie resistance could result in a loss of genetic variability. In this study, the effect on genetic variability of selection for increasing the ARR allele frequency was estimated in the Latxa breed. Two sources of information were used, pedigree and genetic polymorphisms (fifteen microsatellites. The results based on the genealogical information were conditioned by a low pedigree completeness level that revealed the interest of also using the information provided by the molecular markers. The overall results suggest that no great negative effect on genetic variability can be expected in the short time in the population analysed by selection of only ARR/ARR males. The estimated average relationship of ARR/ARR males with reproductive females was similar to that of all available males whatever its genotype: 0.010 vs. 0.012 for a genealogical relationship and 0.257 vs. 0.296 for molecular coancestry, respectively. However, selection of only ARR/ARR males implied important losses in founder animals (87 percent and low frequency alleles (30 percent in the ram population. The evaluation of mild selection strategies against scrapie susceptibility based on the use of some ARR heterozygous males was difficult because the genetic relationships estimated among animals differed when pedigree or molecular information was used, and the use of more molecular markers should be evaluated.

  19. Effects of musical tempo on physiological, affective, and perceptual variables and performance of self-selected walking pace.

    Science.gov (United States)

    Almeida, Flávia Angélica Martins; Nunes, Renan Felipe Hartmann; Ferreira, Sandro Dos Santos; Krinski, Kleverton; Elsangedy, Hassan Mohamed; Buzzachera, Cosme Franklin; Alves, Ragami Chaves; Gregorio da Silva, Sergio

    2015-06-01

    [Purpose] This study investigated the effects of musical tempo on physiological, affective, and perceptual responses as well as the performance of self-selected walking pace. [Subjects] The study included 28 adult women between 29 and 51 years old. [Methods] The subjects were divided into three groups: no musical stimulation group (control), and 90 and 140 beats per minute musical tempo groups. Each subject underwent three experimental sessions: involved familiarization with the equipment, an incremental test to exhaustion, and a 30-min walk on a treadmill at a self-selected pace, respectively. During the self-selected walking session, physiological, perceptual, and affective variables were evaluated, and walking performance was evaluated at the end. [Results] There were no significant differences in physiological variables or affective response among groups. However, there were significant differences in perceptual response and walking performance among groups. [Conclusion] Fast music (140 beats per minute) promotes a higher rating of perceived exertion and greater performance in self-selected walking pace without significantly altering physiological variables or affective response. PMID:26180303

  20. Bayesian calibration for forensic age estimation.

    Science.gov (United States)

    Ferrante, Luigi; Skrami, Edlira; Gesuita, Rosaria; Cameriere, Roberto

    2015-05-10

    Forensic medicine is increasingly called upon to assess the age of individuals. Forensic age estimation is mostly required in relation to illegal immigration and identification of bodies or skeletal remains. A variety of age estimation methods are based on dental samples and use of regression models, where the age of an individual is predicted by morphological tooth changes that take place over time. From the medico-legal point of view, regression models, with age as the dependent random variable entail that age tends to be overestimated in the young and underestimated in the old. To overcome this bias, we describe a new full Bayesian calibration method (asymmetric Laplace Bayesian calibration) for forensic age estimation that uses asymmetric Laplace distribution as the probability model. The method was compared with three existing approaches (two Bayesian and a classical method) using simulated data. Although its accuracy was comparable with that of the other methods, the asymmetric Laplace Bayesian calibration appears to be significantly more reliable and robust in case of misspecification of the probability model. The proposed method was also applied to a real dataset of values of the pulp chamber of the right lower premolar measured on x-ray scans of individuals of known age. PMID:25645903

  1. Realisations of the Word-initial Variable (th) in Selected Late Middle English Northern Legal Documents

    OpenAIRE

    Adamczyk, Michał

    2016-01-01

    Synchronic variability in the area of phonetics, phonology, vocabulary, morphology and syntax is a natural feature of any language, including English. The existence of competing variants is in itself a fascinating phenomenon, but it is also a prerequisite for diachronic changes. This volume is a collection of studies which investigate variability from a contemporary and historical perspective, in both native and non-native varieties of English. The topics include Middle English spelling varia...

  2. Selection of complementary single-variable domains for building monoclonal antibodies to native proteins

    OpenAIRE

    Tanaka, Tomoyuki; Rabbitts, Terence H.

    2009-01-01

    Antibodies are now indispensable tools for all areas of cell biology and biotechnology as well as for diagnosis and therapy. Antigen-specific single immunoglobulin variable domains that bind to native antigens can be isolated and manipulated using yeast intracellular antibody capture technology but converting these to whole monoclonal antibody requires that complementary variable domains (VH or VL) bind to the same antigenic site. We describe a simple approach (CatcherAb) for specific isolati...

  3. Application of Bayesian decision theory to airborne gamma snow measurement

    Science.gov (United States)

    Bissell, V. C.

    1975-01-01

    Measured values of several variables are incorporated into the calculation of snow water equivalent as measured from an aircraft by snow attenuation of terrestrial gamma radiation. Bayesian decision theory provides a snow water equivalent measurement by taking into account the uncertainties in the individual measurement variables and filtering information about the measurement variables through prior notions of what the calculated variable (water equivalent) should be.

  4. Bayesian multimodel inference for dose-response studies

    Science.gov (United States)

    Link, W.A.; Albers, P.H.

    2007-01-01

    Statistical inference in dose?response studies is model-based: The analyst posits a mathematical model of the relation between exposure and response, estimates parameters of the model, and reports conclusions conditional on the model. Such analyses rarely include any accounting for the uncertainties associated with model selection. The Bayesian inferential system provides a convenient framework for model selection and multimodel inference. In this paper we briefly describe the Bayesian paradigm and Bayesian multimodel inference. We then present a family of models for multinomial dose?response data and apply Bayesian multimodel inferential methods to the analysis of data on the reproductive success of American kestrels (Falco sparveriuss) exposed to various sublethal dietary concentrations of methylmercury.

  5. Shape, sizing optimization and material selection based on mixed variables and genetic algorithm

    NARCIS (Netherlands)

    Tang, X.; Bassir, D.H.; Zhang, W.

    2010-01-01

    In this work, we explore simultaneous designs of materials selection and structural optimization. As the material selection turns out to be a discrete process that finds the optimal distribution of materials over the design domain, it cannot be performed with common gradient-based optimization metho

  6. Variable Selection and Updating In Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications*

    OpenAIRE

    Murphy, Thomas Brendan; Dean, Nema; Raftery, Adrian E.

    2010-01-01

    Food authenticity studies are concerned with determining if food samples have been correctly labeled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give ...

  7. Bayesian Analysis of Dynamic Multivariate Models with Multiple Structural Breaks

    OpenAIRE

    Sugita, Katsuhiro

    2006-01-01

    This paper considers a vector autoregressive model or a vector error correction model with multiple structural breaks in any subset of parameters, using a Bayesian approach with Markov chain Monte Carlo simulation technique. The number of structural breaks is determined as a sort of model selection by the posterior odds. For a cointegrated model, cointegrating rank is also allowed to change with breaks. Bayesian approach by Strachan (Journal of Business and Economic Statistics 21 (2003) 185) ...

  8. One-Stage and Bayesian Two-Stage Optimal Designs for Mixture Models

    OpenAIRE

    Lin, Hefang

    1999-01-01

    In this research, Bayesian two-stage D-D optimal designs for mixture experiments with or without process variables under model uncertainty are developed. A Bayesian optimality criterion is used in the first stage to minimize the determinant of the posterior variances of the parameters. The second stage design is then generated according to an optimality procedure that collaborates with the improved model from first stage data. Our results show that the Bayesian two-stage D-D optimal design...

  9. Effectiveness of Shrinkage and Variable Selection Methods for the Prediction of Complex Human Traits using Data from Distantly Related Individuals

    Science.gov (United States)

    Pérez‐Rodríguez, Paulino; Veturi, Yogasudha; Simianer, Henner; de los Campos, Gustavo

    2015-01-01

    Summary Genome‐wide association studies (GWAS) have detected large numbers of variants associated with complex human traits and diseases. However, the proportion of variance explained by GWAS‐significant single nucleotide polymorphisms has been usually small. This brought interest in the use of whole‐genome regression (WGR) methods. However, there has been limited research on the factors that affect prediction accuracy (PA) of WGRs when applied to human data of distantly related individuals. Here, we examine, using real human genotypes and simulated phenotypes, how trait complexity, marker‐quantitative trait loci (QTL) linkage disequilibrium (LD), and the model used affect the performance of WGRs. Our results indicated that the estimated rate of missing heritability is dependent on the extent of marker‐QTL LD. However, this parameter was not greatly affected by trait complexity. Regarding PA our results indicated that: (a) under perfect marker‐QTL LD WGR can achieve moderately high prediction accuracy, and with simple genetic architectures variable selection methods outperform shrinkage procedures and (b) under imperfect marker‐QTL LD, variable selection methods can achieved reasonably good PA with simple or moderately complex genetic architectures; however, the PA of these methods deteriorated as trait complexity increases and with highly complex traits variable selection and shrinkage methods both performed poorly. This was confirmed with an analysis of human height. PMID:25600682

  10. Contributions of Selected Perinatal Variables to Seven-Year Psychological and Achievement Test Scores.

    Science.gov (United States)

    Henderson, N. B.; And Others

    Perinatal variables were used to predict 7-year outcome for 538 children, 32% Negro and 68% white. Mother's age, birthplace, education, occupation, marital status, neuropsychiatric status, family income, number supported, birth weight, one- and five-minute Apgar scores were regressed on 7-year Verbal, Performance and Full Scale IQ, Bender, Wide…

  11. Discipline in the Schools: The Relationship of Educators' Attitudes About Corporal Punishment to Selected Variables.

    Science.gov (United States)

    Parkay, Forrest W.; Conoley, Colleen

    The purpose of this study was twofold: (1) to determine educators' attitudes toward corporal punishment and its alternatives in a variety of school settings throughout the Southwest; and (2) to explore the relationships between respondents' attitudes and such independent variables as dogmatism, sex, experience, level of education, job description,…

  12. Cortical Response Variability as a Developmental Index of Selective Auditory Attention

    Science.gov (United States)

    Strait, Dana L.; Slater, Jessica; Abecassis, Victor; Kraus, Nina

    2014-01-01

    Attention induces synchronicity in neuronal firing for the encoding of a given stimulus at the exclusion of others. Recently, we reported decreased variability in scalp-recorded cortical evoked potentials to attended compared with ignored speech in adults. Here we aimed to determine the developmental time course for this neural index of auditory…

  13. The Multifaceted Variable Approach: Selection of Method in Solving Simple Linear Equations

    Science.gov (United States)

    Tahir, Salma; Cavanagh, Michael

    2010-01-01

    This paper presents a comparison of the solution strategies used by two groups of Year 8 students as they solved linear equations. The experimental group studied algebra following a multifaceted variable approach, while the comparison group used a traditional approach. Students in the experimental group employed different solution strategies,…

  14. Selecting both latent and explanatory variables in the PLS1 regression model

    OpenAIRE

    Lazraq, Aziz; Cléroux, Robert; Gauchi, Jean-Pierre

    2003-01-01

    In this paper, two inferential procedures for selecting the significant predictors in the PLS1 regression model are introduced. The significant PLS components are first obtained and the two predictor selection methods, called PLS–Forward and PLS–Bootstrap, are applied to the PLS model obtained. They are also compared empirically to two other methods that exist in the literature with respect to the quality of fit of the model and to their predictive ability. Although none of the four methods i...

  15. Stochastic back analysis of permeability coefficient using generalized Bayesian method

    Institute of Scientific and Technical Information of China (English)

    Zheng Guilan; Wang Yuan; Wang Fei; Yang Jian

    2008-01-01

    Owing to the fact that the conventional deterministic back analysis of the permeability coefficient cannot reflect the uncertainties of parameters, including the hydraulic head at the boundary, the permeability coefficient and measured hydraulic head, a stochastic back analysis taking consideration of uncertainties of parameters was performed using the generalized Bayesian method. Based on the stochastic finite element method (SFEM) for a seepage field, the variable metric algorithm and the generalized Bayesian method, formulas for stochastic back analysis of the permeability coefficient were derived. A case study of seepage analysis of a sluice foundation was performed to illustrate the proposed method. The results indicate that, with the generalized Bayesian method that considers the uncertainties of measured hydraulic head, the permeability coefficient and the hydraulic head at the boundary, both the mean and standard deviation of the permeability coefficient can be obtained and the standard deviation is less than that obtained by the conventional Bayesian method. Therefore, the present method is valid and applicable.

  16. Bayesian community detection

    DEFF Research Database (Denmark)

    Mørup, Morten; Schmidt, Mikkel N

    2012-01-01

    Many networks of scientific interest naturally decompose into clusters or communities with comparatively fewer external than internal links; however, current Bayesian models of network communities do not exert this intuitive notion of communities. We formulate a nonparametric Bayesian model...... for community detection consistent with an intuitive definition of communities and present a Markov chain Monte Carlo procedure for inferring the community structure. A Matlab toolbox with the proposed inference procedure is available for download. On synthetic and real networks, our model detects communities...... consistent with ground truth, and on real networks, it outperforms existing approaches in predicting missing links. This suggests that community structure is an important structural property of networks that should be explicitly modeled....

  17. Evaluation of a Partial Genome Screening of Two Asthma Susceptibility Regions Using Bayesian Network Based Bayesian Multilevel Analysis of Relevance

    OpenAIRE

    Ildikó Ungvári; Gábor Hullám; Péter Antal; Petra Sz Kiszel; András Gézsi; Éva Hadadi; Viktor Virág; Gergely Hajós; András Millinghoffer; Adrienne Nagy; András Kiss; Semsei, Ágnes F.; Gergely Temesi; Béla Melegh; Péter Kisfali

    2012-01-01

    Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls). The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA). Th...

  18. Bayesian Word Sense Induction

    OpenAIRE

    Brody, Samuel; Lapata, Mirella

    2009-01-01

    Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim is to partition a word’s contexts into different classes, each representing a word sense. Our work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samp...

  19. Bayesian Generalized Rating Curves

    OpenAIRE

    Helgi Sigurðarson 1985

    2014-01-01

    A rating curve is a curve or a model that describes the relationship between water elevation, or stage, and discharge in an observation site in a river. The rating curve is fit from paired observations of stage and discharge. The rating curve then predicts discharge given observations of stage and this methodology is applied as stage is substantially easier to directly observe than discharge. In this thesis a statistical rating curve model is proposed working within the framework of Bayesian...

  20. Efficient Bayesian Phase Estimation

    Science.gov (United States)

    Wiebe, Nathan; Granade, Chris

    2016-07-01

    We introduce a new method called rejection filtering that we use to perform adaptive Bayesian phase estimation. Our approach has several advantages: it is classically efficient, easy to implement, achieves Heisenberg limited scaling, resists depolarizing noise, tracks time-dependent eigenstates, recovers from failures, and can be run on a field programmable gate array. It also outperforms existing iterative phase estimation algorithms such as Kitaev's method.

  1. Bayesian theory and applications

    CERN Document Server

    Dellaportas, Petros; Polson, Nicholas G; Stephens, David A

    2013-01-01

    The development of hierarchical models and Markov chain Monte Carlo (MCMC) techniques forms one of the most profound advances in Bayesian analysis since the 1970s and provides the basis for advances in virtually all areas of applied and theoretical Bayesian statistics. This volume guides the reader along a statistical journey that begins with the basic structure of Bayesian theory, and then provides details on most of the past and present advances in this field. The book has a unique format. There is an explanatory chapter devoted to each conceptual advance followed by journal-style chapters that provide applications or further advances on the concept. Thus, the volume is both a textbook and a compendium of papers covering a vast range of topics. It is appropriate for a well-informed novice interested in understanding the basic approach, methods and recent applications. Because of its advanced chapters and recent work, it is also appropriate for a more mature reader interested in recent applications and devel...

  2. Bayesian Attractor Learning

    Science.gov (United States)

    Wiegerinck, Wim; Schoenaker, Christiaan; Duane, Gregory

    2016-04-01

    Recently, methods for model fusion by dynamically combining model components in an interactive ensemble have been proposed. In these proposals, fusion parameters have to be learned from data. One can view these systems as parametrized dynamical systems. We address the question of learnability of dynamical systems with respect to both short term (vector field) and long term (attractor) behavior. In particular we are interested in learning in the imperfect model class setting, in which the ground truth has a higher complexity than the models, e.g. due to unresolved scales. We take a Bayesian point of view and we define a joint log-likelihood that consists of two terms, one is the vector field error and the other is the attractor error, for which we take the L1 distance between the stationary distributions of the model and the assumed ground truth. In the context of linear models (like so-called weighted supermodels), and assuming a Gaussian error model in the vector fields, vector field learning leads to a tractable Gaussian solution. This solution can then be used as a prior for the next step, Bayesian attractor learning, in which the attractor error is used as a log-likelihood term. Bayesian attractor learning is implemented by elliptical slice sampling, a sampling method for systems with a Gaussian prior and a non Gaussian likelihood. Simulations with a partially observed driven Lorenz 63 system illustrate the approach.

  3. Ultrahigh Dimensional Variable Selection for Interpolation of Point Referenced Spatial Data: A Digital Soil Mapping Case Study.

    Science.gov (United States)

    Fitzpatrick, Benjamin R; Lamb, David W; Mengersen, Kerrie

    2016-01-01

    Modern soil mapping is characterised by the need to interpolate point referenced (geostatistical) observations and the availability of large numbers of environmental characteristics for consideration as covariates to aid this interpolation. Modelling tasks of this nature also occur in other fields such as biogeography and environmental science. This analysis employs the Least Angle Regression (LAR) algorithm for fitting Least Absolute Shrinkage and Selection Operator (LASSO) penalized Multiple Linear Regressions models. This analysis demonstrates the efficiency of the LAR algorithm at selecting covariates to aid the interpolation of geostatistical soil carbon observations. Where an exhaustive search of the models that could be constructed from 800 potential covariate terms and 60 observations would be prohibitively demanding, LASSO variable selection is accomplished with trivial computational investment. PMID:27603135

  4. An alternative approach to approximate entropy threshold value (r) selection: application to heart rate variability and systolic blood pressure variability under postural challenge.

    Science.gov (United States)

    Singh, A; Saini, B S; Singh, D

    2016-05-01

    This study presents an alternative approach to approximate entropy (ApEn) threshold value (r) selection. There are two limitations of traditional ApEn algorithm: (1) the occurrence of undefined conditional probability (CPu) where no template match is found and (2) use of a crisp tolerance (radius) threshold 'r'. To overcome these limitations, CPu is substituted with optimum bias setting ɛ opt which is found by varying ɛ from (1/N - m) to 1 in the increments of 0.05, where N is the length of the series and m is the embedding dimension. Furthermore, an alternative approach for selection of r based on binning the distance values obtained by template matching to calculate ApEnbin is presented. It is observed that ApEnmax, ApEnchon and ApEnbin converge for ɛ opt = 0.6 in 50 realizations (n = 50) of random number series of N = 300. Similar analysis suggests ɛ opt = 0.65 and ɛ opt = 0.45 for 50 realizations each of fractional Brownian motion and MIX(P) series (Lu et al. in J Clin Monit Comput 22(1):23-29, 2008). ɛ opt = 0.5 is suggested for heart rate variability (HRV) and systolic blood pressure variability (SBPV) signals obtained from 50 young healthy subjects under supine and upright position. It is observed that (1) ApEnbin of HRV is lower than SBPV, (2) ApEnbin of HRV increases from supine to upright due to vagal inhibition and (3) ApEnbin of BPV decreases from supine to upright due to sympathetic activation. Moreover, merit of ApEnbin is that it provides an alternative to the cumbersome ApEnmax procedure. PMID:26253284

  5. Identifying market segments in consumer markets: variable selection and data interpretation

    OpenAIRE

    Tonks, D G

    2004-01-01

    Market segmentation is often articulated as being a process which displays the recognised features of classical rationalism but in part; convention, convenience, prior experience and the overarching impact of rhetoric will influence if not determine the outcomes of a segmentation exercise. Particular examples of this process are addressed critically in this paper which concentrates on the issues of variable choice for multivariate approaches to market segmentation and also the methods used fo...

  6. COMPARISON OF SELECTED PSYCHOLOGICAL VARIABLES AMONG UNIVERSITY WOMEN FOOTBALL PLAYERS AT DIFFERENT TOPOGRAPHY AND POSITIONAL PLAY

    OpenAIRE

    Suganya. S

    2014-01-01

    The purpose of the study was to compare the psychological variables such as anxiety, achievement motivation, self concept, locus of control and team relationship among the university women football players of south and west zone Defenders, Midfielders and Forwards. The requirements for the collection of data through administration of questionnaires were explained to the subjects so as to avoid any ambiguity of the effort required on their part and prior to the administration o...

  7. A scale-independent clustering method with automatic variable selection based on trees

    OpenAIRE

    Lynch, Sarah K.

    2014-01-01

    Approved for public release; distribution is unlimited. Clustering is the process of putting observations into groups based on their distance, or dissimilarity, from one another. Measuring distance for continuous variables often requires scaling or monotonic transformation. Determining dissimilarity when observations have both continuous and categorical measurements can be difficult because each type of measurement must be approached differently. We introduce a new clustering method that u...

  8. GENOTYPIC VARIABILITY ESTIMATES OF AGRONOMIC TRAITS FOR SELECTION IN A SWEETPOTATO (IPOMOEA BATATAS POLYCROSS POPULATION IN PAPUA NEW GUINEA

    Directory of Open Access Journals (Sweden)

    Boney Wera

    2015-07-01

    Full Text Available Successful crop breeding program incorporating agronomic and consumer preferred traits can be achieved by recognizing the existence and degree of variability among sweetpotato (Ipomoea batatas, (L. Lam. genotypes. Understanding genetic variability, genotypic and phenotypic correlation and inheritance among agronomic traits is fundamental to improvement of any crop. The study was carried out with the objective to estimate the genotypic variability and other yield related traits of highlands sweetpotato in Papua New Guinea in a polycross population. A total of 8 genotypes of sweetpotato derived from the polycross were considered in two cycles of replicated field experiments. Analysis of Variance was computed to contrast the variability within the selected genotypes based on high yielding β-carotene rich orange-fleshed sweetpotato. The results revealed significant differences among the genotypes. Genotypic coefficient of variation (GCV % was lower than phenotypic coefficient of variation (PCV % for all traits studied. Relatively high genetic variance, along with high heritability and expected genetic advances were observed in NMTN and ABYield. Harvest index (HI, scab and gall mite damage scores had heritability of 67%, 66% and 37% respectively. Marketable tuber yield (MTYield and total tuber yield (TTYield had lower genetic variance, low heritability and low genetic advance. There is need to investigate correlated inheritance among these traits. Selecting directly for yield improvement in polycross population may not be very efficient as indicated by the results. Therefore, it can be conclude that the variability within sweetpotato genotypes collected from polycross population in Aiyura Research Station for tuber yield is low and the extent of its yield improvement is narrow.

  9. Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification

    CERN Document Server

    Richards, Joseph W; Brink, Henrik; Miller, Adam A; Bloom, Joshua S; Butler, Nathaniel R; James, J Berian; Long, James P; Rice, John

    2011-01-01

    Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---i...

  10. Learning Bayesian networks using genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    Chen Fei; Wang Xiufeng; Rao Yimei

    2007-01-01

    A new method to evaluate the fitness of the Bayesian networks according to the observed data is provided. The main advantage of this criterion is that it is suitable for both the complete and incomplete cases while the others not.Moreover it facilitates the computation greatly. In order to reduce the search space, the notation of equivalent class proposed by David Chickering is adopted. Instead of using the method directly, the novel criterion, variable ordering, and equivalent class are combined,moreover the proposed mthod avoids some problems caused by the previous one. Later, the genetic algorithm which allows global convergence, lack in the most of the methods searching for Bayesian network is applied to search for a good model in thisspace. To speed up the convergence, the genetic algorithm is combined with the greedy algorithm. Finally, the simulation shows the validity of the proposed approach.

  11. Target selection of classical pulsating variables for space-based photometry

    CERN Document Server

    Plachy, E; Szabó, R; Kolenberg, K; Bányai, E

    2016-01-01

    In a few years the Kepler and TESS missions will provide ultra-precise photometry for thousands of RR Lyrae and hundreds of Cepheid stars. In the extended Kepler mission all targets are proposed in the Guest Observer (GO) Program, while the TESS space telescope will work with full frame images and a ~15-16th mag brightness limit with the possibility of short cadence measurements for a limited number of pre-selected objects. This paper highlights some details of the enormous and important work of the target selection process made by the members of Working Group 7 (WG#7) of the Kepler and TESS Asteroseismic Science Consortium.

  12. Computational statistics using the Bayesian Inference Engine

    Science.gov (United States)

    Weinberg, Martin D.

    2013-09-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.

  13. COPD phenotypes on computed tomography and its correlation with selected lung function variables in severe patients

    Directory of Open Access Journals (Sweden)

    da Silva SMD

    2016-03-01

    Full Text Available Silvia Maria Doria da Silva, Ilma Aparecida Paschoal, Eduardo Mello De Capitani, Marcos Mello Moreira, Luciana Campanatti Palhares, Mônica Corso PereiraPneumology Service, Department of Internal Medicine, School of Medical Sciences, State University of Campinas (UNICAMP, Campinas, São Paulo, BrazilBackground: Computed tomography (CT phenotypic characterization helps in understanding the clinical diversity of chronic obstructive pulmonary disease (COPD patients, but its clinical relevance and its relationship with functional features are not clarified. Volumetric capnography (VC uses the principle of gas washout and analyzes the pattern of CO2 elimination as a function of expired volume. The main variables analyzed were end-tidal concentration of carbon dioxide (ETCO2, Slope of phase 2 (Slp2, and Slope of phase 3 (Slp3 of capnogram, the curve which represents the total amount of CO2 eliminated by the lungs during each breath.Objective: To investigate, in a group of patients with severe COPD, if the phenotypic analysis by CT could identify different subsets of patients, and if there was an association of CT findings and functional variables.Subjects and methods: Sixty-five patients with COPD Gold III–IV were admitted for clinical evaluation, high-resolution CT, and functional evaluation (spirometry, 6-minute walk test [6MWT], and VC. The presence and profusion of tomography findings were evaluated, and later, the patients were identified as having emphysema (EMP or airway disease (AWD phenotype. EMP and AWD groups were compared; tomography findings scores were evaluated versus spirometric, 6MWT, and VC variables.Results: Bronchiectasis was found in 33.8% and peribronchial thickening in 69.2% of the 65 patients. Structural findings of airways had no significant correlation with spirometric variables. Air trapping and EMP were strongly correlated with VC variables, but in opposite directions. There was some overlap between the EMP and AWD

  14. Characterization of Machine Variability and Progressive Heat Treatment in Selective Laser Melting of Inconel 718

    Science.gov (United States)

    Prater, Tracie; Tilson, Will; Jones, Zack

    2015-01-01

    The absence of an economy of scale in spaceflight hardware makes additive manufacturing an immensely attractive option for propulsion components. As additive manufacturing techniques are increasingly adopted by government and industry to produce propulsion hardware in human-rated systems, significant development efforts are needed to establish these methods as reliable alternatives to conventional subtractive manufacturing. One of the critical challenges facing powder bed fusion techniques in this application is variability between machines used to perform builds. Even with implementation of robust process controls, it is possible for two machines operating at identical parameters with equivalent base materials to produce specimens with slightly different material properties. The machine variability study presented here evaluates 60 specimens of identical geometry built using the same parameters. 30 samples were produced on machine 1 (M1) and the other 30 samples were built on machine 2 (M2). Each of the 30-sample sets were further subdivided into three subsets (with 10 specimens in each subset) to assess the effect of progressive heat treatment on machine variability. The three categories for post-processing were: stress relief, stress relief followed by hot isostatic press (HIP), and stress relief followed by HIP followed by heat treatment per AMS 5664. Each specimen (a round, smooth tensile) was mechanically tested per ASTM E8. Two formal statistical techniques, hypothesis testing for equivalency of means and one-way analysis of variance (ANOVA), were applied to characterize the impact of machine variability and heat treatment on six material properties: tensile stress, yield stress, modulus of elasticity, fracture elongation, and reduction of area. This work represents the type of development effort that is critical as NASA, academia, and the industrial base work collaboratively to establish a path to certification for additively manufactured parts. For future

  15. Bayesian optimization for materials design

    OpenAIRE

    Frazier, Peter I.; Wang, Jialei

    2015-01-01

    We introduce Bayesian optimization, a technique developed for optimizing time-consuming engineering simulations and for fitting machine learning models on large datasets. Bayesian optimization guides the choice of experiments during materials design and discovery to find good material designs in as few experiments as possible. We focus on the case when materials designs are parameterized by a low-dimensional vector. Bayesian optimization is built on a statistical technique called Gaussian pro...

  16. Empirically Driven Variable Selection for the Estimation of Causal Effects with Observational Data

    Science.gov (United States)

    Keller, Bryan; Chen, Jianshen

    2016-01-01

    Observational studies are common in educational research, where subjects self-select or are otherwise non-randomly assigned to different interventions (e.g., educational programs, grade retention, special education). Unbiased estimation of a causal effect with observational data depends crucially on the assumption of ignorability, which specifies…

  17. An Investigation of the Relation Between the Developmental Parabolic Curve and Selected Personality Variables.

    Science.gov (United States)

    Flugsrud, Marcia R.

    This study is designed to determine whether data obtained cross-sectionally from a sample of subjects in the middle childhood range on selected personality characteristics could be well described by a concave parabolic curve and thus linked to the closure behaviour elicited from the subjects. Specifically, the investigation seeks to determine if…

  18. Heterogeneous selection on a heritable temperament trait in a variable environment

    NARCIS (Netherlands)

    Quinn, John L.; Patrick, Samantha C.; Bouwhuis, Sandra; Wilkin, Teddy A.; Sheldon, Ben C.

    2009-01-01

    P> Temperament traits increasingly provide a focus for investigating the evolutionary ecology of behavioural variation. Here, we examine the underlying causes and selective consequences of individual variation in the temperament trait 'exploration behaviour in a novel environment' (EB, based on an 8

  19. American College Student Values: Their Relationship to Selected Personal and Academic Variables.

    Science.gov (United States)

    Ritter, Carolyn E.

    A 20-item chi-square test of independence was administered to a selected sample of college students that was stratified 50% male and 50% female. Male and female responses showed a significant difference on 18 of the 20 items. The 2 items on which attitudes of both sexes were the same were the role of government in business and a solution to the…

  20. The Relationship between Selected Body Composition Variables and Muscular Endurance in Women

    Science.gov (United States)

    Esco, Michael R.; Olson, Michele S.; Williford, Henry N.

    2010-01-01

    The primary purpose of this study was to determine if muscular endurance is affected by referenced waist circumference groupings, independent of body mass and subcutaneous abdominal fat, in women. This study also explored whether selected body composition measures were associated with muscular endurance. Eighty-four women were measured for height,…

  1. Identification of solid state fermentation degree with FT-NIR spectroscopy: Comparison of wavelength variable selection methods of CARS and SCARS

    Science.gov (United States)

    Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai

    2015-10-01

    The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree.

  2. The effect of aquatic plyometric training with and without resistance on selected physical fitness variables among volleyball players

    Directory of Open Access Journals (Sweden)

    K. KAMALAKKANNAN

    2011-06-01

    Full Text Available The purpose of this study is to analyze the effect of aquatic plyometric training with and without the use ofweights on selected physical fitness variables among volleyball players. To achieve the purpose of these study 36physically active undergraduate volleyball players between 18 and 20 years of age volunteered as participants.The participants were randomly categorized into three groups of 12 each: a control group (CG, an aquaticPlyometric training with weight group (APTWG, and an aquatic Plyometric training without weight group(APTWOG. The subjects of the control group were not exposed to any training. Both experimental groupsunderwent their respective experimental treatment for 12 weeks, 3 days per week and a single session on eachday. Speed, endurance, and explosive power were measured as the dependent variables for this study. 36 days ofexperimental treatment was conducted for all the groups and pre and post data was collected. The collected datawere analyzed using an analysis of covariance (ANCOVA and followed by a Scheffé’s post hoc test. The resultsrevealed significant differences between groups on all the selected dependent variables. This study demonstratedthat aquatic plyometric training can be one effective means for improving speed, endurance, and explosivepower in volley ball players

  3. Bayesian Posteriors Without Bayes' Theorem

    CERN Document Server

    Hill, Theodore P

    2012-01-01

    The classical Bayesian posterior arises naturally as the unique solution of several different optimization problems, without the necessity of interpreting data as conditional probabilities and then using Bayes' Theorem. For example, the classical Bayesian posterior is the unique posterior that minimizes the loss of Shannon information in combining the prior and the likelihood distributions. These results, direct corollaries of recent results about conflations of probability distributions, reinforce the use of Bayesian posteriors, and may help partially reconcile some of the differences between classical and Bayesian statistics.

  4. The relationship between selected variables and customer loyalty within an optometric practice environment

    Directory of Open Access Journals (Sweden)

    T. Van Vuuren

    2012-12-01

    Full Text Available Purpose: The purpose of the research that informed this article was to examine the relationship between customer satisfaction, trust, supplier image, commitment and customer loyalty within an optometric practice environment. Problem investigated: Optometric businesses need to adopt their strategies to enhance loyalty, as customer satisfaction is not enough to ensure loyalty and customer retention. An understanding of the variables influencing loyalty could help businesses within the optometric service environment to retain their customers and become more profitable. Methodology: The methodological approach followed was exploratory and quantitative in nature. The sample consisted of 357 customers who visited the practice twice or more over the previous six years. A structured questionnaire, with a five-point Likert scale, was fielded to gather the data. The descriptive and multiple regression analysis approach was used to analyse the results. Collinearity statistics and Pearson's correlation coefficient were also calculated to determine which independent variable has the largest influence on customer loyalty. Findings and implications: The main finding is that customer satisfaction had the highest correlation with customer loyalty. The other independent variables, however, also appear to significantly influence customer loyalty within an optometric practice environment. The implication is that optometric practices need to focus on customer satisfaction, trust, supplier image and commitment when addressing the improvement of customer loyalty. Originality and value of the research: The article contributes to the improvement of customer loyalty within a service business environment that could assist in facilitating larger market share, higher customer retention and greater profitability for the business over the long term.

  5. Selected topics in the classical theory of functions of a complex variable

    CERN Document Server

    Heins, Maurice

    2014-01-01

    Elegant and concise, this text is geared toward advanced undergraduate students acquainted with the theory of functions of a complex variable. The treatment presents such students with a number of important topics from the theory of analytic functions that may be addressed without erecting an elaborate superstructure. These include some of the theory's most celebrated results, which seldom find their way into a first course. After a series of preliminaries, the text discusses properties of meromorphic functions, the Picard theorem, and harmonic and subharmonic functions. Subsequent topics incl

  6. PAC-Bayesian Analysis of Martingales and Multiarmed Bandits

    CERN Document Server

    Seldin, Yevgeny; Shawe-Taylor, John; Peters, Jan; Auer, Peter

    2011-01-01

    We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many ...

  7. SOMBI: Bayesian identification of parameter relations in unstructured cosmological data

    CERN Document Server

    Frank, Philipp; Enßlin, Torsten A

    2016-01-01

    This work describes the implementation and application of a correlation determination method based on Self Organizing Maps and Bayesian Inference (SOMBI). SOMBI aims to automatically identify relations between different observed parameters in unstructured cosmological or astrophysical surveys by automatically identifying data clusters in high-dimensional datasets via the Self Organizing Map neural network algorithm. Parameter relations are then revealed by means of a Bayesian inference within respective identified data clusters. Specifically such relations are assumed to be parametrized as a polynomial of unknown order. The Bayesian approach results in a posterior probability distribution function for respective polynomial coefficients. To decide which polynomial order suffices to describe correlation structures in data, we include a method for model selection, the Bayesian Information Criterion, to the analysis. The performance of the SOMBI algorithm is tested with mock data. As illustration we also provide ...

  8. Optimized diffusion of buck semen for saving genetic variability in selected dairy goat populations

    Directory of Open Access Journals (Sweden)

    Martin Pierre

    2011-02-01

    Full Text Available Abstract Background Current research on quantitative genetics has provided efficient guidelines for the sustainable management of selected populations: genetic gain is maximized while the loss of genetic diversity is maintained at a reasonable rate. However, actual selection schemes are complex, especially for large domestic species, and they have to take into account many operational constraints. This paper deals with the actual selection of dairy goats where the challenge is to optimize diffusion of buck semen on the field. Three objectives are considered simultaneously: i natural service buck replacement (NSR; ii goat replacement (GR; iii semen distribution of young bucks to be progeny-tested. An appropriate optimization method is developed, which involves five analytical steps. Solutions are obtained by simulated annealing and the corresponding algorithms are presented in detail. Results The whole procedure was tested on two French goat populations (Alpine and Saanen breeds and the results presented in the abstract were based on the average of the two breeds. The procedure induced an immediate acceleration of genetic gain in comparison with the current annual genetic gain (0.15 genetic standard deviation unit, as shown by two facts. First, the genetic level of replacement natural service (NS bucks was predicted, 1.5 years ahead at the moment of reproduction, to be equivalent to that of the progeny-tested bucks in service, born from the current breeding scheme. Second, the genetic level of replacement goats was much higher than that of their dams (0.86 unit, which represented 6 years of selection, although dams were only 3 years older than their replacement daughters. This improved genetic gain could be achieved while decreasing inbreeding coefficients substantially. Inbreeding coefficients (% of NS bucks was lower than that of the progeny-tested bucks (-0.17. Goats were also less inbred than their dams (-0.67. Conclusions It was possible to

  9. Induction and selection of superior genetic variables of oil seed rape (brassica napus L.)

    International Nuclear Information System (INIS)

    Dry and uniform seeds of two rape seed varieties, Ganyou-5 and Tower, were subjected to different doses of gamma rays. Genetic variation in yield and yield components generated in M1 was studied in M2 and 30 useful variants were isolated from a large magnetized population. The selected mutants were progeny tested for stability of the characters in M3. Only five out of 30 progenies were identified to be uniform and stable. Further selection was made in the segregating m3 progenies. Results on some of the promising mutants are reported. The effect of irradiation treatment was highly pronounced on pod length, seeds per pod and 1000-seed weight. The genetic changes thus induced would help to evolve high yielding versions of different rape seed varieties under local environmental conditions. (author)

  10. Effect of Integrated Yoga Module on Selected Psychological Variables among Women with Anxiety Problem.

    Science.gov (United States)

    Parthasarathy, S; Jaiganesh, K; Duraisamy

    2014-01-01

    The implementation of yogic practices has proven benefits in both organic and psychological diseases. Forty-five women with anxiety selected by a random sampling method were divided into three groups. Experimental group I was subjected to asanas, relaxation and pranayama while Experimental group II was subjected to an integrated yoga module. The control group did not receive any intervention. Anxiety was measured by Taylor's Manifest Anxiety Scale before and after treatment. Frustration was measured through Reaction to Frustration Scale. All data were spread in an Excel sheet to be analysed with SPSS 16 software using analysis of covariance (ANCOVA). Selected yoga and asanas decreased anxiety and frustration scores but treatment with an integrated yoga module resulted in significant reduction of anxiety and frustration. To conclude, the practice of asanas and yoga decreased anxiety in women, and yoga as an integrated module significantly improved anxiety scores in young women with proven anxiety without any ill effects.

  11. A nonparametric Bayesian method for estimating a response function

    OpenAIRE

    Brown, Scott; Meeden, Glen

    2012-01-01

    Consider the problem of estimating a response function which depends upon a non-stochastic independent variable under our control. The data are independent Bernoulli random variables where the probabilities of success are given by the response function at the chosen values of the independent variable. Here we present a nonparametric Bayesian method for estimating the response function. The only prior information assumed is that the response function can be well approximated by a mixture of st...

  12. Selection of area-level variables from administrative data: an intersectional approach to the study of place and child development.

    Science.gov (United States)

    Kershaw, Paul; Forer, Barry

    2010-05-01

    Given data limitations, neighborhood effects scholarship relies heavily on administrative data to measure area-level constructs. We provide new evidence to guide the selection of indicators from routinely collected sources, focusing on effects on early child development. Informed by an analytic paradigm attuned to the intersection of race, class, and sex, along with population-level data in British Columbia, Canada, our findings signal the need for greater precision when choosing variables in place of the now dominant approaches for measuring constructs like income/wealth, employment, family structure and race/ethnicity. We also provide new evidence about which area-level variables associate with the different domains of child development, as well as how area-level associations vary across urban and rural contexts. PMID:20089438

  13. Bayesian multivariate mixed-scale density estimation

    CERN Document Server

    Canale, Antonio

    2011-01-01

    Although univariate continuous density estimation has received abundant attention in the Bayesian nonparametrics literature, there is essentially no theory on multivariate mixed scale density estimation. In this article, we consider a general framework to jointly model continuous, count and categorical variables under a nonparametric prior, which is induced through rounding latent variables having an unknown density with respect to Lesbesgue measure. For the proposed class of priors, we provide sufficient conditions for large support, strong consistency and rates of posterior contraction. These conditions, which primarily relate to the prior on the latent variable density and heaviness of the tails for the observed continuous variables, allow one to convert sufficient conditions obtained in the setting of multivariate continuous density estimation to the mixed scale case. We provide new results in the multivariate continuous density estimation case, showing the Kullback-Leibler property and strong consistency...

  14. Bayesian prediction and adaptive sampling algorithms for mobile sensor networks online environmental field reconstruction in space and time

    CERN Document Server

    Xu, Yunfei; Dass, Sarat; Maiti, Tapabrata

    2016-01-01

    This brief introduces a class of problems and models for the prediction of the scalar field of interest from noisy observations collected by mobile sensor networks. It also introduces the problem of optimal coordination of robotic sensors to maximize the prediction quality subject to communication and mobility constraints either in a centralized or distributed manner. To solve such problems, fully Bayesian approaches are adopted, allowing various sources of uncertainties to be integrated into an inferential framework effectively capturing all aspects of variability involved. The fully Bayesian approach also allows the most appropriate values for additional model parameters to be selected automatically by data, and the optimal inference and prediction for the underlying scalar field to be achieved. In particular, spatio-temporal Gaussian process regression is formulated for robotic sensors to fuse multifactorial effects of observations, measurement noise, and prior distributions for obtaining the predictive di...

  15. Bayesian inference tools for inverse problems

    Science.gov (United States)

    Mohammad-Djafari, Ali

    2013-08-01

    In this paper, first the basics of Bayesian inference with a parametric model of the data is presented. Then, the needed extensions are given when dealing with inverse problems and in particular the linear models such as Deconvolution or image reconstruction in Computed Tomography (CT). The main point to discuss then is the prior modeling of signals and images. A classification of these priors is presented, first in separable and Markovien models and then in simple or hierarchical with hidden variables. For practical applications, we need also to consider the estimation of the hyper parameters. Finally, we see that we have to infer simultaneously on the unknowns, the hidden variables and the hyper parameters. Very often, the expression of this joint posterior law is too complex to be handled directly. Indeed, rarely we can obtain analytical solutions to any point estimators such the Maximum A posteriori (MAP) or Posterior Mean (PM). Three main tools are then can be used: Laplace approximation (LAP), Markov Chain Monte Carlo (MCMC) and Bayesian Variational Approximations (BVA). To illustrate all these aspects, we will consider a deconvolution problem where we know that the input signal is sparse and propose to use a Student-t prior for that. Then, to handle the Bayesian computations with this model, we use the property of Student-t which is modelling it via an infinite mixture of Gaussians, introducing thus hidden variables which are the variances. Then, the expression of the joint posterior of the input signal samples, the hidden variables (which are here the inverse variances of those samples) and the hyper-parameters of the problem (for example the variance of the noise) is given. From this point, we will present the joint maximization by alternate optimization and the three possible approximation methods. Finally, the proposed methodology is applied in different applications such as mass spectrometry, spectrum estimation of quasi periodic biological signals and

  16. Natural Selection and Genetic Drift: Neutral and adaptive genetic variability of hatchery versus wild populations in brown trout Salmo trutta

    Directory of Open Access Journals (Sweden)

    Tamara Schenekar

    2015-11-01

    Full Text Available Genetic drift and natural selection are two of the major forces shaping the genetic makeup of a population. Genetic drift reduces genetic variability due to the random loss of alleles during the transition from one generation to the next one. The smaller the population, the stronger genetic drift is. Natural selection favors the spread of specific alleles within a population over time, namely those alleles that are beneficial in the specific environment of this population. Individuals that carry alleles that are less advantageous have a lower probability to survive and reproduce. For establishing a captive population, very often, only a small number of individuals are taken and the number breeding individuals used to maintain the population is limited, thus increasing the amount of genetic drift. On the other hand, the drastically different environment in captivity (artificial diet, higher individual density, altered pathogen pressure, etc. may favor alleles that are maladaptive for individuals when they are released back in the wild, e.g. for stocking measures. We screened both, neutral and adaptive genetic markers, in order to assess the relative importance of genetic drift and selection pressure on wild and hatchery populations of Austrian brown trout. We confirm a strong positive selection pressure on an adaptive locus of the Major histocompatibility Complex (MHC, whereas the signal of this selection pressure was more pronounced in hatchery populations. This may either stem from stronger genetic drift in wild populations due to smaller effective population sizes or a stronger directional selection in these wild populations, whereby only particular genetic variants proved to be adaptive in each specific environment. Therefore, the alleles arising from the hatchery selection regime may be detrimental in the wild, which can lead to lower survival rates of stocked fish in wild environments.

  17. Measure Transformer Semantics for Bayesian Machine Learning

    Science.gov (United States)

    Borgström, Johannes; Gordon, Andrew D.; Greenberg, Michael; Margetson, James; van Gael, Jurgen

    The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.

  18. Conflict Management Styles of Selected Managers and Their Relationship With Management and Organization Variables

    Directory of Open Access Journals (Sweden)

    Concepcion Martires

    1990-12-01

    Full Text Available This study sought to determine the relationship between the conflict management styles of managers and certain management and organization factors. A total of 462 top, middle, and lower managers from 72 companies participated in the study which utilized the Thomas-Killman Conflict Mode Instrument. To facilitate the computation of the statistical data, a microcomputer and a software package was used.The majority of the managers of the 17 types of organization included in the study use collaborative mode of managing conflict. This finding is congruent with the findings of past studies conducted on managers of commercial banks, service, manufacturing, trading advertising, appliance, investment houses, and overseas recruitment industries showing their high degree of objectivity and assertiveness of their own personal goals and of other people's concerns. The second dominant style, which is compromising, indicates their desire in sharing and searching for solutions that result in satisfaction among conflicting parties. This finding is highly consistent with the strong Filipino value of smooth interpersonal relationships (SIR as reflected and discussed in the numerous researches on Filipino values.The chi-square tests generated by the computer package in statistics showed independence between the manager's conflict management styles and each of the variables of sex, civil status, position level at work, work experience, type of corporation, and number of subordinates. This result is again congruent with those of past studies conducted in the Philippines. The past and present findings may imply that conflict management mode may be a highly personal style that is not dependent on any of these variables included in the study. However, the chi-square tests show that management style is dependent on the manager's age and educational attainment.

  19. Applied Bayesian modelling

    CERN Document Server

    Congdon, Peter

    2014-01-01

    This book provides an accessible approach to Bayesian computing and data analysis, with an emphasis on the interpretation of real data sets. Following in the tradition of the successful first edition, this book aims to make a wide range of statistical modeling applications accessible using tested code that can be readily adapted to the reader's own applications. The second edition has been thoroughly reworked and updated to take account of advances in the field. A new set of worked examples is included. The novel aspect of the first edition was the coverage of statistical modeling using WinBU

  20. Computationally efficient Bayesian tracking

    Science.gov (United States)

    Aughenbaugh, Jason; La Cour, Brian

    2012-06-01

    In this paper, we describe the progress we have achieved in developing a computationally efficient, grid-based Bayesian fusion tracking system. In our approach, the probability surface is represented by a collection of multidimensional polynomials, each computed adaptively on a grid of cells representing state space. Time evolution is performed using a hybrid particle/grid approach and knowledge of the grid structure, while sensor updates use a measurement-based sampling method with a Delaunay triangulation. We present an application of this system to the problem of tracking a submarine target using a field of active and passive sonar buoys.

  1. Bayesian Geostatistical Design

    DEFF Research Database (Denmark)

    Diggle, Peter; Lophaven, Søren Nymand

    2006-01-01

    locations to, or deletion of locations from, an existing design, and prospective design, which consists of choosing positions for a new set of sampling locations. We propose a Bayesian design criterion which focuses on the goal of efficient spatial prediction whilst allowing for the fact that model......This paper describes the use of model-based geostatistics for choosing the set of sampling locations, collectively called the design, to be used in a geostatistical analysis. Two types of design situation are considered. These are retrospective design, which concerns the addition of sampling...

  2. A DNA-based system for selecting and displaying the combined result of two input variables

    DEFF Research Database (Denmark)

    Liu, Huajie; Wang, Jianbang; Song, S;

    2015-01-01

    demonstrate this capability in a DNA-based system that takes two input numbers, represented in DNA strands, and returns the result of their multiplication, writing this as a number in a display. Unlike a conventional calculator, this system operates by selecting the result from a library of solutions rather...... than through logic operations. The multiplicative example demonstrated here illustrates a much more general capability—to generate a unique output for any distinct pair of DNA inputs. The system thereby functions as a lookup table and could be a key component in future, more powerful data...

  3. The connection between selective referrals for radical cystectomy and radical prostatectomy and volume-outcome effects: an instrumental variables analysis.

    Science.gov (United States)

    Allareddy, Veerasathpurush; Ward, Marcia M; Wehby, George L; Konety, Badrinath R

    2012-01-01

    This study delineates the roles of "selective referrals" and "practice makes perfect" in the hospital procedure volume and in-hospital mortality association for radical cystectomy and radical prostatectomy. This is a retrospective analysis of the Nationwide Inpatient Sample (years 2000-2004). All hospitalizations with primary procedure codes for radical cystectomy and radical prostatectomy were selected. The association between hospital procedure volume and in-hospital mortality was examined using generalized estimating equations and by instrumental variables approaches. There was an inverse association between hospital procedure volume and in-hospital mortality for radical cystectomy (odds ratio = 0.57; 95% confidence interval = 0.38-0.87; P practice makes perfect." PMID:22205768

  4. Technical note: Bayesian calibration of dynamic ruminant nutrition models.

    Science.gov (United States)

    Reed, K F; Arhonditsis, G B; France, J; Kebreab, E

    2016-08-01

    Mechanistic models of ruminant digestion and metabolism have advanced our understanding of the processes underlying ruminant animal physiology. Deterministic modeling practices ignore the inherent variation within and among individual animals and thus have no way to assess how sources of error influence model outputs. We introduce Bayesian calibration of mathematical models to address the need for robust mechanistic modeling tools that can accommodate error analysis by remaining within the bounds of data-based parameter estimation. For the purpose of prediction, the Bayesian approach generates a posterior predictive distribution that represents the current estimate of the value of the response variable, taking into account both the uncertainty about the parameters and model residual variability. Predictions are expressed as probability distributions, thereby conveying significantly more information than point estimates in regard to uncertainty. Our study illustrates some of the technical advantages of Bayesian calibration and discusses the future perspectives in the context of animal nutrition modeling.

  5. Genetic variability and selection for laticiferous system characters in Hevea brasiliensis

    Directory of Open Access Journals (Sweden)

    Paulo de Souza Gonçalves

    2005-09-01

    Full Text Available Six laticiferous system characters were investigated in 22 three-year-old, half-sib rubber tree [Hevea brasiliensis (Willd. ex Adr. de Juss. Muell.-Arg.] progenies, evaluated at three sites (Votuporanga, Pindorama and Jaú, all in the São Paulo State, Brazil. The traits examined were: average rubber yield (Pp, average bark thickness (Bt, number of latex vessel rings (Lv, average distance between consecutive latex vessel rings (Dc, density of latex vessels per 5 mm per ring averaged over all rings (Dd and the diameter of the latex vessels (Di. The joint analysis showed that site effect and progeny x sites interaction were significant for all traits, except Lv. Estimates of individual heritabilities across the three sites were high for Bt; moderate for Lv, Pp and Dc; low for Dd and very low for Di. Genetic correlations in the joint analysis showed high positive correlations between Pp and the other traits. Selecting the best five progenies would result in genetic gains of 24.91% for Pp while selecting best two plants within a progeny would result in a Pp genetic gain of 30.98%.

  6. Inference in hybrid Bayesian networks

    DEFF Research Database (Denmark)

    Lanseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael;

    2009-01-01

    Since the 1980s, Bayesian Networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability-techniques (like fault trees...... decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability....

  7. Ecohydrological model parameter selection for stream health evaluation.

    Science.gov (United States)

    Woznicki, Sean A; Nejadhashemi, A Pouyan; Ross, Dennis M; Zhang, Zhen; Wang, Lizhu; Esfahanian, Abdol-Hossein

    2015-04-01

    Variable selection is a critical step in development of empirical stream health prediction models. This study develops a framework for selecting important in-stream variables to predict four measures of biological integrity: total number of Ephemeroptera, Plecoptera, and Trichoptera (EPT) taxa, family index of biotic integrity (FIBI), Hilsenhoff biotic integrity (HBI), and fish index of biotic integrity (IBI). Over 200 flow regime and water quality variables were calculated using the Hydrologic Index Tool (HIT) and Soil and Water Assessment Tool (SWAT). Streams of the River Raisin watershed in Michigan were grouped using the Strahler stream classification system (orders 1-3 and orders 4-6), k-means clustering technique (two clusters: C1 and C2), and all streams (one grouping). For each grouping, variable selection was performed using Bayesian variable selection, principal component analysis, and Spearman's rank correlation. Following selection of best variable sets, models were developed to predict the measures of biological integrity using adaptive-neuro fuzzy inference systems (ANFIS), a technique well-suited to complex, nonlinear ecological problems. Multiple unique variable sets were identified, all which differed by selection method and stream grouping. Final best models were mostly built using the Bayesian variable selection method. The most effective stream grouping method varied by health measure, although k-means clustering and grouping by stream order were always superior to models built without grouping. Commonly selected variables were related to streamflow magnitude, rate of change, and seasonal nitrate concentration. Each best model was effective in simulating stream health observations, with EPT taxa validation R2 ranging from 0.67 to 0.92, FIBI ranging from 0.49 to 0.85, HBI from 0.56 to 0.75, and fish IBI at 0.99 for all best models. The comprehensive variable selection and modeling process proposed here is a robust method that extends our

  8. Joint variable and rank selection for parsimonious estimation of high dimensional matrices

    CERN Document Server

    Bunea, Florentina; Wegkamp, Marten

    2011-01-01

    This article is devoted to optimal dimension reduction methods for sparse, high dimensional multivariate response regression models. Both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower dimensional approximations of the parameter matrix in such models. We show in this article that important gains in prediction accuracy can be obtained by considering them jointly. For this, we first motivate a new class of sparse multivariate regression models, in which the coefficient matrix has low rank {\\bf and} zero rows or can be well approximated by such a matrix. Then, we introduce estimators that are based on penalized least squares, with novel penalties that impose simultaneous row and rank restrictions on the coefficient matrix. We prove that these estimators indeed adapt to the unknown matrix sparsity and have fast rates of convergence. We support our theoretica...

  9. Application of SEAWAT to select variable-density and viscosity problems

    Science.gov (United States)

    Dausman, Alyssa M.; Langevin, Christian D.; Thorne, Danny T., Jr.; Sukop, Michael C.

    2010-01-01

    SEAWAT is a combined version of MODFLOW and MT3DMS, designed to simulate three-dimensional, variable-density, saturated groundwater flow. The most recent version of the SEAWAT program, SEAWAT Version 4 (or SEAWAT_V4), supports equations of state for fluid density and viscosity. In SEAWAT_V4, fluid density can be calculated as a function of one or more MT3DMS species, and optionally, fluid pressure. Fluid viscosity is calculated as a function of one or more MT3DMS species, and the program also includes additional functions for representing the dependence of fluid viscosity on temperature. This report documents testing of and experimentation with SEAWAT_V4 with six previously published problems that include various combinations of density-dependent flow due to temperature variations and/or concentration variations of one or more species. Some of the problems also include variations in viscosity that result from temperature differences in water and oil. Comparisons between the results of SEAWAT_V4 and other published results are generally consistent with one another, with minor differences considered acceptable.

  10. Spatial and temporal variability of microbes in selected soils at the Nevada Test Site

    Energy Technology Data Exchange (ETDEWEB)

    Angerer, J.P.; Winkel, V.K.; Ostler, W.K.; Hall, P.F.

    1993-12-31

    Large areas encompassing almost 800 hectares on the Nevada Test Site, Nellis Air Force Range and the Tonopah Test Range are contaminated with plutonium. Decontamination of plutonium from these sites may involve removal of plants and almost 370,000 cubic meters of soil. The soil may be subjected to a series of processes to remove plutonium. After decontamination, the soils will be returned to the site and revegetated. There is a paucity of information on the spatial and temporal distribution of microbes in soils of the Mojave and Great Basin Deserts. Therefore, this study was initiated to determine the biomass and diversity of microbes in soils prior to decontamination. Soils were collected to a depth of 10 cm along each of five randomly located 30-m transects at each of four sites. To ascertain spatial differences, soils were collected from beneath major shrubs and from associated interspaces. Soils were collected every three to four months to determine temporal (seasonal) differences in microbial parameters. Soils from beneath shrubs generally had greater active fungi and bacteria, and greater non-amended respiration than soils from interspaces. Temporal variability also was found; total and active fungi, and non-amended respiration were correlated with soil moisture at the time of sampling. Information from this study will aid in determining the effects of plutonium decontamination on soil microorganisms, and what measures, if any, will be required to restore microbial populations during revegetation of these sites.

  11. Impact of oil price shocks on selected macroeconomic variables in Nigeria

    International Nuclear Information System (INIS)

    The impact of oil price shocks on the macroeconomy has received a great deal of attention since the 1970 s. Initially, many empirical studies found a significant negative effect between oil price shocks and GDP but more recently, empirical studies have reported an insignificant relationship between oil shocks and the macroeconomy. A key feature of existing research is that it applies predominantly to advanced, oil-importing countries. For oil-exporting countries, different conclusions are expected but this can only be ascertained empirically. This study conducts an empirical analysis of the effects of oil price shocks on a developing country oil-exporter - Nigeria. Our findings showed that oil price shocks do not have a major impact on most macroeconomic variables in Nigeria. The results of the Granger-causality tests, impulse response functions, and variance decomposition analysis all showed that different measures of linear and positive oil shocks have not caused output, government expenditure, inflation, and the real exchange rate. The tests support the existence of asymmetric effects of oil price shocks because we find that negative oil shocks significantly cause output and the real exchange rate. (author)

  12. Bayesian Inference on Gravitational Waves

    Directory of Open Access Journals (Sweden)

    Asad Ali

    2015-12-01

    Full Text Available The Bayesian approach is increasingly becoming popular among the astrophysics data analysis communities. However, the Pakistan statistics communities are unaware of this fertile interaction between the two disciplines. Bayesian methods have been in use to address astronomical problems since the very birth of the Bayes probability in eighteenth century. Today the Bayesian methods for the detection and parameter estimation of gravitational waves have solid theoretical grounds with a strong promise for the realistic applications. This article aims to introduce the Pakistan statistics communities to the applications of Bayesian Monte Carlo methods in the analysis of gravitational wave data with an  overview of the Bayesian signal detection and estimation methods and demonstration by a couple of simplified examples.

  13. Classification of MALDI-MS imaging data of tissue microarrays using canonical correlation analysis-based variable selection.

    Science.gov (United States)

    Winderbaum, Lyron; Koch, Inge; Mittal, Parul; Hoffmann, Peter

    2016-06-01

    Applying MALDI-MS imaging to tissue microarrays (TMAs) provides access to proteomics data from large cohorts of patients in a cost- and time-efficient way, and opens the potential for applying this technology in clinical diagnosis. The complexity of these TMA data-high-dimensional low sample size-provides challenges for the statistical analysis, as classical methods typically require a nonsingular covariance matrix that cannot be satisfied if the dimension is greater than the sample size. We use TMAs to collect data from endometrial primary carcinomas from 43 patients. Each patient has a lymph node metastasis (LNM) status of positive or negative, which we predict on the basis of the MALDI-MS imaging TMA data. We propose a variable selection approach based on canonical correlation analysis that explicitly uses the LNM information. We apply LDA to the selected variables only. Our method misclassifies 2.3-20.9% of patients by leave-one-out cross-validation and strongly outperforms LDA after reduction of the original data with principle component analysis. PMID:27028088

  14. Adaptation Strategies to Combating Climate Variability and Extremity among Farmers in Selected Farm Settlements in Oyo State, Nigeria

    Directory of Open Access Journals (Sweden)

    BOROKINI T.I

    2014-09-01

    Full Text Available The adverse effects of climate variability and extremities on agriculture in Africa have been widely reported. This calls for adaptive strategies in farming so as to reduce vulnerability and ensure food security. This study was therefore conducted to evaluate the awareness of farmers to climate variability and their adaptation strategies in four selected farm settlements in Oyo State, Nigeria. . Structured questionnaires were administered to 120 farmers using a stratified random sampling method. The results showed very high awareness of climate variability among the farmers. However, majority of the farmers acquired their land by lease, while local farm tools are still used by most of the farmers. Sole cropping, mixed cropping and crop rotation were mostly practiced by the farmers. The farmers reported prevalence of crops pests and diseases, flooding, disappearance of bi-modal rainfall, increased temperature and drought in their farmlands, leading to increase in poverty, higher production costs and poor crop harvests as evidences of harsh climatic conditions. Adaptation strategies used by the farmers were changing planting dates, planting new varieties, intercropping and alternative income generating activities. The farmers are encouraged to acquire more efficient farming system and equipment, while they should strongly consider other adaptation strategies such as agricultural insurance, agroforestry, water conservation methods, soil conservation farming, irrigation farming, organic farming and mechanized farming. Furthermore, land tenure policy that could constrain the farmers should be reviewed, while they should be given proper training.

  15. A conceptual framework for selecting the most appropriate variables for measuring hospital efficiency with a focus on Iranian public hospitals.

    Science.gov (United States)

    Afzali, Hossein Haji Ali; Moss, John R; Mahmood, Mohammad Afzal

    2009-05-01

    Over the past few decades, there has been an increasing interest in the measurement of hospital efficiency in developing countries and in Iran. While the choice of measurement methods in hospital efficiency assessment has been widely argued in the literature, few authors have offered a framework to specify variables that reflect different hospital functions, the quality of the process of care and the effectiveness of hospital services. However, without the knowledge of hospital objectives and all relevant functions, efficiency studies run the risk of making biased comparisons, particularly against hospitals that provide higher quality services requiring the use of more resources. Undertaking an in-depth investigation regarding the multi-product nature of hospitals, various hospital functions and the values of various stakeholders (patient, staff and community) with a focus on the Iranian public hospitals, this study has proposed a conceptual framework to select the most appropriate variables for measuring hospital efficiency using frontier-based techniques. This paper contributes to hospital efficiency studies by proposing a conceptual framework and incorporating a broader set of variables in Iran. This can enhance the validity of hospital efficiency studies using frontier-based methods in developing countries.

  16. Bayesian analysis of volcanic eruptions

    Science.gov (United States)

    Ho, Chih-Hsiang

    1990-10-01

    The simple Poisson model generally gives a good fit to many volcanoes for volcanic eruption forecasting. Nonetheless, empirical evidence suggests that volcanic activity in successive equal time-periods tends to be more variable than a simple Poisson with constant eruptive rate. An alternative model is therefore examined in which eruptive rate(λ) for a given volcano or cluster(s) of volcanoes is described by a gamma distribution (prior) rather than treated as a constant value as in the assumptions of a simple Poisson model. Bayesian analysis is performed to link two distributions together to give the aggregate behavior of the volcanic activity. When the Poisson process is expanded to accomodate a gamma mixing distribution on λ, a consequence of this mixed (or compound) Poisson model is that the frequency distribution of eruptions in any given time-period of equal length follows the negative binomial distribution (NBD). Applications of the proposed model and comparisons between the generalized model and simple Poisson model are discussed based on the historical eruptive count data of volcanoes Mauna Loa (Hawaii) and Etna (Italy). Several relevant facts lead to the conclusion that the generalized model is preferable for practical use both in space and time.

  17. Variability in the cadherin gene in an Ostrinia nubilalis strain selected for Cry1Ab resistance.

    Science.gov (United States)

    Bel, Yolanda; Siqueira, Herbert A A; Siegfried, Blair D; Ferré, Juan; Escriche, Baltasar

    2009-03-01

    Transgenic corn expressing Cry1Ab (a Bacillus thuringiensis toxin) is highly effective in the control of Ostrinia nubilalis. For its toxic action, Cry1Ab has to bind to specific insect midgut proteins. To date, in three Lepidoptera species resistance to a Cry1A toxin has been conferred by mutations in cadherin, a protein of the Lepidoptera midgut membrane. The implication of cadherin in the resistance of an Ostrinia nubilalis colony (Europe-R) selected with Bacillus thuringiensis Cry1Ab protoxin was investigated. Several major mutations in the cadherin (cdh) gene were found, which introduced premature termination codons and/or large deletions (ranging from 1383 to 1701bp). The contribution of these major mutations to the resistance was analyzed in resistant individuals that survived exposure to a high concentration of Cry1Ab protoxin. The results indicated that the presence of major mutations was drastically reduced in individuals that survived exposure. Previous inheritance experiments with the Europe-R strain indicated the involvement of more than one genetic locus and reduced amounts of the cadherin receptor. The results of the present work support a polygenic inheritance of resistance in the Europe-R strain, in which mutations in the cdh gene would contribute to resistance by means of an additive effect. PMID:19114103

  18. The influence of selected socio-demographic variables on symptoms occurring during the menopause

    Directory of Open Access Journals (Sweden)

    Marta Makara-Studzińska

    2015-02-01

    Full Text Available Introduction: It is considered that the lifestyle conditioned by socio-demographic or socio-economic factors determines the health condition of people to the greatest extent. The aim of this study is to evaluate the influence of selected socio-demographic factors on the kinds of symptoms occurring during menopause. Material and methods : The study group consisted of 210 women aged 45 to 65, not using hormone replacement therapy, staying at healthcare centers for rehabilitation treatment. The study was carried out in 2013-2014 in the Silesian, Podlaskie and Lesser Poland voivodeships. The set of tools consisted of the authors’ own survey questionnaire and the Menopause Rating Scale (MRS. Results : The most commonly occurring symptom in the group of studied women was a depressive mood, from the group of psychological symptoms, followed by physical and mental fatigue, and discomfort connected with muscle and joint pain. The greatest intensity of symptoms was observed in the group of women with the lowest level of education, reporting an average or bad material situation, and unemployed women. Conclusions : An alarmingly high number of reported psychological symptoms in the group of menopausal women was observed, and in particular among the group of low socio-economic status. Career seems to be a factor reducing the risk of occurrence of psychological symptoms. There is an urgent need for health promotion and prophylaxis in the group of menopausal women, and in many cases for implementation of specialist psychological assistance.

  19. Variation in predator species abundance can cause variable selection pressure on warning signaling prey

    Science.gov (United States)

    Valkonen, Janne K; Nokelainen, Ossi; Niskanen, Martti; Kilpimaa, Janne; Björklund, Mats; Mappes, Johanna

    2012-01-01

    Predation pressure is expected to drive visual warning signals to evolve toward conspicuousness. However, coloration of defended species varies tremendously and can at certain instances be considered as more camouflaged rather than conspicuous. Recent theoretical studies suggest that the variation in signal conspicuousness can be caused by variation (within or between species) in predators' willingness to attack defended prey or by the broadness of the predators' signal generalization. If some of the predator species are capable of coping with the secondary defenses of their prey, selection can favor reduced prey signal conspicuousness via reduced detectability or recognition. In this study, we combine data collected during three large-scale field experiments to assess whether variation in avian predator species (red kite, black kite, common buzzard, short-toed eagle, and booted eagle) affects the predation pressure on warningly and non-warningly colored artificial snakes. Predation pressure varied among locations and interestingly, if common buzzards were abundant, there were disadvantages to snakes possessing warning signaling. Our results indicate that predator community can have important consequences on the evolution of warning signals. Predators that ignore the warning signal and defense can be the key for the maintenance of variation in warning signal architecture and maintenance of inconspicuous signaling. PMID:22957197

  20. Status of police officers with regard to selected cardio-respiratory and body compositional fitness variables.

    Science.gov (United States)

    Stamford, B A; Weltman, A; Moffatt, R J; Fulco, C

    1978-01-01

    Physical performance and body composition characteristics of members (n = 75) and recruits (n = 61) of the Louisville Police Department (total n = 136) were assessed. Members were randomly selected males and ranged in age from 20 to 55 years and were ranked from the newest inductee through and including the Chief of Police. Members between the ages of 20 and 29 years assigned to active duty possessed average cardio-respiratory fitness (Vo2max). With age, cardio-respiratory fitness decreased and body weight and body fatness progressively increased. Male and female recruits entering basic training also demonstrated average cardio-respiratory fitness. Significant (P less than .05) increases for males and females in Vo2max and decreases in body fatness (males) were found following 4 months of physically rigorous recruit training. Fifteen of the male recruits who completed training were retested following 1 year of active duty. During active duty, physical activity involvement was limited to job requirements with no additional physical training imposed. Cardio-respiratory fitness and body fatness reverted to pre-training levels. It was concluded that the physical demands associated with police work are too low to permit maintenance of physical fitness.

  1. A study of finite mixture model: Bayesian approach on financial time series data

    Science.gov (United States)

    Phoong, Seuk-Yen; Ismail, Mohd Tahir

    2014-07-01

    Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.

  2. Bayesian regression models outperform partial least squares methods for predicting milk components and technological properties using infrared spectral data.

    Science.gov (United States)

    Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G

    2015-11-01

    The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from

  3. Bayesian Integration of multiscale environmental data

    Energy Technology Data Exchange (ETDEWEB)

    2016-08-22

    The software is designed for efficiently integrating large-size of multi-scale environmental data using the Bayesian framework. Suppose we need to estimate the spatial distribution of variable X with high spatial resolution. The available data include (1) direct measurements Z of the unknowns with high resolution in a subset of the spatial domain (small spatial coverage), (2) measurements C of the unknowns at the median scale, and (3) measurements A of the unknowns at the coarsest scale but with large spatial coverage. The goal is to estimate the unknowns at the fine grids by conditioning to all the available data. We first consider all the unknowns as random variables and estimate conditional probability distribution of those variables by conditioning to the limited high-resolution observations (Z). We then treat the estimated probability distribution as the prior distribution. Within the Bayesian framework, we combine the median and large-scale measurements (C and A) through likelihood functions. Since we assume that all the relevant multivariate distributions are Gaussian, the resulting posterior distribution is a multivariate Gaussian distribution. The developed software provides numerical solutions of the posterior probability distribution. The software can be extended in several different ways to solve more general multi-scale data integration problems.

  4. Bayesian multi-QTL mapping for growth curve parameters

    DEFF Research Database (Denmark)

    Heuven, Henri C M; Janss, Luc L G

    2010-01-01

    segregating QTL using a Bayesian algorithm. Results For each individual a logistic growth curve was fitted and three latent variables: asymptote (ASYM), inflection point (XMID) and scaling factor (SCAL) were estimated per individual. Applying an 'animal' model showed heritabilities of approximately 48...

  5. A Bayesian Approach for Analyzing Longitudinal Structural Equation Models

    Science.gov (United States)

    Song, Xin-Yuan; Lu, Zhao-Hua; Hser, Yih-Ing; Lee, Sik-Yum

    2011-01-01

    This article considers a Bayesian approach for analyzing a longitudinal 2-level nonlinear structural equation model with covariates, and mixed continuous and ordered categorical variables. The first-level model is formulated for measures taken at each time point nested within individuals for investigating their characteristics that are dynamically…

  6. A volatolomic approach for studying plant variability: the case of selected Helichrysum species (Asteraceae).

    Science.gov (United States)

    Giuliani, Claudia; Lazzaro, Lorenzo; Calamassi, Roberto; Calamai, Luca; Romoli, Riccardo; Fico, Gelsomina; Foggi, Bruno; Mariotti Lippi, Marta

    2016-10-01

    The species of Helichrysum sect. Stoechadina (Asteraceae) are well-known for their secondary metabolite content and the characteristic aromatic bouquets. In the wild, populations exhibit a wide phenotypic plasticity which makes critical the circumscription of species and infraspecific ranks. Previous investigations on Helichrysum italicum complex focused on a possible phytochemical typification based on hydrodistilled essential oils. Aims of this paper are three-fold: (i) characterizing the volatile profiles of different populations, testing (ii) how these profiles vary across populations and (iii) how the phytochemical diversity may contribute in solving taxonomic problems. Nine selected Helichrysum populations, included within the H. italicum complex, Helichrysum litoreum and Helichrysum stoechas, were investigated. H. stoechas was chosen as outgroup for validating the method. After collection in the wild, plants were cultivated in standard growing conditions for over one year. Annual leafy shoots were screened in the post-blooming period for the emissions of volatile organic compounds (VOCs) by means of headspace solid phase microextraction coupled with gas-chromatography and mass spectrometry (HS-SPME-GC/MS). The VOC composition analysis revealed the production of overall 386 different compounds, with terpenes being the most represented compound class. Statistical data processing allowed the identification of the indicator compounds that differentiate the single populations, revealing the influence of the geographical provenance area in determining the volatile profiles. These results suggested the potential use of VOCs as valuable diacritical characters in discriminating the Helichrysum populations. In addition, the cross-validation analysis hinted the potentiality of this volatolomic study in the discrimination of the Helichrysum species and subspecies, highlighting a general congruence with the current taxonomic treatment of the genus. The consistency

  7. Bayesian Recurrent Neural Network for Language Modeling.

    Science.gov (United States)

    Chien, Jen-Tzung; Ku, Yuan-Chu

    2016-02-01

    A language model (LM) is calculated as the probability of a word sequence that provides the solution to word prediction for a variety of information systems. A recurrent neural network (RNN) is powerful to learn the large-span dynamics of a word sequence in the continuous space. However, the training of the RNN-LM is an ill-posed problem because of too many parameters from a large dictionary size and a high-dimensional hidden layer. This paper presents a Bayesian approach to regularize the RNN-LM and apply it for continuous speech recognition. We aim to penalize the too complicated RNN-LM by compensating for the uncertainty of the estimated model parameters, which is represented by a Gaussian prior. The objective function in a Bayesian classification network is formed as the regularized cross-entropy error function. The regularized model is constructed not only by calculating the regularized parameters according to the maximum a posteriori criterion but also by estimating the Gaussian hyperparameter by maximizing the marginal likelihood. A rapid approximation to a Hessian matrix is developed to implement the Bayesian RNN-LM (BRNN-LM) by selecting a small set of salient outer-products. The proposed BRNN-LM achieves a sparser model than the RNN-LM. Experiments on different corpora show the robustness of system performance by applying the rapid BRNN-LM under different conditions.

  8. Gold nanoparticles immobilized hydrophilic monoliths with variable functional modification for highly selective enrichment and on-line deglycosylation of glycopeptides.

    Science.gov (United States)

    Liang, Yu; Wu, Ci; Zhao, Qun; Wu, Qi; Jiang, Bo; Weng, Yejing; Liang, Zhen; Zhang, Lihua; Zhang, Yukui

    2015-11-01

    The poly (glycidyl methacrylate-co-poly (ethylene glycol) diacrylate) monoliths modified with gold nanoparticles, with advantages of enhanced reactive sites, good hydrophilicity and facile modification, were prepared as the matrix, followed by variable functionalization with cysteine and PNGase F for glycopeptide enrichment and on-line deglycosylation respectively. By the cysteine functionalized monolithic column, glycopeptides could be efficiently and selectively enriched with good reproducibility based on hydrophilic interaction chromatography (HILIC). Furthermore, the enrichment was specially achieved in weak alkaline environment, with 10 mM NH4HCO3 as the elution buffer, compatible with deglycosylation conditions. Therefore, the glycopeptides could be on-line deglycosylated with high efficiency and throughput by directly coupling the PNGase F functionalized monolithic column with the enrichment column during elution without the requirement of buffer exchange and pH adjustment. By such a method, within only 70-min pretreatment, 196 N-linked glycopeptides, corresponding to 122 glycoproteins, could be identified from 5 μg of human plasma with 14 high-abundant proteins removed, and the N-linked glycopeptides occupied 81% of all identified peptides, achieving to the best of our knowledge, the highest selectivity of HILIC-based methods. All the results demonstrated the high efficiency, selectivity and throughput of our proposed strategy for the large scale glycoproteome analysis. PMID:26572842

  9. Road network safety evaluation using Bayesian hierarchical joint model.

    Science.gov (United States)

    Wang, Jie; Huang, Helai

    2016-05-01

    Safety and efficiency are commonly regarded as two significant performance indicators of transportation systems. In practice, road network planning has focused on road capacity and transport efficiency whereas the safety level of a road network has received little attention in the planning stage. This study develops a Bayesian hierarchical joint model for road network safety evaluation to help planners take traffic safety into account when planning a road network. The proposed model establishes relationships between road network risk and micro-level variables related to road entities and traffic volume, as well as socioeconomic, trip generation and network density variables at macro level which are generally used for long term transportation plans. In addition, network spatial correlation between intersections and their connected road segments is also considered in the model. A road network is elaborately selected in order to compare the proposed hierarchical joint model with a previous joint model and a negative binomial model. According to the results of the model comparison, the hierarchical joint model outperforms the joint model and negative binomial model in terms of the goodness-of-fit and predictive performance, which indicates the reasonableness of considering the hierarchical data structure in crash prediction and analysis. Moreover, both random effects at the TAZ level and the spatial correlation between intersections and their adjacent segments are found to be significant, supporting the employment of the hierarchical joint model as an alternative in road-network-level safety modeling as well.

  10. Darwinian Dynamics of Intratumoral Heterogeneity: Not Solely Random Mutations but Also Variable Environmental Selection Forces.

    Science.gov (United States)

    Lloyd, Mark C; Cunningham, Jessica J; Bui, Marilyn M; Gillies, Robert J; Brown, Joel S; Gatenby, Robert A

    2016-06-01

    that at least some of the molecular heterogeneity in cancer cells in tumors is governed by predictable regional variations in environmental selection forces, arguing against the assumption that cancer cells can evolve toward a local fitness maximum by random accumulation of mutations. Cancer Res; 76(11); 3136-44. ©2016 AACR. PMID:27009166

  11. An adaptive technique for multiscale approximate entropy (MAEbin) threshold (r) selection: application to heart rate variability (HRV) and systolic blood pressure variability (SBPV) under postural stress.

    Science.gov (United States)

    Singh, Amritpal; Saini, Barjinder Singh; Singh, Dilbag

    2016-06-01

    Multiscale approximate entropy (MAE) is used to quantify the complexity of a time series as a function of time scale τ. Approximate entropy (ApEn) tolerance threshold selection 'r' is based on either: (1) arbitrary selection in the recommended range (0.1-0.25) times standard deviation of time series (2) or finding maximum ApEn (ApEnmax) i.e., the point where self-matches start to prevail over other matches and choosing the corresponding 'r' (rmax) as threshold (3) or computing rchon by empirically finding the relation between rmax, SD1/SD2 ratio and N using curve fitting, where, SD1 and SD2 are short-term and long-term variability of a time series respectively. None of these methods is gold standard for selection of 'r'. In our previous study [1], an adaptive procedure for selection of 'r' is proposed for approximate entropy (ApEn). In this paper, this is extended to multiple time scales using MAEbin and multiscale cross-MAEbin (XMAEbin). We applied this to simulations i.e. 50 realizations (n = 50) of random number series, fractional Brownian motion (fBm) and MIX (P) [1] series of data length of N = 300 and short term recordings of HRV and SBPV performed under postural stress from supine to standing. MAEbin and XMAEbin analysis was performed on laboratory recorded data of 50 healthy young subjects experiencing postural stress from supine to upright. The study showed that (i) ApEnbin of HRV is more than SBPV in supine position but is lower than SBPV in upright position (ii) ApEnbin of HRV decreases from supine i.e. 1.7324 ± 0.112 (mean ± SD) to upright 1.4916 ± 0.108 due to vagal inhibition (iii) ApEnbin of SBPV increases from supine i.e. 1.5535 ± 0.098 to upright i.e. 1.6241 ± 0.101 due sympathetic activation (iv) individual and cross complexities of RRi and systolic blood pressure (SBP) series depend on time scale under consideration (v) XMAEbin calculated using ApEnmax is correlated with cross-MAE calculated using ApEn (0.1-0.26) in steps of 0

  12. A bayesian mixed regression based prediction of quantitative traits from molecular marker and gene expression data.

    Directory of Open Access Journals (Sweden)

    Madhuchhanda Bhattacharjee

    Full Text Available Both molecular marker and gene expression data were considered alone as well as jointly to serve as additive predictors for two pathogen-activity-phenotypes in real recombinant inbred lines of soybean. For unobserved phenotype prediction, we used a bayesian hierarchical regression modeling, where the number of possible predictors in the model was controlled by different selection strategies tested. Our initial findings were submitted for DREAM5 (the 5th Dialogue on Reverse Engineering Assessment and Methods challenge and were judged to be the best in sub-challenge B3 wherein both functional genomic and genetic data were used to predict the phenotypes. In this work we further improve upon this previous work by considering various predictor selection strategies and cross-validation was used to measure accuracy of in-data and out-data predictions. The results from various model choices indicate that for this data use of both data types (namely functional genomic and genetic simultaneously improves out-data prediction accuracy. Adequate goodness-of-fit can be easily achieved with more complex models for both phenotypes, since the number of potential predictors is large and the sample size is not small. We also further studied gene-set enrichment (for continuous phenotype in the biological process in question and chromosomal enrichment of the gene set. The methodological contribution of this paper is in exploration of variable selection techniques to alleviate the problem of over-fitting. Different strategies based on the nature of covariates were explored and all methods were implemented under the bayesian hierarchical modeling framework with indicator-based covariate selection. All the models based in careful variable selection procedure were found to produce significant results based on permutation test.

  13. Bayesian model-based approach for developing a river water quality index

    Science.gov (United States)

    Ali, Zalina Mohd; Ibrahim, Noor Akma; Mengersen, Kerrie; Shitan, Mahendran; Juahir, Hafizan

    2014-09-01

    Six main pollutants have been previously identified by expert opinion to determine river condition in Malaysia. The pollutants were Dissolved Oxygen (DO), Biochemical Oxygen Demand (BOD), Chemical Oxygen Demand (COD), Suspended Solid (SS), potential of Hydrogen (pH) and Ammonia (AN). The selected variables together with the respective weights have been applied to calculate the water quality index of all rivers in Malaysia. However, the relative weights established in DOE-WQI formula are subjective in nature and not unanimously agreed upon, as indicated by different weight being proposed for the same variables by various panels of experts. Focusing on the Langat River, a Bayesian model-based approach was introduced for the first time in this study to obtain new objective relative weights. The new weights used in WQI calculation are shown to be capable of capturing similar distributions in water quality compared with the existing DOE-WQI.

  14. Book review: Bayesian analysis for population ecology

    Science.gov (United States)

    Link, William A.

    2011-01-01

    Brian Dennis described the field of ecology as “fertile, uncolonized ground for Bayesian ideas.” He continued: “The Bayesian propagule has arrived at the shore. Ecologists need to think long and hard about the consequences of a Bayesian ecology. The Bayesian outlook is a successful competitor, but is it a weed? I think so.” (Dennis 2004)

  15. Lower Bound Bayesian Networks - An Efficient Inference of Lower Bounds on Probability Distributions in Bayesian Networks

    CERN Document Server

    Andrade, Daniel

    2012-01-01

    We present a new method to propagate lower bounds on conditional probability distributions in conventional Bayesian networks. Our method guarantees to provide outer approximations of the exact lower bounds. A key advantage is that we can use any available algorithms and tools for Bayesian networks in order to represent and infer lower bounds. This new method yields results that are provable exact for trees with binary variables, and results which are competitive to existing approximations in credal networks for all other network structures. Our method is not limited to a specific kind of network structure. Basically, it is also not restricted to a specific kind of inference, but we restrict our analysis to prognostic inference in this article. The computational complexity is superior to that of other existing approaches.

  16. A Bayesian observer model constrained by efficient coding can explain 'anti-Bayesian' percepts.

    Science.gov (United States)

    Wei, Xue-Xin; Stocker, Alan A

    2015-10-01

    Bayesian observer models provide a principled account of the fact that our perception of the world rarely matches physical reality. The standard explanation is that our percepts are biased toward our prior beliefs. However, reported psychophysical data suggest that this view may be simplistic. We propose a new model formulation based on efficient coding that is fully specified for any given natural stimulus distribution. The model makes two new and seemingly anti-Bayesian predictions. First, it predicts that perception is often biased away from an observer's prior beliefs. Second, it predicts that stimulus uncertainty differentially affects perceptual bias depending on whether the uncertainty is induced by internal or external noise. We found that both model predictions match reported perceptual biases in perceived visual orientation and spatial frequency, and were able to explain data that have not been explained before. The model is general and should prove applicable to other perceptual variables and tasks. PMID:26343249

  17. Bayesian Unsupervised Learning of DNA Regulatory Binding Regions

    Directory of Open Access Journals (Sweden)

    Jukka Corander

    2009-01-01

    positions within a set of DNA sequences are very rare in the literature. Here we show how such a learning problem can be formulated using a Bayesian model that targets to simultaneously maximize the marginal likelihood of sequence data arising under multiple motif types as well as under the background DNA model, which equals a variable length Markov chain. It is demonstrated how the adopted Bayesian modelling strategy combined with recently introduced nonstandard stochastic computation tools yields a more tractable learning procedure than is possible with the standard Monte Carlo approaches. Improvements and extensions of the proposed approach are also discussed.

  18. A Bayesian approach to estimating the prehepatic insulin secretion rate

    DEFF Research Database (Denmark)

    Andersen, Kim Emil; Højbjerre, Malene

    the time courses of insulin and C-peptide subsequently are used as known forcing functions. In this work we adopt a Bayesian graphical model to describe the unied model simultaneously. We develop a model that also accounts for both measurement error and process variability. The parameters are estimated...... by a Bayesian approach where efficient posterior sampling is made available through the use of Markov chain Monte Carlo methods. Hereby the ill-posed estimation problem inherited in the coupled differential equation model is regularized by the use of prior knowledge. The method is demonstrated on experimental...

  19. Bayesian model discrimination for glucose-insulin homeostasis

    DEFF Research Database (Denmark)

    Andersen, Kim Emil; Brooks, Stephen P.; Højbjerre, Malene

    In this paper we analyse a set of experimental data on a number of healthy and diabetic patients and discuss a variety of models for describing the physiological processes involved in glucose absorption and insulin secretion within the human body. We adopt a Bayesian approach which facilitates...... the reformulation of existing deterministic models as stochastic state space models which properly accounts for both measurement and process variability. The analysis is further enhanced by Bayesian model discrimination techniques and model averaged parameter estimation which fully accounts for model as well...

  20. Comparing Bayesian models for multisensory cue combination without mandatory integration

    OpenAIRE

    Beierholm, Ulrik R.; Shams, Ladan; Kording, Konrad P; Ma, Wei Ji

    2009-01-01

    Bayesian models of multisensory perception traditionally address the problem of estimating an underlying variable that is assumed to be the cause of the two sensory signals. The brain, however, has to solve a more general problem: it also has to establish which signals come from the same source and should be integrated, and which ones do not and should be segregated. In the last couple of years, a few models have been proposed to solve this problem in a Bayesian fashion. One of these ha...