Bayesian Variable Selection in Spatial Autoregressive Models
Jesus Crespo Cuaresma; Philipp Piribauer
2015-01-01
This paper compares the performance of Bayesian variable selection approaches for spatial autoregressive models. We present two alternative approaches which can be implemented using Gibbs sampling methods in a straightforward way and allow us to deal with the problem of model uncertainty in spatial autoregressive models in a flexible and computationally efficient way. In a simulation study we show that the variable selection approaches tend to outperform existing Bayesian model averaging tech...
Bayesian variable selection with spherically symmetric priors
De Kock, M B
2014-01-01
We propose that Bayesian variable selection for linear parametrisations with Gaussian iid likelihoods be based on the spherical symmetry of the diagonalised parameter space. This reduces the multidimensional parameter space problem to one dimension without the need for conjugate priors. Combining this likelihood with what we call the r-prior results in a framework in which we can derive closed forms for the evidence, posterior and characteristic function for four different r-priors, including the hyper-g prior and the Zellner-Siow prior, which are shown to be special cases of our r-prior. Two scenarios of a single variable dispersion parameter and of fixed dispersion are studied separately, and asymptotic forms comparable to the traditional information criteria are derived. In a simple simulation exercise, we find that model comparison based on our uniform r-prior appears to fare better than the current model comparison schemes.
A Bayesian variable selection procedure for ranking overlapping gene sets
DEFF Research Database (Denmark)
Skarman, Axel; Mahdi Shariati, Mohammad; Janss, Luc;
2012-01-01
data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability...... described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian...... variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our...
Bayesian Variable Selection for Detecting Adaptive Genomic Differences Among Populations
Riebler, Andrea; Held, Leonhard; Stephan, Wolfgang
2008-01-01
We extend an Fst-based Bayesian hierarchical model, implemented via Markov chain Monte Carlo, for the detection of loci that might be subject to positive selection. This model divides the Fst-influencing factors into locus-specific effects, population-specific effects, and effects that are specific for the locus in combination with the population. We introduce a Bayesian auxiliary variable for each locus effect to automatically select nonneutral locus effects. As a by-product, the efficiency ...
Bayesian Variable Selection in Cost-Effectiveness Analysis
Directory of Open Access Journals (Sweden)
Miguel A. Negrín
2010-04-01
Full Text Available Linear regression models are often used to represent the cost and effectiveness of medical treatment. The covariates used may include sociodemographic variables, such as age, gender or race; clinical variables, such as initial health status, years of treatment or the existence of concomitant illnesses; and a binary variable indicating the treatment received. However, most studies estimate only one model, which usually includes all the covariates. This procedure ignores the question of uncertainty in model selection. In this paper, we examine four alternative Bayesian variable selection methods that have been proposed. In this analysis, we estimate the inclusion probability of each covariate in the real model conditional on the data. Variable selection can be useful for estimating incremental effectiveness and incremental cost, through Bayesian model averaging, as well as for subgroup analysis.
Bayesian variable selection for detecting adaptive genomic differences among populations.
Riebler, Andrea; Held, Leonhard; Stephan, Wolfgang
2008-03-01
We extend an F(st)-based Bayesian hierarchical model, implemented via Markov chain Monte Carlo, for the detection of loci that might be subject to positive selection. This model divides the F(st)-influencing factors into locus-specific effects, population-specific effects, and effects that are specific for the locus in combination with the population. We introduce a Bayesian auxiliary variable for each locus effect to automatically select nonneutral locus effects. As a by-product, the efficiency of the original approach is improved by using a reparameterization of the model. The statistical power of the extended algorithm is assessed with simulated data sets from a Wright-Fisher model with migration. We find that the inclusion of model selection suggests a clear improvement in discrimination as measured by the area under the receiver operating characteristic (ROC) curve. Additionally, we illustrate and discuss the quality of the newly developed method on the basis of an allozyme data set of the fruit fly Drosophila melanogaster and a sequence data set of the wild tomato Solanum chilense. For data sets with small sample sizes, high mutation rates, and/or long sequences, however, methods based on nucleotide statistics should be preferred. PMID:18245358
Bayesian Biclustering on Discrete Data: Variable Selection Methods
Guo, Lei
2013-01-01
Biclustering is a technique for clustering rows and columns of a data matrix simultaneously. Over the past few years, we have seen its applications in biology-related fields, as well as in many data mining projects. As opposed to classical clustering methods, biclustering groups objects that are similar only on a subset of variables. Many biclustering algorithms on continuous data have emerged over the last decade. In this dissertation, we will focus on two Bayesian biclustering algorithms we...
Bayesian Variable Selection for Logistic Models Using Auxiliary Mixture Sampling
Tüchler, Regina
2006-01-01
The paper presents an Markov Chain Monte Carlo algorithm for both variable and covariance selection in the context of logistic mixed effects models. This algorithm allows us to sample solely from standard densities, with no additional tuning being needed. We apply a stochastic search variable approach to select explanatory variables as well as to determine the structure of the random effects covariance matrix. For logistic mixed effects models prior determination of explanatory variables and ...
Steady-state priors and Bayesian variable selection in VAR forecasting
Louzis, Dimitrios P.
2015-01-01
This study proposes methods for estimating Bayesian vector autoregressions (VARs) with an automatic variable selection and an informative prior on the unconditional mean or steady-state of the system. We show that extant Gibbs sampling methods for Bayesian variable selection can be efficiently extended to incorporate prior beliefs on the steady-state of the economy. Empirical analysis, based on three major US macroeconomic time series, indicates that the out-of-sample forecasting accuracy of ...
Bayesian variable selection and data integration for biological regulatory networks
Jensen, Shane T; Chen, Guang; Stoeckert, Jr, Christian J.
2007-01-01
A substantial focus of research in molecular biology are gene regulatory networks: the set of transcription factors and target genes which control the involvement of different biological processes in living cells. Previous statistical approaches for identifying gene regulatory networks have used gene expression data, ChIP binding data or promoter sequence data, but each of these resources provides only partial information. We present a Bayesian hierarchical model that integrates all three dat...
Lu, Zhaohua; Zhu, Hongtu; Knickmeyer, Rebecca C.; Sullivan, Patrick F.; Stephanie, Williams N.; Zou, Fei
2015-01-01
The power of genome-wide association studies (GWAS) for mapping complex traits with single SNP analysis may be undermined by modest SNP effect sizes, unobserved causal SNPs, correlation among adjacent SNPs, and SNP-SNP interactions. Alternative approaches for testing the association between a single SNP-set and individual phenotypes have been shown to be promising for improving the power of GWAS. We propose a Bayesian latent variable selection (BLVS) method to simultaneously model the joint a...
Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences.
Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric
2016-01-01
Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach. PMID:27314566
Locating disease genes using Bayesian variable selection with the Haseman-Elston method
Directory of Open Access Journals (Sweden)
He Qimei
2003-12-01
Full Text Available Abstract Background We applied stochastic search variable selection (SSVS, a Bayesian model selection method, to the simulated data of Genetic Analysis Workshop 13. We used SSVS with the revisited Haseman-Elston method to find the markers linked to the loci determining change in cholesterol over time. To study gene-gene interaction (epistasis and gene-environment interaction, we adopted prior structures, which incorporate the relationship among the predictors. This allows SSVS to search in the model space more efficiently and avoid the less likely models. Results In applying SSVS, instead of looking at the posterior distribution of each of the candidate models, which is sensitive to the setting of the prior, we ranked the candidate variables (markers according to their marginal posterior probability, which was shown to be more robust to the prior. Compared with traditional methods that consider one marker at a time, our method considers all markers simultaneously and obtains more favorable results. Conclusions We showed that SSVS is a powerful method for identifying linked markers using the Haseman-Elston method, even for weak effects. SSVS is very effective because it does a smart search over the entire model space.
A bayesian integrative model for genetical genomics with spatially informed variable selection.
Cassese, Alberto; Guindani, Michele; Vannucci, Marina
2014-01-01
We consider a Bayesian hierarchical model for the integration of gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. The approach defines a measurement error model that relates the gene expression levels to latent copy number states. In turn, the latent states are related to the observed surrogate CGH measurements via a hidden Markov model. The model further incorporates variable selection with a spatial prior based on a probit link that exploits dependencies across adjacent DNA segments. Posterior inference is carried out via Markov chain Monte Carlo stochastic search techniques. We study the performance of the model in simulations and show better results than those achieved with recently proposed alternative priors. We also show an application to data from a genomic study on lung squamous cell carcinoma, where we identify potential candidates of associations between copy number variants and the transcriptional activity of target genes. Gene ontology (GO) analyses of our findings reveal enrichments in genes that code for proteins involved in cancer. Our model also identifies a number of potential candidate biomarkers for further experimental validation. PMID:25288877
On the use of pseudo-likelihoods in Bayesian variable selection.
Racugno, Walter; Salvan, Alessandra; Ventura, Laura
2005-01-01
In the presence of nuisance parameters, we discuss a one-parameter Bayesian analysis based on a pseudo-likelihood assuming a default prior distribution for the parameter of interest only. Although this way to proceed cannot always be considered as orthodox in the Bayesian perspective, it is of interest to evaluate whether the use of suitable pseudo-likelihoods may be proposed for Bayesian inference. Attention is focused in the context of regression models, in particular on inference about a s...
Bhadra, Anindya
2013-04-22
We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. © 2013, The International Biometric Society.
Bhadra, Anindya; Mallick, Bani K
2013-06-01
We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. PMID:23607608
Zhang, Linlin; Guindani, Michele; Versace, Francesco; Vannucci, Marina
2014-07-15
In this paper we present a novel wavelet-based Bayesian nonparametric regression model for the analysis of functional magnetic resonance imaging (fMRI) data. Our goal is to provide a joint analytical framework that allows to detect regions of the brain which exhibit neuronal activity in response to a stimulus and, simultaneously, infer the association, or clustering, of spatially remote voxels that exhibit fMRI time series with similar characteristics. We start by modeling the data with a hemodynamic response function (HRF) with a voxel-dependent shape parameter. We detect regions of the brain activated in response to a given stimulus by using mixture priors with a spike at zero on the coefficients of the regression model. We account for the complex spatial correlation structure of the brain by using a Markov random field (MRF) prior on the parameters guiding the selection of the activated voxels, therefore capturing correlation among nearby voxels. In order to infer association of the voxel time courses, we assume correlated errors, in particular long memory, and exploit the whitening properties of discrete wavelet transforms. Furthermore, we achieve clustering of the voxels by imposing a Dirichlet process (DP) prior on the parameters of the long memory process. For inference, we use Markov Chain Monte Carlo (MCMC) sampling techniques that combine Metropolis-Hastings schemes employed in Bayesian variable selection with sampling algorithms for nonparametric DP models. We explore the performance of the proposed model on simulated data, with both block- and event-related design, and on real fMRI data. PMID:24650600
Berg, van den S.; Calus, M.P.L.; Meuwissen, T.H.E.; Wientjes, Y.C.J.
2015-01-01
Background: The use of information across populations is an attractive approach to increase the accuracy of genomic prediction for numerically small populations. However, accuracies of across population genomic prediction, in which reference and selection individuals are from different population
Bayesian variable order Markov models: Towards Bayesian predictive state representations
C. Dimitrakakis
2009-01-01
We present a Bayesian variable order Markov model that shares many similarities with predictive state representations. The resulting models are compact and much easier to specify and learn than classical predictive state representations. Moreover, we show that they significantly outperform a more st
Integer variables estimation problems: the Bayesian approach
Directory of Open Access Journals (Sweden)
G. Venuti
1997-06-01
Full Text Available In geodesy as well as in geophysics there are a number of examples where the unknown parameters are partly constrained to be integer numbers, while other parameters have a continuous range of possible values. In all such situations the ordinary least square principle, with integer variates fixed to the most probable integer value, can lead to paradoxical results, due to the strong non-linearity of the manifold of admissible values. On the contrary an overall estimation procedure assigning the posterior distribution to all variables, discrete and continuous, conditional to the observed quantities, like the so-called Bayesian approach, has the advantage of weighting correctly the possible errors in choosing different sets of integer values, thus providing a more realistic and stable estimate even of the continuous parameters. In this paper, after a short recall of the basics of Bayesian theory in section 2, we present the natural Bayesian solution to the problem of assessing the estimable signal from noisy observations in section 3 and the Bayesian solution to cycle slips detection and repair for a stream of GPS measurements in section 4. An elementary synthetic example is discussed in section 3 to illustrate the theory presented and more elaborate, though synthetic, examples are discussed in section 4 where realistic streams of GPS observations, with cycle slips, are simulated and then back processed.
Larson, Nicholas B; McDonnell, Shannon; Albright, Lisa Cannon; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham; MacInnis, Robert; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catolona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J
2016-09-01
Rare variants (RVs) have been shown to be significant contributors to complex disease risk. By definition, these variants have very low minor allele frequencies and traditional single-marker methods for statistical analysis are underpowered for typical sequencing study sample sizes. Multimarker burden-type approaches attempt to identify aggregation of RVs across case-control status by analyzing relatively small partitions of the genome, such as genes. However, it is generally the case that the aggregative measure would be a mixture of causal and neutral variants, and these omnibus tests do not directly provide any indication of which RVs may be driving a given association. Recently, Bayesian variable selection approaches have been proposed to identify RV associations from a large set of RVs under consideration. Although these approaches have been shown to be powerful at detecting associations at the RV level, there are often computational limitations on the total quantity of RVs under consideration and compromises are necessary for large-scale application. Here, we propose a computationally efficient alternative formulation of this method using a probit regression approach specifically capable of simultaneously analyzing hundreds to thousands of RVs. We evaluate our approach to detect causal variation on simulated data and examine sensitivity and specificity in instances of high RV dimensionality as well as apply it to pathway-level RV analysis results from a prostate cancer (PC) risk case-control sequencing study. Finally, we discuss potential extensions and future directions of this work. PMID:27312771
Bayesian Model Averaging in the Instrumental Variable Regression Model
Gary Koop; Robert Leon Gonzalez; Rodney Strachan
2011-01-01
This paper considers the instrumental variable regression model when there is uncertainly about the set of instruments, exogeneity restrictions, the validity of identifying restrictions and the set of exogenous regressors. This uncertainly can result in a huge number of models. To avoid statistical problems associated with standard model selection procedures, we develop a reversible jump Markov chain Monte Carlo algorithm that allows us to do Bayesian model averaging. The algorithm is very fl...
Bayesian auxiliary variable models for binary and multinomial regression
Holmes, C C; HELD, L.
2006-01-01
In this paper we discuss auxiliary variable approaches to Bayesian binary and multinomial regression. These approaches are ideally suited to automated Markov chain Monte Carlo simulation. In the first part we describe a simple technique using joint updating that improves the performance of the conventional probit regression algorithm. In the second part we discuss auxiliary variable methods for inference in Bayesian logistic regression, including covariate set uncertainty. Fina...
Bayesian site selection for fast Gaussian process regression
Pourhabib, Arash
2014-02-05
Gaussian Process (GP) regression is a popular method in the field of machine learning and computer experiment designs; however, its ability to handle large data sets is hindered by the computational difficulty in inverting a large covariance matrix. Likelihood approximation methods were developed as a fast GP approximation, thereby reducing the computation cost of GP regression by utilizing a much smaller set of unobserved latent variables called pseudo points. This article reports a further improvement to the likelihood approximation methods by simultaneously deciding both the number and locations of the pseudo points. The proposed approach is a Bayesian site selection method where both the number and locations of the pseudo inputs are parameters in the model, and the Bayesian model is solved using a reversible jump Markov chain Monte Carlo technique. Through a number of simulated and real data sets, it is demonstrated that with appropriate priors chosen, the Bayesian site selection method can produce a good balance between computation time and prediction accuracy: it is fast enough to handle large data sets that a full GP is unable to handle, and it improves, quite often remarkably, the prediction accuracy, compared with the existing likelihood approximations. © 2014 Taylor and Francis Group, LLC.
Bayesian item selection in constrained adaptive testing using shadow tests
Veldkamp, Bernard P.
2010-01-01
Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specificati
Bayesian Item Selection in Constrained Adaptive Testing Using Shadow Tests
Veldkamp, Bernard P.
2010-01-01
Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item…
Improving randomness characterization through Bayesian model selection
R., Rafael Díaz-H; Martínez, Alí M Angulo; U'Ren, Alfred B; Hirsch, Jorge G; Marsili, Matteo; Castillo, Isaac Pérez
2016-01-01
Nowadays random number generation plays an essential role in technology with important applications in areas ranging from cryptography, which lies at the core of current communication protocols, to Monte Carlo methods, and other probabilistic algorithms. In this context, a crucial scientific endeavour is to develop effective methods that allow the characterization of random number generators. However, commonly employed methods either lack formality (e.g. the NIST test suite), or are inapplicable in principle (e.g. the characterization derived from the Algorithmic Theory of Information (ATI)). In this letter we present a novel method based on Bayesian model selection, which is both rigorous and effective, for characterizing randomness in a bit sequence. We derive analytic expressions for a model's likelihood which is then used to compute its posterior probability distribution. Our method proves to be more rigorous than NIST's suite and the Borel-Normality criterion and its implementation is straightforward. We...
Bayesian item selection in constrained adaptive testing using shadow tests
Bernard P. Veldkamp
2010-01-01
Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item selection process. The Shadow Test Approach is a general purpose algorithm for administering constrained CAT. In this paper it is shown how the approac...
Species selection on variability.
Lloyd, E. A.; Gould, S J
1993-01-01
Most analyses of species selection require emergent, as opposed to aggregate, characters at the species level. This "emergent character" approach tends to focus on the search for adaptations at the species level. Such an approach seems to banish the most potent evolutionary property of populations--variability itself--from arguments about species selection (for variation is an aggregate character). We wish, instead, to extend the legitimate domain of species selection to aggregate characters....
Lin, Lin; Chan, Cliburn; West, Mike
2016-01-01
We discuss the evaluation of subsets of variables for the discriminative evidence they provide in multivariate mixture modeling for classification. The novel development of Bayesian classification analysis presented is partly motivated by problems of design and selection of variables in biomolecular studies, particularly involving widely used assays of large-scale single-cell data generated using flow cytometry technology. For such studies and for mixture modeling generally, we define discriminative analysis that overlays fitted mixture models using a natural measure of concordance between mixture component densities, and define an effective and computationally feasible method for assessing and prioritizing subsets of variables according to their roles in discrimination of one or more mixture components. We relate the new discriminative information measures to Bayesian classification probabilities and error rates, and exemplify their use in Bayesian analysis of Dirichlet process mixture models fitted via Markov chain Monte Carlo methods as well as using a novel Bayesian expectation-maximization algorithm. We present a series of theoretical and simulated data examples to fix concepts and exhibit the utility of the approach, and compare with prior approaches. We demonstrate application in the context of automatic classification and discriminative variable selection in high-throughput systems biology using large flow cytometry datasets. PMID:26040910
Dissecting Magnetar Variability with Bayesian Hierarchical Models
Huppenkothen, Daniela; Brewer, Brendon J.; Hogg, David W.; Murray, Iain; Frean, Marcus; Elenbaas, Chris; Watts, Anna L.; Levin, Yuri; van der Horst, Alexander J.; Kouveliotou, Chryssa
2015-09-01
Neutron stars are a prime laboratory for testing physical processes under conditions of strong gravity, high density, and extreme magnetic fields. Among the zoo of neutron star phenomena, magnetars stand out for their bursting behavior, ranging from extremely bright, rare giant flares to numerous, less energetic recurrent bursts. The exact trigger and emission mechanisms for these bursts are not known; favored models involve either a crust fracture and subsequent energy release into the magnetosphere, or explosive reconnection of magnetic field lines. In the absence of a predictive model, understanding the physical processes responsible for magnetar burst variability is difficult. Here, we develop an empirical model that decomposes magnetar bursts into a superposition of small spike-like features with a simple functional form, where the number of model components is itself part of the inference problem. The cascades of spikes that we model might be formed by avalanches of reconnection, or crust rupture aftershocks. Using Markov Chain Monte Carlo sampling augmented with reversible jumps between models with different numbers of parameters, we characterize the posterior distributions of the model parameters and the number of components per burst. We relate these model parameters to physical quantities in the system, and show for the first time that the variability within a burst does not conform to predictions from ideas of self-organized criticality. We also examine how well the properties of the spikes fit the predictions of simplified cascade models for the different trigger mechanisms.
Dissecting magnetar variability with Bayesian hierarchical models
Huppenkothen, D; Hogg, D W; Murray, I; Frean, M; Elenbaas, C; Watts, A L; Levin, Y; van der Horst, A J; Kouveliotou, C
2015-01-01
Neutron stars are a prime laboratory for testing physical processes under conditions of strong gravity, high density, and extreme magnetic fields. Among the zoo of neutron star phenomena, magnetars stand out for their bursting behaviour, ranging from extremely bright, rare giant flares to numerous, less energetic recurrent bursts. The exact trigger and emission mechanisms for these bursts are not known; favoured models involve either a crust fracture and subsequent energy release into the magnetosphere, or explosive reconnection of magnetic field lines. In the absence of a predictive model, understanding the physical processes responsible for magnetar burst variability is difficult. Here, we develop an empirical model that decomposes magnetar bursts into a superposition of small spike-like features with a simple functional form, where the number of model components is itself part of the inference problem. The cascades of spikes that we model might be formed by avalanches of reconnection, or crust rupture afte...
Optimizing the Amount of Models Taken into Consideration During Model Selection in Bayesian Networks
Castelo, J.R.; Siebes, Arno
1999-01-01
Graphical model selection from data embodies several difficulties. Among them, it is specially challenging the size of the sample space of models on which one should carry out model selection, even considering only a modest amount of variables. This becomes more severe when one works on those graphical models where some variables may be responses to other. This is the case of Bayesian Networks that are modeled by acyclic digraphs. In this paper we try to reduce the amount of models taken into...
Two-Stage Bayesian Model Averaging in Endogenous Variable Models.
Lenkoski, Alex; Eicher, Theo S; Raftery, Adrian E
2014-01-01
Economic modeling in the presence of endogeneity is subject to model uncertainty at both the instrument and covariate level. We propose a Two-Stage Bayesian Model Averaging (2SBMA) methodology that extends the Two-Stage Least Squares (2SLS) estimator. By constructing a Two-Stage Unit Information Prior in the endogenous variable model, we are able to efficiently combine established methods for addressing model uncertainty in regression models with the classic technique of 2SLS. To assess the validity of instruments in the 2SBMA context, we develop Bayesian tests of the identification restriction that are based on model averaged posterior predictive p-values. A simulation study showed that 2SBMA has the ability to recover structure in both the instrument and covariate set, and substantially improves the sharpness of resulting coefficient estimates in comparison to 2SLS using the full specification in an automatic fashion. Due to the increased parsimony of the 2SBMA estimate, the Bayesian Sargan test had a power of 50 percent in detecting a violation of the exogeneity assumption, while the method based on 2SLS using the full specification had negligible power. We apply our approach to the problem of development accounting, and find support not only for institutions, but also for geography and integration as development determinants, once both model uncertainty and endogeneity have been jointly addressed. PMID:24223471
Bayesian Model Selection for LISA Pathfinder
Karnesis, Nikolaos; Sopuerta, Carlos F; Gibert, Ferran; Armano, Michele; Audley, Heather; Congedo, Giuseppe; Diepholz, Ingo; Ferraioli, Luigi; Hewitson, Martin; Hueller, Mauro; Korsakova, Natalia; Plagnol, Eric; Vitale, and Stefano
2013-01-01
The main goal of the LISA Pathfinder (LPF) mission is to fully characterize the acceleration noise models and to test key technologies for future space-based gravitational-wave observatories similar to the LISA/eLISA concept. The Data Analysis (DA) team has developed complex three-dimensional models of the LISA Technology Package (LTP) experiment on-board LPF. These models are used for simulations, but more importantly, they will be used for parameter estimation purposes during flight operations. One of the tasks of the DA team is to identify the physical effects that contribute significantly to the properties of the instrument noise. A way of approaching to this problem is to recover the essential parameters of the LTP which describe the data. Thus, we want to define the simplest model that efficiently explains the observations. To do so, adopting a Bayesian framework, one has to estimate the so-called Bayes Factor between two competing models. In our analysis, we use three main different methods to estimate...
Evaluating variable selection methods for diagnosis of myocardial infarction.
Dreiseitl, S; Ohno-Machado, L; Vinterbo, S
1999-01-01
This paper evaluates the variable selection performed by several machine-learning techniques on a myocardial infarction data set. The focus of this work is to determine which of 43 input variables are considered relevant for prediction of myocardial infarction. The algorithms investigated were logistic regression (with stepwise, forward, and backward selection), backpropagation for multilayer perceptrons (input relevance determination), Bayesian neural networks (automatic relevance determination), and rough sets. An independent method (self-organizing maps) was then used to evaluate and visualize the different subsets of predictor variables. Results show good agreement on some predictors, but also variability among different methods; only one variable was selected by all models. PMID:10566358
Evaluating variable selection methods for diagnosis of myocardial infarction.
Dreiseitl, S.; Ohno-Machado, L.; Vinterbo, S.
1999-01-01
This paper evaluates the variable selection performed by several machine-learning techniques on a myocardial infarction data set. The focus of this work is to determine which of 43 input variables are considered relevant for prediction of myocardial infarction. The algorithms investigated were logistic regression (with stepwise, forward, and backward selection), backpropagation for multilayer perceptrons (input relevance determination), Bayesian neural networks (automatic relevance determinat...
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-12-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way. PMID:26497359
Towards Distributed Bayesian Estimation A Short Note on Selected Aspects
Czech Academy of Sciences Publication Activity Database
Dedecius, Kamil; Sečkárová, Vladimíra
Prague: Institute of Information Theory and Automation, 2011, s. 67-72. ISBN 978-80-903834-6-3. [The 2nd International Workshop od Decision Making with Multiple Imperfect Decision Makers. Held in Conjunction with the 25th Annual Conference on Neural Information Processing Systems (NIPS 2011). Sierra Nevada (ES), 16.12.2011-16.12.2011] R&D Projects: GA ČR GA102/08/0567 Institutional research plan: CEZ:AV0Z10750506 Keywords : efficient estimation * a linear or nonlinear model * distributed estimation * Bayesian decision making Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2011/AS/dedecius-towards distributed bayesian estimation a short note on selected aspects.pdf
Bayesian predictive modeling for genomic based personalized treatment selection.
Ma, Junsheng; Stingo, Francesco C; Hobbs, Brian P
2016-06-01
Efforts to personalize medicine in oncology have been limited by reductive characterizations of the intrinsically complex underlying biological phenomena. Future advances in personalized medicine will rely on molecular signatures that derive from synthesis of multifarious interdependent molecular quantities requiring robust quantitative methods. However, highly parameterized statistical models when applied in these settings often require a prohibitively large database and are sensitive to proper characterizations of the treatment-by-covariate interactions, which in practice are difficult to specify and may be limited by generalized linear models. In this article, we present a Bayesian predictive framework that enables the integration of a high-dimensional set of genomic features with clinical responses and treatment histories of historical patients, providing a probabilistic basis for using the clinical and molecular information to personalize therapy for future patients. Our work represents one of the first attempts to define personalized treatment assignment rules based on large-scale genomic data. We use actual gene expression data acquired from The Cancer Genome Atlas in the settings of leukemia and glioma to explore the statistical properties of our proposed Bayesian approach for personalizing treatment selection. The method is shown to yield considerable improvements in predictive accuracy when compared to penalized regression approaches. PMID:26575856
Dynamic sensor action selection with Bayesian decision analysis
Kristensen, Steen; Hansen, Volker; Kondak, Konstantin
1998-10-01
The aim of this work is to create a framework for the dynamic planning of sensor actions for an autonomous mobile robot. The framework uses Bayesian decision analysis, i.e., a decision-theoretic method, to evaluate possible sensor actions and selecting the most appropriate ones given the available sensors and what is currently known about the state of the world. Since sensing changes the knowledge of the system and since the current state of the robot (task, position, etc.) determines what knowledge is relevant, the evaluation and selection of sensing actions is an on-going process that effectively determines the behavior of the robot. The framework has been implemented on a real mobile robot and has been proven to be able to control in real-time the sensor actions of the system. In current work we are investigating methods to reduce or automatically generate the necessary model information needed by the decision- theoretic method to select the appropriate sensor actions.
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
Scott, James G; 10.1214/10-AOS792
2010-01-01
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham's-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
Bayesian modeling of ChIP-chip data using latent variables.
Wu, Mingqi
2009-10-26
BACKGROUND: The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations. RESULTS: In this paper, we propose a Bayesian latent model for the ChIP-chip data. The new model mainly differs from the existing Bayesian models, such as the joint deconvolution model, the hierarchical gamma mixture model, and the Bayesian hierarchical model, in two respects. Firstly, it works on the difference between the averaged treatment and control samples. This enables the use of a simple model for the data, which avoids the probe-specific effect and the sample (control/treatment) effect. As a consequence, this enables an efficient MCMC simulation of the posterior distribution of the model, and also makes the model more robust to the outliers. Secondly, it models the neighboring dependence of probes by introducing a latent indicator vector. A truncated Poisson prior distribution is assumed for the latent indicator variable, with the rationale being justified at length. CONCLUSION: The Bayesian latent method is successfully applied to real and ten simulated datasets, with comparisons with some of the existing Bayesian methods, hidden Markov model methods, and sliding window methods. The numerical results indicate that the
Bayesian modeling of ChIP-chip data using latent variables
Directory of Open Access Journals (Sweden)
Tian Yanan
2009-10-01
Full Text Available Abstract Background The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations. Results In this paper, we propose a Bayesian latent model for the ChIP-chip data. The new model mainly differs from the existing Bayesian models, such as the joint deconvolution model, the hierarchical gamma mixture model, and the Bayesian hierarchical model, in two respects. Firstly, it works on the difference between the averaged treatment and control samples. This enables the use of a simple model for the data, which avoids the probe-specific effect and the sample (control/treatment effect. As a consequence, this enables an efficient MCMC simulation of the posterior distribution of the model, and also makes the model more robust to the outliers. Secondly, it models the neighboring dependence of probes by introducing a latent indicator vector. A truncated Poisson prior distribution is assumed for the latent indicator variable, with the rationale being justified at length. Conclusion The Bayesian latent method is successfully applied to real and ten simulated datasets, with comparisons with some of the existing Bayesian methods, hidden Markov model methods, and sliding window methods. The numerical results
Feature Selection for Bayesian Evaluation of Trauma Death Risk
Jakaite, L
2008-01-01
In the last year more than 70,000 people have been brought to the UK hospitals with serious injuries. Each time a clinician has to urgently take a patient through a screening procedure to make a reliable decision on the trauma treatment. Typically, such procedure comprises around 20 tests; however the condition of a trauma patient remains very difficult to be tested properly. What happens if these tests are ambiguously interpreted, and information about the severity of the injury will come misleading? The mistake in a decision can be fatal: using a mild treatment can put a patient at risk of dying from posttraumatic shock, while using an overtreatment can also cause death. How can we reduce the risk of the death caused by unreliable decisions? It has been shown that probabilistic reasoning, based on the Bayesian methodology of averaging over decision models, allows clinicians to evaluate the uncertainty in decision making. Based on this methodology, in this paper we aim at selecting the most important screeni...
Family Background Variables as Instruments for Education in Income Regressions: A Bayesian Analysis
Hoogerheide, Lennart; Block, Joern H.; Thurik, Roy
2012-01-01
The validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the estimation results. We show that, in case of moderate direct…
Errata: A survey of Bayesian predictive methods for model assessment, selection and comparison
Directory of Open Access Journals (Sweden)
Aki Vehtari
2014-03-01
Full Text Available Errata for “A survey of Bayesian predictive methods for model assessment, selection and comparison” by A. Vehtari and J. Ojanen, Statistics Surveys, 6 (2012, 142–228. doi:10.1214/12-SS102.
Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition
Stephenson, Todd Andrew; Magimai.-Doss, Mathew; Bourlard, Hervé
2001-01-01
Standard hidden Markov models (HMMs), as used in automatic speech recognition (ASR), calculate their emission probabilities by an artificial neural network (ANN) or a Gaussian distribution conditioned on the hidden state variable, considering the emissions independent of any other variable in the model. Recent work showed the benefit of conditioning the emission distributions on a discrete auxiliary variable, which is observed in training and hidden in recognition. Related work has shown the ...
Directory of Open Access Journals (Sweden)
Guillaume Marrelec
Full Text Available The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity, provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.
Kim, Junhan; Chan, Chi-kwan; Medeiros, Lia; Ozel, Feryal; Psaltis, Dimitrios
2016-01-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long baseline interferometer (VLBI) that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. We also apply our method to the early EHT data...
Multi-variable Echo State Network Optimized by Bayesian Regulation for Daily Peak Load Forecasting
Directory of Open Access Journals (Sweden)
Dongxiao Niu
2012-11-01
Full Text Available In this paper, a multi-variable echo state network trained with Bayesian regulation has been developed for the short-time load forecasting. In this study, we focus on the generalization of a new recurrent network. Therefore, Bayesian regulation and Levenberg-Marquardt algorithm is adopted to modify the output weight. The model is verified by data from a local power company in south China and its performance is rather satisfactory. Besides, traditional methods are also used for the same task as comparison. The simulation results lead to the conclusion that the proposed scheme is feasible and has great robustness and satisfactory capacity of generalization.
Modelling of Traffic Flow with Bayesian Autoregressive Model with Variable Partial Forgetting
Czech Academy of Sciences Publication Activity Database
Dedecius, Kamil; Nagy, Ivan; Hofman, Radek
Praha : ČVUT v Praze, 2011, s. 1-11. [CTU Workshop 2011. Praha (CZ), 01.02.2011-01.02.2011] Grant ostatní: ČVUT v Praze(CZ) SGS 10/099/OHK3/1T/16 Institutional research plan: CEZ:AV0Z10750506 Keywords : Bayesian modelling * traffic modelling Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2011/AS/dedecius-modelling of traffic flow with bayesian autoregressive model with variable partial forgetting.pdf
Bayesian approach to inverse problems for functions with a variable-index Besov prior
Jia, Junxiong; Peng, Jigen; Gao, Jinghuai
2016-08-01
The Bayesian approach has been adopted to solve inverse problems that reconstruct a function from noisy observations. Prior measures play a key role in the Bayesian method. Hence, many probability measures have been proposed, among which total variation (TV) is a well-known prior measure that can preserve sharp edges. However, it has two drawbacks, the staircasing effect and a lack of the discretization-invariant property. The variable-index TV prior has been proposed and analyzed in the area of image analysis for the former, and the Besov prior has been employed recently for the latter. To overcome both issues together, in this paper, we present a variable-index Besov prior measure, which is a non-Gaussian measure. Some useful properties of this new prior measure have been proven for functions defined on a torus. We have also generalized Bayesian inverse theory in infinite dimensions for our new setting. Finally, this theory has been applied to integer- and fractional-order backward diffusion problems. To the best of our knowledge, this is the first time that the Bayesian approach has been used for the fractional-order backward diffusion problem, which provides an opportunity to quantify its uncertainties.
Bayesian analysis of variable-order, reversible Markov chains
Bacallado, Sergio
2011-01-01
We define a conjugate prior for the reversible Markov chain of order $r$. The prior arises from a partially exchangeable reinforced random walk, in the same way that the Beta distribution arises from the exchangeable Poly\\'{a} urn. An extension to variable-order Markov chains is also derived. We show the utility of this prior in testing the order and estimating the parameters of a reversible Markov model.
Characterizing the Aperiodic Variability of 3XMM Sources using Bayesian Blocks
Salvetti, D.; De Luca, A.; Belfiore, A.; Marelli, M.
2016-06-01
I will present Bayesian blocks algorithm and its application to XMM sources, statistical properties of the entire 3XMM sample, and a few interesting cases. While XMM-Newton is the best suited instrument for the characterization of X-ray source variability, its most recent catalogue (3XMM) reports light curves only for the brightest ones and excludes from its analysis periods of background flares. One aim of the EXTraS ("Exploring the X-ray Transient and variable Sky") project is the characterization of aperiodic variability of as many 3XMM sources as possible on a time scale shorter than the XMM observation. We adapted the original Bayesian blocks algorithm to account for background contamination, including soft proton flares. In addition, we characterized the short-term aperiodic variability performing a number of statistical tests on all the Bayesian blocks light curves. The EXTraS catalogue and products will be released to the community in 2017, together with tools that will allow the user to replicate EXTraS results and extend them through the next decade.
Vrieze, Scott I.
2012-01-01
This article reviews the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) in model selection and the appraisal of psychological theory. The focus is on latent variable models, given their growing use in theory testing and construction. Theoretical statistical results in regression are discussed, and more important…
Bayesian natural selection and the evolution of perceptual systems.
Geisler, Wilson S.; Diehl, Randy L.
2002-01-01
In recent years, there has been much interest in characterizing statistical properties of natural stimuli in order to better understand the design of perceptual systems. A fruitful approach has been to compare the processing of natural stimuli in real perceptual systems with that of ideal observers derived within the framework of Bayesian statistical decision theory. While this form of optimization theory has provided a deeper understanding of the information contained in natural stimuli as w...
Implementation of upper limit calculation for a Poisson variable by Bayesian approach
Institute of Scientific and Technical Information of China (English)
ZHU Yong-Sheng
2008-01-01
The calculation of Bayesian confidence upper limit for a Poisson variable including both signal and background with and without systematic uncertainties has been formulated.A Fortran 77 routine,BPULE,has been developed to implement the calculation.The routine can account for systematic uncertainties in the background expectation and signal efficiency.The systematic uncertainties may be separately parameterized by a Gaussian,Log-Gaussian or fiat probability density function (pdf).Some technical details of BPULE have been discussed.
Directory of Open Access Journals (Sweden)
D. Das
2014-04-01
Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.
Bayesian data fusion for spatial prediction of categorical variables in environmental sciences
Energy Technology Data Exchange (ETDEWEB)
Gengler, Sarah, E-mail: sarahgengler@gmail.com; Bogaert, Patrick, E-mail: sarahgengler@gmail.com [Earth and Life Institute, Environmental Sciences. Université catholique de Louvain, Croix du Sud 2/L7.05.16, B-1348 Louvain-la-Neuve (Belgium)
2014-12-05
First developed to predict continuous variables, Bayesian Maximum Entropy (BME) has become a complete framework in the context of space-time prediction since it has been extended to predict categorical variables and mixed random fields. This method proposes solutions to combine several sources of data whatever the nature of the information. However, the various attempts that were made for adapting the BME methodology to categorical variables and mixed random fields faced some limitations, as a high computational burden. The main objective of this paper is to overcome this limitation by generalizing the Bayesian Data Fusion (BDF) theoretical framework to categorical variables, which is somehow a simplification of the BME method through the convenient conditional independence hypothesis. The BDF methodology for categorical variables is first described and then applied to a practical case study: the estimation of soil drainage classes using a soil map and point observations in the sandy area of Flanders around the city of Mechelen (Belgium). The BDF approach is compared to BME along with more classical approaches, as Indicator CoKringing (ICK) and logistic regression. Estimators are compared using various indicators, namely the Percentage of Correctly Classified locations (PCC) and the Average Highest Probability (AHP). Although BDF methodology for categorical variables is somehow a simplification of BME approach, both methods lead to similar results and have strong advantages compared to ICK and logistic regression.
Using Bayesian Model Selection to Characterize Neonatal Eeg Recordings
Mitchell, Timothy J.
2009-12-01
The brains of premature infants must undergo significant maturation outside of the womb and are thus particularly susceptible to injury. Electroencephalographic (EEG) recordings are an important diagnostic tool in determining if a newborn's brain is functioning normally or if injury has occurred. However, interpreting the recordings is difficult and requires the skills of a trained electroencephelographer. Because these EEG specialists are rare, an automated interpretation of newborn EEG recordings would increase access to an important diagnostic tool for physicians. To automate this procedure, we employ Bayesian probability theory to compute the posterior probability for the EEG features of interest and use the results in a program designed to mimic EEG specialists. Specifically, we will be identifying waveforms of varying frequency and amplitude, as well as periods of flat recordings where brain activity is minimal.
Influences of variables on ship collision probability in a Bayesian belief network model
International Nuclear Information System (INIS)
The influences of the variables in a Bayesian belief network model for estimating the role of human factors on ship collision probability in the Gulf of Finland are studied for discovering the variables with the largest influences and for examining the validity of the network. The change in the so-called causation probability is examined while observing each state of the network variables and by utilizing sensitivity and mutual information analyses. Changing course in an encounter situation is the most influential variable in the model, followed by variables such as the Officer of the Watch's action, situation assessment, danger detection, personal condition and incapacitation. The least influential variables are the other distractions on bridge, the bridge view, maintenance routines and the officer's fatigue. In general, the methods are found to agree on the order of the model variables although some disagreements arise due to slightly dissimilar approaches to the concept of variable influence. The relative values and the ranking of variables based on the values are discovered to be more valuable than the actual numerical values themselves. Although the most influential variables seem to be plausible, there are some discrepancies between the indicated influences in the model and literature. Thus, improvements are suggested to the network.
Bayesian estimation in IRT models with missing values in background variables
Directory of Open Access Journals (Sweden)
Christian Aßmann
2015-12-01
Full Text Available Large scale assessment studies typically aim at investigating the relationship between persons competencies and explaining variables. Individual competencies are often estimated by explicitly including explaining background variables into corresponding Item Response Theory models. Since missing values in background variables inevitably occur, strategies to handle the uncertainty related to missing values in parameter estimation are required. We propose to adapt a Bayesian estimation strategy based on Markov Chain Monte Carlo techniques. Sampling from the posterior distribution of parameters is thereby enriched by sampling from the full conditional distribution of the missing values. We consider non-parametric as well as parametric approximations for the full conditional distributions of missing values, thus allowing for a flexible incorporation of metric as well as categorical background variables. We evaluate the validity of our approach with respect to statistical accuracy by a simulation study controlling the missing values generating mechanism. We show that the proposed Bayesian strategy allows for effective comparison of nested model specifications via gauging highest posterior density intervals of all involved model parameters. An illustration of the suggested approach uses data from the National Educational Panel Study on mathematical competencies of fifth grade students.
A survey of Bayesian predictive methods for model assessment, selection and comparison
Directory of Open Access Journals (Sweden)
Aki Vehtari
2012-01-01
Full Text Available To date, several methods exist in the statistical literature formodel assessment, which purport themselves specifically as Bayesian predictive methods. The decision theoretic assumptions on which these methodsare based are not always clearly stated in the original articles, however.The aim of this survey is to provide a unified review of Bayesian predictivemodel assessment and selection methods, and of methods closely related tothem. We review the various assumptions that are made in this context anddiscuss the connections between different approaches, with an emphasis onhow each method approximates the expected utility of using a Bayesianmodel for the purpose of predicting future data.
DEFF Research Database (Denmark)
Burgess, Stephen; Thompson, Simon G; Andrews, G;
2010-01-01
Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context of...... multiple genetic markers measured in multiple studies, based on the analysis of individual participant data. First, for a single genetic marker in one study, we show that the usual ratio of coefficients approach can be reformulated as a regression with heterogeneous error in the explanatory variable. This...... can be implemented using a Bayesian approach, which is next extended to include multiple genetic markers. We then propose a hierarchical model for undertaking a meta-analysis of multiple studies, in which it is not necessary that the same genetic markers are measured in each study. This provides an...
Finding the Most Distant Quasars Using Bayesian Selection Methods
Mortlock, Daniel
2014-01-01
Quasars, the brightly glowing disks of material that can form around the super-massive black holes at the centres of large galaxies, are amongst the most luminous astronomical objects known and so can be seen at great distances. The most distant known quasars are seen as they were when the Universe was less than a billion years old (i.e., $\\sim\\!7%$ of its current age). Such distant quasars are, however, very rare, and so are difficult to distinguish from the billions of other comparably-bright sources in the night sky. In searching for the most distant quasars in a recent astronomical sky survey (the UKIRT Infrared Deep Sky Survey, UKIDSS), there were $\\sim\\!10^3$ apparently plausible candidates for each expected quasar, far too many to reobserve with other telescopes. The solution to this problem was to apply Bayesian model comparison, making models of the quasar population and the dominant contaminating population (Galactic stars) to utilise the information content in the survey measurements. The result wa...
Applications of Bayesian Model Selection to Cosmological Parameters
Trotta, R
2005-01-01
Bayesian evidence is a tool for model comparison which can be used to decide whether the introduction of a new parameter is warranted by data. I show that the usual sampling statistic rejection tests for a null hypothesis can be misleading, since they do not take into account the information content of the data. I review the Laplace approximation and the Savage-Dickey density ratio to compute Bayes factors, which avoid the need of carrying out a computationally demanding multi-dimensional integration. I present a new procedure to forecast the Bayes factor of a future observation by computing the Expected Posterior Odds (ExPO). As an illustration, I consider three key parameters for our understanding of the cosmological concordance model: the spectral tilt of scalar perturbations, the spatial curvature of the Universe and a CDM isocurvature component to the initial conditions which is totally (anti)correlated with the adiabatic mode. I find that current data are not informative enough to draw a conclusion on t...
Energy Technology Data Exchange (ETDEWEB)
Placek, Ben; Knuth, Kevin H. [Physics Department, University at Albany (SUNY), Albany, NY 12222 (United States); Angerhausen, Daniel, E-mail: bplacek@albany.edu, E-mail: kknuth@albany.edu, E-mail: daniel.angerhausen@gmail.com [Department of Physics, Applied Physics, and Astronomy, Rensselear Polytechnic Institute, Troy, NY 12180 (United States)
2014-11-10
EXONEST is an algorithm dedicated to detecting and characterizing the photometric signatures of exoplanets, which include reflection and thermal emission, Doppler boosting, and ellipsoidal variations. Using Bayesian inference, we can test between competing models that describe the data as well as estimate model parameters. We demonstrate this approach by testing circular versus eccentric planetary orbital models, as well as testing for the presence or absence of four photometric effects. In addition to using Bayesian model selection, a unique aspect of EXONEST is the potential capability to distinguish between reflective and thermal contributions to the light curve. A case study is presented using Kepler data recorded from the transiting planet KOI-13b. By considering only the nontransiting portions of the light curve, we demonstrate that it is possible to estimate the photometrically relevant model parameters of KOI-13b. Furthermore, Bayesian model testing confirms that the orbit of KOI-13b has a detectable eccentricity.
Elsheikh, Ahmed H.
2014-02-01
A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems. © 2013 Elsevier Inc.
Variable selection by lasso-type methods
Directory of Open Access Journals (Sweden)
Sohail Chand
2011-09-01
Full Text Available Variable selection is an important property of shrinkage methods. The adaptive lasso is an oracle procedure and can do consistent variable selection. In this paper, we provide an explanation that how use of adaptive weights make it possible for the adaptive lasso to satisfy the necessary and almost sufcient condition for consistent variable selection. We suggest a novel algorithm and give an important result that for the adaptive lasso if predictors are normalised after the introduction of adaptive weights, it makes the adaptive lasso performance identical to the lasso.
A Gene Selection Algorithm using Bayesian Classification Approach
Alok Sharma; Kuldip K. Paliwal
2012-01-01
In this study, we propose a new feature (or gene) selection algorithm using Bayes classification approach. The algorithm can find gene subset crucial for cancer classification problem. Problem statement: Gene identification plays important role in human cancer classification problem. Several feature selection algorithms have been proposed for analyzing and understanding influential genes using gene expression profiles. Approach: The feature selection algorithms aim to explore genes that are c...
Selecting AGN through variability in SN datasets
Boutsia, K.; Leibundgut, B.; Trevese, D.; Vagnetti, F.
2010-01-01
Variability is a main property of active galactic nuclei (AGN) and it was adopted as a selection criterion using multi epoch surveys conducted for the detection of supernovae (SNe). We have used two SN datasets. First we selected the AXAF field of the STRESS project, centered in the Chandra Deep Field South where, besides the deep X-ray surveys also various optical catalogs exist. Our method yielded 132 variable AGN candidates. We then extended our method including the dataset of the ESSENCE ...
Stochastic search variable selection for identifying multiple quantitative trait loci.
Yi, Nengjun; George, Varghese; Allison, David B
2003-07-01
In this article, we utilize stochastic search variable selection methodology to develop a Bayesian method for identifying multiple quantitative trait loci (QTL) for complex traits in experimental designs. The proposed procedure entails embedding multiple regression in a hierarchical normal mixture model, where latent indicators for all markers are used to identify the multiple markers. The markers with significant effects can be identified as those with higher posterior probability included in the model. A simple and easy-to-use Gibbs sampler is employed to generate samples from the joint posterior distribution of all unknowns including the latent indicators, genetic effects for all markers, and other model parameters. The proposed method was evaluated using simulated data and illustrated using a real data set. The results demonstrate that the proposed method works well under typical situations of most QTL studies in terms of number of markers and marker density. PMID:12871920
Variable selection: Current practice in epidemiological studies
S. Walter (Stefan); H.W. Tiemeier (Henning)
2009-01-01
textabstractSelection of covariates is among the most controversial and difficult tasks in epidemiologic analysis. Correct variable selection addresses the problem of confounding in etiologic research and allows unbiased estimation of probabilities in prognostic studies. The aim of this commentary i
Purposeful selection of variables in logistic regression
Directory of Open Access Journals (Sweden)
Williams David Keith
2008-12-01
Full Text Available Abstract Background The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results We show that the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS data. Conclusion If an analyst is in need of an algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.
Variable Selection in Logistic Regression Mo del
Institute of Scientific and Technical Information of China (English)
ZHANG Shangli; ZHANG Lili; QIU Kuanmin; LU Ying; CAI Baigen
2015-01-01
Variable selection is one of the most impor-tant problems in pattern recognition. In linear regression model, there are many methods can solve this problem, such as Least absolute shrinkage and selection operator (LASSO) and many improved LASSO methods, but there are few variable selection methods in generalized linear models. We study the variable selection problem in logis-tic regression model. We propose a new variable selection method–the logistic elastic net, prove that it has grouping eff ect which means that the strongly correlated predictors tend to be in or out of the model together. The logistic elastic net is particularly useful when the number of pre-dictors (p) is much bigger than the number of observations (n). By contrast, the LASSO is not a very satisfactory vari-able selection method in the case when p is more larger than n. The advantage and eff ectiveness of this method are demonstrated by real leukemia data and a simulation study.
Directory of Open Access Journals (Sweden)
Brentani Helena
2004-08-01
Full Text Available Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE, "Digital Northern" or Massively Parallel Signature Sequencing (MPSS, is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries" and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
Emery, A. F.; Valenti, E.; Bardot, D.
2007-01-01
Parameter estimation is generally based upon the maximum likelihood approach and often involves regularization. Typically it is desired that the results be unbiased and of minimum variance. However, it is often better to accept biased estimates that have minimum mean square error. Bayesian inference is an attractive approach that achieves this goal and incorporates regularization automatically. More importantly, it permits us to analyse experiments in which both the system response and the independent variables (time, sensor position, experimental conditions, etc) are corrupted by noise and in which the model includes nuisance variables. This paper describes the use of Bayesian inference for an apparently simple experiment which is, in fact, fundamentally difficult and is compounded by a nuisance variable. By presenting this analysis we hope that members of the inverse community will see the value of applying Bayesian inference.
Within-subject consistency and between-subject variability in Bayesian reasoning strategies.
Cohen, Andrew L; Staub, Adrian
2015-09-01
It is well known that people tend to perform poorly when asked to determine a posterior probability on the basis of a base rate, true positive rate, and false positive rate. The present experiments assessed the extent to which individual participants nevertheless adopt consistent strategies in these Bayesian reasoning problems, and investigated the nature of these strategies. In two experiments, one laboratory-based and one internet-based, each participant completed 36 problems with factorially manipulated probabilities. Many participants applied consistent strategies involving use of only one of the three probabilities provided in the problem, or additive combination of two of the probabilities. There was, however, substantial variability across participants in which probabilities were taken into account. In the laboratory experiment, participants' eye movements were tracked as they read the problems. There was evidence of a relationship between information use and attention to a source of information. Participants' self-assessments of their performance, however, revealed little confidence that the strategies they applied were actually correct. These results suggest that the hypothesis of base rate neglect actually underestimates people's difficulty with Bayesian reasoning, but also suggest that participants are aware of their ignorance. PMID:26354671
Barroso, L M A; Teodoro, P E; Nascimento, M; Torres, F E; Dos Santos, A; Corrêa, A M; Sagrilo, E; Corrêa, C C G; Silva, F A; Ceccon, G
2016-01-01
This study aimed to verify that a Bayesian approach could be used for the selection of upright cowpea genotypes with high adaptability and phenotypic stability, and the study also evaluated the efficiency of using informative and minimally informative a priori distributions. Six trials were conducted in randomized blocks, and the grain yield of 17 upright cowpea genotypes was assessed. To represent the minimally informative a priori distributions, a probability distribution with high variance was used, and a meta-analysis concept was adopted to represent the informative a priori distributions. Bayes factors were used to conduct comparisons between the a priori distributions. The Bayesian approach was effective for selection of upright cowpea genotypes with high adaptability and phenotypic stability using the Eberhart and Russell method. Bayes factors indicated that the use of informative a priori distributions provided more accurate results than minimally informative a priori distributions. PMID:26985961
Bayesian Model Selection and Prediction with Empirical Applications
Phillips, Peter C.B.
1992-01-01
This paper builds on some recent work by the author and Werner Ploberger (1991, 1994) on the development of "Bayes models" for time series and on the authors' model selection criterion "PIC." The PIC criterion is used in this paper to determine the lag order, the trend degree, and the presence or absence of a unit root in an autoregression with deterministic trend. A new forecast encompassing test for Bayes models is developed which allows one Bayes model to be compared with another on the ba...
QUASAR SELECTION BASED ON PHOTOMETRIC VARIABILITY
International Nuclear Information System (INIS)
We develop a method for separating quasars from other variable point sources using Sloan Digital Sky Survey (SDSS) Stripe 82 light-curve data for ∼ 10,000 variable objects. To statistically describe quasar variability, we use a damped random walk model parametrized by a damping timescale, τ, and an asymptotic amplitude (structure function), SF∞. With the aid of an SDSS spectroscopically confirmed quasar sample, we demonstrate that variability selection in typical extragalactic fields with low stellar density can deliver complete samples with reasonable purity (or efficiency, E). Compared to a selection method based solely on the slope of the structure function, the inclusion of the τ information boosts E from 60% to 75% while maintaining a highly complete sample (98%) even in the absence of color information. For a completeness of C = 90%, E is boosted from 80% to 85%. Conversely, C improves from 90% to 97% while maintaining E = 80% when imposing a lower limit on τ. With the aid of color selection, the purity can be further boosted to 96%, with C = 93%. Hence, selection methods based on variability will play an important role in the selection of quasars with data provided by upcoming large sky surveys, such as Pan-STARRS and the Large Synoptic Survey Telescope (LSST). For a typical (simulated) LSST cadence over 10 years and a photometric accuracy of 0.03 mag (achieved at i ∼ 22), C is expected to be 88% for a simple sample selection criterion of >100 days. In summary, given an adequate survey cadence, photometric variability provides an even better method than color selection for separating quasars from stars.
Variable and subset selection in PLS regression
DEFF Research Database (Denmark)
Høskuldsson, Agnar
2001-01-01
The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...... obtained by different methods. We also present an approach to orthogonal scatter correction. The procedures and comparisons are applied to industrial data. (C) 2001 Elsevier Science B.V. All rights reserved....
Heart rate variability estimation in photoplethysmography signals using Bayesian learning approach.
Alqaraawi, Ahmed; Alwosheel, Ahmad; Alasaad, Amr
2016-06-01
Heart rate variability (HRV) has become a marker for various health and disease conditions. Photoplethysmography (PPG) sensors integrated in wearable devices such as smart watches and phones are widely used to measure heart activities. HRV requires accurate estimation of time interval between consecutive peaks in the PPG signal. However, PPG signal is very sensitive to motion artefact which may lead to poor HRV estimation if false peaks are detected. In this Letter, the authors propose a probabilistic approach based on Bayesian learning to better estimate HRV from PPG signal recorded by wearable devices and enhance the performance of the automatic multi scale-based peak detection (AMPD) algorithm used for peak detection. The authors' experiments show that their approach enhances the performance of the AMPD algorithm in terms of number of HRV related metrics such as sensitivity, positive predictive value, and average temporal resolution. PMID:27382483
The selective bleed variable cycle engine
Nascimento, M. A. R.
1992-01-01
A new concept in aircraft propulsion is described in this work. In particular, variable jet engine is investigated for supersonic ASTOVL aircraft. This engine is a Selective Bleed Variable Cycle, twin shaft turbofan. At low flight speeds the engine operates as a medium bypass turbofan. At supersonic cruise it operates as low bypass turbofan without reheat. The performance of the engine and its components is analyzed using a novel matching procedure. Off-design engine performance characterist...
A new variable selection method for classification
Directory of Open Access Journals (Sweden)
Nuñez Letamendia,Laura
2007-01-01
Full Text Available This work proposes an “ad hoc” new method for variable selection in classification, specifically in Discriminant Analysis. This new method is based on the metaheuristic strategy Tabu Search. From a computational point of view variable selection is a NP-Hard problem and therefore there is no guarantee of finding the optimum solution (NP = Nondeterministic Polynomial Time. This means that when the size of the problem is large finding an optimum solution in practice is unfeasible. As found in other optimization problems, metaheuristic techniques have proved to be good at solving this type of problems. Although there are many references in the literature regarding selecting variables for their use in classification, there are very few key references on the selection of variables for their use in Discriminant Analysis. In fact, the most well-known statistical packages continue to use classic selection methods as Stepwise, Backward or Forward. After performing some tests it is found that Tabu Search obtains significantly better results than the Stepwise, Backward or Forward methods used by classic statistical packages.
Bayesian model selection applied to artificial neural networks used for water resources modeling
Kingston, Greer B.; Maier, Holger R.; Lambert, Martin F.
2008-04-01
Artificial neural networks (ANNs) have proven to be extremely valuable tools in the field of water resources engineering. However, one of the most difficult tasks in developing an ANN is determining the optimum level of complexity required to model a given problem, as there is no formal systematic model selection method. This paper presents a Bayesian model selection (BMS) method for ANNs that provides an objective approach for comparing models of varying complexity in order to select the most appropriate ANN structure. The approach uses Markov Chain Monte Carlo posterior simulations to estimate the evidence in favor of competing models and, in this study, three known methods for doing this are compared in terms of their suitability for being incorporated into the proposed BMS framework for ANNs. However, it is acknowledged that it can be particularly difficult to accurately estimate the evidence of ANN models. Therefore, the proposed BMS approach for ANNs incorporates a further check of the evidence results by inspecting the marginal posterior distributions of the hidden-to-output layer weights, which unambiguously indicate any redundancies in the hidden layer nodes. The fact that this check is available is one of the greatest advantages of the proposed approach over conventional model selection methods, which do not provide such a test and instead rely on the modeler's subjective choice of selection criterion. The advantages of a total Bayesian approach to ANN development, including training and model selection, are demonstrated on two synthetic and one real world water resources case study.
Schöniger, Anneli; Wöhling, Thomas; Samaniego, Luis; Nowak, Wolfgang
2014-12-01
Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible.
Variable Selection for Latent Dirichlet Allocation
Kim, Dongwoo; Oh, Alice
2012-01-01
In latent Dirichlet allocation (LDA), topics are multinomial distributions over the entire vocabulary. However, the vocabulary usually contains many words that are not relevant in forming the topics. We adopt a variable selection method widely used in statistical modeling as a dimension reduction tool and combine it with LDA. In this variable selection model for LDA (vsLDA), topics are multinomial distributions over a subset of the vocabulary, and by excluding words that are not informative for finding the latent topic structure of the corpus, vsLDA finds topics that are more robust and discriminative. We compare three models, vsLDA, LDA with symmetric priors, and LDA with asymmetric priors, on heldout likelihood, MCMC chain consistency, and document classification. The performance of vsLDA is better than symmetric LDA for likelihood and classification, better than asymmetric LDA for consistency and classification, and about the same in the other comparisons.
Coping with Trial-to-Trial Variability of Event Related Signals: A Bayesian Inference Approach
Ding, Mingzhou; Chen, Youghong; Knuth, Kevin H.; Bressler, Steven L.; Schroeder, Charles E.
2005-01-01
In electro-neurophysiology, single-trial brain responses to a sensory stimulus or a motor act are commonly assumed to result from the linear superposition of a stereotypic event-related signal (e.g. the event-related potential or ERP) that is invariant across trials and some ongoing brain activity often referred to as noise. To extract the signal, one performs an ensemble average of the brain responses over many identical trials to attenuate the noise. To date, h s simple signal-plus-noise (SPN) model has been the dominant approach in cognitive neuroscience. Mounting empirical evidence has shown that the assumptions underlying this model may be overly simplistic. More realistic models have been proposed that account for the trial-to-trial variability of the event-related signal as well as the possibility of multiple differentially varying components within a given ERP waveform. The variable-signal-plus-noise (VSPN) model, which has been demonstrated to provide the foundation for separation and characterization of multiple differentially varying components, has the potential to provide a rich source of information for questions related to neural functions that complement the SPN model. Thus, being able to estimate the amplitude and latency of each ERP component on a trial-by-trial basis provides a critical link between the perceived benefits of the VSPN model and its many concrete applications. In this paper we describe a Bayesian approach to deal with this issue and the resulting strategy is referred to as the differentially Variable Component Analysis (dVCA). We compare the performance of dVCA on simulated data with Independent Component Analysis (ICA) and analyze neurobiological recordings from monkeys performing cognitive tasks.
International Nuclear Information System (INIS)
In making low-level radioactivity measurements of populations, it is commonly observed that a substantial portion of net results are negative. Furthermore, the observed variance of the measurement results arises from a combination of measurement uncertainty and population variability. This paper presents a method for disaggregating measurement uncertainty from population variability to produce a probability density function (PDF) of possibly true results. To do this, simple, justifiable, and reasonable assumptions are made about the relationship of the measurements to the measurands (the 'true values'). The measurements are assumed to be unbiased, that is, that their average value is the average of the measurands. Using traditional estimates of each measurement's uncertainty to disaggregate population variability from measurement uncertainty, a PDF of measurands for the population is produced. Then, using Bayes's theorem, the same assumptions, and all the data from the population of individuals, a prior PDF is computed for each individual's measurand. These PDFs are non-negative, and their average is equal to the average of the measurement results for the population. The uncertainty in these Bayesian posterior PDFs is all Berkson with no remaining classical component. The methods are applied to baseline bioassay data from the Hanford site. The data include 90Sr urinalysis measurements on 128 people, 137Cs in vivo measurements on 5,337 people, and 239Pu urinalysis measurements on 3,270 people. The method produces excellent results for the 90Sr and 137Cs measurements, since there are nonzero concentrations of these global fallout radionuclides in people who have not been occupationally exposed. The method does not work for the 239Pu measurements in non-occupationally exposed people because the population average is essentially zero.
Seidou, O.; Asselin, J. J.; Ouarda, T. B. M. J.
2007-08-01
Multivariate linear regression is one of the most popular modeling tools in hydrology and climate sciences for explaining the link between key variables. Piecewise linear regression is not always appropriate since the relationship may experiment sudden changes due to climatic, environmental, or anthropogenic perturbations. To address this issue, a practical and general approach to the Bayesian analysis of the multivariate regression model is presented. The approach allows simultaneous single change point detection in a multivariate sample and can account for missing data in the response variables and/or in the explicative variables. It also improves on recently published change point detection methodologies by allowing a more flexible and thus more realistic prior specification for the existence of a change and the date of change as well as for the regression parameters. The estimation of all unknown parameters is achieved by Monte Carlo Markov chain simulations. It is shown that the developed approach is able to reproduce the results of Rasmussen (2001) as well as those of Perreault et al. (2000a, 2000b). Furthermore, two of the examples provided in the paper show that the proposed methodology can readily be applied to some problems that cannot be addressed by any of the above-mentioned approaches because of limiting model structure and/or restrictive prior assumptions. The first of these examples deals with single change point detection in the multivariate linear relationship between mean basin-scale precipitation at different periods of the year and the summer-autumn flood peaks of the Broadback River located in northern Quebec, Canada. The second one addresses the problem of missing data estimation with uncertainty assessment in multisite streamflow records with a possible simultaneous shift in mean streamflow values that occurred at an unknown date.
Variable Selection in Model-based Clustering: A General Variable Role Modeling
Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure
2008-01-01
The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally indepe...
Bayesian model selection without evidences: application to the dark energy equation-of-state
Hee, Sonke; Hobson, Mike P; Lasenby, Anthony N
2015-01-01
A method is presented for Bayesian model selection without explicitly computing evidences, by using a combined likelihood and introducing an integer model selection parameter $n$ so that Bayes factors, or more generally posterior odds ratios, may be read off directly from the posterior of $n$. If the total number of models under consideration is specified a priori, the full joint parameter space $(\\theta, n)$ of the models is of fixed dimensionality and can be explored using standard MCMC or nested sampling methods, without the need for reversible jump MCMC techniques. The posterior on $n$ is then obtained by straightforward marginalisation. We demonstrate the efficacy of our approach by application to several toy models. We then apply it to constraining the dark energy equation-of-state using a free-form reconstruction technique. We show that $\\Lambda$CDM is significantly favoured over all extensions, including the simple $w(z){=}{\\rm constant}$ model.
Directory of Open Access Journals (Sweden)
Markus Krauss
Full Text Available Interindividual variability in anatomical and physiological properties results in significant differences in drug pharmacokinetics. The consideration of such pharmacokinetic variability supports optimal drug efficacy and safety for each single individual, e.g. by identification of individual-specific dosings. One clear objective in clinical drug development is therefore a thorough characterization of the physiological sources of interindividual variability. In this work, we present a Bayesian population physiologically-based pharmacokinetic (PBPK approach for the mechanistically and physiologically realistic identification of interindividual variability. The consideration of a generic and highly detailed mechanistic PBPK model structure enables the integration of large amounts of prior physiological knowledge, which is then updated with new experimental data in a Bayesian framework. A covariate model integrates known relationships of physiological parameters to age, gender and body height. We further provide a framework for estimation of the a posteriori parameter dependency structure at the population level. The approach is demonstrated considering a cohort of healthy individuals and theophylline as an application example. The variability and co-variability of physiological parameters are specified within the population; respectively. Significant correlations are identified between population parameters and are applied for individual- and population-specific visual predictive checks of the pharmacokinetic behavior, which leads to improved results compared to present population approaches. In the future, the integration of a generic PBPK model into an hierarchical approach allows for extrapolations to other populations or drugs, while the Bayesian paradigm allows for an iterative application of the approach and thereby a continuous updating of physiological knowledge with new data. This will facilitate decision making e.g. from preclinical to
DEFF Research Database (Denmark)
Jensen, Finn Verner; Nielsen, Thomas Dyhre
2016-01-01
Mathematically, a Bayesian graphical model is a compact representation of the joint probability distribution for a set of variables. The most frequently used type of Bayesian graphical models are Bayesian networks. The structural part of a Bayesian graphical model is a graph consisting of nodes and...... largely due to the availability of efficient inference algorithms for answering probabilistic queries about the states of the variables in the network. Furthermore, to support the construction of Bayesian network models, learning algorithms are also available. We give an overview of the Bayesian network...
A Bayesian Network Approach for Offshore Risk Analysis Through Linguistic Variables
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
This paper presents a new approach for offshore risk analysis that is capable of dealing with linguistic probabilities in Bayesian networks (BNs). In this paper, linguistic probabilities are used to describe occurrence likelihood of hazardous events that may cause possible accidents in offshore operations. In order to use fuzzy information, an f-weighted valuation function is proposed to transform linguistic judgements into crisp probability distributions which can be easily put into a BN to model causal relationships among risk factors. The use of linguistic variables makes it easier for human experts to express their knowledge, and the transformation of linguistic judgements into crisp probabilities can significantly save the cost of computation, modifying and maintaining a BN model. The flexibility of the method allows for multiple forms of information to be used to quantify model relationships, including formally assessed expert opinion when quantitative data are lacking, or when only qualitative or vague statements can be made. The model is a modular representation of uncertain knowledge caused due to randomness, vagueness and ignorance. This makes the risk analysis of offshore engineering systems more functional and easier in many assessment contexts. Specifically, the proposed f-weighted valuation function takes into account not only the dominating values, but also the α-level values that are ignored by conventional valuation methods. A case study of the collision risk between a Floating Production, Storage and Off-loading (FPSO) unit and the authorised vessels due to human elements during operation is used to illustrate the application of the proposed model.
Gamma prior distribution selection for Bayesian analysis of failure rate and reliability
International Nuclear Information System (INIS)
It is assumed that the phenomenon under study is such that the time-to-failure may be modeled by an exponential distribution with failure rate lambda. For Bayesian analyses of the assumed model, the family of gamma distributions provides conjugate prior models for lambda. Thus, an experimenter needs to select a particular gamma model to conduct a Bayesian reliability analysis. The purpose of this report is to present a methodology that can be used to translate engineering information, experience, and judgment into a choice of a gamma prior distribution. The proposed methodology assumes that the practicing engineer can provide percentile data relating to either the failure rate or the reliability of the phenomenon being investigated. For example, the methodology will select the gamma prior distribution which conveys an engineer's belief that the failure rate lambda simultaneously satisfies the probability statements, P(lambda less than 1.0 x 10-3) equals 0.50 and P(lambda less than 1.0 x 10-5) equals 0.05. That is, two percentiles provided by an engineer are used to determine a gamma prior model which agrees with the specified percentiles. For those engineers who prefer to specify reliability percentiles rather than the failure rate percentiles illustrated above, it is possible to use the induced negative-log gamma prior distribution which satisfies the probability statements, P(R(t0) less than 0.99) equals 0.50 and P(R(t0) less than 0.99999) equals 0.95, for some operating time t0. The report also includes graphs for selected percentiles which assist an engineer in applying the procedure. 28 figures, 16 tables
Gamma prior distribution selection for Bayesian analysis of failure rate and reliability
Energy Technology Data Exchange (ETDEWEB)
Waller, R.A.; Johnson, M.M.; Waterman, M.S.; Martz, H.F. Jr.
1976-07-01
It is assumed that the phenomenon under study is such that the time-to-failure may be modeled by an exponential distribution with failure rate lambda. For Bayesian analyses of the assumed model, the family of gamma distributions provides conjugate prior models for lambda. Thus, an experimenter needs to select a particular gamma model to conduct a Bayesian reliability analysis. The purpose of this report is to present a methodology that can be used to translate engineering information, experience, and judgment into a choice of a gamma prior distribution. The proposed methodology assumes that the practicing engineer can provide percentile data relating to either the failure rate or the reliability of the phenomenon being investigated. For example, the methodology will select the gamma prior distribution which conveys an engineer's belief that the failure rate lambda simultaneously satisfies the probability statements, P(lambda less than 1.0 x 10/sup -3/) equals 0.50 and P(lambda less than 1.0 x 10/sup -5/) equals 0.05. That is, two percentiles provided by an engineer are used to determine a gamma prior model which agrees with the specified percentiles. For those engineers who prefer to specify reliability percentiles rather than the failure rate percentiles illustrated above, it is possible to use the induced negative-log gamma prior distribution which satisfies the probability statements, P(R(t/sub 0/) less than 0.99) equals 0.50 and P(R(t/sub 0/) less than 0.99999) equals 0.95, for some operating time t/sub 0/. The report also includes graphs for selected percentiles which assist an engineer in applying the procedure. 28 figures, 16 tables.
Maximum Likelihood Bayesian Averaging of Spatial Variability Models in Unsaturated Fractured Tuff
International Nuclear Information System (INIS)
Hydrologic analyses typically rely on a single conceptual-mathematical model. Yet hydrologic environments are open and complex, rendering them prone to multiple interpretations and mathematical descriptions. Adopting only one of these may lead to statistical bias and underestimation of uncertainty. Bayesian Model Averaging (BMA) provides an optimal way to combine the predictions of several competing models and to assess their joint predictive uncertainty. However, it tends to be computationally demanding and relies heavily on prior information about model parameters. We apply a maximum likelihood (ML) version of BMA (MLBMA) to seven alternative variogram models of log air permeability data from single-hole pneumatic injection tests in six boreholes at the Apache Leap Research Site (ALRS) in central Arizona. Unbiased ML estimates of variogram and drift parameters are obtained using Adjoint State Maximum Likelihood Cross Validation in conjunction with Universal Kriging and Generalized L east Squares. Standard information criteria provide an ambiguous ranking of the models, which does not justify selecting one of them and discarding all others as is commonly done in practice. Instead, we eliminate some of the models based on their negligibly small posterior probabilities and use the rest to project the measured log permeabilities by kriging onto a rock volume containing the six boreholes. We then average these four projections, and associated kriging variances, using the posterior probability of each model as weight. Finally, we cross-validate the results by eliminating from consideration all data from one borehole at a time, repeating the above process, and comparing the predictive capability of MLBMA with that of each individual model. We find that MLBMA is superior to any individual geostatistical model of log permeability among those we consider at the ALRS
Amene, E; Hanson, L A; Zahn, E A; Wild, S R; Döpfer, D
2016-07-01
The purpose of this study was to apply a novel statistical method for variable selection and a model-based approach for filling data gaps in mortality rates associated with foodborne diseases using the WHO Vital Registration mortality dataset. Correlation analysis and elastic net regularization methods were applied to drop redundant variables and to select the most meaningful subset of predictors. Whenever predictor data were missing, multiple imputation was used to fill in plausible values. Cluster analysis was applied to identify similar groups of countries based on the values of the predictors. Finally, a Bayesian hierarchical regression model was fit to the final dataset for predicting mortality rates. From 113 potential predictors, 32 were retained after correlation analysis. Out of these 32 predictors, eight with non-zero coefficients were selected using the elastic net regularization method. Based on the values of these variables, four clusters of countries were identified. The uncertainty of predictions was large for countries within clusters lacking mortality rates, and it was low for a cluster that had mortality rate information. Our results demonstrated that, using Bayesian hierarchical regression models, a data-driven clustering of countries and a meaningful subset of predictors can be used to fill data gaps in foodborne disease mortality. PMID:26785774
Schöniger, A.; Nowak, W.; Wöhling, T.
2013-12-01
Bayesian model averaging (BMA) combines the predictive capabilities of alternative conceptual models into a robust best estimate and allows the quantification of conceptual uncertainty. The individual models are weighted with their posterior probability according to Bayes' theorem. Despite this rigorous procedure, we see four obstacles to robust model ranking: (1) The weights inherit uncertainty related to measurement noise in the calibration data set, which may compromise the reliability of model ranking. (2) Posterior weights rank the models only relative to each other, but do not contain information about the absolute model performance. (3) There is a lack of objective methods to assess whether the suggested models are practically distinguishable or very similar to each other, i.e., whether the individual models explore different regions of the model space. (4) No theory for optimal design (OD) of experiments exists that explicitly aims at maximum-confidence model discrimination. The goal of our study is to overcome these four shortcomings. We determine the robustness of weights against measurement noise (1) by repeatedly perturbing the observed data with random measurement errors and analyzing the variability in the obtained weights. Realizing that model weights have a probability distribution of their own, we introduce an additional term into the overall prediction uncertainty analysis scheme which we call 'weighting uncertainty'. We further assess an 'absolute distance' in performance of the model set from the truth (2) as seen through the eyes of the data by interpreting statistics of Bayesian model evidence. This analysis is of great value for modellers to decide, if the modelling task can be satisfactorily carried out with the model(s) at hand, or if more effort should be invested in extending the set with better performing models. As a further prerequisite for robust model selection, we scrutinize the ability of BMA to distinguish between the models in
Bayesian Nonparametric Graph Clustering
Banerjee, Sayantan; Akbani, Rehan; Baladandayuthapani, Veerabhadran
2015-01-01
We present clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables. As opposed to standard approaches that assume known graph structures, we first estimate the edge structure of the unknown graph using Bayesian neighborhood selection approaches, wherein we account for the uncertainty of graphical structure learning through model-averaged estimates of the suitable parameters. Subsequently, we develop a nonparametric graph cluster...
Variables influencing victim selection in genocide.
Komar, Debra A
2008-01-01
While victims of racially motivated violence may be identified through observation of morphological features, those targeted because of their ethnic, religious, or national identity are not easily recognized. This study examines how perpetrators of genocide recognize their victims. Court documents, including indictments, witness statements, and testimony from the International Criminal Tribunals for Rwanda and the former Yugoslavia (FY) detail the interactions between victim and assailant. A total of 6012 decedents were included in the study; only 20.8% had been positively identified. Variables influencing victim selection in Rwanda included location, segregation, incitement, and prior relationship, while significant factors in FY were segregation, location, age/gender, and social data. Additional contributing factors in both countries included self-identification, victim behavior, linguistic or clothing evidence, and morphological features. Understanding the system of recognition used by perpetrators aids investigators tasked with establishing victim identity in such prosecutions. PMID:18005010
Selection of Trusted Service Providers by Enforcing Bayesian Analysis in iVCE
Institute of Scientific and Technical Information of China (English)
GU Bao-jun; LI Xiao-yong; WANG Wei-nong
2008-01-01
The initiative of internet-based virtual computing environment (iVCE) aims to provide the end users and applications With a harmonious, trustworthy and transparent integrated computing environment which will facilitate sharing and collaborating of network resources between applications. Trust management is an elementary component for iVCE. The uncertain and dynamic characteristics of iVCE necessitate the requirement for the trust management to be subjective, historical evidence based and context dependent. This paper presents a Bayesian analysis-based trust model, which aims to secure the active agents for selecting appropriate trustod services in iVCE. Simulations are made to analyze the properties of the trust model which show that the subjective prior information influences trust evaluation a lot and the model stimulates positive interactions.
Directory of Open Access Journals (Sweden)
Kok-Chin Khor
2009-01-01
Full Text Available Problem statement: Implementing a single or multiple classifiers that involve a Bayesian Network (BN is a rising research interest in network intrusion detection domain. Approach: However, little attention has been given to evaluate the performance of BN classifiers before they could be implemented in a real system. In this research, we proposed a novel approach to select important features by utilizing two selected feature selection algorithms utilizing filter approach. Results: The selected features were further validated by domain experts where extra features were added into the final proposed feature set. We then constructed three types of BN namely, Naive Bayes Classifiers (NBC, Learned BN and Expert-elicited BN by utilizing a standard network intrusion dataset. The performance of each classifier was recorded. We found that there was no difference in overall performance of the BNs and therefore, concluded that the BNs performed equivalently well in detecting network attacks. Conclusion/Recommendations: The results of the study indicated that the BN built using the proposed feature set has less features but the performance was comparable to BNs built using other feature sets generated by the two algorithms.
Bayesian model selection validates a biokinetic model for zirconium processing in humans
Directory of Open Access Journals (Sweden)
Schmidl Daniel
2012-08-01
Full Text Available Abstract Background In radiation protection, biokinetic models for zirconium processing are of crucial importance in dose estimation and further risk analysis for humans exposed to this radioactive substance. They provide limiting values of detrimental effects and build the basis for applications in internal dosimetry, the prediction for radioactive zirconium retention in various organs as well as retrospective dosimetry. Multi-compartmental models are the tool of choice for simulating the processing of zirconium. Although easily interpretable, determining the exact compartment structure and interaction mechanisms is generally daunting. In the context of observing the dynamics of multiple compartments, Bayesian methods provide efficient tools for model inference and selection. Results We are the first to apply a Markov chain Monte Carlo approach to compute Bayes factors for the evaluation of two competing models for zirconium processing in the human body after ingestion. Based on in vivo measurements of human plasma and urine levels we were able to show that a recently published model is superior to the standard model of the International Commission on Radiological Protection. The Bayes factors were estimated by means of the numerically stable thermodynamic integration in combination with a recently developed copula-based Metropolis-Hastings sampler. Conclusions In contrast to the standard model the novel model predicts lower accretion of zirconium in bones. This results in lower levels of noxious doses for exposed individuals. Moreover, the Bayesian approach allows for retrospective dose assessment, including credible intervals for the initially ingested zirconium, in a significantly more reliable fashion than previously possible. All methods presented here are readily applicable to many modeling tasks in systems biology.
Variable selection in model-based discriminant analysis
Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure
2010-01-01
A general methodology for selecting predictors for Gaussian generative classification models is presented. The problem is regarded as a model selection problem. Three different roles for each possible predictor are considered: a variable can be a relevant classification predictor or not, and the irrelevant classification variables can be linearly dependent on a part of the relevant predictors or independent variables. This variable selection model was inspired by the model-based clustering mo...
International Nuclear Information System (INIS)
A general adaptive modeling algorithm for selection and validation of coarse-grained models of atomistic systems is presented. A Bayesian framework is developed to address uncertainties in parameters, data, and model selection. Algorithms for computing output sensitivities to parameter variances, model evidence and posterior model plausibilities for given data, and for computing what are referred to as Occam Categories in reference to a rough measure of model simplicity, make up components of the overall approach. Computational results are provided for representative applications
Comparison of Two Gas Selection Methodologies: An Application of Bayesian Model Averaging
Energy Technology Data Exchange (ETDEWEB)
Renholds, Andrea S.; Thompson, Sandra E.; Anderson, Kevin K.; Chilton, Lawrence K.
2006-03-31
One goal of hyperspectral imagery analysis is the detection and characterization of plumes. Characterization includes identifying the gases in the plumes, which is a model selection problem. Two gas selection methods compared in this report are Bayesian model averaging (BMA) and minimum Akaike information criterion (AIC) stepwise regression (SR). Simulated spectral data from a three-layer radiance transfer model were used to compare the two methods. Test gases were chosen to span the types of spectra observed, which exhibit peaks ranging from broad to sharp. The size and complexity of the search libraries were varied. Background materials were chosen to either replicate a remote area of eastern Washington or feature many common background materials. For many cases, BMA and SR performed the detection task comparably in terms of the receiver operating characteristic curves. For some gases, BMA performed better than SR when the size and complexity of the search library increased. This is encouraging because we expect improved BMA performance upon incorporation of prior information on background materials and gases.
A Bayesian Approach to Service Selection for Secondary Users in Cognitive Radio Networks
Directory of Open Access Journals (Sweden)
Elaheh Homayounvala
2015-10-01
Full Text Available In cognitive radio networks where secondary users (SUs use the time-frequency gaps of primary users' (PUs licensed spectrum opportunistically, the experienced throughput of SUs depend not only on the traffic load of the PUs but also on the PUs' service type. Each service has its own pattern of channel usage, and if the SUs know the dominant pattern of primary channel usage, then they can make a better decision on choosing which service is better to be used at a specific time to get the best advantage of the primary channel, in terms of higher achievable throughput. However, it is difficult to inform directly SUs of PUs' dominant used services in each area, for practical reasons. This paper proposes a learning mechanism embedded in SUs to sense the primary channel for a specific length of time. This algorithm recommends the SUs upon sensing a free primary channel, to choose the best service in order to get the best performance, in terms of maximum achieved throughput and the minimum experienced delay. The proposed learning mechanism is based on a Bayesian approach that can predict the performance of a requested service for a given SU. Simulation results show that this service selection method outperforms the blind opportunistic SU service selection, significantly.
International Nuclear Information System (INIS)
Highlights: • Deduce secondary structure content of intrinsically disordered proteins from IR spectra. • Bayesian analysis to infer conformations of disordered regions of proteins from IR. • Comparison of measured and calculated IR spectra to obtain thermodynamic weights. - Abstract: As it remains practically impossible to generate ergodic ensembles for large intrinsically disordered proteins (IDP) with molecular dynamics (MD) simulations, it becomes critical to compare spectroscopic characteristics of the theoretically generated ensembles to corresponding measurements. We develop a Bayesian framework to infer the ensemble properties of an IDP using a combination of conformations generated by MD simulations and its measured infrared spectrum. We performed 100 different MD simulations totaling more than 10 μs to characterize the conformational ensemble of α-synuclein, a prototypical IDP, in water. These conformations are clustered based on solvent accessibility and helical content. We compute the amide-I band for these clusters and predict the thermodynamic weights of each cluster given the measured amide-I band. Bayesian analysis produces a reproducible and non-redundant set of thermodynamic weights for each cluster, which can then be used to calculate the ensemble properties. In a rigorous validation, these weights reproduce measured chemical shifts
Bayesian model selection for testing the no-hair theorem with black hole ringdowns
Gossan, S; Sathyaprakash, B S
2011-01-01
General relativity predicts that a black hole that results from the merger of two compact stars (either black holes or neutron stars) is initially highly deformed but soon settles down to a quiescent state by emitting a superposition of quasi-normal modes (QNMs). The QNMs are damped sinusoids with characteristic frequencies and decay times that depend only on the mass and spin of the black hole and no other parameter - a statement of the no-hair theorem. In this paper we have examined the extent to which QNMs could be used to test the no-hair theorem with future ground- and space-based gravitational-wave detectors. We model departures from general relativity (GR) by introducing extra parameters which change the mode frequencies or decay times from their general relativistic values. With the aid of numerical simulations and Bayesian model selection, we assess the extent to which the presence of such a parameter could be inferred, and its value estimated. We find that it is harder to decipher the departure of d...
Schöniger, Anneli; Illman, Walter A.; Wöhling, Thomas; Nowak, Wolfgang
2015-12-01
Groundwater modelers face the challenge of how to assign representative parameter values to the studied aquifer. Several approaches are available to parameterize spatial heterogeneity in aquifer parameters. They differ in their conceptualization and complexity, ranging from homogeneous models to heterogeneous random fields. While it is common practice to invest more effort into data collection for models with a finer resolution of heterogeneities, there is a lack of advice which amount of data is required to justify a certain level of model complexity. In this study, we propose to use concepts related to Bayesian model selection to identify this balance. We demonstrate our approach on the characterization of a heterogeneous aquifer via hydraulic tomography in a sandbox experiment (Illman et al., 2010). We consider four increasingly complex parameterizations of hydraulic conductivity: (1) Effective homogeneous medium, (2) geology-based zonation, (3) interpolation by pilot points, and (4) geostatistical random fields. First, we investigate the shift in justified complexity with increasing amount of available data by constructing a model confusion matrix. This matrix indicates the maximum level of complexity that can be justified given a specific experimental setup. Second, we determine which parameterization is most adequate given the observed drawdown data. Third, we test how the different parameterizations perform in a validation setup. The results of our test case indicate that aquifer characterization via hydraulic tomography does not necessarily require (or justify) a geostatistical description. Instead, a zonation-based model might be a more robust choice, but only if the zonation is geologically adequate.
Bayesian Credit Ratings (new version)
Paola Cerchiello; Paolo Giudici
2013-01-01
In this contribution we aim at improving ordinal variable selection in the context of causal models. In this regard, we propose an approach that provides a formal inferential tool to compare the explanatory power of each covariate, and, therefore, to select an effective model for classification purposes. Our proposed model is Bayesian nonparametric, and, thus, keeps the amount of model specification to a minimum. We consider the case in which information from the covariates is at the ordinal ...
ANALYSIS OF BAYESIAN CLASSIFIER ACCURACY
Directory of Open Access Journals (Sweden)
Felipe Schneider Costa
2013-01-01
Full Text Available The naÃ¯ve Bayes classifier is considered one of the most effective classification algorithms today, competing with more modern and sophisticated classifiers. Despite being based on unrealistic (naÃ¯ve assumption that all variables are independent, given the output class, the classifier provides proper results. However, depending on the scenario utilized (network structure, number of samples or training cases, number of variables, the network may not provide appropriate results. This study uses a process variable selection, using the chi-squared test to verify the existence of dependence between variables in the data model in order to identify the reasons which prevent a Bayesian network to provide good performance. A detailed analysis of the data is also proposed, unlike other existing work, as well as adjustments in case of limit values between two adjacent classes. Furthermore, variable weights are used in the calculation of a posteriori probabilities, calculated with mutual information function. Tests were applied in both a naÃ¯ve Bayesian network and a hierarchical Bayesian network. After testing, a significant reduction in error rate has been observed. The naÃ¯ve Bayesian network presented a drop in error rates from twenty five percent to five percent, considering the initial results of the classification process. In the hierarchical network, there was not only a drop in fifteen percent error rate, but also the final result came to zero.
Hug, Sabine Carolin
2015-01-01
In this thesis we use differential equations for mathematically representing biological processes. For this we have to infer the associated parameters for fitting the differential equations to measurement data. If the structure of the ODE itself is uncertain, model selection methods have to be applied. We refine several existing Bayesian methods, ranging from an adaptive scheme for the computation of high-dimensional integrals to multi-chain Metropolis-Hastings algorithms for high-dimensional...
Energy Technology Data Exchange (ETDEWEB)
Schoelzel, C. [Bonn Univ. (Germany). Meteorologisches Inst.
2006-07-01
This thesis presents the development of statistical climatological-botanical transfer functions in order to provide reconstructions of Holocene climate variability in the Near East region. Two classical concepts, the biomisation as well as the indicator taxa approach, are translated into a Bayesian network. Fossil pollen spectra of laminated sediments from the Ein Gedi location at the western shoreline of the Dead Sea and from the crater lake Birkat Ram in the northern Golan serve as proxy data, covering the past 10000 and 6500 years, respectively. The climatological variables are winter temperature, summer temperature, and annual precipitation, obtained from the 0.5 x 0.5 degree climatology CRU TS 1.0. The Bayesian biome model is based on the three main vegetation territories, the Mediterranean, the Irano-Turanian, and the Saharo-Arabian territory, which are digitized on the same grid as the climate data. From their spatial extend, a classification in the phase space is described by estimating the conditional probability for the existence of a certain biome given the climate. These biome specific likelihood functions are modelled by a generalised linear model, including second order monomials of the climate variables. A statistical mixture model is applied to the biome probabilities as estimated by the Ein Gedi data, resulting in a posterior probability density function for the three dimensional climate state vector. The indicator taxa model is based on the distribution of 15 Mediterranean taxa. Their spatial extend allows to estimate the taxon specific likelihood functions. In this case, they are conditional probability density functions for the climate state vector given the existence of a certain taxon. In order to address the general problem of multivariate non-normally distributed populations, multivariate normal Copulas are used, which allow to create distribution functions with gamma as well as normal marginal distributions. Applying the model to the Birkat
Using Instrumental Variables Properly to Account for Selection Effects
Porter, Stephen R.
2012-01-01
Selection bias is problematic when evaluating the effects of postsecondary interventions on college students, and can lead to biased estimates of program effects. While instrumental variables can be used to account for endogeneity due to self-selection, current practice requires that all five assumptions of instrumental variables be met in order…
Directory of Open Access Journals (Sweden)
W David Walter
Full Text Available Bovine tuberculosis is a bacterial disease caused by Mycobacterium bovis in livestock and wildlife with hosts that include Eurasian badgers (Meles meles, brushtail possum (Trichosurus vulpecula, and white-tailed deer (Odocoileus virginianus. Risk-assessment efforts in Michigan have been initiated on farms to minimize interactions of cattle with wildlife hosts but research on M. bovis on cattle farms has not investigated the spatial context of disease epidemiology. To incorporate spatially explicit data, initial likelihood of infection probabilities for cattle farms tested for M. bovis, prevalence of M. bovis in white-tailed deer, deer density, and environmental variables for each farm were modeled in a Bayesian hierarchical framework. We used geo-referenced locations of 762 cattle farms that have been tested for M. bovis, white-tailed deer prevalence, and several environmental variables that may lead to long-term survival and viability of M. bovis on farms and surrounding habitats (i.e., soil type, habitat type. Bayesian hierarchical analyses identified deer prevalence and proportion of sandy soil within our sampling grid as the most supported model. Analysis of cattle farms tested for M. bovis identified that for every 1% increase in sandy soil resulted in an increase in odds of infection by 4%. Our analysis revealed that the influence of prevalence of M. bovis in white-tailed deer was still a concern even after considerable efforts to prevent cattle interactions with white-tailed deer through on-farm mitigation and reduction in the deer population. Cattle farms test positive for M. bovis annually in our study area suggesting that the potential for an environmental source either on farms or in the surrounding landscape may contributing to new or re-infections with M. bovis. Our research provides an initial assessment of potential environmental factors that could be incorporated into additional modeling efforts as more knowledge of deer herd
Furtado-Junior, I; Abrunhosa, F A; Holanda, F C A F; Tavares, M C S
2016-06-01
Fishing selectivity of the mangrove crab Ucides cordatus in the north coast of Brazil can be defined as the fisherman's ability to capture and select individuals from a certain size or sex (or a combination of these factors) which suggests an empirical selectivity. Considering this hypothesis, we calculated the selectivity curves for males and females crabs using the logit function of the logistic model in the formulation. The Bayesian inference consisted of obtaining the posterior distribution by applying the Markov chain Monte Carlo (MCMC) method to software R using the OpenBUGS, BRugs, and R2WinBUGS libraries. The estimated results of width average carapace selection for males and females compared with previous studies reporting the average width of the carapace of sexual maturity allow us to confirm the hypothesis that most mature individuals do not suffer from fishing pressure; thus, ensuring their sustainability. PMID:26934154
Cha, YoonKyung; Soon Park, Seok; Won Lee, Hye; Stow, Craig A.
2016-01-01
Modeling to accurately predict river phytoplankton distribution and abundance is important in water quality and resource management. Nevertheless, the complex nature of eutrophication processes in highly connected river systems makes the task challenging. To model dynamics of river phytoplankton, represented by chlorophyll a (Chl a) concentration, we propose a Bayesian hierarchical model that explicitly accommodates seasonality and upstream-downstream spatial gradient in the structure. The utility of our model is demonstrated with an application to the Nakdong River (South Korea), which is a eutrophic, intensively regulated river, but functions as an irreplaceable water source for more than 13 million people. Chl a is modeled with two manageable factors, river flow, and total phosphorus (TP) concentration. Our model results highlight the importance of taking seasonal and spatial context into account when describing flow regimes and phosphorus delivery in rivers. A contrasting positive Chl a-flow relationship across stations versus negative Chl a-flow slopes that arose when Chl a was modeled on a station-month basis is an illustration of Simpson's paradox, which necessitates modeling Chl a-flow relationships decomposed into seasonal and spatial components. Similar Chl a-TP slopes among stations and months suggest that, with the flow effect removed, positive TP effects on Chl a are uniform regardless of the season and station in the river. Our model prediction successfully captured the shift in the spatial and monthly patterns of Chl a.
Research on Some Questions About Selection of Independent Variables
Institute of Scientific and Technical Information of China (English)
TAO Jing-xuan
2002-01-01
The paper studies four methods about selection of independent variables in multivariate analysis. In general condition, advanced statistical method and backward statistical method could not obtain the best subset of independent variables. It is possibly affected by the orders of variables or associations among variables. When multicollinearity is presented in a set of explanatory variables-abnormal state, it is not effective to use the method, although stepwise regression and optimum selecting method of total subsets is widely used.According to this case, the paper proposes a new method which combines deleting variables with ingredient analysis and is used in research and science practically.The important characteristic of this paper is that it gives some examples to support each conclusion.
A Variable-Selection Heuristic for K-Means Clustering.
Brusco, Michael J.; Cradit, J. Dennis
2001-01-01
Presents a variable selection heuristic for nonhierarchical (K-means) cluster analysis based on the adjusted Rand index for measuring cluster recovery. Subjected the heuristic to Monte Carlo testing across more than 2,200 datasets. Results indicate that the heuristic is extremely effective at eliminating masking variables. (SLD)
A numeric comparison of variable selection algorithms for supervised learning
International Nuclear Information System (INIS)
Datasets in modern High Energy Physics (HEP) experiments are often described by dozens or even hundreds of input variables. Reducing a full variable set to a subset that most completely represents information about data is therefore an important task in analysis of HEP data. We compare various variable selection algorithms for supervised learning using several datasets such as, for instance, imaging gamma-ray Cherenkov telescope (MAGIC) data found at the UCI repository. We use classifiers and variable selection methods implemented in the statistical package StatPatternRecognition (SPR), a free open-source C++ package developed in the HEP community ( (http://sourceforge.net/projects/statpatrec/)). For each dataset, we select a powerful classifier and estimate its learning accuracy on variable subsets obtained by various selection algorithms. When possible, we also estimate the CPU time needed for the variable subset selection. The results of this analysis are compared with those published previously for these datasets using other statistical packages such as R and Weka. We show that the most accurate, yet slowest, method is a wrapper algorithm known as generalized sequential forward selection ('Add N Remove R') implemented in SPR.
Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal
Rao, J. N. K.
2011-01-01
According to Hansen, Madow and Tepping [J. Amer. Statist. Assoc. 78 (1983) 776--793], "Probability sampling designs and randomization inference are widely accepted as the standard approach in sample surveys." In this article, reasons are advanced for the wide use of this design-based approach, particularly by federal agencies and other survey organizations conducting complex large scale surveys on topics related to public policy. Impact of Bayesian methods in survey sampling is also discussed...
Directory of Open Access Journals (Sweden)
Rodríguez-Prieto Víctor
2012-08-01
Full Text Available Abstract Background Bovine tuberculosis (bTB is a chronic infectious disease mainly caused by Mycobacterium bovis. Although eradication is a priority for the European authorities, bTB remains active or even increasing in many countries, causing significant economic losses. The integral consideration of epidemiological factors is crucial to more cost-effectively allocate control measures. The aim of this study was to identify the nature and extent of the association between TB distribution and a list of potential risk factors regarding cattle, wild ungulates and environmental aspects in Ciudad Real, a Spanish province with one of the highest TB herd prevalences. Results We used a Bayesian mixed effects multivariable logistic regression model to predict TB occurrence in either domestic or wild mammals per municipality in 2007 by using information from the previous year. The municipal TB distribution and endemicity was clustered in the western part of the region and clearly overlapped with the explanatory variables identified in the final model: (1 incident cattle farms, (2 number of years of veterinary inspection of big game hunting events, (3 prevalence in wild boar, (4 number of sampled cattle, (5 persistent bTB-infected cattle farms, (6 prevalence in red deer, (7 proportion of beef farms, and (8 farms devoted to bullfighting cattle. Conclusions The combination of these eight variables in the final model highlights the importance of the persistence of the infection in the hosts, surveillance efforts and some cattle management choices in the circulation of M. bovis in the region. The spatial distribution of these variables, together with particular Mediterranean features that favour the wildlife-livestock interface may explain the M. bovis persistence in this region. Sanitary authorities should allocate efforts towards specific areas and epidemiological situations where the wildlife-livestock interface seems to critically hamper the definitive b
Variable selection and estimation for longitudinal survey data
Wang, Li
2014-09-01
There is wide interest in studying longitudinal surveys where sample subjects are observed successively over time. Longitudinal surveys have been used in many areas today, for example, in the health and social sciences, to explore relationships or to identify significant variables in regression settings. This paper develops a general strategy for the model selection problem in longitudinal sample surveys. A survey weighted penalized estimating equation approach is proposed to select significant variables and estimate the coefficients simultaneously. The proposed estimators are design consistent and perform as well as the oracle procedure when the correct submodel was known. The estimating function bootstrap is applied to obtain the standard errors of the estimated parameters with good accuracy. A fast and efficient variable selection algorithm is developed to identify significant variables for complex longitudinal survey data. Simulated examples are illustrated to show the usefulness of the proposed methodology under various model settings and sampling designs. © 2014 Elsevier Inc.
DEFF Research Database (Denmark)
2010-01-01
Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context...... an overall estimate of the causal relationship between the phenotype and the outcome, and an assessment of its heterogeneity across studies. As an example, we estimate the causal relationship of blood concentrations of C-reactive protein on fibrinogen levels using data from 11 studies. These methods provide...... a flexible framework for efficient estimation of causal relationships derived from multiple studies. Issues discussed include weak instrument bias, analysis of binary outcome data such as disease risk, missing genetic data, and the use of haplotypes....
Deng, Bai-chuan; Yun, Yong-huan; Liang, Yi-zeng; Yi, Lun-zhao
2014-10-01
In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/. PMID:25083512
Ball, Jessica Lynne
Light Detection and Ranging (LiDAR) data has shown great potential to estimate spatially explicit forest variables, including above-ground biomass, stem density, tree height, and more. Due to its ability to garner information about the vertical and horizontal structure of forest canopies effectively and efficiently, LiDAR sensors have played a key role in the development of operational air and space-borne instruments capable of gathering information about forest structure at regional, continental, and global scales. Combining LiDAR datasets with field-based validation measurements to build predictive models is becoming an attractive solution to the problem of quantifying and mapping forest structure for private forest land owners and local, state, and federal government entities alike. As with any statistical model using spatially indexed data, the potential to violate modeling assumptions resulting from spatial correlation is high. This thesis explores several different modeling frameworks that aim to accommodate correlation structures within model residuals. The development is motivated using LiDAR and forest inventory datasets. Special attention is paid to estimation and propagation of parameter and model uncertainty through to prediction units. Inference follows a Bayesian statistical paradigm. Results suggest the proposed frameworks help ensure model assumptions are met and prediction performance can be improved by pursuing spatially enabled models.
Bayesian feature weighting for unsupervised learning, with application to object recognition
Carbonetto, Peter; De Freitas, Nando; Gustafson, Paul; Thompson, Natalie
2003-01-01
We present a method for variable selection/weighting in an unsupervised learning context using Bayesian shrinkage. The basis for the model parameters and cluster assignments can be computed simultaneous using an efficient EM algorithm. Applying our Bayesian shrinkage model to a complex problem in object recognition (Duygulu, Barnard, de Freitas and Forsyth 2002), our experiments yied good results.
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-01-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the Central Nervous System selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of...
Directory of Open Access Journals (Sweden)
Mabaso Musawenkosi LH
2007-09-01
Full Text Available Abstract Background Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana. Results Of 50 potential explanatory variables from eight environmental data themes, 42 were significantly associated with malaria prevalence in univariate logistic regression and were ranked by the Akaike Information Criterion. Those correlated with higher-ranking relatives of the same environmental theme, were temporarily excluded. The remaining 14 candidates were ranked by selection frequency after running automated step-wise selection procedures on 1000 bootstrap samples drawn from the data. A non-spatial multiple-variable model was developed through step-wise inclusion in order of selection frequency. Previously excluded variables were then re-evaluated for inclusion, using further step-wise bootstrap procedures, resulting in the exclusion of another variable. Finally a Bayesian geo-statistical model using Markov Chain Monte Carlo simulation was fitted to the data, resulting in a final model of three predictor variables, namely summer rainfall, mean annual temperature and altitude. Each was independently and significantly associated with malaria prevalence after allowing for spatial correlation. This model was used to predict malaria prevalence at unobserved locations, producing a smooth risk map for the whole country. Conclusion We have
Variable Selection in the Partially Linear Errors-in-Variables Models for Longitudinal Data
Institute of Scientific and Technical Information of China (English)
Yi-ping YANG; Liu-gen XUE; Wei-hu CHENG
2012-01-01
This paper proposes a new approach for variable selection in partially linear errors-in-variables (EV) models for longitudinal data by penalizing appropriate estimating functions.We apply the SCAD penalty to simultaneously select significant variables and estimate unknown parameters.The rate of convergence and the asymptotic normality of the resulting estimators are established.Furthermore,with proper choice of regularization parameters,we show that the proposed estimators perform as well as the oracle procedure.A new algorithm is proposed for solving penalized estimating equation.The asymptotic results are augmented by a simulation study.
Sparse covariance thresholding for high-dimensional variable selection
Daye, X. Jessie Jeng And Z. John
2010-01-01
In high-dimensions, many variable selection methods, such as the lasso, are often limited by excessive variability and rank deficiency of the sample covariance matrix. Covariance sparsity is a natural phenomenon in high-dimensional applications, such as microarray analysis, image processing, etc., in which a large number of predictors are independent or weakly correlated. In this paper, we propose the covariance-thresholded lasso, a new class of regression methods that can utilize covariance ...
Bayesian model selection framework for identifying growth patterns in filamentous fungi.
Lin, Xiao; Terejanu, Gabriel; Shrestha, Sajan; Banerjee, Sourav; Chanda, Anindya
2016-06-01
This paper describes a rigorous methodology for quantification of model errors in fungal growth models. This is essential to choose the model that best describes the data and guide modeling efforts. Mathematical modeling of growth of filamentous fungi is necessary in fungal biology for gaining systems level understanding on hyphal and colony behaviors in different environments. A critical challenge in the development of these mathematical models arises from the indeterminate nature of their colony architecture, which is a result of processing diverse intracellular signals induced in response to a heterogeneous set of physical and nutritional factors. There exists a practical gap in connecting fungal growth models with measurement data. Here, we address this gap by introducing the first unified computational framework based on Bayesian inference that can quantify individual model errors and rank the statistical models based on their descriptive power against data. We show that this Bayesian model comparison is just a natural formalization of Occam׳s razor. The application of this framework is discussed in comparing three models in the context of synthetic data generated from a known true fungal growth model. This framework of model comparison achieves a trade-off between data fitness and model complexity and the quantified model error not only helps in calibrating and comparing the models, but also in making better predictions and guiding model refinements. PMID:27000772
A New Statistic for Variable Selection in Questionnaire Analysis
Institute of Scientific and Technical Information of China (English)
ZHANG Jun-hua; FANG Wei-wu
2001-01-01
In this paper, a new statistic is proposed for variable selection which is one of the important problems in analysis of questionnaire data. Contrasting to other methods, the approach introduced here can be used not only for two groups of samples but can also be easily generalized to the multi-group case.
Directory of Open Access Journals (Sweden)
Mark I Rowley
Full Text Available We present novel Bayesian methods for the analysis of exponential decay data that exploit the evidence carried by every detected decay event and enables robust extension to advanced processing. Our algorithms are presented in the context of fluorescence lifetime imaging microscopy (FLIM and particular attention has been paid to model the time-domain system (based on time-correlated single photon counting with unprecedented accuracy. We present estimates of decay parameters for mono- and bi-exponential systems, offering up to a factor of two improvement in accuracy compared to previous popular techniques. Results of the analysis of synthetic and experimental data are presented, and areas where the superior precision of our techniques can be exploited in Förster Resonance Energy Transfer (FRET experiments are described. Furthermore, we demonstrate two advanced processing methods: decay model selection to choose between differing models such as mono- and bi-exponential, and the simultaneous estimation of instrument and decay parameters.
Directory of Open Access Journals (Sweden)
Pierre eBerthet
2012-10-01
Full Text Available Several studies have shown a strong involvement of the basal ganglia (BG in action selection and dopamine dependent learning. The dopaminergic signal to striatum, the input stage of the BG, has been commonly described as coding a reward prediction error (RPE, i.e. the difference between the predicted and actual reward. The RPE has been hypothesized to be critical in the modulation of the synaptic plasticity in cortico-striatal synapses in the direct and indirect pathway. We developed an abstract computational model of the BG, with a dual pathway structure functionally corresponding to the direct and indirect pathways, and compared its behaviour to biological data as well as other reinforcement learning models. The computations in our model are inspired by Bayesian inference, and the synaptic plasticity changes depend on a three factor Hebbian-Bayesian learning rule based on co-activation of pre- and post-synaptic units and on the value of the RPE. The model builds on a modified Actor-Critic architecture and implements the direct (Go and the indirect (NoGo pathway, as well as the reward prediction (RP system, acting in a complementary fashion. We investigated the performance of the model system when different configurations of the Go, NoGo and RP system were utilized, e.g. using only the Go, NoGo, or RP system, or combinations of those. Learning performance was investigated in several types of learning paradigms, such as learning-relearning, successive learning, stochastic learning, reversal learning and a two-choice task. The RPE and the activity of the model during learning were similar to monkey electrophysiological and behavioural data. Our results, however, show that there is not a unique best way to configure this BG model to handle well all the learning paradigms tested. We thus suggest that an agent might dynamically configure its action selection mode, possibly depending on task characteristics and also on how much time is available.
Berthet, Pierre; Hellgren-Kotaleski, Jeanette; Lansner, Anders
2012-01-01
Several studies have shown a strong involvement of the basal ganglia (BG) in action selection and dopamine dependent learning. The dopaminergic signal to striatum, the input stage of the BG, has been commonly described as coding a reward prediction error (RPE), i.e., the difference between the predicted and actual reward. The RPE has been hypothesized to be critical in the modulation of the synaptic plasticity in cortico-striatal synapses in the direct and indirect pathway. We developed an abstract computational model of the BG, with a dual pathway structure functionally corresponding to the direct and indirect pathways, and compared its behavior to biological data as well as other reinforcement learning models. The computations in our model are inspired by Bayesian inference, and the synaptic plasticity changes depend on a three factor Hebbian-Bayesian learning rule based on co-activation of pre- and post-synaptic units and on the value of the RPE. The model builds on a modified Actor-Critic architecture and implements the direct (Go) and the indirect (NoGo) pathway, as well as the reward prediction (RP) system, acting in a complementary fashion. We investigated the performance of the model system when different configurations of the Go, NoGo, and RP system were utilized, e.g., using only the Go, NoGo, or RP system, or combinations of those. Learning performance was investigated in several types of learning paradigms, such as learning-relearning, successive learning, stochastic learning, reversal learning and a two-choice task. The RPE and the activity of the model during learning were similar to monkey electrophysiological and behavioral data. Our results, however, show that there is not a unique best way to configure this BG model to handle well all the learning paradigms tested. We thus suggest that an agent might dynamically configure its action selection mode, possibly depending on task characteristics and also on how much time is available. PMID:23060764
International Nuclear Information System (INIS)
We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the ∼30% of γ-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the optical LINEAR asteroid survey, we characterize the optical variability of blazars by fitting a damped random walk model to individual light curves with two main model parameters, the characteristic timescales of variability τ, and driving amplitudes on short timescales σ-circumflex. Imposing cuts on minimum τ and σ-circumflex allows for blazar selection with high efficiency E and completeness C. To test the efficacy of this approach, we apply this method to optically variable LINEAR objects that fall within the several-arcminute error ellipses of γ-ray sources in the Fermi 2FGL catalog. Despite the extreme stellar contamination at the shallow depth of the LINEAR survey, we are able to recover previously associated optical counterparts to Fermi active galactic nuclei with E ≥ 88% and C = 88% in Fermi 95% confidence error ellipses having semimajor axis r < 8'. We find that the suggested radio counterpart to Fermi source 2FGL J1649.6+5238 has optical variability consistent with other γ-ray blazars and is likely to be the γ-ray source. Our results suggest that the variability of the non-thermal jet emission in blazars is stochastic in nature, with unique variability properties due to the effects of relativistic beaming. After correcting for beaming, we estimate that the characteristic timescale of blazar variability is ∼3 years in the rest frame of the jet, in contrast with the ∼320 day disk flux timescale observed in quasars. The variability-based selection method presented will be useful for blazar identification in time-domain optical surveys and is also a probe of jet physics.
Barcella, William; Iorio, Maria De; Baio, Gianluca; Malone-Lee, James
2016-04-15
Lower urinary tract symptoms can indicate the presence of urinary tract infection (UTI), a condition that if it becomes chronic requires expensive and time consuming care as well as leading to reduced quality of life. Detecting the presence and gravity of an infection from the earliest symptoms is then highly valuable. Typically, white blood cell (WBC) count measured in a sample of urine is used to assess UTI. We consider clinical data from 1341 patients in their first visit in which UTI (i.e. WBC ≥ 1) is diagnosed. In addition, for each patient, a clinical profile of 34 symptoms was recorded. In this paper, we propose a Bayesian nonparametric regression model based on the Dirichlet process prior aimed at providing the clinicians with a meaningful clustering of the patients based on both the WBC (response variable) and possible patterns within the symptoms profiles (covariates). This is achieved by assuming a probability model for the symptoms as well as for the response variable. To identify the symptoms most associated to UTI, we specify a spike and slab base measure for the regression coefficients: this induces dependence of symptoms selection on cluster assignment. Posterior inference is performed through Markov Chain Monte Carlo methods. PMID:26536840
Portfolio Selection Based on Distance between Fuzzy Variables
Directory of Open Access Journals (Sweden)
Weiyi Qian
2014-01-01
Full Text Available This paper researches portfolio selection problem in fuzzy environment. We introduce a new simple method in which the distance between fuzzy variables is used to measure the divergence of fuzzy investment return from a prior one. Firstly, two new mathematical models are proposed by expressing divergence as distance, investment return as expected value, and risk as variance and semivariance, respectively. Secondly, the crisp forms of the new models are also provided for different types of fuzzy variables. Finally, several numerical examples are given to illustrate the effectiveness of the proposed approach.
A Bayesian Optimisation Algorithm for the Nurse Scheduling Problem
Jingpeng, Li
2008-01-01
A Bayesian optimization algorithm for the nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurses assignment. Unlike our previous work that used Gas to implement implicit learning, the learning in the proposed algorithm is explicit, ie. Eventually, we will be able to identify and mix building blocks directly. The Bayesian optimization algorithm is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated, ie in our case, a new rule string has been obtained. Another set of rule strings will be generated in this way, some of which will replace previous strings based on fitness selection. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again usin...
Directory of Open Access Journals (Sweden)
Henry de-Graft Acquah
2013-03-01
Full Text Available Alternative formulations of the Bayesian Information Criteria provide a basis for choosing between competing methods for detecting price asymmetry. However, very little is understood about their performance in the asymmetric price transmission modelling framework. In addressing this issue, this paper introduces and applies parametric bootstrap techniques to evaluate the ability of Bayesian Information Criteria (BIC and Draper's Information Criteria (DIC in discriminating between alternative asymmetric price transmission models under various error and sample size conditions. The results of the bootstrap simulations indicate that model selection performance depends on bootstrap sample size and the amount of noise in the data generating process. The Bayesian criterion clearly identifies the true asymmetric model out of different competing models in the presence of bootstrap samples. Draper's Information Criteria (DIC; Draper, 1995 outperforms BIC at either larger bootstrap sample size or lower noise level.
The selection and application of variable order differential operators
Ramirez, Lynnette E. S.
2009-01-01
This work demonstrates the practicality of using variable order (VO) derivative operators for modeling the dynamics of complex systems. First we review the various candidate VO integral and derivative operator definitions proposed in the literature. We select a definition that is appropriate for physical modeling based on the following criteria: the VO operator must be able to return all intermediate values between 0 and 1 that correspond to the argument of the order of differe...
Variable Selection for Marginal Longitudinal Generalized Linear Models
Eva Cantoni; Joanna Mills Flemming; Elvezio Ronchetti
2003-01-01
Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this paper we propose a generalized version of Mallows's Cp (GCp) suitable for use with both parametric and nonparametric models. GCp provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in ...
Filament winding cylinders. III - Selection of the process variables
Lee, Soo-Yong; Springer, George S.
1990-01-01
By using the Lee-Springer filament winding model temperatures, degrees of cure, viscosities, stresses, strains, fiber tensions, fiber motions, and void diameters were calculated in graphite-epoxy composite cylinders during the winding and subsequent curing. The results demonstrate the type of information which can be generated by the model. It is shown, in reference to these results, how the model, and the corresponding WINDTHICK code, can be used to select the appropriate process variables.
Mahalanobis distance and variable selection to optimize dose response
International Nuclear Information System (INIS)
A battery of statistical techniques are combined to improve detection of low-level dose response. First, Mahalanobis distances are used to classify objects as normal or abnormal. Then the proportion classified abnormal is regressed on dose. Finally, a subset of regressor variables is selected which maximizes the slope of the dose response line. Use of the techniques is illustrated by application to mouse sperm damaged by low doses of x-rays
Multi-scale inference of interaction rules in animal groups using Bayesian model selection.
Directory of Open Access Journals (Sweden)
Richard P Mann
Full Text Available Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis. We show that these exhibit a stereotypical 'phase transition', whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have 'memory' of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture the observed locality of interactions. Traditional self-propelled particle models fail to capture the fine scale dynamics of the system. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics, while maintaining a biologically plausible perceptual range. We conclude that prawns' movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects.
Multi-scale inference of interaction rules in animal groups using Bayesian model selection.
Directory of Open Access Journals (Sweden)
Richard P Mann
2012-01-01
Full Text Available Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis. We show that these exhibit a stereotypical 'phase transition', whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have 'memory' of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture fine scale rules of interaction, which are primarily mediated by physical contact. Conversely, the Markovian self-propelled particle model captures the fine scale rules of interaction but fails to reproduce global dynamics. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics. We conclude that prawns' movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects.
Martínez, Isabel; Wiegand, Thorsten; Camarero, J Julio; Batllori, Enric; Gutiérrez, Emilia
2011-05-01
Alpine tree-line ecotones are characterized by marked changes at small spatial scales that may result in a variety of physiognomies. A set of alternative individual-based models was tested with data from four contrasting Pinus uncinata ecotones in the central Spanish Pyrenees to reveal the minimal subset of processes required for tree-line formation. A Bayesian approach combined with Markov chain Monte Carlo methods was employed to obtain the posterior distribution of model parameters, allowing the use of model selection procedures. The main features of real tree lines emerged only in models considering nonlinear responses in individual rates of growth or mortality with respect to the altitudinal gradient. Variation in tree-line physiognomy reflected mainly changes in the relative importance of these nonlinear responses, while other processes, such as dispersal limitation and facilitation, played a secondary role. Different nonlinear responses also determined the presence or absence of krummholz, in agreement with recent findings highlighting a different response of diffuse and abrupt or krummholz tree lines to climate change. The method presented here can be widely applied in individual-based simulation models and will turn model selection and evaluation in this type of models into a more transparent, effective, and efficient exercise. PMID:21508601
Bayesian Variable Selection to identify QTL affecting a simulated quantitative trait
Schurink, A.; Janss, L.L.G.; Heuven, H.C.M.
2012-01-01
Background Recent developments in genetic technology and methodology enable accurate detection of QTL and estimation of breeding values, even in individuals without phenotypes. The QTL-MAS workshop offers the opportunity to test different methods to perform a genome-wide association study on simulat
Bayesian Variable Selection to identify QTL affecting a simulated quantitative trait
Schurink, A.; Janss, L.L.G.; Heuven, H.C.M.
2012-01-01
Abstract Background: Recent developments in genetic technology and methodology enable accurate detection of QTL and estimation of breeding values, even in individuals without phenotypes. The QTL-MAS workshop offers the opportunity to test different methods to perform a genome-wide association study
Embryologic changes in rabbit lines selected for litter size variability.
García, M L; Blasco, A; Argente, M J
2016-09-15
A divergent selection experiment on litter size variability was carried out. Correlated response in early embryo survival, embryonic development, size of embryos, and size of embryonic coats after four generations of selection was estimated. A total of 429 embryos from 51 high-line females and 648 embryos from 80 low-line females were used in the experiment. The traits studied were percentage of normal embryos, embryo diameter, zona pellucida thickness, and mucin coat thickness. Traits were measured at 24, 48, and 72 hours postcoitum (hpc); mucin coat thickness was only measured at 48 and 72 hpc. The embryos were classified as zygotes or two-cell embryos at 24 hpc; 16-cell embryos or early morulae at 48 hpc; and early morulae, compacted morulae, or blastocyst at 72 hpc. At 24 hpc, the percentage of normal embryos in the high line was lower than in the low line (-2.5%), and embryos in the high line showed 10% higher zona pellucida thickness than those of the low line. No differences in percentage of zygotes or two-cell embryos were found. At 48 hpc, the high-line embryos were less developed, with a higher percentage of 16-cell embryos (23.4%) and a lower percentage of early morulae (-23.4%). At 72 hpc, high-line embryos continued to be less developed, showing higher percentages of early morulae and compact morulae and lower percentages of blastocyst (-1.8%). No differences in embryo diameter or mucin coat thickness were found at any time. In conclusion, selection for litter size variability has consequences on early embryonic survival and development, with embryos presenting a lower state of development and a lower percentage of normal embryos in the line selected for higher variability. PMID:27207473
Isoenzymatic variability in tropical maize populations under reciprocal recurrent selection
Directory of Open Access Journals (Sweden)
Pinto Luciana Rossini
2003-01-01
Full Text Available Maize (Zea mays L. is one of the crops in which the genetic variability has been extensively studied at isoenzymatic loci. The genetic variability of the maize populations BR-105 and BR-106, and the synthetics IG-3 and IG-4, obtained after one cycle of a high-intensity reciprocal recurrent selection (RRS, was investigated at seven isoenzymatic loci. A total of twenty alleles were identified, and most of the private alleles were found in the BR-106 population. One cycle of reciprocal recurrent selection (RRS caused reductions of 12% in the number of alleles in both populations. Changes in allele frequencies were also observed between populations and synthetics, mainly for the Est 2 locus. Populations presented similar values for the number of alleles per locus, percentage of polymorphic loci, and observed and expected heterozygosities. A decrease of the genetic variation values was observed for the synthetics as a consequence of genetic drift effects and reduction of the effective population sizes. The distribution of the genetic diversity within and between populations revealed that most of the diversity was maintained within them, i.e. BR-105 x BR-106 (G ST = 3.5% and IG-3 x IG-4 (G ST = 4.0%. The genetic distances between populations and synthetics increased approximately 21%. An increase in the genetic divergence between the populations occurred without limiting new selection procedures.
A Selective Overview of Variable Selection in High Dimensional Feature Space.
Fan, Jianqing; Lv, Jinchi
2010-01-01
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976
Directory of Open Access Journals (Sweden)
Kaski Kimmo
2007-05-01
Full Text Available Abstract Background A key challenge in metabonomics is to uncover quantitative associations between multidimensional spectroscopic data and biochemical measures used for disease risk assessment and diagnostics. Here we focus on clinically relevant estimation of lipoprotein lipids by 1H NMR spectroscopy of serum. Results A Bayesian methodology, with a biochemical motivation, is presented for a real 1H NMR metabonomics data set of 75 serum samples. Lipoprotein lipid concentrations were independently obtained for these samples via ultracentrifugation and specific biochemical assays. The Bayesian models were constructed by Markov chain Monte Carlo (MCMC and they showed remarkably good quantitative performance, the predictive R-values being 0.985 for the very low density lipoprotein triglycerides (VLDL-TG, 0.787 for the intermediate, 0.943 for the low, and 0.933 for the high density lipoprotein cholesterol (IDL-C, LDL-C and HDL-C, respectively. The modelling produced a kernel-based reformulation of the data, the parameters of which coincided with the well-known biochemical characteristics of the 1H NMR spectra; particularly for VLDL-TG and HDL-C the Bayesian methodology was able to clearly identify the most characteristic resonances within the heavily overlapping information in the spectra. For IDL-C and LDL-C the resulting model kernels were more complex than those for VLDL-TG and HDL-C, probably reflecting the severe overlap of the IDL and LDL resonances in the 1H NMR spectra. Conclusion The systematic use of Bayesian MCMC analysis is computationally demanding. Nevertheless, the combination of high-quality quantification and the biochemical rationale of the resulting models is expected to be useful in the field of metabonomics.
Estimation and variable selection for generalized additive partial linear models
Wang, Li
2011-08-01
We study generalized additive partial linear models, proposing the use of polynomial spline smoothing for estimation of nonparametric functions, and deriving quasi-likelihood based estimators for the linear parameters. We establish asymptotic normality for the estimators of the parametric components. The procedure avoids solving large systems of equations as in kernel-based procedures and thus results in gains in computational simplicity. We further develop a class of variable selection procedures for the linear parameters by employing a nonconcave penalized quasi-likelihood, which is shown to have an asymptotic oracle property. Monte Carlo simulations and an empirical example are presented for illustration. © Institute of Mathematical Statistics, 2011.
Penalized maximum likelihood estimation and variable selection in geostatistics
Chu, Tingjin; Wang, Haonan; 10.1214/11-AOS919
2012-01-01
We consider the problem of selecting covariates in spatial linear models with Gaussian process errors. Penalized maximum likelihood estimation (PMLE) that enables simultaneous variable selection and parameter estimation is developed and, for ease of computation, PMLE is approximated by one-step sparse estimation (OSE). To further improve computational efficiency, particularly with large sample sizes, we propose penalized maximum covariance-tapered likelihood estimation (PMLE$_{\\mathrm{T}}$) and its one-step sparse estimation (OSE$_{\\mathrm{T}}$). General forms of penalty functions with an emphasis on smoothly clipped absolute deviation are used for penalized maximum likelihood. Theoretical properties of PMLE and OSE, as well as their approximations PMLE$_{\\mathrm{T}}$ and OSE$_{\\mathrm{T}}$ using covariance tapering, are derived, including consistency, sparsity, asymptotic normality and the oracle properties. For covariance tapering, a by-product of our theoretical results is consistency and asymptotic normal...
Secondary eclipses in the CoRoT light curves: A homogeneous search based on Bayesian model selection
Parviainen, Hannu; Belmonte, Juan Antonio
2012-01-01
We aim to identify and characterize secondary eclipses in the original light curves of all published CoRoT planets using uniform detection and evaluation critetia. Our analysis is based on a Bayesian model selection between two competing models: one with and one without an eclipse signal. The search is carried out by mapping the Bayes factor in favor of the eclipse model as a function of the eclipse center time, after which the characterization of plausible eclipse candidates is done by estimating the posterior distributions of the eclipse model parameters using Markov Chain Monte Carlo. We discover statistically significant eclipse events for two planets, CoRoT-6b and CoRoT-11b, and for one brown dwarf, CoRoT-15b. We also find marginally significant eclipse events passing our plausibility criteria for CoRoT-3b, 13b, 18b, and 21b. The previously published CoRoT-1b and CoRoT-2b eclipses are also confirmed.
The Tabu Search Procedure: An Alternative to the Variable Selection Methods
Mills, Jamie, D.; Olejnik, Stephen, F.; Marcoulides, George, A.
2005-01-01
The effectiveness of the Tabu variable selection algorithm, to identify predictor variables related to a criterion variable, is compared with the stepwise variable selection method and the all possible regression approach. Considering results obtained from previous research, Tabu is more successful in identifying relevant variables than the…
Villforth, Carolin; Koekemoer, Anton M.; Grogin, Norman A.
2010-01-01
Variability is a property shared by practically all AGN. This makes variability selection a possible technique for identifying AGN. Given that variability selection makes no prior assumption about spectral properties, it is a powerful technique for detecting both low-luminosity AGN in which the host galaxy emission is dominating and AGN with unusual spectral properties. In this paper, we will discuss and test different statistical methods for the detection of variability in sparsely sampled d...
Identifying relevant positions in proteins by Critical Variable Selection.
Grigolon, Silvia; Franz, Silvio; Marsili, Matteo
2016-06-21
Evolution in its course has found a variety of solutions to the same optimisation problem. The advent of high-throughput genomic sequencing has made available extensive data from which, in principle, one can infer the underlying structure on which biological functions rely. In this paper, we present a new method aimed at the extraction of sites encoding structural and functional properties from a set of protein primary sequences, namely a multiple sequence alignment. The method, called critical variable selection, is based on the idea that subsets of relevant sites correspond to subsequences that occur with a particularly broad frequency distribution in the dataset. By applying this algorithm to in silico sequences, to the response regulator receiver and to the voltage sensor domain of ion channels, we show that this procedure recovers not only the information encoded in single site statistics and pairwise correlations but also captures dependencies going beyond pairwise correlations. The method proposed here is complementary to statistical coupling analysis, in that the most relevant sites predicted by the two methods differ markedly. We find robust and consistent results for datasets as small as few hundred sequences that reveal a hidden hierarchy of sites that are consistent with the present knowledge on biologically relevant sites and evolutionary dynamics. This suggests that critical variable selection is capable of identifying a core of sites encoding functional and structural information in a multiple sequence alignment. PMID:26974515
Sudre, Carole H; Cardoso, M Jorge; Bouvy, Willem H; Biessels, Geert Jan; Barnes, Josephine; Ourselin, Sebastien
2015-10-01
In neuroimaging studies, pathologies can present themselves as abnormal intensity patterns. Thus, solutions for detecting abnormal intensities are currently under investigation. As each patient is unique, an unbiased and biologically plausible model of pathological data would have to be able to adapt to the subject's individual presentation. Such a model would provide the means for a better understanding of the underlying biological processes and improve one's ability to define pathologically meaningful imaging biomarkers. With this aim in mind, this work proposes a hierarchical fully unsupervised model selection framework for neuroimaging data which enables the distinction between different types of abnormal image patterns without pathological a priori knowledge. Its application on simulated and clinical data demonstrated the ability to detect abnormal intensity clusters, resulting in a competitive to improved behavior in white matter lesion segmentation when compared to three other freely-available automated methods. PMID:25850086
International Nuclear Information System (INIS)
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm-1) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques
Finley, A. O.; Banerjee, S.; Cook, B. D.
2010-12-01
Recent advances in remote sensing, specifically waveform Light Detection and Ranging (LiDAR) sensors, provide the data needed to quantify forest variables at a fine spatial resolution over large domains. Of particular interest is LiDAR data from NASA's Laser Vegetation Imaging Sensor (LVIS), upcoming Deformation, Ecosystem Structure, and Dynamics of Ice (DESDynI) missions, and NSF's National Ecological Observatory Network planned Airborne Observation Platform. A central challenge to using these data is to couple field measurements of forest variables (e.g., species, indices of structural complexity, light competition, or drought stress) with the high-dimensional LiDAR signal through a model, which allows prediction of the tree-level variables at locations where only the remotely sensed data area are available. It is common to model the high-dimensional signal vector as a mixture of a relatively small number of Gaussian distributions. The parameters from these Gaussian distributions, or indices derived from the parameters, can then be used as regressors in a regression model. These approaches retain only a small amount of information contained in the signal. Further, it is not known a priori which features of the signal explain the most variability in the response variables. It is possible to fully exploit the information in the signal by treating it as an object, thus, we define a framework to couple a spatial latent factor model with forest variables using a fully Bayesian functional spatial data analysis. Our proposed modeling framework explicitly: 1) reduces the dimensionality of signals in an optimal way (i.e., preserves the information that describes the maximum variability in response variable); 2) propagates uncertainty in data and parameters through to prediction, and; 3) acknowledges and leverages spatial dependence among the regressors and model residuals to meet statistical assumptions and improve prediction. The proposed modeling framework is
Wöhling, T.; Schöniger, A.; Geiges, A.; Nowak, W.; Gayler, S.
2013-12-01
The objective selection of appropriate models for realistic simulations of coupled soil-plant processes is a challenging task since the processes are complex, not fully understood at larger scales, and highly non-linear. Also, comprehensive data sets are scarce, and measurements are uncertain. In the past decades, a variety of different models have been developed that exhibit a wide range of complexity regarding their approximation of processes in the coupled model compartments. We present a method for evaluating experimental design for maximum confidence in the model selection task. The method considers uncertainty in parameters, measurements and model structures. Advancing the ideas behind Bayesian Model Averaging (BMA), we analyze the changes in posterior model weights and posterior model choice uncertainty when more data are made available. This allows assessing the power of different data types, data densities and data locations in identifying the best model structure from among a suite of plausible models. The models considered in this study are the crop models CERES, SUCROS, GECROS and SPASS, which are coupled to identical routines for simulating soil processes within the modelling framework Expert-N. The four models considerably differ in the degree of detail at which crop growth and root water uptake are represented. Monte-Carlo simulations were conducted for each of these models considering their uncertainty in soil hydraulic properties and selected crop model parameters. Using a Bootstrap Filter (BF), the models were then conditioned on field measurements of soil moisture, matric potential, leaf-area index, and evapotranspiration rates (from eddy-covariance measurements) during a vegetation period of winter wheat at a field site at the Swabian Alb in Southwestern Germany. Following our new method, we derived model weights when using all data or different subsets thereof. We discuss to which degree the posterior mean outperforms the prior mean and all
Bayesian model averaging to explore the worth of data for soil-plant model selection and prediction
Wöhling, Thomas; Schöniger, Anneli; Gayler, Sebastian; Nowak, Wolfgang
2015-04-01
A Bayesian model averaging (BMA) framework is presented to evaluate the worth of different observation types and experimental design options for (1) more confidence in model selection and (2) for increased predictive reliability. These two modeling tasks are handled separately because model selection aims at identifying the most appropriate model with respect to a given calibration data set, while predictive reliability aims at reducing uncertainty in model predictions through constraining the plausible range of both models and model parameters. For that purpose, we pursue an optimal design of measurement framework that is based on BMA and that considers uncertainty in parameters, measurements, and model structures. We apply this framework to select between four crop models (the vegetation components of CERES, SUCROS, GECROS, and SPASS), which are coupled to identical routines for simulating soil carbon and nitrogen turnover, soil heat and nitrogen transport, and soil water movement. An ensemble of parameter realizations was generated for each model using Monte-Carlo simulation. We assess each model's plausibility by determining its posterior weight, which signifies the probability to have generated a given experimental data set. Several BMA analyses were conducted for different data packages with measurements of soil moisture, evapotranspiration (ETa), and leaf area index (LAI). The posterior weights resulting from the different BMA runs were compared to the weight distribution of a reference run with all data types to investigate the utility of different data packages and monitoring design options in identifying the most appropriate model in the ensemble. We found that different (combinations of) data types support different models and none of the four crop models outperforms all others under all data scenarios. The best model discrimination was observed for those data where the competing models disagree the most. The data worth for reducing prediction
Institute of Scientific and Technical Information of China (English)
Peixin ZHAO
2013-01-01
In this paper,we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data.By constructing a penalized auxiliary vector ingeniously,we propose an empirical likelihood based variable selection procedure,and show that it is consistent and satisfies the sparsity.The simulation studies show that the proposed variable selection method is workable.
Noncausal Bayesian Vector Autoregression
DEFF Research Database (Denmark)
Lanne, Markku; Luoto, Jani
We propose a Bayesian inferential procedure for the noncausal vector autoregressive (VAR) model that is capable of capturing nonlinearities and incorporating effects of missing variables. In particular, we devise a fast and reliable posterior simulator that yields the predictive distribution as a...
Institute of Scientific and Technical Information of China (English)
Zheng-yan Lin; Yu-ze Yuan
2012-01-01
Semiparametric models with diverging number of predictors arise in many contemporary scientific areas. Variable selection for these models consists of two components: model selection for non-parametric components and selection of significant variables for the parametric portion.In this paper,we consider a variable selection procedure by combining basis function approximation with SCAD penalty.The proposed procedure simultaneously selects significant variables in the parametric components and the nonparametric components.With appropriate selection of tuning parameters,we establish the consistency and sparseness of this procedure.
Birth order and selected work-related personality variables.
Phillips, A S; Bedeian, A G; Mossholder, K W; Touliatos, J
1988-12-01
A possible link between birth order and various individual characteristics (e. g., intelligence, potential eminence, need for achievement, sociability) has been suggested by personality theorists such as Adler for over a century. The present study examines whether birth order is associated with selected personality variables that may be related to various work outcomes. 3 of 7 hypotheses were supported and the effect sizes for these were small. Firstborns scored significantly higher than later borns on measures of dominance, good impression, and achievement via conformity. No differences between firstborns and later borns were found in managerial potential, work orientation, achievement via independence, and sociability. The study's sample consisted of 835 public, government, and industrial accountants responding to a national US survey of accounting professionals. The nature of the sample may have been partially responsible for the results obtained. Its homogeneity may have caused any birth order effects to wash out. It can be argued that successful membership in the accountancy profession requires internalization of a set of prescribed rules and standards. It may be that accountants as a group are locked in to a behavioral framework. Any differentiation would result from spurious interpersonal differences, not from predictable birth-order related characteristics. A final interpretation is that birth order effects are nonexistent or statistical artifacts. Given the present data and particularistic sample, however, the authors have insufficient information from which to draw such a conclusion. PMID:12281942
Flood quantile estimation at ungauged sites by Bayesian networks
Mediero, L.; Santillán, D.; Garrote, L.
2012-04-01
Estimating flood quantiles at a site for which no observed measurements are available is essential for water resources planning and management. Ungauged sites have no observations about the magnitude of floods, but some site and basin characteristics are known. The most common technique used is the multiple regression analysis, which relates physical and climatic basin characteristic to flood quantiles. Regression equations are fitted from flood frequency data and basin characteristics at gauged sites. Regression equations are a rigid technique that assumes linear relationships between variables and cannot take the measurement errors into account. In addition, the prediction intervals are estimated in a very simplistic way from the variance of the residuals in the estimated model. Bayesian networks are a probabilistic computational structure taken from the field of Artificial Intelligence, which have been widely and successfully applied to many scientific fields like medicine and informatics, but application to the field of hydrology is recent. Bayesian networks infer the joint probability distribution of several related variables from observations through nodes, which represent random variables, and links, which represent causal dependencies between them. A Bayesian network is more flexible than regression equations, as they capture non-linear relationships between variables. In addition, the probabilistic nature of Bayesian networks allows taking the different sources of estimation uncertainty into account, as they give a probability distribution as result. A homogeneous region in the Tagus Basin was selected as case study. A regression equation was fitted taking the basin area, the annual maximum 24-hour rainfall for a given recurrence interval and the mean height as explanatory variables. Flood quantiles at ungauged sites were estimated by Bayesian networks. Bayesian networks need to be learnt from a huge enough data set. As observational data are reduced, a
The SEDs, Host Galaxies and Environments of Variability Selected AGN in GOODS-S
Villforth, Carolin; Sarajedini, Vicki; Koekemoer, Anton
2012-01-01
Variability selection has been proposed as a powerful tool for identifying both low-luminosity AGN and those with unusual SEDs. However, a systematic study of sources selected in such a way has been lacking. In this paper, we present the multi-wavelength properties of the variability selected AGN in GOODS South. We demonstrate that variability selection indeed reliably identifies AGN, predominantly of low luminosity. We find contamination from stars as well as a very small sample of sources t...
Variable Selection for Varying-Coefficient Models with Missing Response at Random
Institute of Scientific and Technical Information of China (English)
Pei Xin ZHAO; Liu Gen XUE
2011-01-01
In this paper, we present a variable selection procedure by combining basis function approximations with penalized estimating equations for varying-coefficient models with missing response at random. With appropriate selection of the tuning parameters, we establish the consistency of the variable selection procedure and the optimal convergence rate of the regularized estimators. A simulation study is undertaken to assess the finite sample performance of the proposed variable selection procedure.
Bayesian Analysis Made Simple An Excel GUI for WinBUGS
Woodward, Philip
2011-01-01
From simple NLMs to complex GLMMs, this book describes how to use the GUI for WinBUGS - BugsXLA - an Excel add-in written by the author that allows a range of Bayesian models to be easily specified. With case studies throughout, the text shows how to routinely apply even the more complex aspects of model specification, such as GLMMs, outlier robust models, random effects Emax models, auto-regressive errors, and Bayesian variable selection. It provides brief, up-to-date discussions of current issues in the practical application of Bayesian methods. The author also explains how to obtain free so
Lesaffre, Emmanuel
2012-01-01
The growth of biostatistics has been phenomenal in recent years and has been marked by considerable technical innovation in both methodology and computational practicality. One area that has experienced significant growth is Bayesian methods. The growing use of Bayesian methodology has taken place partly due to an increasing number of practitioners valuing the Bayesian paradigm as matching that of scientific discovery. In addition, computational advances have allowed for more complex models to be fitted routinely to realistic data sets. Through examples, exercises and a combination of introd
Bayesian modeling using WinBUGS
Ntzoufras, Ioannis
2009-01-01
A hands-on introduction to the principles of Bayesian modeling using WinBUGS Bayesian Modeling Using WinBUGS provides an easily accessible introduction to the use of WinBUGS programming techniques in a variety of Bayesian modeling settings. The author provides an accessible treatment of the topic, offering readers a smooth introduction to the principles of Bayesian modeling with detailed guidance on the practical implementation of key principles. The book begins with a basic introduction to Bayesian inference and the WinBUGS software and goes on to cover key topics, including: Markov Chain Monte Carlo algorithms in Bayesian inference Generalized linear models Bayesian hierarchical models Predictive distribution and model checking Bayesian model and variable evaluation Computational notes and screen captures illustrate the use of both WinBUGS as well as R software to apply the discussed techniques. Exercises at the end of each chapter allow readers to test their understanding of the presented concepts and all ...
DEFF Research Database (Denmark)
Widyas, Nuzul; Jensen, Just; Nielsen, Vivi Hunnicke
selected downwards and three lines were kept as controls. Bayesian statistical methods are used to estimate the genetic variance components. Mixed model analysis is modified including mutation effect following the methods by Wray (1990). DIC was used to compare the model. Models including mutation effect...... have better fit compared to the model with only additive effect. Mutation as direct effect contributes 3.18% of the total phenotypic variance. While in the model with interactions between additive and mutation, it contributes 1.43% as direct effect and 1.36% as interaction effect of the total variance...
Institute of Scientific and Technical Information of China (English)
Pei Xin ZHAO; Liu Gen XUE
2011-01-01
In this paper,we present a variable selection procedure by combining basis function approximations with penalized estimating equations for semiparametric varying-coefficient partially linear models with missing response at random.The proposed procedure simultaneously selects significant variables in parametric components and nonparametric components.With appropriate selection of the tuning parameters,we establish the consistency of the variable selection procedure and the convergence rate of the regularized estimators.A simulation study is undertaken to assess the finite sample performance of the proposed variable selection procedure.
Draper, D.
2001-01-01
© 2012 Springer Science+Business Media, LLC. All rights reserved. Article Outline: Glossary Definition of the Subject and Introduction The Bayesian Statistical Paradigm Three Examples Comparison with the Frequentist Statistical Paradigm Future Directions Bibliography
Gilkey, Kelly M.; Myers, Jerry G.; McRae, Michael P.; Griffin, Elise A.; Kallrui, Aditya S.
2012-01-01
The Exploration Medical Capability project is creating a catalog of risk assessments using the Integrated Medical Model (IMM). The IMM is a software-based system intended to assist mission planners in preparing for spaceflight missions by helping them to make informed decisions about medical preparations and supplies needed for combating and treating various medical events using Probabilistic Risk Assessment. The objective is to use statistical analyses to inform the IMM decision tool with estimated probabilities of medical events occurring during an exploration mission. Because data regarding astronaut health are limited, Bayesian statistical analysis is used. Bayesian inference combines prior knowledge, such as data from the general U.S. population, the U.S. Submarine Force, or the analog astronaut population located at the NASA Johnson Space Center, with observed data for the medical condition of interest. The posterior results reflect the best evidence for specific medical events occurring in flight. Bayes theorem provides a formal mechanism for combining available observed data with data from similar studies to support the quantification process. The IMM team performed Bayesian updates on the following medical events: angina, appendicitis, atrial fibrillation, atrial flutter, dental abscess, dental caries, dental periodontal disease, gallstone disease, herpes zoster, renal stones, seizure, and stroke.
Relationships of Selected Personal and Social Variables in Conforming Judgment
Long, Huey B.
1970-01-01
To help determine relationships between certain personality variables and conforming judgment, and difference in conforming judgments among differently structured groups, prison immates were studies for the personality variables of IQ (California Capacity Questionnaire), agreement response set (Couch and Kenniston Scale), and dogmatism (Form E,…
Bayesian Peak Picking for NMR Spectra
Cheng, Yichen
2014-02-01
Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method.
Ciarleglio, Maria M; Arendt, Christopher D; Makuch, Robert W; Peduzzi, Peter N
2015-03-01
Specification of the treatment effect that a clinical trial is designed to detect (θA) plays a critical role in sample size and power calculations. However, no formal method exists for using prior information to guide the choice of θA. This paper presents a hybrid classical and Bayesian procedure for choosing an estimate of the treatment effect to be detected in a clinical trial that formally integrates prior information into this aspect of trial design. The value of θA is found that equates the pre-specified frequentist power and the conditional expected power of the trial. The conditional expected power averages the traditional frequentist power curve using the conditional prior distribution of the true unknown treatment effect θ as the averaging weight. The Bayesian prior distribution summarizes current knowledge of both the magnitude of the treatment effect and the strength of the prior information through the assumed spread of the distribution. By using a hybrid classical and Bayesian approach, we are able to formally integrate prior information on the uncertainty and variability of the treatment effect into the design of the study, mitigating the risk that the power calculation will be overly optimistic while maintaining a frequentist framework for the final analysis. The value of θA found using this method may be written as a function of the prior mean μ0 and standard deviation τ0, with a unique relationship for a given ratio of μ0/τ0. Results are presented for Normal, Uniform, and Gamma priors for θ. PMID:25583273
Random Forests for Ordinal Response Data: Prediction and Variable Selection
Janitza, Silke; Tutz, Gerhard; Boulesteix, Anne-Laure
2014-01-01
The random forest method is a commonly used tool for classification with high-dimensional data that is able to rank candidate predictors through its inbuilt variable importance measures (VIMs). It can be applied to various kinds of regression problems including nominal, metric and survival response variables. While classification and regression problems using random forest methodology have been extensively investigated in the past, there seems to be a lack of literature on handling ordinal re...
Regression Analysis with Block Missing Values and Variables Selection
Directory of Open Access Journals (Sweden)
Chien-Pai Han
2011-07-01
Full Text Available We consider a regression model when a block of observations is missing, i.e. there are a group of observations with all the explanatory variables or covariates observed and another set of observations with only a block of the variables observed. We propose an estimator of the regression coefficients that is a combination of two estimators, one based on the observations with no missing variables, and the other the set all observations after deleting of the block of variables with missing values. The proposed combined estimator will be compared with the uncombined estimators. If the experimenter suspects that the variables with missing values may be deleted, a preliminary test will be performed to resolve the uncertainty. If the preliminary test of the null hypothesis that regression coefficients of the variables with missing value equal to zero is accepted, then only the data with no missing values are used for estimating the regression coefficients. Otherwise the combined estimator is used. This gives a preliminary test estimator. The properties of the preliminary test estimator and comparisons of the estimators are studied by a Monte Carlo study
Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.
2013-01-01
In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.
Bayesian phylogeography finds its roots.
Directory of Open Access Journals (Sweden)
Philippe Lemey
2009-09-01
Full Text Available As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has been frequently interpreted in the light of their genetic histories. Unfortunately, inference of historical dispersal or migration patterns of viruses has mainly been restricted to model-free heuristic approaches that provide little insight into the temporal setting of the spatial dynamics. The introduction of probabilistic models of evolution, however, offers unique opportunities to engage in this statistical endeavor. Here we introduce a Bayesian framework for inference, visualization and hypothesis testing of phylogeographic history. By implementing character mapping in a Bayesian software that samples time-scaled phylogenies, we enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. Standard Markov model inference is extended with a stochastic search variable selection procedure that identifies the parsimonious descriptions of the diffusion process. In addition, we propose priors that can incorporate geographical sampling distributions or characterize alternative hypotheses about the spatial dynamics. To visualize the spatial and temporal information, we summarize inferences using virtual globe software. We describe how Bayesian phylogeography compares with previous parsimony analysis in the investigation of the influenza A H5N1 origin and H5N1 epidemiological linkage among sampling localities. Analysis of rabies in West African dog populations reveals how virus diffusion may enable endemic maintenance through continuous epidemic cycles. From these analyses, we conclude that our phylogeographic framework will make an important asset in molecular epidemiology that can be easily generalized to infer biogeogeography from genetic data for many organisms.
A Simple Method for Variable Selection in Regression with Respect to Treatment Selection
Directory of Open Access Journals (Sweden)
Lacey Gunter
2011-09-01
Full Text Available In this paper, we compare the method of Gunter et al. (2011 for variable selection in treatment comparison analysis (an approach to regression analysis where treatment-covariate interactions are deemed important with a simple stepwise selection method that we introduce. The stepwise method has several advantages, most notably its generalization to regression models that are not necessarily linear, its simplicity and its intuitive nature. We show that the new simple method works surprisingly well compared to the more complex method when compared in the linear regression framework. We use four generative models (explicitly detailed in the paper for the simulations and compare spuriously identified interactions and where applicable (generative models 3 and 4 correctly identified interactions. We also apply the new method to logistic regression and Poisson regression and illustrate its performance in Table 2 in the paper. The simple method can be applied to other types of regression models including various other generalized linear models, Cox proportional hazard models and nonlinear models.
Adaptive Dynamic Bayesian Networks
Energy Technology Data Exchange (ETDEWEB)
Ng, B M
2007-10-26
A discrete-time Markov process can be compactly modeled as a dynamic Bayesian network (DBN)--a graphical model with nodes representing random variables and directed edges indicating causality between variables. Each node has a probability distribution, conditional on the variables represented by the parent nodes. A DBN's graphical structure encodes fixed conditional dependencies between variables. But in real-world systems, conditional dependencies between variables may be unknown a priori or may vary over time. Model errors can result if the DBN fails to capture all possible interactions between variables. Thus, we explore the representational framework of adaptive DBNs, whose structure and parameters can change from one time step to the next: a distribution's parameters and its set of conditional variables are dynamic. This work builds on recent work in nonparametric Bayesian modeling, such as hierarchical Dirichlet processes, infinite-state hidden Markov networks and structured priors for Bayes net learning. In this paper, we will explain the motivation for our interest in adaptive DBNs, show how popular nonparametric methods are combined to formulate the foundations for adaptive DBNs, and present preliminary results.
Tibbetts, Elizabeth A
2004-01-01
The ability to recognize individuals is common in animals; however, we know little about why the phenotypic variability necessary for individual recognition has evolved in some animals but not others. One possibility is that natural selection favours variability in some social contexts but not in others. Polistes fuscatus wasps have variable facial and abdominal markings used for individual recognition within their complex societies. Here, I explore whether social behaviour can select for var...
VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.
Variable selection in functional data classification: a maxima-hunting proposal
Berrendero, José R.; Cuevas, Antonio; Torrecilla, José L.
2013-01-01
Variable selection is considered in the setting of supervised binary classification with functional data $\\{X(t),\\ t\\in[0,1]\\}$. By "variable selection" we mean any dimension-reduction method which leads to replace the whole trajectory $\\{X(t),\\ t\\in[0,1]\\}$, with a low-dimensional vector $(X(t_1),\\ldots,X(t_k))$ still keeping a similar classification error. Our proposal for variable selection is based on the idea of selecting the local maxima $(t_1,\\ldots,t_k)$ of the function ${\\mathcal V}_...
Directory of Open Access Journals (Sweden)
Gil Luiz HS
2010-11-01
Full Text Available Abstract Background Plasmodium vivax malaria is a major public health challenge in Latin America, Asia and Oceania, with 130-435 million clinical cases per year worldwide. Invasion of host blood cells by P. vivax mainly depends on a type I membrane protein called Duffy binding protein (PvDBP. The erythrocyte-binding motif of PvDBP is a 170 amino-acid stretch located in its cysteine-rich region II (PvDBPII, which is the most variable segment of the protein. Methods To test whether diversifying natural selection has shaped the nucleotide diversity of PvDBPII in Brazilian populations, this region was sequenced in 122 isolates from six different geographic areas. A Bayesian method was applied to test for the action of natural selection under a population genetic model that incorporates recombination. The analysis was integrated with a structural model of PvDBPII, and T- and B-cell epitopes were localized on the 3-D structure. Results The results suggest that: (i recombination plays an important role in determining the haplotype structure of PvDBPII, and (ii PvDBPII appears to contain neutrally evolving codons as well as codons evolving under natural selection. Diversifying selection preferentially acts on sites identified as epitopes, particularly on amino acid residues 417, 419, and 424, which show strong linkage disequilibrium. Conclusions This study shows that some polymorphisms of PvDBPII are present near the erythrocyte-binding domain and might serve to elude antibodies that inhibit cell invasion. Therefore, these polymorphisms should be taken into account when designing vaccines aimed at eliciting antibodies to inhibit erythrocyte invasion.
High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning
Bach, Francis
2009-01-01
We consider the problem of high-dimensional non-linear variable selection for supervised learning. Our approach is based on performing linear selection among exponentially many appropriately defined positive definite kernels that characterize non-linear interactions between the original variables. To select efficiently from these many kernels, we use the natural hierarchical structure of the problem to extend the multiple kernel learning framework to kernels that can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a graph-adapted sparsity-inducing norm, in polynomial time in the number of selected kernels. Moreover, we study the consistency of variable selection in high-dimensional settings, showing that under certain assumptions, our regularization framework allows a number of irrelevant variables which is exponential in the number of observations. Our simulations on synthetic datasets and datasets from the UCI repository show state-of-the-art pre...
Optical variability of X-ray-selected QSOs
International Nuclear Information System (INIS)
Photometric data for ten X-ray-selected quasistellar objects have been obtained from archival records of the Rosemary Hill Observatory. Reliable magnitudes were obtained for seven of the ten sources and six displayed optical variations significant at the 95 percent confidence level or greater. One source appeared to exhibit optically violent behavior. Light curves and photographic magnitudes are presented and discussed. 22 references
GOUR BANDYOPADHYAY
2013-01-01
The study uses correlation and regression analysis to examine the impact of Non Performing Assets (NPA) on selected banking variables in two Public Sector Banks (PSBs) in India. Initially to examine degree of association between the strategic banking variables identified, simple correlation co-efficients have been computed and their significance examined. For the purpose of examining impact of NPA on the profitability and other strategic banking variables including time variable, simple li...
Bayesian Methods and Universal Darwinism
Campbell, John
2010-01-01
Bayesian methods since the time of Laplace have been understood by their practitioners as closely aligned to the scientific method. Indeed a recent champion of Bayesian methods, E. T. Jaynes, titled his textbook on the subject Probability Theory: the Logic of Science. Many philosophers of science including Karl Popper and Donald Campbell have interpreted the evolution of Science as a Darwinian process consisting of a 'copy with selective retention' algorithm abstracted from Darwin's theory of...
Cameron, Ewan
2013-01-01
In the second paper of this series we extend our Bayesian reanalysis of the evidence for a cosmic variation of the fine structure constant to the semi-parametric modelling regime. By adopting a mixture of Dirichlet processes prior for the unexplained errors in each instrumental subgroup of the benchmark quasar dataset we go some way towards freeing our model selection procedure from the apparent subjectivity of a fixed distributional form. Despite the infinite-dimensional domain of the error hierarchy so constructed we are able to demonstrate a recursive scheme for marginal likelihood estimation with prior-sensitivity analysis directly analogous to that presented in Paper I, thereby allowing the robustness of our posterior Bayes factors to hyper-parameter choice and model specification to be readily verified. In the course of this work we elucidate various similarities between unexplained error problems in the seemingly disparate fields of astronomy and clinical meta-analysis, and we highlight a number of sop...
Estimating a positive false discovery rate for variable selection in pharmacogenetic studies.
Li, Lang; Hui, Siu; Pennello, Gene; Desta, Zeruesenay; Todd, Skaar; Nguyen, Anne; Flockhart, David
2007-01-01
Selecting predictors to optimize the outcome prediction is an important statistical method. However, it usually ignores the false positives in the selected predictors. In this paper, we develop a positive false discovery rate (pFDR) estimate for a conventional step-wise forward variable selection procedure. We propose two views of a variable selection process, an overall and an individual test. An interesting feature of the overall test is that its power of selecting non-null predictors increases with the proportion of non-null predictors among all candidate predictors. Data analysis is illustrated with a pharmacogenetics example. PMID:17885872
DEFF Research Database (Denmark)
Ödman, Peter; Johansen, C.L.; Olsson, L.;
2010-01-01
of biomass and substrate (casamino acids) concentrations, respectively. The effect of combination of fluorescence and gas analyzer data as well as of different variable selection methods was investigated. Improved prediction models were obtained by combination of data from the two sensors and by...... variable selection using a genetic algorithm, interval PLS, and the principal variables method, respectively. A stepwise variable elimination method was applied to the three-way fluorescence data, resulting in simpler and more accurate N-PLS models. The prediction models were validated using leave...
Kirstein, Roland
2005-01-01
This paper presents a modification of the inspection game: The ?Bayesian Monitoring? model rests on the assumption that judges are interested in enforcing compliant behavior and making correct decisions. They may base their judgements on an informative but imperfect signal which can be generated costlessly. In the original inspection game, monitoring is costly and generates a perfectly informative signal. While the inspection game has only one mixed strategy equilibrium, three Perfect Bayesia...
Variability-based active galactic nucleus selection using image subtraction in the SDSS and LSST era
International Nuclear Information System (INIS)
With upcoming all-sky surveys such as LSST poised to generate a deep digital movie of the optical sky, variability-based active galactic nucleus (AGN) selection will enable the construction of highly complete catalogs with minimum contamination. In this study, we generate g-band difference images and construct light curves (LCs) for QSO/AGN candidates listed in Sloan Digital Sky Survey Stripe 82 public catalogs compiled from different methods, including spectroscopy, optical colors, variability, and X-ray detection. Image differencing excels at identifying variable sources embedded in complex or blended emission regions such as Type II AGNs and other low-luminosity AGNs that may be omitted from traditional photometric or spectroscopic catalogs. To separate QSOs/AGNs from other sources using our difference image LCs, we explore several LC statistics and parameterize optical variability by the characteristic damping timescale (τ) and variability amplitude. By virtue of distinguishable variability parameters of AGNs, we are able to select them with high completeness of 93.4% and efficiency (i.e., purity) of 71.3%. Based on optical variability, we also select highly variable blazar candidates, whose infrared colors are consistent with known blazars. One-third of them are also radio detected. With the X-ray selected AGN candidates, we probe the optical variability of X-ray detected optically extended sources using their difference image LCs for the first time. A combination of optical variability and X-ray detection enables us to select various types of host-dominated AGNs. Contrary to the AGN unification model prediction, two Type II AGN candidates (out of six) show detectable variability on long-term timescales like typical Type I AGNs. This study will provide a baseline for future optical variability studies of extended sources.
Variability-based active galactic nucleus selection using image subtraction in the SDSS and LSST era
Energy Technology Data Exchange (ETDEWEB)
Choi, Yumi; Gibson, Robert R.; Becker, Andrew C.; Ivezić, Željko; Connolly, Andrew J.; Ruan, John J.; Anderson, Scott F. [Department of Astronomy, University of Washington, Box 351580, Seattle, WA 98195 (United States); MacLeod, Chelsea L., E-mail: ymchoi@astro.washington.edu [Physics Department, U.S. Naval Academy, 572 Holloway Road, Annapolis, MD 21402 (United States)
2014-02-10
With upcoming all-sky surveys such as LSST poised to generate a deep digital movie of the optical sky, variability-based active galactic nucleus (AGN) selection will enable the construction of highly complete catalogs with minimum contamination. In this study, we generate g-band difference images and construct light curves (LCs) for QSO/AGN candidates listed in Sloan Digital Sky Survey Stripe 82 public catalogs compiled from different methods, including spectroscopy, optical colors, variability, and X-ray detection. Image differencing excels at identifying variable sources embedded in complex or blended emission regions such as Type II AGNs and other low-luminosity AGNs that may be omitted from traditional photometric or spectroscopic catalogs. To separate QSOs/AGNs from other sources using our difference image LCs, we explore several LC statistics and parameterize optical variability by the characteristic damping timescale (τ) and variability amplitude. By virtue of distinguishable variability parameters of AGNs, we are able to select them with high completeness of 93.4% and efficiency (i.e., purity) of 71.3%. Based on optical variability, we also select highly variable blazar candidates, whose infrared colors are consistent with known blazars. One-third of them are also radio detected. With the X-ray selected AGN candidates, we probe the optical variability of X-ray detected optically extended sources using their difference image LCs for the first time. A combination of optical variability and X-ray detection enables us to select various types of host-dominated AGNs. Contrary to the AGN unification model prediction, two Type II AGN candidates (out of six) show detectable variability on long-term timescales like typical Type I AGNs. This study will provide a baseline for future optical variability studies of extended sources.
Resting high frequency heart rate variability selectively predicts cooperative behavior.
Beffara, Brice; Bret, Amélie G; Vermeulen, Nicolas; Mermillod, Martial
2016-10-01
This study explores whether the vagal connection between the heart and the brain is involved in prosocial behaviors. The Polyvagal Theory postulates that vagal activity underlies prosocial tendencies. Even if several results suggest that vagal activity is associated with prosocial behaviors, none of them used behavioral measures of prosociality to establish this relationship. We recorded the resting state vagal activity (reflected by High Frequency Heart Rate Variability, HF-HRV) of 48 (42 suitale for analysis) healthy human adults and measured their level of cooperation during a hawk-dove game. We also manipulated the consequence of mutual defection in the hawk-dove game (severe vs. moderate). Results show that HF-HRV is positively and linearly related to cooperation level, but only when the consequence of mutual defection is severe (compared to moderate). This supports that i) prosocial behaviors are likely to be underpinned by vagal functioning ii) physiological disposition to cooperate interacts with environmental context. We discuss these results within the theoretical framework of the Polyvagal Theory. PMID:27343804
3D Bayesian contextual classifiers
DEFF Research Database (Denmark)
Larsen, Rasmus
2000-01-01
We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours.......We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours....
Galea, J. M.; Ruge, D.; Buijink, A.; Bestmann, S.; Rothwell, J. C.
2013-01-01
Action selection describes the high-level process which selects between competing movements. In animals, behavioural variability is critical for the motor exploration required to select the action which optimizes reward and minimizes cost/punishment, and is guided by dopamine (DA). The aim of this study was to test in humans whether low-level movement parameters are affected by punishment and reward in ways similar to high-level action selection. Moreover, we addressed the proposed dependence...
Variable selection methods in PLS regression - a comparison study on metabolomics data
DEFF Research Database (Denmark)
Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach;
integrated approach. Due to the high number of variables in data sets (both raw data and after peak picking) the selection of important variables in an explorative analysis is difficult, especially when different data sets of metabolomics data need to be related. Variable selection (or removal of irrelevant....... The aim of the metabolomics study was to investigate the metabolic profile in pigs fed various cereal fractions with special attention to the metabolism of lignans using LC-MS based metabolomic approach. References 1. Lê Cao KA, Rossouw D, Robert-Granié C, Besse P: A Sparse PLS for Variable Selection when......Partial least squares regression (PLSR) has been applied to various fields such as psychometrics, consumer science, econometrics and process control. Recently it has been applied to metabolomics based data sets (GC/LC-MS, NMR) and proven to be a very powerful in situations with many variables...
Discriminative variable selection for clustering with the sparse Fisher-EM algorithm
Bouveyron, Charles
2012-01-01
The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the efficiency of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through $\\ell_{1}$-type penalizations. Experimental comparisons with existing approach...
Self-selection for personality variables among healthy volunteers.
Pieters, M S; Jennekens-Schinkel, A; Schoemaker, H C; Cohen, A F
1992-01-01
1. Healthy student volunteers (n = 103) participating in ongoing clinical pharmacological research completed the Dutch Personality Inventory (DPI), the Dutch version of the Spielberger State-Trait Anxiety Inventory (STAI-DY) and the Dutch version of the Sensation Seeking Scale (SSS). 2. The volunteers were more extrovert (P less than 0.001), more flexible (P less than 0.001), more tolerant or less impulsive (P less than 0.001), had more self-confidence and initiative (P less than 0.001), and were more satisfied and optimistic (P less than 0.01) when compared with the general norm. When compared with a student norm, volunteers had lower levels of state (P less than 0.001) and trait (P less than 0.05) anxiety. The general sensation seeking tendency of volunteers was higher than in the student norm group (P less than 0.001). The volunteers had a greater tendency to thrill-and-adventure-seeking (P less than 0.001) and to disinhibition (P less than 0.01). 3. Hence, volunteers were a selected sample of the total population of students. This may influence the interpretation of pharmacokinetic and pharmacodynamic parameters. 4. Personality screening should be added to the screening procedures for volunteers. PMID:1540478
EFFECT OF ASANA PRACTICES AND BRISK WALKING ON SELECTED PSYCHOLOGICAL VARIABLES AMONG DIABETIC WOMEN
Sabarinathan, J; D. Sakthignanavel
2015-01-01
The purpose of the study was to find out the effect of asana practices and brisk walking on selected psychological variables among diabetic women. The study was conducted on sixty diabetic women. Totally three groups, namely, control and Experimental group I &II consisting of 20 diabetic women who underwent eight weeks practice in selected asana practices and brisk walking whereas the control group did not undergo any type of training. The psychological variables in anxiety, se...
Stock market reaction to selected macroeconomic variables in the Nigerian economy
Abraham, Terfa Williams
2011-01-01
This study examines the relationship between the stock market and selected macroeconomic variables in Nigeria. The all share index was used as a proxy for the stock market while inflation, interest and exchange rates were the macroeconomic variables selected. Employing error correction model, it was found that a significant negative short run relationship exists between the stock market and the minimum rediscounting rate (MRR) implying that, a decrease in the MRR, would improve the performanc...
Predictive modeling with high-dimensional data streams: an on-line variable selection approach
McWilliams, Brian; Montana, Giovanni
2009-01-01
International audience In this paper we propose a computationally efficient algorithm for on-line variable selection in multivariate regression problems involving high dimensional data streams. The algorithm recursively extracts all the latent factors of a partial least squares solution and selects the most important variables for each factor. This is achieved by means of only one sparse singular value decomposition which can be efficiently updated on-line and in an adaptive fashion. Simul...
Sparse partial least squares for on-line variable selection in multivariate data streams
McWilliams, Brian; Montana, Giovanni
2009-01-01
In this paper we propose a computationally efficient algorithm for on-line variable selection in multivariate regression problems involving high dimensional data streams. The algorithm recursively extracts all the latent factors of a partial least squares solution and selects the most important variables for each factor. This is achieved by means of only one sparse singular value decomposition which can be efficiently updated on-line and in an adaptive fashion. Simulation results based on art...
Variability-selected low luminosity AGNs in the SA57 and in the CDFS
Vagnetti, F; Trevese, D
2009-01-01
Low Luminosity Active Galactic Nuclei (LLAGNs) are contaminated by the light of their host galaxies, thus they cannot be detected by the usual colour techniques. For this reason their evolution in cosmic time is poorly known. Variability is a property shared by virtually all active galactic nuclei, and it was adopted as a criterion to select them using multi epoch surveys. Here we report on two variability surveys in different sky areas, the Selected Area 57 and the Chandra Deep Field South.
M. Dhanalakshmi; Grace Helina; Senthilkumar
2015-01-01
The aim of this study was to find out the comparative effects of aerobics and resistance exercises on selected physiological variables among obese children. To achieve the purpose, 60 Obese children, whose BMI was greater than 30 kg/m2 were randomly selected and assigned into three groups, aerobics exercises group (AEG), resistance training group (RTG) and Control group (CG) consisting of 20 in each. After assessing physiological variables, forced vital capacity and resting heart rate init...
Variable selection in multiple linear regression: The influence of individual cases
Directory of Open Access Journals (Sweden)
SJ Steel
2007-12-01
Full Text Available The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the C_p criterion and Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is proposed as the selection influence of the specific omitted case. Four standard examples from the literature are considered and the selection influence of the cases is calculated. It is argued that the selection procedure may be improved by taking the selection influence of individual data cases into account.
Hill Steven M; Neve Richard M; Bayani Nora; Kuo Wen-Lin; Ziyad Safiyyah; Spellman Paul T; Gray Joe W; Mukherjee Sach
2012-01-01
Abstract Background An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary informatio...
Zhu, Xiang-Wei; Xin, Yan-Jun; Ge, Hui-Lin
2015-04-27
Variable selection is of crucial significance in QSAR modeling since it increases the model predictive ability and reduces noise. The selection of the right variables is far more complicated than the development of predictive models. In this study, eight continuous and categorical data sets were employed to explore the applicability of two distinct variable selection methods random forests (RF) and least absolute shrinkage and selection operator (LASSO). Variable selection was performed: (1) by using recursive random forests to rule out a quarter of the least important descriptors at each iteration and (2) by using LASSO modeling with 10-fold inner cross-validation to tune its penalty λ for each data set. Along with regular statistical parameters of model performance, we proposed the highest pairwise correlation rate, average pairwise Pearson's correlation coefficient, and Tanimoto coefficient to evaluate the optimal by RF and LASSO in an extensive way. Results showed that variable selection could allow a tremendous reduction of noisy descriptors (at most 96% with RF method in this study) and apparently enhance model's predictive performance as well. Furthermore, random forests showed property of gathering important predictors without restricting their pairwise correlation, which is contrary to LASSO. The mutual exclusion of highly correlated variables in LASSO modeling tends to skip important variables that are highly related to response endpoints and thus undermine the model's predictive performance. The optimal variables selected by RF share low similarity with those by LASSO (e.g., the Tanimoto coefficients were smaller than 0.20 in seven out of eight data sets). We found that the differences between RF and LASSO predictive performances mainly resulted from the variables selected by different strategies rather than the learning algorithms. Our study showed that the right selection of variables is more important than the learning algorithm for modeling. We hope
Probability and Bayesian statistics
1987-01-01
This book contains selected and refereed contributions to the "Inter national Symposium on Probability and Bayesian Statistics" which was orga nized to celebrate the 80th birthday of Professor Bruno de Finetti at his birthplace Innsbruck in Austria. Since Professor de Finetti died in 1985 the symposium was dedicated to the memory of Bruno de Finetti and took place at Igls near Innsbruck from 23 to 26 September 1986. Some of the pa pers are published especially by the relationship to Bruno de Finetti's scientific work. The evolution of stochastics shows growing importance of probability as coherent assessment of numerical values as degrees of believe in certain events. This is the basis for Bayesian inference in the sense of modern statistics. The contributions in this volume cover a broad spectrum ranging from foundations of probability across psychological aspects of formulating sub jective probability statements, abstract measure theoretical considerations, contributions to theoretical statistics an...
Bessiere, Pierre; Ahuactzin, Juan Manuel; Mekhnacha, Kamel
2013-01-01
Probability as an Alternative to Boolean LogicWhile logic is the mathematical foundation of rational reasoning and the fundamental principle of computing, it is restricted to problems where information is both complete and certain. However, many real-world problems, from financial investments to email filtering, are incomplete or uncertain in nature. Probability theory and Bayesian computing together provide an alternative framework to deal with incomplete and uncertain data. Decision-Making Tools and Methods for Incomplete and Uncertain DataEmphasizing probability as an alternative to Boolean
Henry de-Graft Acquah; Joseph Acquah
2013-01-01
Alternative formulations of the Bayesian Information Criteria provide a basis for choosing between competing methods for detecting price asymmetry. However, very little is understood about their performance in the asymmetric price transmission modelling framework. In addressing this issue, this paper introduces and applies parametric bootstrap techniques to evaluate the ability of Bayesian Information Criteria (BIC) and Draper's Information Criteria (DIC) in discriminating between alternative...
The Time Domain Spectroscopic Survey: Variable Object Selection and Anticipated Results
Morganson, Eric; Anderson, Scott F; Ruan, John J; Myers, Adam D; Eracleous, Michael; Kelly, Brandon; Badenes, Carlos; Banados, Eduardo; Blanton, Michael R; Bershady, Matthew A; Borissova, Jura; Brandt, William Nielsen; Burgett, William S; Chambers, Kenneth; Draper, Peter W; Davenport, James R A; Flewelling, Heather; Garnavich, Peter; Hawley, Suzanne L; Hodapp, Klaus W; Isler, Jedidah C; Kaiser, Nick; Kinemuchi, Karen; Kudritzki, Rolf P; Metcalfe, Nigel; Morgan, Jeffrey S; Paris, Isabelle; Parvizi, Mahmoud; Poleski, Radoslaw; Price, Paul A; Salvato, Mara; Shanks, Tom; Schlafly, Eddie F; Schneider, Donald P; Shen, Yue; Stassun, Keivan; Tonry, John T; Walter, Fabian; Waters, Chris Z
2015-01-01
We present the selection algorithm and anticipated results for the Time Domain Spectroscopic Survey (TDSS). TDSS is an SDSS-IV eBOSS subproject that will provide initial identification spectra of approximately 220,000 luminosity-variable objects (variable stars and AGN) across 7,500 square degrees selected from a combination of SDSS and multi-epoch Pan-STARRS1 photometry. TDSS will be the largest spectroscopic survey to explicitly target variable objects, avoiding pre-selection on the basis of colors or detailed modeling of specific variability characteristics. Kernel Density Estimate (KDE) analysis of our target population performed on SDSS Stripe 82 data suggests our target sample will be 95% pure (meaning 95% of objects we select have genuine luminosity variability of a few magnitudes or more). Our final spectroscopic sample will contain roughly 135,000 quasars and 85,000 stellar variables, approximately 4,000 of which will be RR Lyrae stars which may be used as outer Milky Way probes. The variability-sele...
Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection
DEFF Research Database (Denmark)
Karaman, Ibrahim; Qannari, El Mostafa; Martens, Harald;
2013-01-01
is often used by chemometricians. In order to evaluate the predictive ability of both methods, cross model validation was implemented. The performance of both methods was assessed using FTIR spectroscopic data, on the one hand, and a set of simulated data. The stability of the variable selection procedures...... was highlighted by the frequency of the selection of each variable in the cross model validation segments. Computationally, Jack-knife PLSR was much faster than Sparse PLSR. But while it was found that both methods have more or less the same predictive ability, Sparse PLSR turned out to be generally very stable......The objective of this study was to compare two different techniques of variable selection, Sparse PLSR and Jack-knife PLSR, with respect to their predictive ability and their ability to identify relevant variables. Sparse PLSR is a method that is frequently used in genomics, whereas Jack-knife PLSR...
A bootstrapping soft shrinkage approach for variable selection in chemical modeling.
Deng, Bai-Chuan; Yun, Yong-Huan; Cao, Dong-Sheng; Yin, Yu-Long; Wang, Wei-Ting; Lu, Hong-Mei; Luo, Qian-Yi; Liang, Yi-Zeng
2016-02-18
In this study, a new variable selection method called bootstrapping soft shrinkage (BOSS) method is developed. It is derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA). The weights of variables are determined based on the absolute values of regression coefficients. WBS is applied according to the weights to generate sub-models and MPA is used to analyze the sub-models to update weights for variables. The optimization procedure follows the rule of soft shrinkage, in which less important variables are not eliminated directly but are assigned smaller weights. The algorithm runs iteratively and terminates until the number of variables reaches one. The optimal variable set with the lowest root mean squared error of cross-validation (RMSECV) is selected. The method was tested on three groups of near infrared (NIR) spectroscopic datasets, i.e. corn datasets, diesel fuels datasets and soy datasets. Three high performing variable selection methods, i.e. Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm partial least squares (GA-PLS) are used for comparison. The results show that BOSS is promising with improved prediction performance. The Matlab codes for implementing BOSS are freely available on the website: http://www.mathworks.com/matlabcentral/fileexchange/52770-boss. PMID:26826688
Directory of Open Access Journals (Sweden)
GOUR BANDYOPADHYAY
2013-12-01
Full Text Available The study uses correlation and regression analysis to examine the impact of Non Performing Assets (NPA on selected banking variables in two Public Sector Banks (PSBs in India. Initially to examine degree of association between the strategic banking variables identified, simple correlation co-efficients have been computed and their significance examined. For the purpose of examining impact of NPA on the profitability and other strategic banking variables including time variable, simple linear regression and multiple regression (as appropriate have been attempted. To diagnose the problem of multi co-linearity in multiple regressions, the value of tolerance factor (TOL along with variance inflating factor (VIF have also been computed and compared with standard. The study reveals that NPA has statistically significant negative impact on profitability and statistically significant impact on few strategic banking variables in respect of two selected PSBs.
De Smet, Tom; Struys, Michel M. R. F.; Neckebroek, Martine M.; Van den Hauwe, Kristof; Bonte, Sjoert; Mortier, Eric P.
2008-01-01
BACKGROUND: Closed-loop control of the hypnotic component of anesthesia has been proposed in an attempt to optimize drug delivery. Here, we introduce a newly developed Bayesian-based, patient-individualized, model-based, adaptive control method for bispectral index (BIS) guided propofol infusion int
Refining gene signatures: a Bayesian approach
Directory of Open Access Journals (Sweden)
Labbe Aurélie
2009-12-01
Full Text Available Abstract Background In high density arrays, the identification of relevant genes for disease classification is complicated by not only the curse of dimensionality but also the highly correlated nature of the array data. In this paper, we are interested in the question of how many and which genes should be selected for a disease class prediction. Our work consists of a Bayesian supervised statistical learning approach to refine gene signatures with a regularization which penalizes for the correlation between the variables selected. Results Our simulation results show that we can most often recover the correct subset of genes that predict the class as compared to other methods, even when accuracy and subset size remain the same. On real microarray datasets, we show that our approach can refine gene signatures to obtain either the same or better predictive performance than other existing methods with a smaller number of genes. Conclusions Our novel Bayesian approach includes a prior which penalizes highly correlated features in model selection and is able to extract key genes in the highly correlated context of microarray data. The methodology in the paper is described in the context of microarray data, but can be applied to any array data (such as micro RNA, for example as a first step towards predictive modeling of cancer pathways. A user-friendly software implementation of the method is available.
COMPARISON OF SELECTED PHYSIOLOGICAL VARIABLES OF PLAYERS BELONGING TO VARIOUS DISTANCE RUNNERS
Satpal Yadav; Arvind S.Sajwan; Ankan Sinha
2009-01-01
The purpose of the study was to compare the selected physiological variables namely; maximum oxygen consumption, vital capacity, resting heart rate and hemoglobin content among various distance runners. Thesubjects were selected from the male athlete’s of Gwalior district of various distance runners i.e. short, middle and long distance runners for this study. Ten (10) male athletes from each groups namely short, middle and long distance groups were selected as the subject for the study. Selec...
EFFECT OF PLYOMETRIC TRAINING ON SELECTED SKILL PERFORMANCE VARIABLES AMONG FEMALE HOCKEY PLAYERS
G. VASANTHI; P. Y. Sivachandran
2014-01-01
The purpose of the study was to find out the effect of plyometric training on selected skill performance variables among female hockey players. To achieve the purpose of the present study, thirty female hockey players were randomly selected from PKR Women College of Arts and Science and Gopi Arts and Science College, Erode district, Tamilnadu, India and their age ranged from 18 to 21 years. The selected subjects were divided into two groups of fifteen subjects in each. Group I ...
A survey of variable selection methods in two Chinese epidemiology journals
Directory of Open Access Journals (Sweden)
Lynn Henry S
2010-09-01
Full Text Available Abstract Background Although much has been written on developing better procedures for variable selection, there is little research on how it is practiced in actual studies. This review surveys the variable selection methods reported in two high-ranking Chinese epidemiology journals. Methods Articles published in 2004, 2006, and 2008 in the Chinese Journal of Epidemiology and the Chinese Journal of Preventive Medicine were reviewed. Five categories of methods were identified whereby variables were selected using: A - bivariate analyses; B - multivariable analysis; e.g. stepwise or individual significance testing of model coefficients; C - first bivariate analyses, followed by multivariable analysis; D - bivariate analyses or multivariable analysis; and E - other criteria like prior knowledge or personal judgment. Results Among the 287 articles that reported using variable selection methods, 6%, 26%, 30%, 21%, and 17% were in categories A through E, respectively. One hundred sixty-three studies selected variables using bivariate analyses, 80% (130/163 via multiple significance testing at the 5% alpha-level. Of the 219 multivariable analyses, 97 (44% used stepwise procedures, 89 (41% tested individual regression coefficients, but 33 (15% did not mention how variables were selected. Sixty percent (58/97 of the stepwise routines also did not specify the algorithm and/or significance levels. Conclusions The variable selection methods reported in the two journals were limited in variety, and details were often missing. Many studies still relied on problematic techniques like stepwise procedures and/or multiple testing of bivariate associations at the 0.05 alpha-level. These deficiencies should be rectified to safeguard the scientific validity of articles published in Chinese epidemiology journals.
Wu, Rui-mei; Zhao, Jie-wen; Chen, Quan-sheng; Huang, Xing-yi
2011-07-01
The present paper was attempted to study the feasibility to determine the taste quality of green tea using FT-NIR spectroscopy combined with variable selection methods. Chemistry evaluation, as the reference measurement, was used to measure the total taste scores of green tea infusion. First, synergy interval PLS (siPLS) was implemented to select efficient spectral regions from SNV preprocessed spectra; then, optimal variables were selected using genetic algorithm (GA) from these selected spectral regions by siPLS, and the optimal model was achieved with Rp = 0.8908, RMSEP = 4.66 in the prediction set when 38 variables and 6 PLS factors were included. Experimental results showed that the performance of siPLS-GA model was superior to those of others. This study demonstrated that NIR spectra could be used successfully to measure taste quality of green tea and siPLS-GA algorithm has superiority to other algorithm in developing NIR spectral regression model. PMID:21942023
Directory of Open Access Journals (Sweden)
Woosik Jang
2015-01-01
Full Text Available Since the 1970s, revenues generated by Korean contractors in international construction have increased rapidly, exceeding USD 70 billion per year in recent years. However, Korean contractors face significant risks from market uncertainty and sensitivity to economic volatility and technical difficulties. As the volatility of these risks threatens project profitability, approximately 15% of bad projects were found to account for 74% of losses from the same international construction sector. Anticipating bad projects via preemptive risk management can better prevent losses so that contractors can enhance the efficiency of bidding decisions during the early stages of a project cycle. In line with these objectives, this paper examines the effect of such factors on the degree of project profitability. The Naïve Bayesian classifier is applied to identify a good project screening tool, which increases practical applicability using binomial variables with limited information that is obtainable in the early stages. The proposed model produced superior classification results that adequately reflect contractor views of risk. It is anticipated that when users apply the proposed model based on their own knowledge and expertise, overall firm profit rates will increase as a result of early abandonment of bad projects as well as the prioritization of good projects before final bidding decisions are made.
Variable selectivity and the role of nutritional quality in food selection by a planktonic rotifer
International Nuclear Information System (INIS)
To investigate the potential for selective feeding to enhance fitness, I test the hypothesis that an herbivorous zooplankter selects those food items that best support its reproduction. Under this hypothesis, growth and reproduction on selected food items should be higher than on less preferred items. The hypothesis is not supported. In situ selectivity by the rotifer Keratella taurocephala for Cryptomonas relative to Chlamydomonas goes through a seasonal cycle, in apparent response to fluctuating Cryptomonas populations. However, reproduction on a unialgal diet of Cryptomonas is consistently high and similar to that on Chlamydomonas. Oocystis, which also supports reproduction equivalent to that supported by Chlamydomonas, is sometimes rejected by K. taurocephala. In addition, K. taurocephala does not discriminate between Merismopedia and Chlamydomonas even though Merismopedia supports virtually no reproduction by the rotifer. Selection by K. taurocephala does not simply maximize the intake of food items that yield high reproduction. Selectivity is a complex, dynamic process, one function of which may be the exploitation of locally or seasonally abundant foods. (author)
Bayesian Approach to Handling Informative Sampling
Sikov, Anna
2015-01-01
In the case of informative sampling the sampling scheme explicitly or implicitly depends on the response variable. As a result, the sample distribution of response variable can- not be used for making inference about the population. In this research I investigate the problem of informative sampling from the Bayesian perspective. Application of the Bayesian approach permits solving the problems, which arise due to complexity of the models, being used for handling informative sampling. The main...
Kuiper, Rebecca M; Nederhoff, Tim; Klugkist, Irene
2015-05-01
In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration-based set of hypotheses containing equality constraints on the means, or a theory-based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory-based hypotheses) has advantages over exploration (i.e., examining all possible equality-constrained hypotheses). Furthermore, examining reasonable order-restricted hypotheses has more power to detect the true effect/non-null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory-based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number). PMID:24975402
Universal Darwinism as a process of Bayesian inference
Campbell, John O
2016-01-01
Many of the mathematical frameworks describing natural selection are equivalent to Bayes Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment". Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description clo...
Bayesian Methods and Universal Darwinism
Campbell, John
2010-01-01
Bayesian methods since the time of Laplace have been understood by their practitioners as closely aligned to the scientific method. Indeed a recent champion of Bayesian methods, E. T. Jaynes, titled his textbook on the subject Probability Theory: the Logic of Science. Many philosophers of science including Karl Popper and Donald Campbell have interpreted the evolution of Science as a Darwinian process consisting of a 'copy with selective retention' algorithm abstracted from Darwin's theory of Natural Selection. Arguments are presented for an isomorphism between Bayesian Methods and Darwinian processes. Universal Darwinism, as the term has been developed by Richard Dawkins, Daniel Dennett and Susan Blackmore, is the collection of scientific theories which explain the creation and evolution of their subject matter as due to the operation of Darwinian processes. These subject matters span the fields of atomic physics, chemistry, biology and the social sciences. The principle of Maximum Entropy states that system...
Creaco, E.; Berardi, L.; Sun, Siao; Giustolisi, O.; Savic, D.
2016-04-01
The growing availability of field data, from information and communication technologies (ICTs) in "smart" urban infrastructures, allows data modeling to understand complex phenomena and to support management decisions. Among the analyzed phenomena, those related to storm water quality modeling have recently been gaining interest in the scientific literature. Nonetheless, the large amount of available data poses the problem of selecting relevant variables to describe a phenomenon and enable robust data modeling. This paper presents a procedure for the selection of relevant input variables using the multiobjective evolutionary polynomial regression (EPR-MOGA) paradigm. The procedure is based on scrutinizing the explanatory variables that appear inside the set of EPR-MOGA symbolic model expressions of increasing complexity and goodness of fit to target output. The strategy also enables the selection to be validated by engineering judgement. In such context, the multiple case study extension of EPR-MOGA, called MCS-EPR-MOGA, is adopted. The application of the proposed procedure to modeling storm water quality parameters in two French catchments shows that it was able to significantly reduce the number of explanatory variables for successive analyses. Finally, the EPR-MOGA models obtained after the input selection are compared with those obtained by using the same technique without benefitting from input selection and with those obtained in previous works where other data-modeling techniques were used on the same data. The comparison highlights the effectiveness of both EPR-MOGA and the input selection procedure.
Ross, Cody T; Strimling, Pontus; Ericksen, Karen Paige; Lindenfors, Patrik; Mulder, Monique Borgerhoff
2016-06-01
We present formal evolutionary models for the origins and persistence of the practice of Female Genital Modification (FGMo). We then test the implications of these models using normative cross-cultural data on FGMo in Africa and Bayesian phylogenetic methods that explicitly model adaptive evolution. Empirical evidence provides some support for the findings of our evolutionary models that the de novo origins of the FGMo practice should be associated with social stratification, and that social stratification should place selective pressures on the adoption of FGMo; these results, however, are tempered by the finding that FGMo has arisen in many cultures that have no social stratification, and that forces operating orthogonally to stratification appear to play a more important role in the cross-cultural distribution of FGMo. To explain these cases, one must consider cultural evolutionary explanations in conjunction with behavioral ecological ones. We conclude with a discussion of the implications of our study for policies designed to end the practice of FGMo. PMID:26846688
Definition of Valid Proteomic Biomarkers: A Bayesian Solution
Harris, Keith; Girolami, Mark; Mischak, Harald
Clinical proteomics is suffering from high hopes generated by reports on apparent biomarkers, most of which could not be later substantiated via validation. This has brought into focus the need for improved methods of finding a panel of clearly defined biomarkers. To examine this problem, urinary proteome data was collected from healthy adult males and females, and analysed to find biomarkers that differentiated between genders. We believe that models that incorporate sparsity in terms of variables are desirable for biomarker selection, as proteomics data typically contains a huge number of variables (peptides) and few samples making the selection process potentially unstable. This suggests the application of a two-level hierarchical Bayesian probit regression model for variable selection which assumes a prior that favours sparseness. The classification performance of this method is shown to improve that of the Probabilistic K-Nearest Neighbour model.
Variable selection in the explorative analysis of several data blocks in metabolomics
DEFF Research Database (Denmark)
Karaman, İbrahim; Nørskov, Natalja; Yde, Christian Clement;
highly correlated data sets in one integrated approach. Due to the high number of variables in data sets from metabolomics (both raw data and after peak picking) the selection of important variables in an explorative analysis is difficult, especially when different data sets of metabolomics data need...... to be related. Tools for the handling of mental overflow minimising false discovery rates both by using statistical and biological validation in an integrative approach are needed. In this paper different strategies for variable selection were considered with respect to false discovery and the possibility...... with many variables for the purpose of reducing over-fitting problems and providing useful interpretation tools. These tools have excellent possibilities for giving a graphical overview of sample and variation patterns. They handle co-linearity in an efficient way and make it possible to use different...
Variable selection in PLSR and extensions to a multi-block setting for metabolomics data
DEFF Research Database (Denmark)
Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach;
-block situation. Thereby the close relationship to elastic net remains established. [1] K. A. Lê Cao, D. Rossouw, C. Robert-Granié, and P. Besse, A sparse PLS for variable selection when integrating omics data, Statistical Applications in Genetics and Molecular Biology, 7 (2008). [2] F. Westad and H. Martens...... genomics [1]. They became quickly well established in the field of statistics because a close relationship to elastic net has been established. In sparse variable selection combined with PLSR, a soft thresholding is applied on each loading weight separately. In the field of chemometrics Jack-knifing has......When applying LC-MS or NMR spectroscopy in metabolomics studies, high-dimensional data are generated and effective tools for variable selection are needed in order to detect the important metabolites. Methods based on sparsity combined with PLSR have recently attracted attention in the field of...
Variability-based AGN selection using image subtraction in the SDSS and LSST era
Choi, Yumi; Gibson, Robert R.; Becker, Andrew C.; Ivezić, Željko; Connolly, Andrew J.; MacLeod, Chelsea L.; Ruan, John J.; Anderson, Scott F
2013-01-01
With upcoming all sky surveys such as LSST poised to generate a deep digital movie of the optical sky, variability-based AGN selection will enable the construction of highly-complete catalogs with minimum contamination. In this study, we generate $g$-band difference images and construct light curves for QSO/AGN candidates listed in SDSS Stripe 82 public catalogs compiled from different methods, including spectroscopy, optical colors, variability, and X-ray detection. Image differencing excels...
Geographic Elements Selection Algorithm Based on Quadtree in Variable-scale Visualization
Hao Guo; Feixiang Chen; Junjie Peng
2013-01-01
In order to balance the demand between local and global visualization in the data acquisition, this paper adopts the variable-scale visualization technology and uses quadrangular frustum pyramid projection to show geographic information continuously on a mobile device. In addition, the geographic elements in the variable-scale transition region are crowd because of unceasingly scale changing. In order to solve this problem, this paper presents a quadtree-based geographic elements selection al...
Zuber, Verena
2012-01-01
In this thesis, we address the identification of biomarkers in high-dimensional omics data. The identification of valid biomarkers is especially relevant for personalized medicine that depends on accurate prediction rules. Moreover, biomarkers elucidate the provenance of disease, or molecular changes related to disease. From a statistical point of view the identification of biomarkers is best cast as variable selection. In particular, we refer to variables as the molecular attributes under in...
Bayesian nonparametric data analysis
Müller, Peter; Jara, Alejandro; Hanson, Tim
2015-01-01
This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.
Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection
Chen, Lisha
2012-12-01
The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.
Seleção de variáveis em QSAR Variable selection in QSAR
Directory of Open Access Journals (Sweden)
Márcia Miguel Castro Ferreira
2002-05-01
Full Text Available The process of building mathematical models in quantitative structure-activity relationship (QSAR studies is generally limited by the size of the dataset used to select variables from. For huge datasets, the task of selecting a given number of variables that produces the best linear model can be enormous, if not unfeasible. In this case, some methods can be used to separate good parameter combinations from the bad ones. In this paper three methodologies are analyzed: systematic search, genetic algorithm and chemometric methods. These methods have been exposed and discussed through practical examples.
International Nuclear Information System (INIS)
Five different methods are compared for selecting the most important variables with a view to classifying high energy physics events with neural networks. The different methods are: the F-test, Principal Component Analysis (PCA), a decision tree method: CART, weight evaluation, and Optimal Cell Damage (OCD). The neural networks use the variables selected with the different methods. We compare the percentages of events properly classified by each neural network. The learning set and the test set are the same for all the neural networks. (author)
The use of vector bootstrapping to improve variable selection precision in Lasso models.
Laurin, Charles; Boomsma, Dorret; Lubke, Gitta
2016-08-01
The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping. Data were simulated to represent genomic data under a polygenic model as well as under a model with effect sizes representative of typical GWAS results. We compared these approaches to each other as well as to software defaults for the Lasso. Nested cross-validation had the most precise variable selection at small effect sizes. At larger effect sizes, there was no advantage to nesting. We illustrated the nested approach with empirical data comprising SNPs and SNP-SNP interactions from the most significant SNPs in a GWAS of borderline personality symptoms. In the empirical example, we found that the default Lasso selected low-reliability SNPs and interactions which were excluded by bootstrapping. PMID:27248122
Current Debates on Variability in Child Welfare Decision-Making: A Selected Literature Review
Directory of Open Access Journals (Sweden)
Emily Keddell
2014-11-01
Full Text Available This article considers selected drivers of decision variability in child welfare decision-making and explores current debates in relation to these drivers. Covering the related influences of national orientation, risk and responsibility, inequality and poverty, evidence-based practice, constructions of abuse and its causes, domestic violence and cognitive processes, it discusses the literature in regards to how each of these influences decision variability. It situates these debates in relation to the ethical issue of variability and the equity issues that variability raises. I propose that despite the ecological complexity that drives decision variability, that improving internal (within-country decision consistency is still a valid goal. It may be that the use of annotated case examples, kind learning systems, and continued commitments to the social justice issues of inequality and individualisation can contribute to this goal.
Long-term Optical Variability of Radio-Selected Quasars from the FIRST Survey
Helfand, D J; Willman, B; White, R L; Becker, R H; Price, T; Gregg, M D; McMahon, R G; Helfand, David J.; Stone, Remington P.S.; Willman, Beth; White, Richard L.; Becker, Robert H.; Price, Trevor; Gregg, Michael D.; Mahon, Richard G. Mc
2001-01-01
We have obtained single-epoch optical photometry for 201 quasars, taken from the FIRST Bright Quasar Survey, which span a wide range in radio loudness. Comparison with the magnitudes of these objects on the POSS-I plates provides by far the largest sample of long-term variability amplitudes for radio-selected quasars yet produced. We find the quasars to be more variable in the blue than in the red band, consistent with work on optically selected samples. The previously noted trend of decreasing variability with increasing optical luminosity applies only to radio-quiet objects. Furthermore, we do not confirm a rise in variability amplitude with redshift, nor do we see any dependence on radio flux or luminosity. The variability over a radio-optical flux ratio range spanning a factor of 60,000 from radio-quiet to extreme radio-loud objects is largely constant, although there is a suggestion of greater variability in the extreme radio-loud objects. We demonstrate the importance of Malmquist bias in variability st...
Effects of Some Selected Socio-demographic Variables on Male Migrants in Bangladesh
Rafiqul Islam; Abdur Rokib
2009-01-01
The purpose of this study is to identify the intensity of the effects of various socio-economic and demographic factors on migration. The data was collected by multi-stage sampling technique at Meherpur sadar thana at Meherpur district, Bangladesh. This paper shows exact causal links between various selected socioeconomic and demographic variables. Multivariate technique like as path analysis has been used to find out the direct, indirect, total and implied effects of the selected socio-econo...
Ohmann, C; Künneke, M.; Zaczyk, R.; Thon, K.; Lorenz, Wilfried
1986-01-01
In this paper two problems of computer-aided diagnosis with 'independence Bayes' were investigated: selection of variables and monotonicity in performance as the number of measurements is increased. Using prospective data from patients with upper gastrointestinal bleeding, the stepwise forward selection approach maximizing the apparent diagnostic accuracy was analysed with respect to different kinds of bias in estimation of the true diagnostic accuracy and to the stability of the number and t...
Walk, Nathan; Ralph, Timothy C; Lam, Ping Koy
2011-01-01
We extend the security proof for continuous variable quantum key distribution protocols using post selection to account for arbitrary eavesdropping attacks by employing the concept of an equiv- alent protocol where the post-selection is implemented as a projective quantum measurement. We demonstrate that the security can be calculated using only experimentally accessible quantities and finally explicitly evaluate the performance for the case of a noisy Gaussian channel in the limit of unbounded key length.
Sun, Xu-dong; Zhou, Ming-xing; Sun, Yi-ze
2016-07-01
Investigations were initiated to develop near infrared (NIR) techniques coupled with variables selection method to rapidly measure cotton content in blend fabrics of cotton and polyester. Multiplicative scatter correction (MSC), smooth, first derivative (1Der), second derivative (2Der) and their combination were employed to preprocess the spectra. Monte Carlo uninformative variables elimination (MCUVE), successive projections algorithm (SPA), and genetic algorithm (GA) were performed comparatively to choose characteristic variables associated with cotton content distributions. One hundred and thirty-five and fifty-nine samples were used to calibrate models and assess the performance of the models, respectively. Through comparing the performance of partial least squares (PLS) regression models with new samples, the optimal model of cotton content was obtained with spectral pretreatment method of 2 Der-Smooth-MSC and variables selection method of MCUVE-SPA-PLS. The correlation coefficient of prediction (rp) and root mean square errors of prediction (RMSEP) were 0.988% and 2.100%, respectively. The results suggest that NIR technique combining with variables selection method of MCUVE-SPA has significant potential to quantitatively analyze cotton content in blend fabrics of cotton and polyester; moreover, it could indicate the related spectral contributions.
A QSAR Study of Environmental Estrogens Based on a Novel Variable Selection Method
Directory of Open Access Journals (Sweden)
Aiqian Zhang
2012-05-01
Full Text Available A large number of descriptors were employed to characterize the molecular structure of 53 natural, synthetic, and environmental chemicals which are suspected of disrupting endocrine functions by mimicking or antagonizing natural hormones and may thus pose a serious threat to the health of humans and wildlife. In this work, a robust quantitative structure-activity relationship (QSAR model with a novel variable selection method has been proposed for the effective estrogens. The variable selection method is based on variable interaction (VSMVI with leave-multiple-out cross validation (LMOCV to select the best subset. During variable selection, model construction and assessment, the Organization for Economic Co-operation and Development (OECD principles for regulation of QSAR acceptability were fully considered, such as using an unambiguous multiple-linear regression (MLR algorithm to build the model, using several validation methods to assessment the performance of the model, giving the define of applicability domain and analyzing the outliers with the results of molecular docking. The performance of the QSAR model indicates that the VSMVI is an effective, feasible and practical tool for rapid screening of the best subset from large molecular descriptors.
Knowledge of the spatial variability of soil properties in agricultural fields is important for implementing various precision agricultural management practices. This paper examines spatial variation of selected soil physical and chemical properties and explores their spatial correlation in the A ho...
A Robust Supervised Variable Selection for Noisy High-Dimensional Data
Czech Academy of Sciences Publication Activity Database
Kalina, Jan; Schlenker, Anna
2015-01-01
Roč. 2015, Article 320385 (2015), s. 1-10. ISSN 2314-6133 R&D Projects: GA ČR GA13-17187S Institutional support: RVO:67985807 Keywords : dimensionality reduction * variable selection * robustness Subject RIV: BA - General Mathematics Impact factor: 1.579, year: 2014
Random forest (RF) is popular in ecological and environmental modeling, in part, because of its insensitivity to correlated predictors and resistance to overfitting. Although variable selection has been proposed to improve both performance and interpretation of RF models, it is u...
Meta-Statistics for Variable Selection: The R Package BioMark
Directory of Open Access Journals (Sweden)
Ron Wehrens
2012-11-01
Full Text Available Biomarker identification is an ever more important topic in the life sciences. With the advent of measurement methodologies based on microarrays and mass spectrometry, thousands of variables are routinely being measured on complex biological samples. Often, the question is what makes two groups of samples different. Classical hypothesis testing suffers from the multiple testing problem; however, correcting for this often leads to a lack of power. In addition, choosing α cutoff levels remains somewhat arbitrary. Also in a regression context, a model depending on few but relevant variables will be more accurate and precise, and easier to interpret biologically.We propose an R package, BioMark, implementing two meta-statistics for variable selection. The first, higher criticism, presents a data-dependent selection threshold for significance, instead of a cookbook value of α = 0.05. It is applicable in all cases where two groups are compared. The second, stability selection, is more general, and can also be applied in a regression context. This approach uses repeated subsampling of the data in order to assess the variability of the model coefficients and selects those that remain consistently important. It is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data. BioMark also contains functionality for simulating data with specific characteristics for algorithm development and testing.
Effect Of Pranayama On Selected Physiological Variables Of Male Physical Education Students
Shivendra Dubey; M. K. Singh
2013-01-01
The objective of the study was to determine the effects of Pranayama on Selected Physiological Variables of Male Physical Education Students of P.G.College, Upardaha, Baraut, Allahabad (U.P), India. The subjects for this study were randomly selected from the Department of Physical Education at P.G.College, Upardaha, Baraut, Allahabad (U.P), India. A total of 40 male physical education students were selected as subject for this study. All the subjects were randomly divided into 2 groups. Prana...
Bayesian artificial intelligence
Korb, Kevin B
2010-01-01
Updated and expanded, Bayesian Artificial Intelligence, Second Edition provides a practical and accessible introduction to the main concepts, foundation, and applications of Bayesian networks. It focuses on both the causal discovery of networks and Bayesian inference procedures. Adopting a causal interpretation of Bayesian networks, the authors discuss the use of Bayesian networks for causal modeling. They also draw on their own applied research to illustrate various applications of the technology.New to the Second EditionNew chapter on Bayesian network classifiersNew section on object-oriente
Knowledge-based variable selection for learning rules from proteomic data
Directory of Open Access Journals (Sweden)
Hogan William R
2009-09-01
Full Text Available Abstract Background The incorporation of biological knowledge can enhance the analysis of biomedical data. We present a novel method that uses a proteomic knowledge base to enhance the performance of a rule-learning algorithm in identifying putative biomarkers of disease from high-dimensional proteomic mass spectral data. In particular, we use the Empirical Proteomics Ontology Knowledge Base (EPO-KB that contains previously identified and validated proteomic biomarkers to select m/zs in a proteomic dataset prior to analysis to increase performance. Results We show that using EPO-KB as a pre-processing method, specifically selecting all biomarkers found only in the biofluid of the proteomic dataset, reduces the dimensionality by 95% and provides a statistically significantly greater increase in performance over no variable selection and random variable selection. Conclusion Knowledge-based variable selection even with a sparsely-populated resource such as the EPO-KB increases overall performance of rule-learning for disease classification from high-dimensional proteomic mass spectra.
Directory of Open Access Journals (Sweden)
Deletić Nebojša
2009-01-01
Full Text Available This paper deals with 31 SSD lines from ZP-Syn-1 C0 and 37 from ZP-Syn-1 C3 maize populations. After line selection and seed multiplication in the first year of the study, the trials were set during two years in Kruševac and Zemun Polje, in RCB design with three replications. Additive and phenotypic variances of yield components were calculated, as well as the estimation of genetic variability narrowing by multivariate cluster analysis. The differences in additive and phenotypic variances between the cycles were significant for ear length only and highly significant for grain row number per ear and for percent of root and stalk lodged plants. It means, a significant narrowing of additive and phenotypic variance occurred only for those three traits, and the other traits did not change their variability by selection in a significant manner. However, according to cluster analysis, distances among genotypes and groups in the zero selection cycle were approximately double than in the third one, but group definition was better in the third selection cycle. It can suggest indirectly to a total variability narrowing after three cycles of recurrent selection.
Peerbhay, Kabir Yunus; Mutanga, Onisimo; Ismail, Riyad
2014-01-01
Tree species information is important for forest inventory management and supports decisions related to the composition and distribution of forest resources. However, traditional methods of obtaining such information involve time consuming and cost intensive ground-based methods. Hyperspectral data offer an alternative source for obtaining information related to forest inventory. Utilizing Airborne Imaging Spectrometer for Applications Eagle hyperspectral data (393 to 994 nm), this study compares the utility of two partial least squares (PLS)-based methods for the classification of three commercial Pinus tree species. Results indicate that the sparse partial least squares discriminant analysis (SPLS-DA) method performed variable selection and dimension reduction successfully to produce an overall accuracy of 80.21%. In comparison, the PLS-DA method and variable importance in the projection (VIP) selected bands produced an overall accuracy of 71.88%. The most effective bands selected by PLS-DA and VIP coincided within the visible region of the spectrum (393 to 700 nm). However, SPLS-DA selected fewer wavebands within the blue (415 to 483 nm), green (515 to 565 nm), and red regions (674 to 694 nm) to confirm the importance of the visible in discriminating tree species. Overall, this study shows the potential of SPLS-DA to perform simultaneous variable selection and dimension reduction of hyperspectral remotely sensed data resulting in improved classification accuracies.
Penalized variable selection procedure for Cox models with semiparametric relative risk
Du, Pang; Liang, Hua; 10.1214/09-AOS780
2010-01-01
We study the Cox models with semiparametric relative risk, which can be partially linear with one nonparametric component, or multiple additive or nonadditive nonparametric components. A penalized partial likelihood procedure is proposed to simultaneously estimate the parameters and select variables for both the parametric and the nonparametric parts. Two penalties are applied sequentially. The first penalty, governing the smoothness of the multivariate nonlinear covariate effect function, provides a smoothing spline ANOVA framework that is exploited to derive an empirical model selection tool for the nonparametric part. The second penalty, either the smoothly-clipped-absolute-deviation (SCAD) penalty or the adaptive LASSO penalty, achieves variable selection in the parametric part. We show that the resulting estimator of the parametric part possesses the oracle property, and that the estimator of the nonparametric part achieves the optimal rate of convergence. The proposed procedures are shown to work well i...
Macroeconomic Forecasts in Models with Bayesian Averaging of Classical Estimates
Directory of Open Access Journals (Sweden)
Piotr Białowolski
2012-03-01
Full Text Available The aim of this paper is to construct a forecasting model oriented on predicting basic macroeconomic variables, namely: the GDP growth rate, the unemployment rate, and the consumer price inflation. In order to select the set of the best regressors, Bayesian Averaging of Classical Estimators (BACE is employed. The models are atheoretical (i.e. they do not reflect causal relationships postulated by the macroeconomic theory and the role of regressors is played by business and consumer tendency survey-based indicators. Additionally, survey-based indicators are included with a lag that enables to forecast the variables of interest (GDP, unemployment, and inflation for the four forthcoming quarters without the need to make any additional assumptions concerning the values of predictor variables in the forecast period. Bayesian Averaging of Classical Estimators is a method allowing for full and controlled overview of all econometric models which can be obtained out of a particular set of regressors. In this paper authors describe the method of generating a family of econometric models and the procedure for selection of a final forecasting model. Verification of the procedure is performed by means of out-of-sample forecasts of main economic variables for the quarters of 2011. The accuracy of the forecasts implies that there is still a need to search for new solutions in the atheoretical modelling.
Quilty, John; Adamowski, Jan; Khalil, Bahaa; Rathinasamy, Maheswaran
2016-03-01
The input variable selection problem has recently garnered much interest in the time series modeling community, especially within water resources applications, demonstrating that information theoretic (nonlinear)-based input variable selection algorithms such as partial mutual information (PMI) selection (PMIS) provide an improved representation of the modeled process when compared to linear alternatives such as partial correlation input selection (PCIS). PMIS is a popular algorithm for water resources modeling problems considering nonlinear input variable selection; however, this method requires the specification of two nonlinear regression models, each with parametric settings that greatly influence the selected input variables. Other attempts to develop input variable selection methods using conditional mutual information (CMI) (an analog to PMI) have been formulated under different parametric pretenses such as k nearest-neighbor (KNN) statistics or kernel density estimates (KDE). In this paper, we introduce a new input variable selection method based on CMI that uses a nonparametric multivariate continuous probability estimator based on Edgeworth approximations (EA). We improve the EA method by considering the uncertainty in the input variable selection procedure by introducing a bootstrap resampling procedure that uses rank statistics to order the selected input sets; we name our proposed method bootstrap rank-ordered CMI (broCMI). We demonstrate the superior performance of broCMI when compared to CMI-based alternatives (EA, KDE, and KNN), PMIS, and PCIS input variable selection algorithms on a set of seven synthetic test problems and a real-world urban water demand (UWD) forecasting experiment in Ottawa, Canada.
Zhu, Bo; Zhu, Miao; Jiang, Jicai; Niu, Hong; Wang, Yanhui; Wu, Yang; Xu, Lingyang; Chen, Yan; Zhang, Lupei; Gao, Xue; Gao, Huijiang; Liu, Jianfeng; Li, Junya
2016-01-01
Three conventional Bayesian approaches (BayesA, BayesB and BayesCπ) have been demonstrated to be powerful in predicting genomic merit for complex traits in livestock. A priori, these Bayesian models assume that the non-zero SNP effects (marginally) follow a t-distribution depending on two fixed hyperparameters, degrees of freedom and scale parameters. In this study, we performed genomic prediction in Chinese Simmental beef cattle and treated degrees of freedom and scale parameters as unknown with inappropriate priors. Furthermore, we compared the modified methods (BayesFA, BayesFB and BayesFCπ) with their corresponding counterparts using simulation datasets. We found that the modified methods with distribution assumed to the two hyperparameters were beneficial for improving the predictive accuracy. Our results showed that the predictive accuracies of the modified methods were slightly higher than those of their counterparts especially for traits with low heritability and a small number of QTLs. Moreover, cross-validation analysis for three traits, namely carcass weight, live weight and tenderloin weight, in 1136 Simmental beef cattle suggested that predictive accuracy of BayesFCπ noticeably outperformed BayesCπ with the highest increase (3.8%) for live weight using the cohort masking cross-validation. PMID:27139889
Directory of Open Access Journals (Sweden)
Bo Zhu
Full Text Available Three conventional Bayesian approaches (BayesA, BayesB and BayesCπ have been demonstrated to be powerful in predicting genomic merit for complex traits in livestock. A priori, these Bayesian models assume that the non-zero SNP effects (marginally follow a t-distribution depending on two fixed hyperparameters, degrees of freedom and scale parameters. In this study, we performed genomic prediction in Chinese Simmental beef cattle and treated degrees of freedom and scale parameters as unknown with inappropriate priors. Furthermore, we compared the modified methods (BayesFA, BayesFB and BayesFCπ with their corresponding counterparts using simulation datasets. We found that the modified methods with distribution assumed to the two hyperparameters were beneficial for improving the predictive accuracy. Our results showed that the predictive accuracies of the modified methods were slightly higher than those of their counterparts especially for traits with low heritability and a small number of QTLs. Moreover, cross-validation analysis for three traits, namely carcass weight, live weight and tenderloin weight, in 1136 Simmental beef cattle suggested that predictive accuracy of BayesFCπ noticeably outperformed BayesCπ with the highest increase (3.8% for live weight using the cohort masking cross-validation.
International Nuclear Information System (INIS)
We have developed a comprehensive, Bayesian, PBPK model-based analysis of the population toxicokinetics of trichloroethylene (TCE) and its metabolites in mice, rats, and humans, considering a wider range of physiological, chemical, in vitro, and in vivo data than any previously published analysis of TCE. The toxicokinetics of the 'population average,' its population variability, and their uncertainties are characterized in an approach that strives to be maximally transparent and objective. Estimates of experimental variability and uncertainty were also included in this analysis. The experimental database was expanded to include virtually all available in vivo toxicokinetic data, which permitted, in rats and humans, the specification of separate datasets for model calibration and evaluation. The total combination of these approaches and PBPK analysis provides substantial support for the model predictions. In addition, we feel confident that the approach employed also yields an accurate characterization of the uncertainty in metabolic pathways for which available data were sparse or relatively indirect, such as GSH conjugation and respiratory tract metabolism. Key conclusions from the model predictions include the following: (1) as expected, TCE is substantially metabolized, primarily by oxidation at doses below saturation; (2) GSH conjugation and subsequent bioactivation in humans appear to be 10- to 100-fold greater than previously estimated; and (3) mice had the greatest rate of respiratory tract oxidative metabolism as compared to rats and humans. In a situation such as TCE in which there is large database of studies coupled with complex toxicokinetics, the Bayesian approach provides a systematic method of simultaneously estimating model parameters and characterizing their uncertainty and variability. However, care needs to be taken in its implementation to ensure biological consistency, transparency, and objectivity.
Mendoza, D. L.; Stuart, A. L.; Dagne, G.; Yu, H.
2013-12-01
Uncertainties in emissions estimates are known to be one of the primary sources of uncertainty in calculating concentrations and subsequent exposure estimates. Despite continued improvement in the accuracy of emissions downscaling, the quantification of uncertainties is necessary in order to generate a representative emissions product. Bayesian data assimilation is a promising approach to uncertainty estimation when used to calibrate model results with measurement data. This study discusses an emissions inventory and concentration estimates for carbon monoxide (CO), fine particle (PM2.5) black carbon, and nitrogen oxides (NOx) for the city of Fort Collins, Colorado. The development of a Bayesian framework for updating estimates of emissions and concentrations in multiple stages, using measurement data, is also presented. The emissions inventory was constructed using the 2008 National Emissions Inventory (NEI). The spatial and temporal allocation methods from the Emission Modeling Clearinghouse data set are used to downscale the NEI data from annual and county-level resolution for point, nonpoint, and nonroad sources. Onroad mobile source emissions were estimated by combining a bottom-up emissions calculation approach (using emission factors and activities) for large roadway links within Fort Collins with a top-down spatial allocation approach for other roadways. Vehicle activity data for road links were obtained from local 2009 travel demand model results and automatic traffic recorder (ATR) data. The CALPUFF Gaussian puff dispersion model was used to estimate air pollutant concentrations. Hourly, 1.33 km x 1.33 km MM5 meteorological data was used to capture temporal variability in transport. Distributions of concentrations are obtained for spatial locations and time spans using a Monte Carlo sampling approach. Data for ensemble members are sampled from distributions defined from the emissions inventory and meteorological data. Modeled concentrations of CO, PM2
Stochastic model updating utilizing Bayesian approach and Gaussian process model
Wan, Hua-Ping; Ren, Wei-Xin
2016-03-01
Stochastic model updating (SMU) has been increasingly applied in quantifying structural parameter uncertainty from responses variability. SMU for parameter uncertainty quantification refers to the problem of inverse uncertainty quantification (IUQ), which is a nontrivial task. Inverse problem solved with optimization usually brings about the issues of gradient computation, ill-conditionedness, and non-uniqueness. Moreover, the uncertainty present in response makes the inverse problem more complicated. In this study, Bayesian approach is adopted in SMU for parameter uncertainty quantification. The prominent strength of Bayesian approach for IUQ problem is that it solves IUQ problem in a straightforward manner, which enables it to avoid the previous issues. However, when applied to engineering structures that are modeled with a high-resolution finite element model (FEM), Bayesian approach is still computationally expensive since the commonly used Markov chain Monte Carlo (MCMC) method for Bayesian inference requires a large number of model runs to guarantee the convergence. Herein we reduce computational cost in two aspects. On the one hand, the fast-running Gaussian process model (GPM) is utilized to approximate the time-consuming high-resolution FEM. On the other hand, the advanced MCMC method using delayed rejection adaptive Metropolis (DRAM) algorithm that incorporates local adaptive strategy with global adaptive strategy is employed for Bayesian inference. In addition, we propose the use of the powerful variance-based global sensitivity analysis (GSA) in parameter selection to exclude non-influential parameters from calibration parameters, which yields a reduced-order model and thus further alleviates the computational burden. A simulated aluminum plate and a real-world complex cable-stayed pedestrian bridge are presented to illustrate the proposed framework and verify its feasibility.
X-ray variability in a complete sample of Soft X-ray selected AGN
D. Grupe; Thomas, H. -C.; Beuermann, K.
2000-01-01
We present ROSAT All-Sky Survey and ROSAT pointed observations (PSPC and HRI) of a complete sample of 113 bright soft X-ray AGN selected from the ROSAT Bright Source Catalog. We compare these observations in order to search for extreme cases of flux and spectral X-ray variability - X-ray transient AGN. Three definite transients and one transient candidate are found. The other sources show amplitude variations typically by factors of 2-3 on timescales of years. We found that the variability st...
Morokuma, Tomoki; Doi, Mamoru; Yasuda, Naoki; Akiyama, Masayuki; Sekiguchi, Kazuhiro; Furusawa, Hisanori; Ueda, Yoshihiro; Totani, Tomonori; Oda, Takeshi; Nagao, Tohru; Kashikawa, Nobunari; Murayama, Takashi; Ouchi, Masami; Watson, Mike G.
2007-01-01
We present the properties of active galactic nuclei (AGN) selected by optical variability in the Subaru/XMM-Newton Deep Field (SXDF). Based on the locations of variable components and light curves, 211 optically variable AGN were reliably selected. We made three AGN samples; X-ray detected optically non-variable AGN (XA), X-ray detected optically variable AGN (XVA), and X-ray undetected optically variable AGN (VA). In the VA sample, we found a bimodal distribution of the ratio between the var...
Gelman, Andrew; Stern, Hal S; Dunson, David B; Vehtari, Aki; Rubin, Donald B
2013-01-01
FUNDAMENTALS OF BAYESIAN INFERENCEProbability and InferenceSingle-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian ApproachesHierarchical ModelsFUNDAMENTALS OF BAYESIAN DATA ANALYSISModel Checking Evaluating, Comparing, and Expanding ModelsModeling Accounting for Data Collection Decision AnalysisADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional ApproximationsREGRESSION MODELS Introduction to Regression Models Hierarchical Linear
Yuan, Ying; MacKinnon, David P.
2009-01-01
This article proposes Bayesian analysis of mediation effects. Compared to conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian mediation analysis, inference is straightforward and exact, which makes it appealing for studies with small samples. Third, the Bayesian approach is conceptua...
International Nuclear Information System (INIS)
Variability is a property shared by practically all active galactic nuclei (AGNs). This makes variability selection a possible technique for identifying AGNs. Given that variability selection makes no prior assumption about spectral properties, it is a powerful technique for detecting both low-luminosity AGNs in which the host galaxy emission is dominating and AGNs with unusual spectral properties. In this paper, we will discuss and test different statistical methods for the detection of variability in sparsely sampled data that allow full control over the false positive rates. We will apply these methods to the GOODS North and South fields and present a catalog of variable sources in the z band in both GOODS fields. Out of the 11,931 objects checked, we find 155 variable sources at a significance level of 99.9%, corresponding to about 1.3% of all objects. After rejection of stars and supernovae, 139 variability-selected AGNs remain. Their magnitudes reach down as faint as 25.5 mag in z. Spectroscopic redshifts are available for 22 of the variability-selected AGNs, ranging from 0.046 to 3.7. The absolute magnitudes in the rest-frame z band range from ∼-18 to -24, reaching substantially fainter than the typical luminosities probed by traditional X-ray and spectroscopic AGN selection in these fields. Therefore, this is a powerful technique for future exploration of the evolution of the faint end of the AGN luminosity function up to high redshifts.
Bayesian Games with Intentions
Bjorndahl, Adam; Halpern, Joseph Y.; Pass, Rafael
2016-01-01
We show that standard Bayesian games cannot represent the full spectrum of belief-dependent preferences. However, by introducing a fundamental distinction between intended and actual strategies, we remove this limitation. We define Bayesian games with intentions, generalizing both Bayesian games and psychological games, and prove that Nash equilibria in psychological games correspond to a special class of equilibria as defined in our setting.
Calibration Variable Selection and Natural Zero Determination for Semispan and Canard Balances
Ulbrich, Norbert M.
2013-01-01
Independent calibration variables for the characterization of semispan and canard wind tunnel balances are discussed. It is shown that the variable selection for a semispan balance is determined by the location of the resultant normal and axial forces that act on the balance. These two forces are the first and second calibration variable. The pitching moment becomes the third calibration variable after the normal and axial forces are shifted to the pitch axis of the balance. Two geometric distances, i.e., the rolling and yawing moment arms, are the fourth and fifth calibration variable. They are traditionally substituted by corresponding moments to simplify the use of calibration data during a wind tunnel test. A canard balance is related to a semispan balance. It also only measures loads on one half of a lifting surface. However, the axial force and yawing moment are of no interest to users of a canard balance. Therefore, its calibration variable set is reduced to the normal force, pitching moment, and rolling moment. The combined load diagrams of the rolling and yawing moment for a semispan balance are discussed. They may be used to illustrate connections between the wind tunnel model geometry, the test section size, and the calibration load schedule. Then, methods are reviewed that may be used to obtain the natural zeros of a semispan or canard balance. In addition, characteristics of three semispan balance calibration rigs are discussed. Finally, basic requirements for a full characterization of a semispan balance are reviewed.
Bayesian segmentation of hyperspectral images
Mohammadpour, Adel; Mohammad-Djafari, Ali
2007-01-01
In this paper we consider the problem of joint segmentation of hyperspectral images in the Bayesian framework. The proposed approach is based on a Hidden Markov Modeling (HMM) of the images with common segmentation, or equivalently with common hidden classification label variables which is modeled by a Potts Markov Random Field. We introduce an appropriate Markov Chain Monte Carlo (MCMC) algorithm to implement the method and show some simulation results.
Bayesian segmentation of hyperspectral images
Mohammadpour, Adel; Féron, Olivier; Mohammad-Djafari, Ali
2004-11-01
In this paper we consider the problem of joint segmentation of hyperspectral images in the Bayesian framework. The proposed approach is based on a Hidden Markov Modeling (HMM) of the images with common segmentation, or equivalently with common hidden classification label variables which is modeled by a Potts Markov Random Field. We introduce an appropriate Markov Chain Monte Carlo (MCMC) algorithm to implement the method and show some simulation results.
Bayesian Network--Response Regression
WANG, LU; Durante, Daniele; Dunson, David B.
2016-01-01
There is an increasing interest in learning how human brain networks vary with continuous traits (e.g., personality, cognitive abilities, neurological disorders), but flexible procedures to accomplish this goal are limited. We develop a Bayesian semiparametric model, which combines low-rank factorizations and Gaussian process priors to allow flexible shifts of the conditional expectation for a network-valued random variable across the feature space, while including subject-specific random eff...
Inference in hybrid Bayesian networks
DEFF Research Database (Denmark)
Lanseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael;
2009-01-01
and reliability block diagrams). However, limitations in the BNs' calculation engine have prevented BNs from becoming equally popular for domains containing mixtures of both discrete and continuous variables (so-called hybrid domains). In this paper we focus on these difficulties, and summarize some of the last...... decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability....
Genuer, Robin; Toussile, Wilson
2011-01-01
Malaria control strategies aiming at reducing disease transmission intensity may impact both oocyst intensity and infection prevalence in the mosquito vector. Thus far, mathematical models failed to identify a clear relationship between Plasmodium falciparum gametocytes and their infectiousness to mosquitoes. Natural isolates of gametocytes are genetically diverse and biologically complex. Infectiousness to mosquitoes relies on multiple parameters such as density, sex-ratio, maturity, parasite genotypes and host immune factors. In this article, we investigated how density and genetic diversity of gametocytes impact on the success of transmission in the mosquito vector. We analyzed data for which the number of covariates plus attendant interactions is at least of order of the sample size, precluding usage of classical models such as general linear models. We then considered the variable importance from random forests to address the problem of selecting the most influent variables. The selected covariates were ...
Directory of Open Access Journals (Sweden)
Guillermo A. Cañadas
2010-12-01
Full Text Available Burnout syndrome has a high incidence among professional healthcare and social workers. This leads to deterioration in the quality of their working life and affects their health, the organization where they work and, via their clients, society itself. Given these serious effects, many studies have investigated this construct and identified groups at increased risk of the syndrome. The present work has 2 main aims: to compare burnout levels in potential risk groups among professional healthcare workers; and to compare them using standard and Bayesian statistical analysis. The sample consisted of 108 psycho-social care workers based at 2 centers run by the Granada Council in Spain. All participants, anonymously and individually, filled in a booklet that included questions on personal information and the Spanish adaptation of the Maslach Burnout Inventory (MBI. Standard and Bayesian analysis of variance were used to identify the risk factors associated with different levels of burnout. It was found that the information provided by the Bayesian procedure complemented that provided by the standard procedure.
Variability-selected active galactic nuclei from supernova search in the Chandra deep field south
Trevese, D.; Boutsia, K.; Vagnetti, F.; Cappellaro, E.; Puccetti, S.
2008-01-01
Variability is a property shared by virtually all active galactic nuclei (AGNs), and was adopted as a criterion for their selection using data from multi epoch surveys. Low Luminosity AGNs (LLAGNs) are contaminated by the light of their host galaxies, and cannot therefore be detected by the usual colour techniques. For this reason, their evolution in cosmic time is poorly known. Consistency with the evolution derived from X-ray detected samples has not been clearly established so far, also be...
Floral variability in selected species of the genus Coelogyne Lindl., Orchidaceae
Directory of Open Access Journals (Sweden)
Romuald Kosina
2015-05-01
Full Text Available Correlations of the lip characters in the Coelogyne flower proved a synchronised development of this organ. The lip is a very interspecifically variable organ. A numerical taxonomy approach permitted to select in an ordination space some extreme species, based on a description of lip morphology, Coelogyne salmonicolor versus C. fuliginosa and C. quinquelamellata versus C. nitida. A hybrid C. lawrenceana × mooreana appeared to be close to its paternal species.
Floral variability in selected species of the genus Coelogyne Lindl., Orchidaceae
Romuald Kosina; Marta Szkudlarek
2015-01-01
Correlations of the lip characters in the Coelogyne flower proved a synchronised development of this organ. The lip is a very interspecifically variable organ. A numerical taxonomy approach permitted to select in an ordination space some extreme species, based on a description of lip morphology, Coelogyne salmonicolor versus C. fuliginosa and C. quinquelamellata versus C. nitida. A hybrid C. lawrenceana × mooreana appeared to be close to its paternal species.
Alpkaya, Ufuk
2013-01-01
The purpose of this study is to determine the influence of gymnastics training integrated with physical education courses on selected motor performance variables in seven year old girls. Subjects were divided into two groups: (1) control group (N=15, X=7.56 plus or minus 0.46 year old); (2) gymnastics group (N=16, X=7.60 plus or minus 0.50 year…
Variable Selection for Generalized Linear Mixed Models by L1-Penalized Estimation
Groll, Andreas
2011-01-01
Generalized linear mixed models are a widely used tool for modeling longitudinal data. However, their use is typically restricted to few covariates, because the presence of many predictors yields unstable estimates. The presented approach to the fitting of generalized linear mixed models includes an L1-penalty term that enforces variable selection and shrinkage simultaneously. A gradient ascent algorithm is proposed that allows to maximize the penalized loglikelihood yielding models with r...
Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy
Foster, Dean P.; Stine, Robert A.
2001-01-01
We develop and illustrate a methodology for fitting models to large, complex data sets. The methodology uses standard regression techniques that make few assumptions about the structure of the data. We accomplish this with three small modifications to stepwise regression: (1) We add interactions to capture non-linearities and indicator functions to capture missing values; (2) We exploit modern decision theoretic variable selection criteria; and (3) We estimate standard error using a conservat...
Li, Caiyan; Li, Hongzhe
2010-01-01
Graphs and networks are common ways of depicting biological information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein--protein interaction networks. This kind of a priori use of graphs is a useful supplement to the standard numerical data such as microarray gene expression data. In this paper we consider the problem of regression analysis and variable selection when the covariates are linked on a graph. ...
Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.
2013-01-01
Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933
Directory of Open Access Journals (Sweden)
C. Fernandez-Lozano
2013-01-01
Full Text Available Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM. Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA, the most representative variables for a specific classification problem can be selected.
C. Fernandez-Lozano; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.
2013-01-01
Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected.
The problem of variable selection for financial distress: applying GRASP methaeuristics
LAURA MARTA NUÑEZ
2004-01-01
We use the GRASP procedure to select a subset of financial ratios that are then used to estimate a model of logistic regression to anticipate financial distress on a sample of Spanish firms. The algorithm we suggest is designed "ad-hoc" for this type of variables. Reducing dimensionality has several advantages such as reducing the cost of data acquisition, better understanding of the final classification model, and increasing the efficiency and the efficacy. The application of the GRASP proce...
Quantum Inference on Bayesian Networks
Yoder, Theodore; Low, Guang Hao; Chuang, Isaac
2014-03-01
Because quantum physics is naturally probabilistic, it seems reasonable to expect physical systems to describe probabilities and their evolution in a natural fashion. Here, we use quantum computation to speedup sampling from a graphical probability model, the Bayesian network. A specialization of this sampling problem is approximate Bayesian inference, where the distribution on query variables is sampled given the values e of evidence variables. Inference is a key part of modern machine learning and artificial intelligence tasks, but is known to be NP-hard. Classically, a single unbiased sample is obtained from a Bayesian network on n variables with at most m parents per node in time (nmP(e) - 1 / 2) , depending critically on P(e) , the probability the evidence might occur in the first place. However, by implementing a quantum version of rejection sampling, we obtain a square-root speedup, taking (n2m P(e) -1/2) time per sample. The speedup is the result of amplitude amplification, which is proving to be broadly applicable in sampling and machine learning tasks. In particular, we provide an explicit and efficient circuit construction that implements the algorithm without the need for oracle access.
Pérez, Noel; Guevara, Miguel A.; Silva, Augusto
2013-02-01
This work addresses the issue of variable selection within the context of breast cancer classification with mammography. A comprehensive repository of feature vectors was used including a hybrid subset gathering image-based and clinical features. It aimed to gather experimental evidence of variable selection in terms of cardinality, type and find a classification scheme that provides the best performance over the Area Under Receiver Operating Characteristics Curve (AUC) scores using the ranked features subset. We evaluated and classified a total of 300 subsets of features formed by the application of Chi-Square Discretization, Information-Gain, One-Rule and RELIEF methods in association with Feed-Forward Backpropagation Neural Network (FFBP), Support Vector Machine (SVM) and Decision Tree J48 (DTJ48) Machine Learning Algorithms (MLA) for a comparative performance evaluation based on AUC scores. A variable selection analysis was performed for Single-View Ranking and Multi-View Ranking groups of features. Features subsets representing Microcalcifications (MCs), Masses and both MCs and Masses lesions achieved AUC scores of 0.91, 0.954 and 0.934 respectively. Experimental evidence demonstrated that classification performance was improved by combining image-based and clinical features. The most important clinical and image-based features were StromaDistortion and Circularity respectively. Other less important but worth to use due to its consistency were Contrast, Perimeter, Microcalcification, Correlation and Elongation.
DEFF Research Database (Denmark)
Mauricio Iglesias, Miguel; Valverde Perez, Borja; Sin, Gürkan
Selecting the right controlled variables in a bioprocess is challenging since the objectives of the process (yields, product or substrate concentration) are difficult to relate with a given actuator. We apply here process control tools that can be used to assist in the selection of controlled...... variables to the case of the SHARON-Anammox process for autotrophic nitrogen removal....
The Use of Variable Q1 Isolation Windows Improves Selectivity in LC-SWATH-MS Acquisition.
Zhang, Ying; Bilbao, Aivett; Bruderer, Tobias; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard; Varesio, Emmanuel
2015-10-01
As tryptic peptides and metabolites are not equally distributed along the mass range, the probability of cross fragment ion interference is higher in certain windows when fixed Q1 SWATH windows are applied. We evaluated the benefits of utilizing variable Q1 SWATH windows with regards to selectivity improvement. Variable windows based on equalizing the distribution of either the precursor ion population (PIP) or the total ion current (TIC) within each window were generated by an in-house software, swathTUNER. These two variable Q1 SWATH window strategies outperformed, with respect to quantification and identification, the basic approach using a fixed window width (FIX) for proteomic profiling of human monocyte-derived dendritic cells (MDDCs). Thus, 13.8 and 8.4% additional peptide precursors, which resulted in 13.1 and 10.0% more proteins, were confidently identified by SWATH using the strategy PIP and TIC, respectively, in the MDDC proteomic sample. On the basis of the spectral library purity score, some improvement warranted by variable Q1 windows was also observed, albeit to a lesser extent, in the metabolomic profiling of human urine. We show that the novel concept of "scheduled SWATH" proposed here, which incorporates (i) variable isolation windows and (ii) precursor retention time segmentation further improves both peptide and metabolite identifications. PMID:26302369
Selection for increased variability of seed protein and lysine contents in wheat
International Nuclear Information System (INIS)
Two projects within the Seibersdorf Laboratory programme for increased seed protein and lysine contents in wheat are described. The first experiment was started in 1971 with Svenno wheat. Selections for an increased protein content, measured by the modified Udy procedure developed at Seibersdorf, have yielded populations deviating from the unselected control means. However, due to large variation in the control population, the selection responses after mutagen treatment did not exceed that in the control. In the second experiment with line Mex 22A and initiated in 1973 it was possible to increase the variability in mutagenized populations when compared with selected control populations. Selections based upon DBC protein and % N have resulted in increased protein content in the selected populations and lines. Attempts to select for increased lysine content of seed protein based upon DBC value or upon ratios of DBC/N, DBC/proline or N/proline have not so far been successful. Further attempts are being based upon video-densitometry assays of lysine as a proportion of total amino acids. (author)
Social variables exert selective pressures in the evolution and form of primate mimetic musculature.
Burrows, Anne M; Li, Ly; Waller, Bridget M; Micheletta, Jerome
2016-04-01
Mammals use their faces in social interactions more so than any other vertebrates. Primates are an extreme among most mammals in their complex, direct, lifelong social interactions and their frequent use of facial displays is a means of proximate visual communication with conspecifics. The available repertoire of facial displays is primarily controlled by mimetic musculature, the muscles that move the face. The form of these muscles is, in turn, limited by and influenced by phylogenetic inertia but here we use examples, both morphological and physiological, to illustrate the influence that social variables may exert on the evolution and form of mimetic musculature among primates. Ecomorphology is concerned with the adaptive responses of morphology to various ecological variables such as diet, foliage density, predation pressures, and time of day activity. We present evidence that social variables also exert selective pressures on morphology, specifically using mimetic muscles among primates as an example. Social variables include group size, dominance 'style', and mating systems. We present two case studies to illustrate the potential influence of social behavior on adaptive morphology of mimetic musculature in primates: (1) gross morphology of the mimetic muscles around the external ear in closely related species of macaque (Macaca mulatta and Macaca nigra) characterized by varying dominance styles and (2) comparative physiology of the orbicularis oris muscle among select ape species. This muscle is used in both facial displays/expressions and in vocalizations/human speech. We present qualitative observations of myosin fiber-type distribution in this muscle of siamang (Symphalangus syndactylus), chimpanzee (Pan troglodytes), and human to demonstrate the potential influence of visual and auditory communication on muscle physiology. In sum, ecomorphologists should be aware of social selective pressures as well as ecological ones, and that observed morphology might
Richards, Joseph W.; Starr, Dan L.; Brink, Henrik; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; James, J. Berian; Long, James P.; Rice, John
2012-01-01
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because (1) standard assumptions for machine-learned model selection procedures break down and (2) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting, co-training, and active learning (AL). We argue that AL—where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up—is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and Optical Gravitational Lensing Experiment, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a Web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply AL to classify variable stars in the All Sky Automated Survey, finding dramatic improvement in our agreement with the ASAS Catalog of Variable Stars, from 65.5% to 79.5%, and a significant increase in the classifier's average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.
International Nuclear Information System (INIS)
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because (1) standard assumptions for machine-learned model selection procedures break down and (2) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting, co-training, and active learning (AL). We argue that AL—where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up—is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and Optical Gravitational Lensing Experiment, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a Web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply AL to classify variable stars in the All Sky Automated Survey, finding dramatic improvement in our agreement with the ASAS Catalog of Variable Stars, from 65.5% to 79.5%, and a significant increase in the classifier's average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.
Bayesian estimation of turbulent motion
Héas, P.; Herzet, C.; Mémin, E.; Heitz, D.; P. D. Mininni
2013-01-01
International audience Based on physical laws describing the multi-scale structure of turbulent flows, this article proposes a regularizer for fluid motion estimation from an image sequence. Regularization is achieved by imposing some scale invariance property between histograms of motion increments computed at different scales. By reformulating this problem from a Bayesian perspective, an algorithm is proposed to jointly estimate motion, regularization hyper-parameters, and to select the ...
Space Shuttle RTOS Bayesian Network
Morris, A. Terry; Beling, Peter A.
2001-01-01
With shrinking budgets and the requirements to increase reliability and operational life of the existing orbiter fleet, NASA has proposed various upgrades for the Space Shuttle that are consistent with national space policy. The cockpit avionics upgrade (CAU), a high priority item, has been selected as the next major upgrade. The primary functions of cockpit avionics include flight control, guidance and navigation, communication, and orbiter landing support. Secondary functions include the provision of operational services for non-avionics systems such as data handling for the payloads and caution and warning alerts to the crew. Recently, a process to selection the optimal commercial-off-the-shelf (COTS) real-time operating system (RTOS) for the CAU was conducted by United Space Alliance (USA) Corporation, which is a joint venture between Boeing and Lockheed Martin, the prime contractor for space shuttle operations. In order to independently assess the RTOS selection, NASA has used the Bayesian network-based scoring methodology described in this paper. Our two-stage methodology addresses the issue of RTOS acceptability by incorporating functional, performance and non-functional software measures related to reliability, interoperability, certifiability, efficiency, correctness, business, legal, product history, cost and life cycle. The first stage of the methodology involves obtaining scores for the various measures using a Bayesian network. The Bayesian network incorporates the causal relationships between the various and often competing measures of interest while also assisting the inherently complex decision analysis process with its ability to reason under uncertainty. The structure and selection of prior probabilities for the network is extracted from experts in the field of real-time operating systems. Scores for the various measures are computed using Bayesian probability. In the second stage, multi-criteria trade-off analyses are performed between the scores
Variable selection for distribution-free models for longitudinal zero-inflated count responses.
Chen, Tian; Wu, Pan; Tang, Wan; Zhang, Hui; Feng, Changyong; Kowalski, Jeanne; Tu, Xin M
2016-07-20
Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26844819
Effect of recurrent selection on the variability of the UENF-14 popcorn population
Directory of Open Access Journals (Sweden)
Rodrigo Moreira Ribeiro
2016-07-01
Full Text Available This study aimed to evaluate the effect of recurrent selection on the genetic variability of UENF-14 population after six selections. Two hundred and ten half-sib families were evaluated in two environments in the state of Rio de Janeiro, using incomplete randomized blocks design with treatments arranged in replication within “Sets”. There was significant effect for Families within the “Set” (F/S, proving that there is enough genetic variability to be exploited in the popcorn breeding program of UENF. The significance for the source of variation Environment (E shows that the environments were distinct enough to promote differences between the evaluated characteristics. It was found that for both characteristics of greatest interest, GY and PE, the magnitude of the additive variance remains with close values in advanced cycles of UENF-14 population, indicating that variability remains, with no evidence of decreases in advanced cycles. This is concluded by the longevity of UENF breeding program.
DEFF Research Database (Denmark)
Dashab, Golam Reza; Kadri, Naveen Kumar; Mahdi Shariati, Mohammad;
2012-01-01
) Mixed model analysis (MMA), 2) Random haplotype model (RHM), 3) Genealogy-based mixed model (GENMIX), and 4) Bayesian variable selection (BVS). The data consisted of phenotypes of 2000 animals from 20 sire families and were genotyped with 9990 SNPs on five chromosomes. Results: Out of the eight...
Ngeow, Chow Choong; Chang, D.; Pan, K.; Chung, T.; Koptelova, E.; TAOS Collaboration
2010-05-01
The Taiwan-American Occultation Survey (TAOS) project is aimed to find Kuiper Belt Objects (KBO) and measure their size distribution using the occultation technique. The TAOS project employed four 20-inch wide-field (F/1.9, 3 degree-squared FOV) telescopes, equipped with a 2K x 2K CCD, to simultaneously monitor the same patch of the sky. All four TAOS telescopes, which can be operated automatically, were located at the Lulin Observatory in central Taiwan. The TAOS project has been continuously taking data since 2005. In addition of finding KBO, the dense sampling strategy employed in TAOS can also be used to find variable stars. We report the search of variable stars from selected TAOS fields at this Meeting. For example, we found about 50 candidate variables (out of 2600 stars) in TAOS 60 Field (RA: 04h48m00s, DEC: +20d46m20s, with limiting magnitudes about15 mag. at S/N=10), including three previously known variables, using sigma deviation and Stetson's J-index methods. The available data in this field spanned about 150 days in time. However, TAOS observations were conducted using a customized filter. We therefore initiated a followup program to observe and construct the light curves of these candidate variables in the BVRI bands, using the Lulin's One-Meter telescope, Lulin's SLT telescope (16-inch aperture) and 32-inch telescope from the Tenagra II Observatory. The multi-band optical followup observation will help in improving the classification of these candidates, estimate their BVRI mean magnitudes, colors as well as extinction. This will enable a wide range of research in astrophysics for these variables. We also present our preliminary results based on the first season of the followup observations. CCN acknowledges the support from NSC 98-2112-M-008-013-MY3.
A Bayesian approach to linear regression in astronomy
Sereno, Mauro
2015-01-01
Linear regression is common in astronomical analyses. I discuss a Bayesian hierarchical modeling of data with heteroscedastic and possibly correlated measurement errors and intrinsic scatter. The method fully accounts for time evolution. The slope, the normalization, and the intrinsic scatter of the relation can evolve with the redshift. The intrinsic distribution of the independent variable is approximated using a mixture of Gaussian distributions whose means and standard deviations depend on time. The method can address scatter in the measured independent variable (a kind of Eddington bias), selection effects in the response variable (Malmquist bias), and departure from linearity in form of a knee. I tested the method with toy models and simulations and quantified the effect of biases and inefficient modeling. The R-package LIRA (LInear Regression in Astronomy) is made available to perform the regression.
Nikoloulopoulos, Aristidis K
2016-06-30
The method of generalized estimating equations (GEE) is popular in the biostatistics literature for analyzing longitudinal binary and count data. It assumes a generalized linear model for the outcome variable, and a working correlation among repeated measurements. In this paper, we introduce a viable competitor: the weighted scores method for generalized linear model margins. We weight the univariate score equations using a working discretized multivariate normal model that is a proper multivariate model. Because the weighted scores method is a parametric method based on likelihood, we propose composite likelihood information criteria as an intermediate step for model selection. The same criteria can be used for both correlation structure and variable selection. Simulations studies and the application example show that our method outperforms other existing model selection methods in GEE. From the example, it can be seen that our methods not only improve on GEE in terms of interpretability and efficiency but also can change the inferential conclusions with respect to GEE. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26822854
Effect of Selected Organic Acids on Cadmium Sorption by Variable-and Permanent-Charge Soils
Institute of Scientific and Technical Information of China (English)
HU Hong-Qing; LIU Hua-Liang; HE Ji-Zheng; HUANG Qiao-Yun
2007-01-01
Batch equilibrium experiments were conducted to investigate cadmium (Cd) sorption by two permanent-charge soils, a yellow-cinnamon soil and a yellow-brown soil, and two variable-charge soils, a red soil and a latosol, with addition of selected organic acids (acetate, tartrate, and citrate). Results showed that with an increase in acetate concentrations from 0 to 3.0 mmol L-1, Cd sorption percentage by the yellow-cinnamon soil, the yellow-brown soil, and the latosol decreased. The sorption percentage of Cd by the yellow-cinnamon soil and generally the yellow-brown soil (permanent-charge soils)decreased with an increase in tartrate concentration, but increased at low tartrate concentrations for the red soil and the latosol. Curves of percentage of Cd sorption for citrate were similar to those for tartrate. For the variable-charge soils with tartrate and citrate, there were obvious peaks in Cd sorption percentage. These peaks, where organic acids had maximum influence, changed with soil type, and were at a higher organic acid concentration for the variable-charge soils than for the permanent charge soils. Addition of cadmium after tartrate adsorption resulted in higher sorption increase for the variable-charge soils than permanent-charge soils. When tartrate and Cd solution were added together, sorption of Cd decreased with tartrate concentration for the yellow-brown soil, but increased at low tartrate concentrations and then decreased with tartrate concentration for the red soil and the latosol.
Variable selection based on entropic criterion and its application to the debris-flow triggering
Chen, C; Tseng, C Y; Chen, Chien-chih; Dong, Jia-Jyun; Tseng, Chih-Yuan
2006-01-01
We propose a new data analyzing scheme, the method of minimum entropy analysis (MEA), in this paper. New MEA provides a quantitative criterion to select relevant variables for modeling the physical system interested. Such method can be easily extended to various geophysical/geological data analysis, where many relevant or irrelevant available measurements may obscure the understanding of the highly complicated physical system like the triggering of debris-flows. After demonstrating and testing the MEA method, we apply this method to a dataset of debris-flow occurrences in Taiwan and successfully find out three relevant variables, i.e. the hydrological form factor, numbers and areas of landslides, to the triggering of observed debris-flow events due to the 1996 Typhoon Herb.
Implementation of Phonetic Context Variable Length Unit Selection Module for Malay Text to Speech
Directory of Open Access Journals (Sweden)
Tian-Swee Tan
2008-01-01
Full Text Available Problem statement: The main problem with current Malay Text-To-Speech (MTTS synthesis system is the poor quality of the generated speech sound due to the inability of traditional TTS system to provide multiple choices of unit for generating more accurate synthesized speech. Approach: This study proposes a phonetic context variable length unit selection MTTS system that is capable of providing more natural and accurate unit selection for synthesized speech. It implemented a phonetic context algorithm for unit selection for MTTS. The unit selection method (without phonetic context may encounter the problem of selecting the speech unit from different sources and affect the quality of concatenation. This study proposes the design of speech corpus and unit selection method according to phonetic context so that it can select a string of continuous phoneme from same source instead of individual phoneme from different sources. This can further reduce the concatenation point and increase the quality of concatenation. The speech corpus was transcribed according to phonetic context to preserve the phonetic information. This method utilizes word base concatenation method. Firstly it will search through the speech corpus for the target word, if the target is found; it will be used for concatenation. If the word does not exist, then it will construct the words from phoneme sequence. Results: This system had been tested with 40 participants in Mean Opinion Score (MOS listening test with the average rates for naturalness, pronunciation and intelligibility are 3.9, 4.1 and 3.9. Conclusion/Recommendation: Through this study, a very first version of Corpus-based MTTS has been designed; it has improved the naturalness, pronunciation and intelligibility of synthetic speech. But it still has some lacking that need to be perfected such as the prosody module to support the phrasing analysis and intonation of input text to match with the waveform modifier.
An Approach with Support Vector Machine using Variable Features Selection on Breast Cancer Prognosis
Directory of Open Access Journals (Sweden)
Sandeep Chaurasia
2013-09-01
Full Text Available Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of machine learning. In this paper we have used an approach by using support vector machine classifier to construct a model that is useful for the breast cancer survivability prediction. We have used both 5 cross and 10 cross validation of variable selection on input feature vectors and the performance measurement through bio-learning class performance while measuring AUC, specificity and sensitivity. The performance of the SVM is much better than the other machine learning classifier.
Soft Sensing Modelling Based on Optimal Selection of Secondary Variables and Its Application
Institute of Scientific and Technical Information of China (English)
Qi Li; Cheng Shao
2009-01-01
The composition of the distillation column is a very important quality value in refineries, unfortunately, few hardware sensors are available on-line to measure the distillation compositions. In this paper, a novel method using sensitivity matrix analysis and kernel ridge regression (KRR) to implement on-line soft sensing of distillation compositions is proposed. In this approach, the sensitivity matrix analysis is presented to select the most suitable secondary variables to be used as the soft sensor's input. The KRR is used to build the composition soft sensor. Application to a simulated distillation column demonstrates the effectiveness of the method.
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Gezari, S.; Martin, D. C.; Forster, K.; Neill, J. D.; Huber, M.; Heckman, T.; Bianchi, L.; Morrissey, P.; Neff, S. G.; Seibert, M.; Schiminovich, D.; Wyder, T. K.; Burgett, W. S.; Chambers, K. C.; Kaiser, N.; Magnier, E. A.; Price, P. A.; Tonry, J. L.
2013-01-01
We present the selection and classification of over a thousand ultraviolet (UV) variable sources discovered in approximately 40 deg(exp 2) of GALEX Time Domain Survey (TDS) NUV images observed with a cadence of 2 days and a baseline of observations of approximately 3 years. The GALEX TDS fields were designed to be in spatial and temporal coordination with the Pan-STARRS1 Medium Deep Survey, which provides deep optical imaging and simultaneous optical transient detections via image differencing. We characterize the GALEX photometric errors empirically as a function of mean magnitude, and select sources that vary at the 5 sigma level in at least one epoch. We measure the statistical properties of the UV variability, including the structure function on timescales of days and years. We report classifications for the GALEX TDS sample using a combination of optical host colors and morphology, UV light curve characteristics, and matches to archival X-ray, and spectroscopy catalogs. We classify 62% of the sources as active galaxies (358 quasars and 305 active galactic nuclei), and 10% as variable stars (including 37 RR Lyrae, 53 M dwarf flare stars, and 2 cataclysmic variables). We detect a large-amplitude tail in the UV variability distribution for M-dwarf flare stars and RR Lyrae, reaching up to absolute value(?m) = 4.6 mag and 2.9 mag, respectively. The mean amplitude of the structure function for quasars on year timescales is five times larger than observed at optical wavelengths. The remaining unclassified sources include UV-bright extragalactic transients, two of which have been spectroscopically confirmed to be a young core-collapse supernova and a flare from the tidal disruption of a star by dormant supermassive black hole. We calculate a surface density for variable sources in the UV with NUV less than 23 mag and absolute value(?m) greater than 0.2 mag of approximately 8.0, 7.7, and 1.8 deg(exp -2) for quasars, active galactic nuclei, and RR Lyrae stars
Energy Technology Data Exchange (ETDEWEB)
Gezari, S. [Department of Astronomy, University of Maryland, College Park, MD 20742-2421 (United States); Martin, D. C.; Forster, K.; Neill, J. D.; Morrissey, P.; Wyder, T. K. [Astronomy Department, California Institute of Technology, MC 249-17, 1200 East California Boulevard, Pasadena, CA 91125 (United States); Huber, M.; Burgett, W. S.; Chambers, K. C.; Kaiser, N.; Magnier, E. A.; Tonry, J. L. [Institute for Astronomy, University of Hawaii, 2680 Woodlawn Drive, Honolulu, HI 96822 (United States); Heckman, T.; Bianchi, L. [Department of Physics and Astronomy, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218 (United States); Neff, S. G. [Laboratory for Astronomy and Solar Physics, NASA Goddard Space Flight Center, Greenbelt, MD 20771 (United States); Seibert, M. [Observatories of the Carnegie Institute of Washington, Pasadena, CA 90095 (United States); Schiminovich, D. [Department of Astronomy, Columbia University, New York, NY 10027 (United States); Price, P. A., E-mail: suvi@astro.umd.edu [Department of Astrophysical Sciences, Princeton University, Princeton, NJ 08544 (United States)
2013-03-20
We present the selection and classification of over a thousand ultraviolet (UV) variable sources discovered in {approx}40 deg{sup 2} of GALEX Time Domain Survey (TDS) NUV images observed with a cadence of 2 days and a baseline of observations of {approx}3 years. The GALEX TDS fields were designed to be in spatial and temporal coordination with the Pan-STARRS1 Medium Deep Survey, which provides deep optical imaging and simultaneous optical transient detections via image differencing. We characterize the GALEX photometric errors empirically as a function of mean magnitude, and select sources that vary at the 5{sigma} level in at least one epoch. We measure the statistical properties of the UV variability, including the structure function on timescales of days and years. We report classifications for the GALEX TDS sample using a combination of optical host colors and morphology, UV light curve characteristics, and matches to archival X-ray, and spectroscopy catalogs. We classify 62% of the sources as active galaxies (358 quasars and 305 active galactic nuclei), and 10% as variable stars (including 37 RR Lyrae, 53 M dwarf flare stars, and 2 cataclysmic variables). We detect a large-amplitude tail in the UV variability distribution for M-dwarf flare stars and RR Lyrae, reaching up to |{Delta}m| = 4.6 mag and 2.9 mag, respectively. The mean amplitude of the structure function for quasars on year timescales is five times larger than observed at optical wavelengths. The remaining unclassified sources include UV-bright extragalactic transients, two of which have been spectroscopically confirmed to be a young core-collapse supernova and a flare from the tidal disruption of a star by dormant supermassive black hole. We calculate a surface density for variable sources in the UV with NUV < 23 mag and |{Delta}m| > 0.2 mag of {approx}8.0, 7.7, and 1.8 deg{sup -2} for quasars, active galactic nuclei, and RR Lyrae stars, respectively. We also calculate a surface density rate in the
Gezari, S.; Martin, D. C.; Forster, K.; Neill, J. D.; Huber, M.; Heckman, T.; Bianchi, L.; Morrissey, P.; Neff, S. G.; Seibert, M.; Schiminovich, D.; Wyder, T. K.; Burgett, W. S.; Chambers, K. C.; Kaiser, N.; Magnier, E. A.; Price, P. A.; Tonry, J. L.
2013-03-01
We present the selection and classification of over a thousand ultraviolet (UV) variable sources discovered in ~40 deg2 of GALEX Time Domain Survey (TDS) NUV images observed with a cadence of 2 days and a baseline of observations of ~3 years. The GALEX TDS fields were designed to be in spatial and temporal coordination with the Pan-STARRS1 Medium Deep Survey, which provides deep optical imaging and simultaneous optical transient detections via image differencing. We characterize the GALEX photometric errors empirically as a function of mean magnitude, and select sources that vary at the 5σ level in at least one epoch. We measure the statistical properties of the UV variability, including the structure function on timescales of days and years. We report classifications for the GALEX TDS sample using a combination of optical host colors and morphology, UV light curve characteristics, and matches to archival X-ray, and spectroscopy catalogs. We classify 62% of the sources as active galaxies (358 quasars and 305 active galactic nuclei), and 10% as variable stars (including 37 RR Lyrae, 53 M dwarf flare stars, and 2 cataclysmic variables). We detect a large-amplitude tail in the UV variability distribution for M-dwarf flare stars and RR Lyrae, reaching up to |Δm| = 4.6 mag and 2.9 mag, respectively. The mean amplitude of the structure function for quasars on year timescales is five times larger than observed at optical wavelengths. The remaining unclassified sources include UV-bright extragalactic transients, two of which have been spectroscopically confirmed to be a young core-collapse supernova and a flare from the tidal disruption of a star by dormant supermassive black hole. We calculate a surface density for variable sources in the UV with NUV 0.2 mag of ~8.0, 7.7, and 1.8 deg-2 for quasars, active galactic nuclei, and RR Lyrae stars, respectively. We also calculate a surface density rate in the UV for transient sources, using the effective survey time at the
International Nuclear Information System (INIS)
We present the selection and classification of over a thousand ultraviolet (UV) variable sources discovered in ∼40 deg2 of GALEX Time Domain Survey (TDS) NUV images observed with a cadence of 2 days and a baseline of observations of ∼3 years. The GALEX TDS fields were designed to be in spatial and temporal coordination with the Pan-STARRS1 Medium Deep Survey, which provides deep optical imaging and simultaneous optical transient detections via image differencing. We characterize the GALEX photometric errors empirically as a function of mean magnitude, and select sources that vary at the 5σ level in at least one epoch. We measure the statistical properties of the UV variability, including the structure function on timescales of days and years. We report classifications for the GALEX TDS sample using a combination of optical host colors and morphology, UV light curve characteristics, and matches to archival X-ray, and spectroscopy catalogs. We classify 62% of the sources as active galaxies (358 quasars and 305 active galactic nuclei), and 10% as variable stars (including 37 RR Lyrae, 53 M dwarf flare stars, and 2 cataclysmic variables). We detect a large-amplitude tail in the UV variability distribution for M-dwarf flare stars and RR Lyrae, reaching up to |Δm| = 4.6 mag and 2.9 mag, respectively. The mean amplitude of the structure function for quasars on year timescales is five times larger than observed at optical wavelengths. The remaining unclassified sources include UV-bright extragalactic transients, two of which have been spectroscopically confirmed to be a young core-collapse supernova and a flare from the tidal disruption of a star by dormant supermassive black hole. We calculate a surface density for variable sources in the UV with NUV 0.2 mag of ∼8.0, 7.7, and 1.8 deg–2 for quasars, active galactic nuclei, and RR Lyrae stars, respectively. We also calculate a surface density rate in the UV for transient sources, using the effective survey time at
Rubin, Donald B.
1981-01-01
The Bayesian bootstrap is the Bayesian analogue of the bootstrap. Instead of simulating the sampling distribution of a statistic estimating a parameter, the Bayesian bootstrap simulates the posterior distribution of the parameter; operationally and inferentially the methods are quite similar. Because both methods of drawing inferences are based on somewhat peculiar model assumptions and the resulting inferences are generally sensitive to these assumptions, neither method should be applied wit...
Czech Academy of Sciences Publication Activity Database
Pudil, Pavel; Somol, Petr; Střítecký, R.
Lhasa, Tibet, China : California Polytechnic State University, USA, 2007 - (Lee, T.; Liu, Y.; Zhao, X.), s. 1-18 ISSN 1539-2023. - (Series of Information & Management Sciences). [6th Int. Conf. on Information and Management Sciences. Lhasa, Tibet (CN), 01.07.2007-06.07.2007] R&D Projects: GA MŠk 1M0572; GA AV ČR IAA2075302 EU Projects: European Commission(XE) 507752 - MUSCLE Grant ostatní: GA MŠk(CZ) 2C06019; GA ČR(CZ) GA402/03/1310 Institutional research plan: CEZ:AV0Z10750506 Keywords : feature selection * decision making * pattern recognition Subject RIV: BD - Theory of Information http://library.utia.cas.cz/separaty/2007/ro/pudil-methodology of selecting the most informative variables for decision - making .pdf
Palanque-Delabrouille, N; Yèche, Ch; Pâris, I; Petitjean, P; Burtin, E; Dawson, K; McGreer, I; Myers, A D; Rossi, G; Schlegel, D; Schneider, D; Streblyanska, A; Tinker, J
2015-01-01
The SDSS-IV/eBOSS has an extensive quasar program that combines several selection methods. Among these, the photometric variability technique provides highly uniform samples, unaffected by the redshift bias of traditional optical-color selections, when $z= 2.7 - 3.5$ quasars cross the stellar locus or when host galaxy light affects quasar colors at $z 2.2$. Both models are constrained to be continuous at $z=2.2$. They present a flattening of the bright-end slope at large redshift. The LEDE model indicates a reduction of the break density with increasing redshift, but the evolution of the break magnitude depends on the parameterization. The models are in excellent accord, predicting quasar counts that agree within 0.3\\% (resp., 1.1\\%) to $g<22.5$ (resp., $g<23$). The models are also in good agreement over the entire redshift range with models from previous studies.
Bayesian statistics an introduction
Lee, Peter M
2012-01-01
Bayesian Statistics is the school of thought that combines prior beliefs with the likelihood of a hypothesis to arrive at posterior beliefs. The first edition of Peter Lee’s book appeared in 1989, but the subject has moved ever onwards, with increasing emphasis on Monte Carlo based techniques. This new fourth edition looks at recent techniques such as variational methods, Bayesian importance sampling, approximate Bayesian computation and Reversible Jump Markov Chain Monte Carlo (RJMCMC), providing a concise account of the way in which the Bayesian approach to statistics develops as wel
Understanding Computational Bayesian Statistics
Bolstad, William M
2011-01-01
A hands-on introduction to computational statistics from a Bayesian point of view Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistic
Impact of perennial energy crops income variability on the crop selection of risk averse farmers
International Nuclear Information System (INIS)
The UK Government policy is for the area of perennial energy crops in the UK to expand significantly. Farmers need to choose these crops in preference to conventional rotations for this to be achievable. This paper looks at the potential level and variability of perennial energy crop incomes and the relation to incomes from conventional arable crops. Assuming energy crop prices are correlated to oil prices the results suggests that incomes from them are not well correlated to conventional arable crop incomes. A farm scale mathematical programming model is then used to attempt to understand the affect on risk averse farmers crop selection. The inclusion of risk reduces the energy crop price required for the selection of these crops. However yields towards the highest of those predicted in the UK are still required to make them an optimal choice, suggesting only a small area of energy crops within the UK would be expected to be chosen to be grown. This must be regarded as a tentative conclusion, primarily due to high sensitivity found to crop yields, resulting in the proposal for further work to apply the model using spatially disaggregated data. - Highlights: ► Energy crop and conventional crop incomes suggested as uncorrelated. ► Diversification effect of energy crops investigated for a risk averse farmer. ► Energy crops indicated as optimal selection only on highest yielding UK sites. ► Large establishment grant rates to substantially alter crop selections.
Interplay of Periodic, Cyclic and Stochastic Variability in Selected Areas of the H-R Diagram
Sterken, C.
2003-03-01
In 1999, the Ministry of the Flemish Community (Department for Science) allotted a research grant in the framework Bilateral scientific and technological cooperation to a consortium of four astronomical institutes, viz. Vrije Universiteit Brussel (Observational Astronomy), University of Concepcion (Department of Astronomy), Katholieke Universiteit Leuven (Astronomical Institute) and Royal Observatory of Belgium. The project, Long-Term Photometry of Variables, consolidates two decades of scientific collaboration between the Flemish and Chilean partners in the field of long-term monitoring of variable stars (Be stars, cataclysmic variables, S Dor and LBV stars, O stars, WR stars and pulsating main-sequence stars). The allotted grant intended to achieve intensive observing of several key objects selected among some of the most interesting variable stars of the southern hemisphere. The purpose of the present workshop was to comply with the Government's requirement to organise a scientific conference in Flanders in order to debate the scientific outcome of the project along with a broader discussion of related scientific aspects. However, the diversity of research topics dealt with during the course of this project makes it impossible to present a deep view of the field in workshop format. Instead, we decided to concentrate on one common attribute that we all encounter when studying above-mentioned classes of variable stars: the underlying periods and cycles, the methods used to find these cycles, and the associated aspects of interpretation of the oscillations found. The workshop format was a series of invited papers on key topics, supplemented with a number of contributed papers and posters, and ample time for discussions. The meeting consisted of seven sessions, each with a specific theme, viz. history (history of astronomy in Chile, variable star research, etc.), sky surveys, regular variations, non-regular and transient phenomena, chaos (observations), methods and
Approach to the Correlation Discovery of Chinese Linguistic Parameters Based on Bayesian Method
Institute of Scientific and Technical Information of China (English)
WANG Wei(王玮); CAI LianHong(蔡莲红)
2003-01-01
Bayesian approach is an important method in statistics. The Bayesian belief network is a powerful knowledge representation and reasoning tool under the conditions of uncertainty.It is a graphics model that encodes probabilistic relationships among variables of interest. In this paper, an approach to Bayesian network construction is given for discovering the Chinese linguistic parameter relationship in the corpus.
Directory of Open Access Journals (Sweden)
Yong-Hong Zhang
2015-05-01
Full Text Available Assessing the human placental barrier permeability of drugs is very important to guarantee drug safety during pregnancy. Quantitative structure–activity relationship (QSAR method was used as an effective assessing tool for the placental transfer study of drugs, while in vitro human placental perfusion is the most widely used method. In this study, the partial least squares (PLS variable selection and modeling procedure was used to pick out optimal descriptors from a pool of 620 descriptors of 65 compounds and to simultaneously develop a QSAR model between the descriptors and the placental barrier permeability expressed by the clearance indices (CI. The model was subjected to internal validation by cross-validation and y-randomization and to external validation by predicting CI values of 19 compounds. It was shown that the model developed is robust and has a good predictive potential (r2 = 0.9064, RMSE = 0.09, q2 = 0.7323, rp2 = 0.7656, RMSP = 0.14. The mechanistic interpretation of the final model was given by the high variable importance in projection values of descriptors. Using PLS procedure, we can rapidly and effectively select optimal descriptors and thus construct a model with good stability and predictability. This analysis can provide an effective tool for the high-throughput screening of the placental barrier permeability of drugs.
The GALEX Time Domain Survey I. Selection and Classification of Over a Thousand UV Variable Sources
Gezari, S; Forster, K; Neill, J D; Huber, M; Heckman, T; Bianchi, L; Morrissey, P; Neff, S G; Seibert, M; Schiminovich, D; Wyder, T K; Burgett, W S; Chambers, K C; Kaiser, N; Magnier, E A; Price, P A; Tonry, J L
2013-01-01
We present the selection and classification of over a thousand ultraviolet (UV) variable sources discovered in ~ 40 deg^2 of GALEX Time Domain Survey (TDS) NUV images observed with a cadence of 2 days and a baseline of observations of ~ 3 years. The GALEX TDS fields were designed to be in spatial and temporal coordination with the Pan-STARRS1 Medium Deep Survey, which provides deep optical imaging and simultaneous optical transient detections via image differencing. We characterize the GALEX photometric errors empirically as a function of mean magnitude, and select sources that vary at the 5\\sigma level in at least one epoch. We measure the statistical properties of the UV variability, including the structure function on timescales of days and years. We report classifications for the GALEX TDS sample using a combination of optical host colors and morphology, UV light curve characteristics, and matches to archival X-ray, and spectroscopy catalogs. We classify 62% of the sources as active galaxies (358 quasars ...
Directory of Open Access Journals (Sweden)
Pawłowski Dominik
2014-12-01
Full Text Available Geostatistical methods for 2D and 3D modelling spatial variability of selected physicochemical properties of biogenic sediments were applied to a small valley mire in order to identify the processes that lead to the formation of various types of peat. A sequential Gaussian simulation was performed to reproduce the statistical distribution of the input data (pH and organic matter and their semivariances, as well as to honouring of data values, yielding more ‘realistic’ models that show microscale spatial variability, despite the fact that the input sample cores were sparsely distributed in the X-Y space of the study area. The stratigraphy of peat deposits in the Ldzań mire shows a record of long-term evolution of water conditions, which is associated with the variability in water supply over time. Ldzań is a fen (a rheotrophic mire with a through-flow of groundwater. Additionally, the vicinity of the Grabia River is marked by seasonal inundations of the southwest part of the mire and increased participation of mineral matter in the peat. In turn, the upper peat layers of some of the central part of Ldzań mire are rather spongy, and these peat-forming phytocoenoses probably formed during permanent waterlogging.
Attention in a bayesian framework
DEFF Research Database (Denmark)
Whiteley, Louise Emma; Sahani, Maneesh
2012-01-01
include both selective phenomena, where attention is invoked by cues that point to particular stimuli, and integrative phenomena, where attention is invoked dynamically by endogenous processing. However, most previous Bayesian accounts of attention have focused on describing relatively simple experimental...... settings, where cues shape expectations about a small number of upcoming stimuli and thus convey "prior" information about clearly defined objects. While operationally consistent with the experiments it seeks to describe, this view of attention as prior seems to miss many essential elements of both its......The behavioral phenomena of sensory attention are thought to reflect the allocation of a limited processing resource, but there is little consensus on the nature of the resource or why it should be limited. Here we argue that a fundamental bottleneck emerges naturally within Bayesian models of...
Bayesian priors for transiting planets
Kipping, David M
2016-01-01
As astronomers push towards discovering ever-smaller transiting planets, it is increasingly common to deal with low signal-to-noise ratio (SNR) events, where the choice of priors plays an influential role in Bayesian inference. In the analysis of exoplanet data, the selection of priors is often treated as a nuisance, with observers typically defaulting to uninformative distributions. Such treatments miss a key strength of the Bayesian framework, especially in the low SNR regime, where even weak a priori information is valuable. When estimating the parameters of a low-SNR transit, two key pieces of information are known: (i) the planet has the correct geometric alignment to transit and (ii) the transit event exhibits sufficient signal-to-noise to have been detected. These represent two forms of observational bias. Accordingly, when fitting transits, the model parameter priors should not follow the intrinsic distributions of said terms, but rather those of both the intrinsic distributions and the observational ...
Universal Darwinism As a Process of Bayesian Inference.
Campbell, John O
2016-01-01
Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment." Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature. PMID:27375438
Li, Junming; Jin, Meijun; Xu, Zheng
2016-01-01
With the rapid industrial development and urbanization in China over the past three decades, PM2.5 pollution has become a severe environmental problem that threatens public health. Due to its unbalanced development and intrinsic topography features, the distribution of PM2.5 concentrations over China is spatially heterogeneous. In this study, we explore the spatiotemporal variations of PM2.5 pollution in China and four great urban areas from 1998 to 2014. A space-time Bayesian hierarchy model is employed to analyse PM2.5 pollution. The results show that a stable “3-Clusters” spatial PM2.5 pollution pattern has formed. The mean and 90% quantile of the PM2.5 concentrations in China have increased significantly, with annual increases of 0.279 μg/m3 (95% CI: 0.083−0.475) and 0.735 μg/m3 (95% CI: 0.261−1.210), respectively. The area with a PM2.5 pollution level of more than 70 μg/m3 has increased significantly, with an annual increase of 0.26 percentage points. Two regions in particular, the North China Plain and Sichuan Basin, are experiencing the largest amounts of PM2.5 pollution. The polluted areas, with a high local magnitude of more than 1.0 relative to the overall PM2.5 concentration, affect an area with a human population of 949 million, which corresponded to 69.3% of the total population in 2010. North and south differentiation occurs in the urban areas of the Jingjinji and Yangtze Delta, and circular and radial gradient differentiation occur in the urban areas of the Cheng-Yu and Pearl Deltas. The spatial heterogeneity of the urban Jingjinji group is the strongest. Eighteen cities located in the Yangtze Delta urban group, including Shanghai and Nanjing, have experienced high PM2.5 concentrations and faster local trends of increasing PM2.5. The percentage of exposure to PM2.5 concentrations greater than 70 μg/m3 and 100 μg/m3 is increasing significantly. PMID:27490557
DEFF Research Database (Denmark)
Christensen, Lars P.B.; Larsen, Jan
2006-01-01
A general Variational Bayesian framework for iterative data and parameter estimation for coherent detection is introduced as a generalization of the EM-algorithm. Explicit solutions are given for MIMO channel estimation with Gaussian prior and noise covariance estimation with inverse-Wishart prior....... Simulation of a GSM-like system provides empirical proof that the VBEM-algorithm is able to provide better performance than the EM-algorithm. However, if the posterior distribution is highly peaked, the VBEM-algorithm approaches the EM-algorithm and the gain disappears. The potential gain is therefore...
Optical variability of infrared power law-selected galaxies & X-ray sources in the goods-south field
Directory of Open Access Journals (Sweden)
Alison Klesman
2007-01-01
Full Text Available We investigate the use of optical variabil- ity over 6 months to identify AGNs in the GOODS-South field. Photometry was per- formed on a sample of 24 infrared power law-selected AGN candidates and 104 X-ray sources with optical counterparts. We find that while the majority of variable objects are unobscured AGN, 30% of IR-only selected candidates show evidence of AGN via optical variability.
Directory of Open Access Journals (Sweden)
Ildikó Ungvári
Full Text Available Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls. The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA. This method uses bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated.With frequentist methods one SNP (rs3751464 in the FRMD6 gene provided evidence for an association with asthma (OR = 1.43(1.2-1.8; p = 3×10(-4. The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics.In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance.
Frühwirth-Schnatter, Sylvia
1990-01-01
In the paper at hand we apply it to Bayesian statistics to obtain "Fuzzy Bayesian Inference". In the subsequent sections we will discuss a fuzzy valued likelihood function, Bayes' theorem for both fuzzy data and fuzzy priors, a fuzzy Bayes' estimator, fuzzy predictive densities and distributions, and fuzzy H.P.D .-Regions. (author's abstract)
Yuan, Ying; MacKinnon, David P.
2009-01-01
In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…
Smail, Linda
2016-06-01
The basic task of any probabilistic inference system in Bayesian networks is computing the posterior probability distribution for a subset or subsets of random variables, given values or evidence for some other variables from the same Bayesian network. Many methods and algorithms have been developed to exact and approximate inference in Bayesian networks. This work compares two exact inference methods in Bayesian networks-Lauritzen-Spiegelhalter and the successive restrictions algorithm-from the perspective of computational efficiency. The two methods were applied for comparison to a Chest Clinic Bayesian Network. Results indicate that the successive restrictions algorithm shows more computational efficiency than the Lauritzen-Spiegelhalter algorithm.
Song, Yunquan; Lin, Lu; Jian, Ling
2016-07-01
Single-index varying-coefficient model is an important mathematical modeling method to model nonlinear phenomena in science and engineering. In this paper, we develop a variable selection method for high-dimensional single-index varying-coefficient models using a shrinkage idea. The proposed procedure can simultaneously select significant nonparametric components and parametric components. Under defined regularity conditions, with appropriate selection of tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. Moreover, due to the robustness of the check loss function to outliers in the finite samples, our proposed variable selection method is more robust than the ones based on the least squares criterion. Finally, the method is illustrated with numerical simulations.
Scaling Bayesian network discovery through incremental recovery
Castelo, J.R.; Siebes, A.P.J.M.
1999-01-01
Bayesian networks are a type of graphical models that, e.g., allow one to analyze the interaction among the variables in a database. A well-known problem with the discovery of such models from a database is the ``problem of high-dimensionality''. That is, the discovery of a network from a database w
International Nuclear Information System (INIS)
Pesticide induced changes were assessed in thirty two subjects of attempted suicide cases. Among all, the farmers and their families were recorded as most frequently suicide attempting. The values obtained from seven biochemical variables of 29 years old (average age) hospitalized subjects were compared to the same number and age matched normal volunteers. The results revealed major differences in the mean values of the selected parameters. The mean difference calculate; alkaline phosphatase (178.7 mu/l), Bilirubin (7.5 mg/dl), GPT (59.2 mu/l) and glucose (38.6 mg/dl) were higher than the controls, which indicate the hepatotoxicity induced by the pesticides in suicide attempting individuals. Increase in serum creatinine and urea indicated renal malfunction that could be linked with pesticide induced nephrotoxicity among them. (author)
Bayesian analysis of factors associated with fibromyalgia syndrome subjects
Jayawardana, Veroni; Mondal, Sumona; Russek, Leslie
2015-01-01
Factors contributing to movement-related fear were assessed by Russek, et al. 2014 for subjects with Fibromyalgia (FM) based on the collected data by a national internet survey of community-based individuals. The study focused on the variables, Activities-Specific Balance Confidence scale (ABC), Primary Care Post-Traumatic Stress Disorder screen (PC-PTSD), Tampa Scale of Kinesiophobia (TSK), a Joint Hypermobility Syndrome screen (JHS), Vertigo Symptom Scale (VSS-SF), Obsessive-Compulsive Personality Disorder (OCPD), Pain, work status and physical activity dependent from the "Revised Fibromyalgia Impact Questionnaire" (FIQR). The study presented in this paper revisits same data with a Bayesian analysis where appropriate priors were introduced for variables selected in the Russek's paper.
Energy Technology Data Exchange (ETDEWEB)
Akroyd, Duane [Department of Adult and Community College Education, College of Education, Campus Box 7801, North Carolina State University, Raleigh, NC 27695 (United States)], E-mail: duane_akroyd@ncsu.edu; Legg, Jeff [Department of Radiologic Sciences, Virginia Commonwealth University, Richmond, VA 23284 (United States); Jackowski, Melissa B. [Division of Radiologic Sciences, University of North Carolina School of Medicine 27599 (United States); Adams, Robert D. [Department of Radiation Oncology, University of North Carolina School of Medicine 27599 (United States)
2009-05-15
The purpose of this study was to examine the impact of selected organizational factors and the leadership behavior of supervisors on radiation therapists' commitment to their organizations. The population for this study consists of all full time clinical radiation therapists registered by the American Registry of Radiologic Technologists (ARRT) in the United States. A random sample of 800 radiation therapists was obtained from the ARRT for this study. Questionnaires were mailed to all participants and measured organizational variables; managerial leadership variable and three components of organizational commitment (affective, continuance and normative). It was determined that organizational support, and leadership behavior of supervisors each had a significant and positive affect on normative and affective commitment of radiation therapists and each of the models predicted over 40% of the variance in radiation therapists organizational commitment. This study examined radiation therapists' commitment to their organizations and found that affective (emotional attachment to the organization) and normative (feelings of obligation to the organization) commitments were more important than continuance commitment (awareness of the costs of leaving the organization). This study can help radiation oncology administrators and physicians to understand the values their radiation therapy employees hold that are predictive of their commitment to the organization. A crucial result of the study is the importance of the perceived support of the organization and the leadership skills of managers/supervisors on radiation therapists' commitment to the organization.
Interdependency of selected metabolic variables in an animal model of metabolic syndrome.
Mellouk, Zoheir; Sener, Abdullah; Yahia, Dalila Ait; Malaisse, Willy J
2014-10-01
In the present study, the correlation between the percentage of glycated hemoglobin, taken as representative of changes in glucose homeostasis, and selected variables was investigated. Rats were treated for 8 weeks with diets containing 64% starch and 5% sunflower oil or containing 64% D-fructose mixed with: 5% sunflower oil; 3.4% sunflower oil and 1.6% salmon oil; or 3.4% sunflower oil and 1.6% safflower oil. Positive correlations were found between glycated hemoglobin and plasma albumin, urea, creatinine, phospholipids, triglycerides and total cholesterol, liver cholesterol, triglyceride and phospholipid content, and the plasma, liver, heart, kidney, soleus muscle and visceral adipose tissue content of thiobarbituric acid reactive substances, carbonyl derivatives and hydroperoxides. Inversely, negative correlations were observed between glycated hemoglobin and plasma calcium, iron and HDL-cholesterol concentrations, liver, heart, kidney, soleus muscle and visceral adipose tissue superoxide dismutase and catalase activity; as well as plasma, liver, heart, kidney, soleus muscle and visceral adipose tissue nitric oxide content. Only the liver glucokinase activity and liver, heart, kidney, soleus muscle and visceral adipose tissue glutathione reductase activity failed to display a significant correlation with glycated hemoglobin. These findings confirm the hypothesis that there is a close association between glucose homeostasis and other variables when considering the effects of long-chain polyunsaturated ω3 and ω6 fatty acids in rats with fructose-induced metabolic syndrome. PMID:25187839
International Nuclear Information System (INIS)
The purpose of this study was to examine the impact of selected organizational factors and the leadership behavior of supervisors on radiation therapists' commitment to their organizations. The population for this study consists of all full time clinical radiation therapists registered by the American Registry of Radiologic Technologists (ARRT) in the United States. A random sample of 800 radiation therapists was obtained from the ARRT for this study. Questionnaires were mailed to all participants and measured organizational variables; managerial leadership variable and three components of organizational commitment (affective, continuance and normative). It was determined that organizational support, and leadership behavior of supervisors each had a significant and positive affect on normative and affective commitment of radiation therapists and each of the models predicted over 40% of the variance in radiation therapists organizational commitment. This study examined radiation therapists' commitment to their organizations and found that affective (emotional attachment to the organization) and normative (feelings of obligation to the organization) commitments were more important than continuance commitment (awareness of the costs of leaving the organization). This study can help radiation oncology administrators and physicians to understand the values their radiation therapy employees hold that are predictive of their commitment to the organization. A crucial result of the study is the importance of the perceived support of the organization and the leadership skills of managers/supervisors on radiation therapists' commitment to the organization.
International Nuclear Information System (INIS)
This study was conducted during the two winter seasons of 2004/2005 and 2005/2006 at the experimental farm belonging to Plant Research Department, Nuclear Research Centre, AEA, Egypt.The aim of this study is to determine the effect of gamma rays(150, 200 and 250 Gy) on means of yield and its attributes for exotic wheat variety (vir-25) and induction of genetic variability that permits to perform visual selection through the irradiated populations, as well as to determine difference in seed protein patterns between vir-25 parent variety and some selectants in M2 generation.The results showed that the different doses of gamma rays had non-significant effect on mean value of yield/plant and significant effect on mean values of it's attributes. 0n the other hand, the considered genetic variability was generated as result of applying gamma irradiation. The highest amount of induced genetic variability was detected for number of grains/ spike, spike length and number of spikes/plant. Additionally, these three traits exhibited strong association with grain yield/plant, hence, they were used as a criterion for selection.Some variant plants were selected from radiation treatment 250 Gy, with 2-10 spikes per plant.These variant plants exhibited increasing in spike length and number of gains/spike.The results also revealed that protein electrophoresis were varied in the number and position of bands from genotype to another and various genotypes share bands with molecular weights 31.4 and 3.2 KD.Many bands were found to be specific for the genotype and the nine wheat mutants were characterized by the presence of bands of molecular weights: 151.9, 125.7, 14.1 and 5.7 KD at M-167.4, 21.7 and 8.2 at M-299.7 KD at M-3136.1, 97.6, 49.8, 27.9 and 20.6 KD at M-4 135.2, 95.3 and 28.1 KD at M-5 135.5, 67.7, 47.1, 32.3, 21.9 and 9.6 KD at M-6 126.1, 112.1, 103.3, 58.8, 20.9 and 12.1 KD at M-7 127.7, 116.6, 93.9, 55.0 and 47.4 KD at M-8 141.7, 96.1, 79.8, 68.9, 42.1, 32.7, 22.0 and 13
Selection of AGN candidates in the GOODS-South Field through SPITZER/MIPS 24 microns variability
García-González, Judit; Alonso-Herrero, Almudena; Pérez-González, Pablo G.; Hernán-Caballero, Antonio; Sarajedini, Vicki L.; Villar, Víctor
2014-01-01
We present a study of galaxies showing mid-infrared variability in the deepest Spitzer/MIPS 24 $\\mu$m surveys in the GOODS-South field. We divide the dataset in epochs and subepochs to study the long-term (months-years) and the short-term (days) variability. We use a $\\chi^2$-statistics method to select AGN candidates with a probability $\\leq$ 1% that the observed variability is due to statistical errors alone. We find 39 (1.7% of the parent sample) sources that show long-term variability and...
Sariful Isalm; S. Thirumalaikumar
2014-01-01
The Purpose of the study is to find out the effect of functional and aerobic training on selected fitness and performance variables among Football players at College level. Pre test and post test randomized group design was applied to this research. Sixty College men Football players from Kolkatta city were randomly selected and they were assigned into four equal groups. Each group consisted of fifteen subjects. Pre test was conducted for all the Sixty subjects on selected fit...
International Nuclear Information System (INIS)
Monitoring population numbers is important for assessing trends and meeting various legislative mandates. However, sampling across time introduces a temporal aspect to survey design in addition to the spatial one. For instance, a sample that is initially representative may lose this attribute if there is a shift in numbers and/or spatial distribution in the underlying population that is not reflected in later sampled plots. Plot selection methods that account for this temporal variability will produce the best trend estimates. Consequently, I used simulation to compare bias and relative precision of estimates of population change among stratified and unstratified sampling designs based on permanent, temporary, and partial replacement plots under varying levels of spatial clustering, density, and temporal shifting of populations. Permanent plots produced more precise estimates of change than temporary plots across all factors. Further, permanent plots performed better than partial replacement plots except for high density (5 and 10 individuals per plot) and 25% - 50% shifts in the population. Stratified designs always produced less precise estimates of population change for all three plot selection methods, and often produced biased change estimates and greatly inflated variance estimates under sampling with partial replacement. Hence, stratification that remains fixed across time should be avoided when monitoring populations that are likely to exhibit large changes in numbers and/or spatial distribution during the study period. Key words: bias; change estimation; monitoring; permanent plots; relative precision; sampling with partial replacement; temporary plots
Evolutionary feature selection to estimate forest stand variables using LiDAR
Garcia-Gutierrez, Jorge; Gonzalez-Ferreiro, Eduardo; Riquelme-Santos, Jose C.; Miranda, David; Dieguez-Aranda, Ulises; Navarro-Cerrillo, Rafael M.
2014-02-01
Light detection and ranging (LiDAR) has become an important tool in forestry. LiDAR-derived models are mostly developed by means of multiple linear regression (MLR) after stepwise selection of predictors. An increasing interest in machine learning and evolutionary computation has recently arisen to improve regression use in LiDAR data processing. Although evolutionary machine learning has already proven to be suitable for regression, evolutionary computation may also be applied to improve parametric models such as MLR. This paper provides a hybrid approach based on joint use of MLR and a novel genetic algorithm for the estimation of the main forest stand variables. We show a comparison between our genetic approach and other common methods of selecting predictors. The results obtained from several LiDAR datasets with different pulse densities in two areas of the Iberian Peninsula indicate that genetic algorithms perform better than the other methods statistically. Preliminary studies suggest that a lack of parametric conditions in field data and possible misuse of parametric tests may be the main reasons for the better performance of the genetic algorithm. This research confirms the findings of previous studies that outline the importance of evolutionary computation in the context of LiDAR analisys of forest data, especially when the size of fieldwork datatasets is reduced.
Palanque-Delabrouille, N.; Magneville, Ch.; Yèche, Ch.; Pâris, I.; Petitjean, P.; Burtin, E.; Dawson, K.; McGreer, I.; Myers, A. D.; Rossi, G.; Schlegel, D.; Schneider, D.; Streblyanska, A.; Tinker, J.
2016-03-01
The extended Baryon Oscillation Spectroscopic Survey of the Sloan Digital Sky Survey (SDSS-IV/eBOSS) has an extensive quasar program that combines several selection methods. Among these, the photometric variability technique provides highly uniform samples, which are unaffected by the redshift bias of traditional optical-color selections, when z = 2.7-3.5 quasars cross the stellar locus or when host galaxy light affects quasar colors at z 2.2. Both models are constrained to be continuous at z = 2.2. They present a flattening of the bright-end slope at high redshift. The LEDE model indicates a reduction of the break density with increasing redshift, but the evolution of the break magnitude depends on the parameterization. The models are in excellent accord, predicting quasar counts that agree within 0.3% (resp., 1.1%) to g< 22.5 (resp., g< 23). The models are also in good agreement over the entire redshift range with models from previous studies.
Modeling operational risks of the nuclear industry with Bayesian networks
International Nuclear Information System (INIS)
Basically, planning a new industrial plant requires information on the industrial management, regulations, site selection, definition of initial and planned capacity, and on the estimation of the potential demand. However, this is far from enough to assure the success of an industrial enterprise. Unexpected and extremely damaging events may occur that deviates from the original plan. The so-called operational risks are not only in the system, equipment, process or human (technical or managerial) failures. They are also in intentional events such as frauds and sabotage, or extreme events like terrorist attacks or radiological accidents and even on public reaction to perceived environmental or future generation impacts. For the nuclear industry, it is a challenge to identify and to assess the operational risks and their various sources. Early identification of operational risks can help in preparing contingency plans, to delay the decision to invest or to approve a project that can, at an extreme, affect the public perception of the nuclear energy. A major problem in modeling operational risk losses is the lack of internal data that are essential, for example, to apply the loss distribution approach. As an alternative, methods that consider qualitative and subjective information can be applied, for example, fuzzy logic, neural networks, system dynamic or Bayesian networks. An advantage of applying Bayesian networks to model operational risk is the possibility to include expert opinions and variables of interest, to structure the model via causal dependencies among these variables, and to specify subjective prior and conditional probabilities distributions at each step or network node. This paper suggests a classification of operational risks in industry and discusses the benefits and obstacles of the Bayesian networks approach to model those risks. (author)
Yi Qian; Hui Xie
2011-01-01
In marketing applications, it is common that some key covariates in a regression model, such as marketing mix variables or consumer profiles, are subject to missingness. The convenient method that excludes the consumers with missingness in any covariate can result in a substantial loss of efficiency and may lead to strong selection bias in the estimation of consumer preferences and sensitivities. To solve these problems, we propose a new Bayesian distribution-free approach, which can ensure t...
Bayesian Image Reconstruction Based on Voronoi Diagrams
Cabrera, G F; Hitschfeld, N
2007-01-01
We present a Bayesian Voronoi image reconstruction technique (VIR) for interferometric data. Bayesian analysis applied to the inverse problem allows us to derive the a-posteriori probability of a novel parameterization of interferometric images. We use a variable Voronoi diagram as our model in place of the usual fixed pixel grid. A quantization of the intensity field allows us to calculate the likelihood function and a-priori probabilities. The Voronoi image is optimized including the number of polygons as free parameters. We apply our algorithm to deconvolve simulated interferometric data. Residuals, restored images and chi^2 values are used to compare our reconstructions with fixed grid models. VIR has the advantage of modeling the image with few parameters, obtaining a better image from a Bayesian point of view.
Karakaya, Gulsah; Taormina, Riccardo; Galelli, Stefano; Damla Ahipasaoglu, Selin
2015-04-01
Input Variable Selection (IVS) is an essential step in hydrological modelling problems, since it allows determining the optimal subset of input variables from a large set of candidates to characterize a preselected output. Interestingly, most of the existing IVS algorithms select a single subset, or, at most, one subset of input variables for each cardinality level, thus overlooking the fact that, for a given cardinality, there can be several subsets with similar information content. In this study, we develop a novel IVS approach specifically conceived to account for this issue. The approach is based on the formulation of a four-objective optimization problem that aims at minimizing the number of selected variables and maximizing the prediction accuracy of a data-driven model, while optimizing two entropy-based measures of relevance and redundancy. The redundancy measure ensures that the cross-dependence between the variables in a subset is minimized, while the relevance measure guarantees that the information content of each subset is maximized. In addition to the capability of selecting equally informative subsets, the approach is characterized by two other properties, namely 1) the capability of handling nonlinear interactions between the candidate input variables and preselected output, and 2) computational efficiency. These properties are guaranteed by the adoption of Extreme Learning Machine and Borg MOEA as data-driven model and heuristic optimization procedure, respectively. The approach is demonstrated on a long-term streamflow prediction problem, with the input dataset including both hydro-meteorological variables and climate indices representing dominant modes of climate variability. Results show that the availability of several equally informative subsets allows 1) determining the relative importance of each candidate input, thus supporting the understanding of the underlying physical processes, and 2) finding a better trade-off between multiple
Bujak, Renata; Daghir-Wojtkowiak, Emilia; Kaliszan, Roman; Markuszewski, Michał J
2016-01-01
Non-targeted metabolomics constitutes a part of the systems biology and aims at determining numerous metabolites in complex biological samples. Datasets obtained in the non-targeted metabolomics studies are high-dimensional due to sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Therefore, a proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study). The orthogonal projections to latent structures-discriminant analysis (OPLS-DA) without and with multiple testing correction as well as the least absolute shrinkage and selection operator (LASSO) with bootstrapping, were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction selected 46 and 218 variables based on the VIP criteria using Pareto and UV scaling, respectively. For the PH study, 217 and 320 variables were selected based on the VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built after correcting for multiple testing, selected 4 and 19 variables as in terms of Pareto and UV scaling, respectively. For the PH study, 14 and 18 variables were selected based on the VIP criteria in terms of Pareto and UV scaling, respectively. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3 and 100%, respectively. In the light of PLS-based models, the larger the search space the higher the probability of developing models that fit the training data well with simultaneous poor predictive performance on the validation set. The LASSO offers potential improvements over standard linear regression due to the presence of the constrain, which promotes sparse solutions. This paper is the first one to date
Granade, Christopher; Cory, D G
2015-01-01
In recent years, Bayesian methods have been proposed as a solution to a wide range of issues in quantum state and process tomography. State-of- the-art Bayesian tomography solutions suffer from three problems: numerical intractability, a lack of informative prior distributions, and an inability to track time-dependent processes. Here, we solve all three problems. First, we use modern statistical methods, as pioneered by Husz\\'ar and Houlsby and by Ferrie, to make Bayesian tomography numerically tractable. Our approach allows for practical computation of Bayesian point and region estimators for quantum states and channels. Second, we propose the first informative priors on quantum states and channels. Finally, we develop a method that allows online tracking of time-dependent states and estimates the drift and diffusion processes affecting a state. We provide source code and animated visual examples for our methods.
Dimensionality reduction in Bayesian estimation algorithms
Directory of Open Access Journals (Sweden)
G. W. Petty
2013-03-01
Full Text Available An idealized synthetic database loosely resembling 3-channel passive microwave observations of precipitation against a variable background is employed to examine the performance of a conventional Bayesian retrieval algorithm. For this dataset, algorithm performance is found to be poor owing to an irreconcilable conflict between the need to find matches in the dependent database versus the need to exclude inappropriate matches. It is argued that the likelihood of such conflicts increases sharply with the dimensionality of the observation space of real satellite sensors, which may utilize 9 to 13 channels to retrieve precipitation, for example. An objective method is described for distilling the relevant information content from N real channels into a much smaller number (M of pseudochannels while also regularizing the background (geophysical plus instrument noise component. The pseudochannels are linear combinations of the original N channels obtained via a two-stage principal component analysis of the dependent dataset. Bayesian retrievals based on a single pseudochannel applied to the independent dataset yield striking improvements in overall performance. The differences between the conventional Bayesian retrieval and reduced-dimensional Bayesian retrieval suggest that a major potential problem with conventional multichannel retrievals – whether Bayesian or not – lies in the common but often inappropriate assumption of diagonal error covariance. The dimensional reduction technique described herein avoids this problem by, in effect, recasting the retrieval problem in a coordinate system in which the desired covariance is lower-dimensional, diagonal, and unit magnitude.
Bayesian exploratory factor analysis
Gabriella Conti; Sylvia Frühwirth-Schnatter; James Heckman; Rémi Piatek
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identifi cation criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study c...
Bayesian Exploratory Factor Analysis
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.; Piatek, Rémi
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study co...
Bayesian Exploratory Factor Analysis
Gabriella Conti; Sylvia Fruehwirth-Schnatter; Heckman, James J.; Remi Piatek
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on \\emph{ad hoc} classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo s...
Bayesian exploratory factor analysis
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.; Piatek, Rémi
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo st...
Bayesian exploratory factor analysis
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James; Piatek, Rémi
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study co...
Carbonetto, Peter; Kisynski, Jacek; De Freitas, Nando; Poole, David L
2012-01-01
The Bayesian Logic (BLOG) language was recently developed for defining first-order probability models over worlds with unknown numbers of objects. It handles important problems in AI, including data association and population estimation. This paper extends BLOG by adopting generative processes over function spaces - known as nonparametrics in the Bayesian literature. We introduce syntax for reasoning about arbitrary collections of objects, and their properties, in an intuitive manner. By expl...
Bayesian default probability models
Andrlíková, Petra
2014-01-01
This paper proposes a methodology for default probability estimation for low default portfolios, where the statistical inference may become troublesome. The author suggests using logistic regression models with the Bayesian estimation of parameters. The piecewise logistic regression model and Box-Cox transformation of credit risk score is used to derive the estimates of probability of default, which extends the work by Neagu et al. (2009). The paper shows that the Bayesian models are more acc...
Ershadi, Ali
2013-05-01
The influence of uncertainty in land surface temperature, air temperature, and wind speed on the estimation of sensible heat flux is analyzed using a Bayesian inference technique applied to the Surface Energy Balance System (SEBS) model. The Bayesian approach allows for an explicit quantification of the uncertainties in input variables: a source of error generally ignored in surface heat flux estimation. An application using field measurements from the Soil Moisture Experiment 2002 is presented. The spatial variability of selected input meteorological variables in a multitower site is used to formulate the prior estimates for the sampling uncertainties, and the likelihood function is formulated assuming Gaussian errors in the SEBS model. Land surface temperature, air temperature, and wind speed were estimated by sampling their posterior distribution using a Markov chain Monte Carlo algorithm. Results verify that Bayesian-inferred air temperature and wind speed were generally consistent with those observed at the towers, suggesting that local observations of these variables were spatially representative. Uncertainties in the land surface temperature appear to have the strongest effect on the estimated sensible heat flux, with Bayesian-inferred values differing by up to ±5°C from the observed data. These differences suggest that the footprint of the in situ measured land surface temperature is not representative of the larger-scale variability. As such, these measurements should be used with caution in the calculation of surface heat fluxes and highlight the importance of capturing the spatial variability in the land surface temperature: particularly, for remote sensing retrieval algorithms that use this variable for flux estimation.
Legarra Andrés; Parada Analia; Alfonso Leopoldo; Ugarte Eva; Arana Ana
2006-01-01
Abstract Breeding sheep populations for scrapie resistance could result in a loss of genetic variability. In this study, the effect on genetic variability of selection for increasing the ARR allele frequency was estimated in the Latxa breed. Two sources of information were used, pedigree and genetic polymorphisms (fifteen microsatellites). The results based on the genealogical information were conditioned by a low pedigree completeness level that revealed the interest of also using the inform...
Selection of AGN candidates in the GOODS-South Field through Spitzer/MIPS 24 μm variability
García González, Judit; Alonso Herrero, Almudena; Pérez González, Pablo Guillermo; Hernán Caballero, Antonio; Sarajedini, Vicki L.; Villar, Victor
2015-01-01
We present a study of galaxies showing mid-infrared variability in data taken in the deepest Spitzer/MIPS 24 mu m surveys in the Great Observatory Origins Deep Survey South field. We divide the data set in epochs and subepochs to study the long-term (months-years) and the short-term (days) variability. We use a chi^(2)-statistics method to select active galactic nucleus (AGN) candidates with a probability
Rathnamma; Gandhi, R
2014-01-01
The purpose of the present study aimed at examining theselected physical fitness variables ofspeed, agility and explosive power, and psychological variables of sports competition anxiety, sports achievement motivation and self-confidence among inter collegiate and inter university athletes. For achieving the result of the study sixty athletes are selected in the ratio of 30 intercollegiate and 30 inter university men athletes from three colleges and three universities. The eac...
Bayesian Methods and Universal Darwinism
Campbell, John
2009-12-01
Bayesian methods since the time of Laplace have been understood by their practitioners as closely aligned to the scientific method. Indeed a recent Champion of Bayesian methods, E. T. Jaynes, titled his textbook on the subject Probability Theory: the Logic of Science. Many philosophers of science including Karl Popper and Donald Campbell have interpreted the evolution of Science as a Darwinian process consisting of a `copy with selective retention' algorithm abstracted from Darwin's theory of Natural Selection. Arguments are presented for an isomorphism between Bayesian Methods and Darwinian processes. Universal Darwinism, as the term has been developed by Richard Dawkins, Daniel Dennett and Susan Blackmore, is the collection of scientific theories which explain the creation and evolution of their subject matter as due to the Operation of Darwinian processes. These subject matters span the fields of atomic physics, chemistry, biology and the social sciences. The principle of Maximum Entropy states that Systems will evolve to states of highest entropy subject to the constraints of scientific law. This principle may be inverted to provide illumination as to the nature of scientific law. Our best cosmological theories suggest the universe contained much less complexity during the period shortly after the Big Bang than it does at present. The scientific subject matter of atomic physics, chemistry, biology and the social sciences has been created since that time. An explanation is proposed for the existence of this subject matter as due to the evolution of constraints in the form of adaptations imposed on Maximum Entropy. It is argued these adaptations were discovered and instantiated through the Operations of a succession of Darwinian processes.
Variable Selection for Functional Logistic Regression in fMRI Data Analysis
Directory of Open Access Journals (Sweden)
Nedret BILLOR
2015-03-01
Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.
Variation in Age and Training on Selected Biochemical Variables of Indian Hockey Players
Directory of Open Access Journals (Sweden)
I. Manna
2010-04-01
Full Text Available The present study was aimed to find out the variation of age and training on biochemical variables of Indian elite hockey players. A total of 120 hockey players who volunteered for the present study, were equally divided (n=30 into 4 groups: under 16 years (14-15 yrs; under 19 years (16-18 yrs; under 23 years (19-22 yrs; and senior (23-30 yrs. The training sessions were divided into 3 phases: Transition Phase (TP, Preparatory Phase (PP, and Competitive Phase (CP. The training programme consisted of aerobic, anaerobic and skill training; and completed 4 hours in morning and evening sessions, 5 days/week. Selected biochemical parameters were measured and data were analyzed by applying Two-way ANOVA and Post hoc test. The mean values of haemoglobin (Hb, total cholesterol (TC, triglyceride (TG, high density lipoprotein cholesterol (HDL-C and low density lipoprotein cholesterol (LDL-C have been increased significantly (P<0.05 with the advancement of age of players. A significant increase (P<0.05 in serum urea, uric acid and HDL-C and a significant decrease (P<0.05 in Hb, TC, TG and LDL-C have been noted in PP and CP when compared to that of TP. The present study would provide useful information for biochemical monitoring of training of hockey players.
An experiment on selecting most informative variables in socio-economic data
Directory of Open Access Journals (Sweden)
L. Jenkins
2014-01-01
Full Text Available In many studies where data are collected on several variables, there is a motivation to find if fewer variables would provide almost as much information. Variance of a variable about its mean is the common statistical measure of information content, and that is used here. We are interested whether the variability in one variable is sufficiently correlated with that in one or more of the other variables that the first variable is redundant. We wish to find one or more ‘principal variables’ that sufficiently reflect the information content in all the original variables. The paper explains the method of principal variables and reports experiments using the technique to see if just a few variables are sufficient to reflect the information in 11 socioeconomic variables on 130 countries from a World Bank (WB database. While the method of principal variables is highly successful in a statistical sense, the WB data varies greatly from year to year, demonstrating that fewer variables wo uld be inadequate for this data.
Comparative performance of selected variability detection techniques in photometric time series data
Sokolovsky, K V; Karampelas, A; Antipin, S V; Bellas-Velidis, I; Benni, P; Bonanos, A Z; Burdanov, A Y; Derlopa, S; Hatzidimitriou, D; Khokhryakova, A D; Kolesnikova, D M; Korotkiy, S A; Lapukhin, E G; Moretti, M I; Popov, A A; Pouliasis, E; Samus, N N; Spetsieri, Z; Veselkov, S A; Volkov, K V; Yang, M; Zubareva, A M
2016-01-01
Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time-series datasets obtained with telescopes ranging in size from a telephoto lens to 1m-class and probing variability on timescales from minutes to decades. The test datasets together include lightcurves of 127539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampli...
Optical Variability of Infrared Power Law-Selected Galaxies & X-ray Sources in the GOODS-South Field
Alison Klesman; Vicki Sarajedini
2007-01-01
We investigate the use of optical variability to identify and study Active Galactic Nuclei (AGN) in the GOODS-South field. A sample of 22 mid-infrared power law sources and 102 X-ray sources with optical counterparts in the HST ACS images were selected. Each object is classified with a variability significance value related to the standard deviation of its magnitude in five epochs separated by 45-day intervals. The variability significance is compared to the optical, mid-IR, and X-ray propert...
Boutsia, K; Trevese, D; Vagnetti, F
2009-01-01
Luminous AGNs are usually selected by their non-stellar colours or their X-ray emission. Colour selection cannot be used to select low-luminosity AGNs, since their emission is dominated by the host galaxy. Objects with low X-ray to optical ratio escape even the deepest X-ray surveys performed so far. In a previous study we presented a sample of candidates selected through optical variability in the Chandra Deep Field South, where repeated optical observations were performed for the STRESS supernova survey. We obtained new optical spectroscopy for a sample of variability selected candidates with the ESO NTT telescope. We analysed the new spectra, together with those existing in the literature and studied the distribution of the objects in U-B and B-V colours, optical and X-ray luminosity, and variability amplitude. A large fraction (17/27) of the observed candidates are broad-line luminous AGNs, confirming the efficiency of variability in detecting quasars. We detect: i) extended objects which would have escap...
Directory of Open Access Journals (Sweden)
R. Annadurai
2014-11-01
Full Text Available The purpose of the study was to investigate effect of SAQ training programme on selected physical fitness variables and skill performance of junior volley ball players. For the purpose of this study thirty junior (N = 30 volleyball players were randomly selected from Coimbatore inter district level school volley ball tournament conducted by the GKD Matriculation Hr.Sec School in the academic year 2012. The respondents’age ranged from 12-14 years and was divided in two equal groups of fifteen subjects, namely experimental group I – SAQ training, Group-II acted as control group. They didn’t under go any specific training programme. The experimental group underwent alternative days per week for six weeks. The data were collected at before and after the training period of six weeks. The following criterion variables were selected such as speed, agility, quickness, serving ability and passing ability. The analysis of‘t’ ratio was used to analyze the data. The study shows that the selected variables were significantly improved due to the SAQ training programme on selected physical fitness variables and skill performance of junior volley ball players.
Elicitation of prior distributions of variable-selection problems in regression
Garthwaite, Paul H.; Dickey, James M.
1992-01-01
This paper addresses the problem of quantifying expert opinion about a normal linear regression model when there is uncertainty as to which independent variables should be included in the model. Opinion is modeled as a mixture of natural conjugate prior distributions with each distribution in the mixture corresponding to a different subset of the independent variables. It is shown that for certain values of the independent variables, the predictive distribution of the dependent variable simpl...
Bayesian synthetic evaluation of multistage reliability growth with instant and delayed fix modes
Institute of Scientific and Technical Information of China (English)
无
2008-01-01
In the multistage reliability growth tests with instant and delayed fix modes, the failure data can be assumed to follow Weibull processes with different parameters at different stages. For the Weibull process within a stage, by the proper selection of prior distribution form and the parameters, a concise posterior distribution form is obtained, thus simplifying the Bayesian analysis. In the multistage tests, the improvement factor is used to convert the posterior of one stage to the prior of the subsequent stage. The conversion criterion is carefully analyzed to determine the distribution parameters of the subsequent stage's variable reasonably. Based on the mentioned results, a new synthetic Bayesian evaluation program and algorithm framework is put forward to evaluate the multistage reliability growth tests with instant and delayed fix modes. The example shows the effectiveness and flexibility of this method.
Variability Selected Low-Luminosity Active Galactic Nuclei in the 4 Ms Chandra Deep Field-South
Young, M.; Brandt, W. N.; Xue, Y. Q.; Paolillo, D. M.; Alexander, F. E.; Bauer, F. E.; Lehmer, B. D.; Luo, B.; Shemmer, O.; Schneider, D. P.; Vignail, C.
2012-01-01
The 4 Ms Chandra Deep Field-South (CDF-S) and other deep X-ray surveys have been highly effective at selecting active galactic nuclei (AGN). However, cosmologically distant low-luminosity AGN (LLAGN) have remained a challenge to identify due to significant contribution from the host galaxy. We identify long-term X ray variability (approx. month years, observed frame) in 20 of 92 CDF-S galaxies spanning redshifts approx equals 00.8 - 1.02 that do not meet other AGN selection criteria. We show that the observed variability cannot be explained by X-ray binary populations or ultraluminous X-ray sources, so the variability is most likely caused by accretion onto a supermassive black hole. The variable galaxies are not heavily obscured in general, with a stacked effective power-law photon index of Gamma(sub Stack) approx equals 1.93 +/- 0.13, and arc therefore likely LLAGN. The LLAGN tend to lie it factor of approx equal 6-89 below the extrapolated linear variability-luminosity relation measured for luminous AGN. This may he explained by their lower accretion rates. Variability-independent black-hole mass and accretion-rate estimates for variable galaxies show that they sample a significantly different black hole mass-accretion-rate space, with masses a factor of 2.4 lower and accretion rates a factor of 22.5 lower than variable luminous AGNs at the same redshift. We find that an empirical model based on a universal broken power-law power spectral density function, where the break frequency depends on SMBH mass and accretion rate, roughly reproduces the shape, but not the normalization, of the variability-luminosity trends measured for variable galaxies and more luminous AGNs.
VARIABILITY-SELECTED LOW-LUMINOSITY ACTIVE GALACTIC NUCLEI IN THE 4 Ms CHANDRA DEEP FIELD-SOUTH
International Nuclear Information System (INIS)
The 4 Ms Chandra Deep Field-South (CDF-S) and other deep X-ray surveys have been highly effective at selecting active galactic nuclei (AGNs). However, cosmologically distant low-luminosity AGNs (LLAGNs) have remained a challenge to identify due to significant contribution from the host galaxy. We identify long-term X-ray variability (∼month-years, observed frame) in 20 of 92 CDF-S galaxies spanning redshifts z ≈ 0.08-1.02 that do not meet other AGN selection criteria. We show that the observed variability cannot be explained by X-ray binary populations or ultraluminous X-ray sources, so the variability is most likely caused by accretion onto a supermassive black hole (SMBH). The variable galaxies are not heavily obscured in general, with a stacked effective power-law photon index of Γstack ≈ 1.93 ± 0.13, and are therefore likely LLAGNs. The LLAGNs tend to lie a factor of ≈6-80 below the extrapolated linear variability-luminosity relation measured for luminous AGNs. This may be explained by their lower accretion rates. Variability-independent black hole mass and accretion-rate estimates for variable galaxies show that they sample a significantly different black hole mass-accretion-rate space, with masses a factor of 2.4 lower and accretion rates a factor of 22.5 lower than variable luminous AGNs at the same redshift. We find that an empirical model based on a universal broken power-law power spectral density function, where the break frequency depends on SMBH mass and accretion rate, roughly reproduces the shape, but not the normalization, of the variability-luminosity trends measured for variable galaxies and more luminous AGNs.
R. Annadurai; N. Sathish Babu
2014-01-01
The purpose of the study was to investigate effect of SAQ training programme on selected physical fitness variables and skill performance of junior volley ball players. For the purpose of this study thirty junior (N = 30) volleyball players were randomly selected from Coimbatore inter district level school volley ball tournament conducted by the GKD Matriculation Hr.Sec School in the academic year 2012. The respondents’age ranged from 12-14 years and was divided in two equal...
Comparative performance of selected variability detection techniques in photometric time series data
Sokolovsky, K. V.; Gavras, P.; Karampelas, A.; Antipin, S. V.; Bellas-Velidis, I.; Benni, P.; Bonanos, A. Z.; Burdanov, A. Y.; Derlopa, S.; Hatzidimitriou, D.; Khokhryakova, A. D.; Kolesnikova, D. M.; Korotkiy, S. A.; Lapukhin, E. G.; Moretti, M. I.; Popov, A. A.; Pouliasis, E.; Samus, N. N.; Spetsieri, Z.; Veselkov, S. A.; Volkov, K. V.; Yang, M.; Zubareva, A. M.
2016-09-01
Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time-series datasets obtained with telescopes ranging in size from a telephoto lens to 1 m-class and probing variability on timescales from minutes to decades. The test datasets together include lightcurves of 127539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampling patterns, photometric accuracies, and percentages of outlier measurements. The first index is the interquartile range (IQR) of magnitude measurements, sensitive to variability irrespective of a timescale and resistant to outliers. It can be complemented by the ratio of the lightcurve variance to the mean square successive difference, 1/η, which is efficient in detecting variability on timescales longer than the typical time interval between observations. Variable objects have larger 1/η and/or IQR values than non-variable objects of similar brightness. Another approach to variability detection is to combine many variability indices using principal component analysis. We present 124 previously unknown variable stars found in the test data.
International Nuclear Information System (INIS)
We present a new quasi-stellar object (QSO) selection algorithm using a Support Vector Machine, a supervised classification method, on a set of extracted time series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars, and microlensing events using 58 known QSOs, 1629 variable stars, and 4288 non-variables in the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ∼80% of known QSOs with a 25% false-positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) data set, which consists of 40 million light curves, and found 1620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false-positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxy's Evolution LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.
Directory of Open Access Journals (Sweden)
C. Reche
2011-03-01
Full Text Available In many large cities of Europe standard air quality limit values of particulate matter (PM are exceeded. Emissions from road traffic and biomass burning are frequently reported to be the major causes. As a consequence of these exceedances a large number of air quality plans, most of them focusing on traffic emissions reductions, have been implemented in the last decade. In spite of this implementation, a number of cities did not record a decrease of PM levels. Thus, is the efficiency of air quality plans overestimated? Or do we need a more specific metric to evaluate the impact of the above emissions on the levels of urban aerosols?
This study shows the results of the interpretation of the 2009 variability of levels of PM, black carbon (BC, aerosol number concentration (N and a number of gaseous pollutants in seven selected urban areas covering road traffic, urban background, urban-industrial, and urban-shipping environments from southern, central and northern Europe.
The results showed that variations of PM and N levels do not always reflect the variation of the impact of road traffic emissions on urban aerosols. However, BC levels vary proportionally with those of traffic related gaseous pollutants, such as CO, NO_{2} and NO. Due to this high correlation, one may suppose that monitoring the levels of these gaseous pollutants would be enough to extrapolate exposure to traffic-derived BC levels. However, the BC/CO, BC/NO_{2} and BC/NO ratios vary widely among the cities studied, as a function of distance to traffic emissions, vehicle fleet composition and the influence of other emission sources such as biomass burning. Thus, levels of BC should be measured at air quality monitoring sites.
During traffic rush hours, a narrow variation in the N/BC ratio was evidenced, but a wide variation of this ratio was determined for the noon period. Although in central and northern Europe N and BC levels tend to vary
Network-based group variable selection for detecting expression quantitative trait loci (eQTL
Directory of Open Access Journals (Sweden)
Zhang Xuegong
2011-06-01
Full Text Available Abstract Background Analysis of expression quantitative trait loci (eQTL aims to identify the genetic loci associated with the expression level of genes. Penalized regression with a proper penalty is suitable for the high-dimensional biological data. Its performance should be enhanced when we incorporate biological knowledge of gene expression network and linkage disequilibrium (LD structure between loci in high-noise background. Results We propose a network-based group variable selection (NGVS method for QTL detection. Our method simultaneously maps highly correlated expression traits sharing the same biological function to marker sets formed by LD. By grouping markers, complex joint activity of multiple SNPs can be considered and the dimensionality of eQTL problem is reduced dramatically. In order to demonstrate the power and flexibility of our method, we used it to analyze two simulations and a mouse obesity and diabetes dataset. We considered the gene co-expression network, grouped markers into marker sets and treated the additive and dominant effect of each locus as a group: as a consequence, we were able to replicate results previously obtained on the mouse linkage dataset. Furthermore, we observed several possible sex-dependent loci and interactions of multiple SNPs. Conclusions The proposed NGVS method is appropriate for problems with high-dimensional data and high-noise background. On eQTL problem it outperforms the classical Lasso method, which does not consider biological knowledge. Introduction of proper gene expression and loci correlation information makes detecting causal markers more accurate. With reasonable model settings, NGVS can lead to novel biological findings.
Bayesian Test of Significance for Conditional Independence: The Multinomial Model
de Morais Andrade, Pablo; Stern, Julio; de Bragança Pereira, Carlos
2014-03-01
Conditional independence tests (CI tests) have received special attention lately in Machine Learning and Computational Intelligence related literature as an important indicator of the relationship among the variables used by their models. In the field of Probabilistic Graphical Models (PGM)--which includes Bayesian Networks (BN) models--CI tests are especially important for the task of learning the PGM structure from data. In this paper, we propose the Full Bayesian Significance Test (FBST) for tests of conditional independence for discrete datasets. FBST is a powerful Bayesian test for precise hypothesis, as an alternative to frequentist's significance tests (characterized by the calculation of the \\emph{p-value}).
Learning Bayesian Networks from Data by Particle Swarm Optimization
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
Learning Bayesian network is an NP-hard problem. When the number of variables is large, the process of searching optimal network structure could be very time consuming and tends to return a structure which is local optimal. The particle swarm optimization (PSO) was introduced to the problem of learning Bayesian networks and a novel structure learning algorithm using PSO was proposed. To search in directed acyclic graphs spaces efficiently, a discrete PSO algorithm especially for structure learning was proposed based on the characteristics of Bayesian networks. The results of experiments show that our PSO based algorithm is fast for convergence and can obtain better structures compared with genetic algorithm based algorithms.
International Nuclear Information System (INIS)
Purpose: Multivariate modeling of complications after radiotherapy is frequently used in conjunction with data driven variable selection. This study quantifies the risk of overfitting in a data driven modeling method using bootstrapping for data with typical clinical characteristics, and estimates the minimum amount of data needed to obtain models with relatively high predictive power. Materials and methods: To facilitate repeated modeling and cross-validation with independent datasets for the assessment of true predictive power, a method was developed to generate simulated data with statistical properties similar to real clinical data sets. Characteristics of three clinical data sets from radiotherapy treatment of head and neck cancer patients were used to simulate data with set sizes between 50 and 1000 patients. A logistic regression method using bootstrapping and forward variable selection was used for complication modeling, resulting for each simulated data set in a selected number of variables and an estimated predictive power. The true optimal number of variables and true predictive power were calculated using cross-validation with very large independent data sets. Results: For all simulated data set sizes the number of variables selected by the bootstrapping method was on average close to the true optimal number of variables, but showed considerable spread. Bootstrapping is more accurate in selecting the optimal number of variables than the AIC and BIC alternatives, but this did not translate into a significant difference of the true predictive power. The true predictive power asymptotically converged toward a maximum predictive power for large data sets, and the estimated predictive power converged toward the true predictive power. More than half of the potential predictive power is gained after approximately 200 samples. Our simulations demonstrated severe overfitting (a predicative power lower than that of predicting 50% probability) in a number of small
Variable Selection for Inflation : A Pseudo Out-of-sample Approach
Selen Baser Andic; Fethi Ogunc
2015-01-01
In this paper, we analyze the forecasting properties of a wide variety of variables for Turkish inflation, and thereby pin down the ones producing robust forecasts periodically. Defining the lag structure of a variable in two different ways, we determine the non-leading forecasters and leading indicators of inflation. We employ a pseudo out-of-sample approach and compare the forecasting performance of each variable ex-post with the benchmark model. We measure forecast errors over forecast hor...
Bayesian least squares deconvolution
Asensio Ramos, A.; Petit, P.
2015-11-01
Aims: We develop a fully Bayesian least squares deconvolution (LSD) that can be applied to the reliable detection of magnetic signals in noise-limited stellar spectropolarimetric observations using multiline techniques. Methods: We consider LSD under the Bayesian framework and we introduce a flexible Gaussian process (GP) prior for the LSD profile. This prior allows the result to automatically adapt to the presence of signal. We exploit several linear algebra identities to accelerate the calculations. The final algorithm can deal with thousands of spectral lines in a few seconds. Results: We demonstrate the reliability of the method with synthetic experiments and we apply it to real spectropolarimetric observations of magnetic stars. We are able to recover the magnetic signals using a small number of spectral lines, together with the uncertainty at each velocity bin. This allows the user to consider if the detected signal is reliable. The code to compute the Bayesian LSD profile is freely available.
Bayesian least squares deconvolution
Ramos, A Asensio
2015-01-01
Aims. To develop a fully Bayesian least squares deconvolution (LSD) that can be applied to the reliable detection of magnetic signals in noise-limited stellar spectropolarimetric observations using multiline techniques. Methods. We consider LSD under the Bayesian framework and we introduce a flexible Gaussian Process (GP) prior for the LSD profile. This prior allows the result to automatically adapt to the presence of signal. We exploit several linear algebra identities to accelerate the calculations. The final algorithm can deal with thousands of spectral lines in a few seconds. Results. We demonstrate the reliability of the method with synthetic experiments and we apply it to real spectropolarimetric observations of magnetic stars. We are able to recover the magnetic signals using a small number of spectral lines, together with the uncertainty at each velocity bin. This allows the user to consider if the detected signal is reliable. The code to compute the Bayesian LSD profile is freely available.
Loredo, T J
2004-01-01
I describe a framework for adaptive scientific exploration based on iterating an Observation--Inference--Design cycle that allows adjustment of hypotheses and observing protocols in response to the results of observation on-the-fly, as data are gathered. The framework uses a unified Bayesian methodology for the inference and design stages: Bayesian inference to quantify what we have learned from the available data and predict future data, and Bayesian decision theory to identify which new observations would teach us the most. When the goal of the experiment is simply to make inferences, the framework identifies a computationally efficient iterative ``maximum entropy sampling'' strategy as the optimal strategy in settings where the noise statistics are independent of signal properties. Results of applying the method to two ``toy'' problems with simulated data--measuring the orbit of an extrasolar planet, and locating a hidden one-dimensional object--show the approach can significantly improve observational eff...
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Wood, Frank; Griffiths, Thomas; Ghahramani, Zoubin
2012-01-01
We present a non-parametric Bayesian approach to structure learning with hidden causes. Previous Bayesian treatments of this problem define a prior over the number of hidden causes and use algorithms such as reversible jump Markov chain Monte Carlo to move between solutions. In contrast, we assume that the number of hidden causes is unbounded, but only a finite number influence observable variables. This makes it possible to use a Gibbs sampler to approximate the distribution over causal stru...
Modelling biogeochemical cycles in forest ecosystems: a Bayesian approach
Bagnara, Maurizio
2015-01-01
Forest models are tools for explaining and predicting the dynamics of forest ecosystems. They simulate forest behavior by integrating information on the underlying processes in trees, soil and atmosphere. Bayesian calibration is the application of probability theory to parameter estimation. It is a method, applicable to all models, that quantifies output uncertainty and identifies key parameters and variables. This study aims at testing the Bayesian procedure for calibration to different t...
Bayesian and frequentist inequality tests
David M. Kaplan; Zhuo, Longhao
2016-01-01
Bayesian and frequentist criteria are fundamentally different, but often posterior and sampling distributions are asymptotically equivalent (and normal). We compare Bayesian and frequentist hypothesis tests of inequality restrictions in such cases. For finite-dimensional parameters, if the null hypothesis is that the parameter vector lies in a certain half-space, then the Bayesian test has (frequentist) size $\\alpha$; if the null hypothesis is any other convex subspace, then the Bayesian test...
Exclusive breastfeeding practice in Nigeria: a bayesian stepwise regression analysis.
Gayawan, Ezra; Adebayo, Samson B; Chitekwe, Stanley
2014-11-01
Despite the importance of breast milk, the prevalence of exclusive breastfeeding (EBF) in Nigeria is far lower than what has been recommended for developing countries. Worse still, the practise has been on downward trend in the country recently. This study was aimed at investigating the determinants and geographical variations of EBF in Nigeria. Any intervention programme would require a good knowledge of factors that enhance the practise. A pooled data set from Nigeria Demographic and Health Survey conducted in 1999, 2003, and 2008 were analyzed using a Bayesian stepwise approach that involves simultaneous selection of variables and smoothing parameters. Further, the approach allows for geographical variations at a highly disaggregated level of states to be investigated. Within a Bayesian context, appropriate priors are assigned on all the parameters and functions. Findings reveal that education of women and their partners, place of delivery, mother's age at birth, and current age of child are associated with increasing prevalence of EBF. However, visits for antenatal care during pregnancy are not associated with EBF in Nigeria. Further, results reveal considerable geographical variations in the practise of EBF. The likelihood of exclusively breastfeeding children are significantly higher in Kwara, Kogi, Osun, and Oyo states but lower in Jigawa, Katsina, and Yobe. Intensive interventions that can lead to improved practise are required in all states in Nigeria. The importance of breastfeeding needs to be emphasized to women during antenatal visits as this can encourage and enhance the practise after delivery. PMID:24619227
Bayesian Subset Modeling for High-Dimensional Generalized Linear Models
Liang, Faming
2013-06-01
This article presents a new prior setting for high-dimensional generalized linear models, which leads to a Bayesian subset regression (BSR) with the maximum a posteriori model approximately equivalent to the minimum extended Bayesian information criterion model. The consistency of the resulting posterior is established under mild conditions. Further, a variable screening procedure is proposed based on the marginal inclusion probability, which shares the same properties of sure screening and consistency with the existing sure independence screening (SIS) and iterative sure independence screening (ISIS) procedures. However, since the proposed procedure makes use of joint information from all predictors, it generally outperforms SIS and ISIS in real applications. This article also makes extensive comparisons of BSR with the popular penalized likelihood methods, including Lasso, elastic net, SIS, and ISIS. The numerical results indicate that BSR can generally outperform the penalized likelihood methods. The models selected by BSR tend to be sparser and, more importantly, of higher prediction ability. In addition, the performance of the penalized likelihood methods tends to deteriorate as the number of predictors increases, while this is not significant for BSR. Supplementary materials for this article are available online. © 2013 American Statistical Association.
Extended Bayesian Information Criteria for Gaussian Graphical Models
Foygel, Rina
2010-01-01
Gaussian graphical models with sparsity in the inverse covariance matrix are of significant interest in many modern applications. For the problem of recovering the graphical structure, information criteria provide useful optimization objectives for algorithms searching through sets of graphs or for selection of tuning parameters of other methods such as the graphical lasso, which is a likelihood penalization technique. In this paper we establish the consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow. Compared to earlier work on the regression case, our treatment allows for growth in the number of non-zero parameters in the true model, which is necessary in order to cover connected graphs. We demonstrate the performance of this criterion on simulated data when used in conjunction with the graphical lasso, and verify that the criterion indeed performs better than either cross-validation or the ordi...
Bayesian multiple target tracking
Streit, Roy L
2013-01-01
This second edition has undergone substantial revision from the 1999 first edition, recognizing that a lot has changed in the multiple target tracking field. One of the most dramatic changes is in the widespread use of particle filters to implement nonlinear, non-Gaussian Bayesian trackers. This book views multiple target tracking as a Bayesian inference problem. Within this framework it develops the theory of single target tracking, multiple target tracking, and likelihood ratio detection and tracking. In addition to providing a detailed description of a basic particle filter that implements
Bayesian Exploratory Factor Analysis
DEFF Research Database (Denmark)
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.;
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the...... corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates...
Schnitzer, Mireille E.; Lok, Judith J.; Gruber, Susan
2015-01-01
This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low-and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios. PMID:26226129
Hunter, Evelyn M. Irving
1998-12-01
The purpose of this study was to examine the relationship and predictive power of the variables gender, high school GPA, class rank, SAT scores, ACT scores, and socioeconomic status on the graduation rates of minority college students majoring in the sciences at a selected urban university. Data was examined on these variables as they related to minority students majoring in science. The population consisted of 101 minority college students who had majored in the sciences from 1986 to 1996 at an urban university in the southwestern region of Texas. A non-probability sampling procedure was used in this study. The non-probability sampling procedure in this investigation was incidental sampling technique. A profile sheet was developed to record the information regarding the variables. The composite scores from SAT and ACT testing were used in the study. The dichotomous variables gender and socioeconomic status were dummy coded for analysis. For the gender variable, zero (0) indicated male, and one (1) indicated female. Additionally, zero (0) indicated high SES, and one (1) indicated low SES. Two parametric procedures were used to analyze the data in this investigation. They were the multiple correlation and multiple regression procedures. Multiple correlation is a statistical technique that indicates the relationship between one variable and a combination of two other variables. The variables socioeconomic status and GPA were found to contribute significantly to the graduation rates of minority students majoring in all sciences when combined with chemistry (Hypotheses Two and Four). These variables accounted for 7% and 15% of the respective variance in the graduation rates of minority students in the sciences and in chemistry. Hypotheses One and Three, the predictor variables gender, high school GPA, SAT Total Scores, class rank, and socioeconomic status did not contribute significantly to the graduation rates of minority students in biology and pharmacy.
Applied Music Teaching Behavior as a Function of Selected Personality Variables.
Schmidt, Charles P.
1989-01-01
Investigates the relationships among applied music teaching behaviors and personality variables as measured by the Myers-Briggs Type Indicator (MBTI). Suggests that personality variables may be important factors underlying four applied music teaching behaviors: approvals, rate of reinforcement, teacher model/performance, and pace. (LS)
Caimi, Florentino J.
1981-01-01
This study investigated relationships between eight motivational variables of the high school band director and three criteria of directing success: ensemble musicianship; ensemble music performance; and students' ratings of their director. Three motivational variables--and school size--were found to be significant predictors of band directing…
Huynh, Huynh
By noting that a Rasch or two parameter logistic (2PL) item belongs to the exponential family of random variables and that the probability density function (pdf) of the correct response (X=1) and the incorrect response (X=0) are symmetric with respect to the vertical line at the item location, it is shown that the conjugate prior for ability is…
The study on the spam filtering technology based on Bayesian algorithm
Wang Chunping
2013-01-01
This paper analyzed spam filtering technology, carried out a detailed study of Naive Bayes algorithm, and proposed the improved Naive Bayesian mail filtering technology. Improvement can be seen in text selection as well as feature extraction. The general Bayesian text classification algorithm mostly takes information gain and cross-entropy algorithm in feature selection. Through the principle of Bayesian analysis, it was found that the characteristics distribution is closely related to the ab...
Prediction of road accidents: A Bayesian hierarchical approach
DEFF Research Database (Denmark)
Deublein, Markus; Schubert, Matthias; Adey, Bryan T.;
2013-01-01
-lognormal regression analysis taking into account correlations amongst multiple dependent model response variables and effects of discrete accident count data e.g. over-dispersion, and (3) Bayesian inference algorithms, which are applied by means of data mining techniques supported by Bayesian Probabilistic Networks...... in order to represent non-linearity between risk indicating and model response variables, as well as different types of uncertainties which might be present in the development of the specific models.Prior Bayesian Probabilistic Networks are first established by means of multivariate regression analysis...... of the observed frequencies of the model response variables, e.g. the occurrence of an accident, and observed values of the risk indicating variables, e.g. degree of road curvature. Subsequently, parameter learning is done using updating algorithms, to determine the posterior predictive probability distributions...
Control of Complex Systems Using Bayesian Networks and Genetic Algorithm
Marwala, Tshilidzi
2007-01-01
A method based on Bayesian neural networks and genetic algorithm is proposed to control the fermentation process. The relationship between input and output variables is modelled using Bayesian neural network that is trained using hybrid Monte Carlo method. A feedback loop based on genetic algorithm is used to change input variables so that the output variables are as close to the desired target as possible without the loss of confidence level on the prediction that the neural network gives. The proposed procedure is found to reduce the distance between the desired target and measured outputs significantly.
An Explanation Mechanism for Bayesian Inferencing Systems
Norton, Steven W.
2013-01-01
Explanation facilities are a particularly important feature of expert system frameworks. It is an area in which traditional rule-based expert system frameworks have had mixed results. While explanations about control are well handled, facilities are needed for generating better explanations concerning knowledge base content. This paper approaches the explanation problem by examining the effect an event has on a variable of interest within a symmetric Bayesian inferencing system. We argue that...
Bayesian network learning for natural hazard assessments
Vogel, Kristin
2016-04-01
Even though quite different in occurrence and consequences, from a modelling perspective many natural hazards share similar properties and challenges. Their complex nature as well as lacking knowledge about their driving forces and potential effects make their analysis demanding. On top of the uncertainty about the modelling framework, inaccurate or incomplete event observations and the intrinsic randomness of the natural phenomenon add up to different interacting layers of uncertainty, which require a careful handling. Thus, for reliable natural hazard assessments it is crucial not only to capture and quantify involved uncertainties, but also to express and communicate uncertainties in an intuitive way. Decision-makers, who often find it difficult to deal with uncertainties, might otherwise return to familiar (mostly deterministic) proceedings. In the scope of the DFG research training group „NatRiskChange" we apply the probabilistic framework of Bayesian networks for diverse natural hazard and vulnerability studies. The great potential of Bayesian networks was already shown in previous natural hazard assessments. Treating each model component as random variable, Bayesian networks aim at capturing the joint distribution of all considered variables. Hence, each conditional distribution of interest (e.g. the effect of precautionary measures on damage reduction) can be inferred. The (in-)dependencies between the considered variables can be learned purely data driven or be given by experts. Even a combination of both is possible. By translating the (in-)dependences into a graph structure, Bayesian networks provide direct insights into the workings of the system and allow to learn about the underlying processes. Besides numerous studies on the topic, learning Bayesian networks from real-world data remains challenging. In previous studies, e.g. on earthquake induced ground motion and flood damage assessments, we tackled the problems arising with continuous variables
Bayesian Kernel Mixtures for Counts.
Canale, Antonio; Dunson, David B
2011-12-01
Although Bayesian nonparametric mixture models for continuous data are well developed, there is a limited literature on related approaches for count data. A common strategy is to use a mixture of Poissons, which unfortunately is quite restrictive in not accounting for distributions having variance less than the mean. Other approaches include mixing multinomials, which requires finite support, and using a Dirichlet process prior with a Poisson base measure, which does not allow smooth deviations from the Poisson. As a broad class of alternative models, we propose to use nonparametric mixtures of rounded continuous kernels. An efficient Gibbs sampler is developed for posterior computation, and a simulation study is performed to assess performance. Focusing on the rounded Gaussian case, we generalize the modeling framework to account for multivariate count data, joint modeling with continuous and categorical variables, and other complications. The methods are illustrated through applications to a developmental toxicity study and marketing data. This article has supplementary material online. PMID:22523437
Oracle Efficient Variable Selection in Random and Fixed Effects Panel Data Models
DEFF Research Database (Denmark)
Kock, Anders Bredahl
This paper generalizes the results for the Bridge estimator of Huang et al. (2008) to linear random and fixed effects panel data models which are allowed to grow in both dimensions. In particular we show that the Bridge estimator is oracle efficient. It can correctly distinguish between relevant...... and irrelevant variables and the asymptotic distribution of the estimators of the coefficients of the relevant variables is the same as if only these had been included in the model, i.e. as if an oracle had revealed the true model prior to estimation. In the case of more explanatory variables than...
Directory of Open Access Journals (Sweden)
Legarra Andrés
2006-09-01
Full Text Available Abstract Breeding sheep populations for scrapie resistance could result in a loss of genetic variability. In this study, the effect on genetic variability of selection for increasing the ARR allele frequency was estimated in the Latxa breed. Two sources of information were used, pedigree and genetic polymorphisms (fifteen microsatellites. The results based on the genealogical information were conditioned by a low pedigree completeness level that revealed the interest of also using the information provided by the molecular markers. The overall results suggest that no great negative effect on genetic variability can be expected in the short time in the population analysed by selection of only ARR/ARR males. The estimated average relationship of ARR/ARR males with reproductive females was similar to that of all available males whatever its genotype: 0.010 vs. 0.012 for a genealogical relationship and 0.257 vs. 0.296 for molecular coancestry, respectively. However, selection of only ARR/ARR males implied important losses in founder animals (87 percent and low frequency alleles (30 percent in the ram population. The evaluation of mild selection strategies against scrapie susceptibility based on the use of some ARR heterozygous males was difficult because the genetic relationships estimated among animals differed when pedigree or molecular information was used, and the use of more molecular markers should be evaluated.
Almeida, Flávia Angélica Martins; Nunes, Renan Felipe Hartmann; Ferreira, Sandro Dos Santos; Krinski, Kleverton; Elsangedy, Hassan Mohamed; Buzzachera, Cosme Franklin; Alves, Ragami Chaves; Gregorio da Silva, Sergio
2015-06-01
[Purpose] This study investigated the effects of musical tempo on physiological, affective, and perceptual responses as well as the performance of self-selected walking pace. [Subjects] The study included 28 adult women between 29 and 51 years old. [Methods] The subjects were divided into three groups: no musical stimulation group (control), and 90 and 140 beats per minute musical tempo groups. Each subject underwent three experimental sessions: involved familiarization with the equipment, an incremental test to exhaustion, and a 30-min walk on a treadmill at a self-selected pace, respectively. During the self-selected walking session, physiological, perceptual, and affective variables were evaluated, and walking performance was evaluated at the end. [Results] There were no significant differences in physiological variables or affective response among groups. However, there were significant differences in perceptual response and walking performance among groups. [Conclusion] Fast music (140 beats per minute) promotes a higher rating of perceived exertion and greater performance in self-selected walking pace without significantly altering physiological variables or affective response. PMID:26180303
Bayesian Geostatistical Design
DEFF Research Database (Denmark)
Diggle, Peter; Lophaven, Søren Nymand
2006-01-01
locations to, or deletion of locations from, an existing design, and prospective design, which consists of choosing positions for a new set of sampling locations. We propose a Bayesian design criterion which focuses on the goal of efficient spatial prediction whilst allowing for the fact that model...
Czech Academy of Sciences Publication Activity Database
Krejsa, Jiří; Věchet, S.
Bratislava: Slovak University of Technology in Bratislava, 2010, s. 217-222. ISBN 978-80-227-3353-3. [Robotics in Education . Bratislava (SK), 16.09.2010-17.09.2010] Institutional research plan: CEZ:AV0Z20760514 Keywords : mobile robot localization * bearing only beacons * Bayesian filters Subject RIV: JD - Computer Applications, Robotics
DEFF Research Database (Denmark)
Antoniou, Constantinos; Harrison, Glenn W.; Lau, Morten I.;
2015-01-01
A large literature suggests that many individuals do not apply Bayes’ Rule when making decisions that depend on them correctly pooling prior information and sample data. We replicate and extend a classic experimental study of Bayesian updating from psychology, employing the methods of experimenta...
Bayesian Independent Component Analysis
DEFF Research Database (Denmark)
Winther, Ole; Petersen, Kaare Brandt
2007-01-01
In this paper we present an empirical Bayesian framework for independent component analysis. The framework provides estimates of the sources, the mixing matrix and the noise parameters, and is flexible with respect to choice of source prior and the number of sources and sensors. Inside the engine...
Loredo, Thomas J.
2004-04-01
I describe a framework for adaptive scientific exploration based on iterating an Observation-Inference-Design cycle that allows adjustment of hypotheses and observing protocols in response to the results of observation on-the-fly, as data are gathered. The framework uses a unified Bayesian methodology for the inference and design stages: Bayesian inference to quantify what we have learned from the available data and predict future data, and Bayesian decision theory to identify which new observations would teach us the most. When the goal of the experiment is simply to make inferences, the framework identifies a computationally efficient iterative ``maximum entropy sampling'' strategy as the optimal strategy in settings where the noise statistics are independent of signal properties. Results of applying the method to two ``toy'' problems with simulated data-measuring the orbit of an extrasolar planet, and locating a hidden one-dimensional object-show the approach can significantly improve observational efficiency in settings that have well-defined nonlinear models. I conclude with a list of open issues that must be addressed to make Bayesian adaptive exploration a practical and reliable tool for optimizing scientific exploration.
Bayesian logistic regression analysis
Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.
2012-01-01
In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an
International Nuclear Information System (INIS)
A study using revealed preference surveys and psychological tests was conducted. Key psychological variables of behavior involved in the choice of transportation mode in a population sample of the Metropolitan Area of the Valle de Aburra were detected. The experiment used the random utility theory for discrete choice models and reasoned action in order to assess beliefs. This was used as a tool for analysis of the psychological variables using the sixteen personality factor questionnaire (16PF test). In addition to the revealed preference surveys, two other surveys were carried out: one with socio-economic characteristics and the other with latent indicators. This methodology allows for an integration of discrete choice models and latent variables. The integration makes the model operational and quantifies the unobservable psychological variables. The most relevant result obtained was that anxiety affects the choice of urban transportation mode and shows that physiological alterations, as well as problems in perception and beliefs, can affect the decision-making process.
Tanaka, Tomoyuki; Rabbitts, Terence H.
2009-01-01
Antibodies are now indispensable tools for all areas of cell biology and biotechnology as well as for diagnosis and therapy. Antigen-specific single immunoglobulin variable domains that bind to native antigens can be isolated and manipulated using yeast intracellular antibody capture technology but converting these to whole monoclonal antibody requires that complementary variable domains (VH or VL) bind to the same antigenic site. We describe a simple approach (CatcherAb) for specific isolati...
Application of Bayesian decision theory to airborne gamma snow measurement
Bissell, V. C.
1975-01-01
Measured values of several variables are incorporated into the calculation of snow water equivalent as measured from an aircraft by snow attenuation of terrestrial gamma radiation. Bayesian decision theory provides a snow water equivalent measurement by taking into account the uncertainties in the individual measurement variables and filtering information about the measurement variables through prior notions of what the calculated variable (water equivalent) should be.
Asymptotic Properties of Criteria for Selection of Variables in Multiple Regression
Nishii, Ryuei
1984-01-01
In normal linear regression analysis, many model selection rules proposed from various viewpoints are available. For the information criteria AIC, FPE, $C_p$, PSS and BIC, the asymptotic distribution of the selected model and the asymptotic quadratic risk based on each criterion are explicitly obtained.
Shape, sizing optimization and material selection based on mixed variables and genetic algorithm
Tang, X.; Bassir, D.H.; Zhang, W.
2010-01-01
In this work, we explore simultaneous designs of materials selection and structural optimization. As the material selection turns out to be a discrete process that finds the optimal distribution of materials over the design domain, it cannot be performed with common gradient-based optimization metho
Bayesian Analysis of Dynamic Multivariate Models with Multiple Structural Breaks
Sugita, Katsuhiro
2006-01-01
This paper considers a vector autoregressive model or a vector error correction model with multiple structural breaks in any subset of parameters, using a Bayesian approach with Markov chain Monte Carlo simulation technique. The number of structural breaks is determined as a sort of model selection by the posterior odds. For a cointegrated model, cointegrating rank is also allowed to change with breaks. Bayesian approach by Strachan (Journal of Business and Economic Statistics 21 (2003) 185) ...
Comparison of Bayesian Sample Size Criteria: ACC, ALC, and WOC
Cao, Jing; Lee, J. Jack; Alber, Susan
2009-01-01
A challenge for implementing performance based Bayesian sample size determination is selecting which of several methods to use. We compare three Bayesian sample size criteria: the average coverage criterion (ACC) which controls the coverage rate of fixed length credible intervals over the predictive distribution of the data, the average length criterion (ALC) which controls the length of credible intervals with a fixed coverage rate, and the worst outcome criterion (WOC) which ensures the des...
One-Stage and Bayesian Two-Stage Optimal Designs for Mixture Models
Lin, Hefang
1999-01-01
In this research, Bayesian two-stage D-D optimal designs for mixture experiments with or without process variables under model uncertainty are developed. A Bayesian optimality criterion is used in the first stage to minimize the determinant of the posterior variances of the parameters. The second stage design is then generated according to an optimality procedure that collaborates with the improved model from first stage data. Our results show that the Bayesian two-stage D-D optimal design...
Murphy, Thomas Brendan; Dean, Nema; Raftery, Adrian E.
2010-01-01
Food authenticity studies are concerned with determining if food samples have been correctly labeled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give ...
K. KAMALAKKANNAN; DR. KAUKAB AZEEM; Dr.C.Arumugam
2011-01-01
The purpose of this study is to analyze the effect of aquatic plyometric training with and without the use ofweights on selected physical fitness variables among volleyball players. To achieve the purpose of these study 36physically active undergraduate volleyball players between 18 and 20 years of age volunteered as participants.The participants were randomly categorized into three groups of 12 each: a control group (CG), an aquaticPlyometric training with weight group (APTWG), and an aquati...
A stochastic analysis of terrain evaluation variables for path selection. [roving vehicle navigation
Donohue, J. G.; Shen, C. N.
1978-01-01
A stochastic analysis was performed on the variables associated with the characteristics of the terrain encountered by a roving system with an autonomous navigation system. A laser rangefinder is employed to detect terrain features at ranges up to 75 m. Analytic expressions and a numerical scheme were developed to calculate the variance of data on these four variables: (1) body clearance, (2) in-path slope, (3) tilt slope, and (4) wheel deviation. The variance is due to noise in the range data. It was found that the standard deviation of these terrain variables is large enough to warrant the use of a safety margin to aid the roving vehicle in avoiding high risk areas.
X-ray spectral variability of LINERs selected from the Palomar sample
Hernández-García, L.; González-Martín, O.; Masegosa, J.; Márquez, I.
2014-09-01
Context. Variability is a general property of active galactic nuclei (AGN). The way in which these changes occur at X-rays is not yet clearly understood. In the particular case of low-ionization nuclear emission line region (LINER) nuclei, variations on the timescales from months to years have been found for some objects, but the main driver of these changes is still debated. Aims: The main purpose of this work is to investigate the X-ray variability in LINERs, including the main driver of these variations, and to search for possible differences between type 1 and 2 objects. Methods: We examined the 18 LINERs in the Palomar sample with data retrieved from the Chandra and/or XMM-Newton archives that correspond to observations gathered at different epochs. All the spectra for the same object were fitted simultaneously to study long-term variations. The nature of the variability patterns were studied by allowing different parameters to vary during the spectral fit. Whenever possible, short-term variations from the analysis of the light curves and long-term UV variability were studied. Results: Short-term variations are not reported in X-rays. Three LINERs are classified as non-AGN candidates in X-rays, all of them are Compton-thick candidates; none of them show variations at these frequencies, and two of them vary in the UV. Long-term X-ray variations were analyzed in 12 out of 15 AGN candidates; about half of them showed variability (7 out of the 12). At UV frequencies, most of the AGN candidates with available data are variable (five out of six). Thus, 13 AGN candidates are analyzed at UV and/or X-rays, ten of which are variable at least in one energy band. None of the three objects that do not vary in X-rays have available UV data. This means that variability on long-timescales is very common in LINERs. These X-ray variations are mainly driven by changes in the nuclear power, while changes in absorptions are found only for NGC 1052. We do not find any difference
Ildikó Ungvári; Gábor Hullám; Péter Antal; Petra Sz Kiszel; András Gézsi; Éva Hadadi; Viktor Virág; Gergely Hajós; András Millinghoffer; Adrienne Nagy; András Kiss; Semsei, Ágnes F.; Gergely Temesi; Béla Melegh; Péter Kisfali
2012-01-01
Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls). The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA). Th...
Potts, Richard; Faith, J Tyler
2015-10-01
Interaction of orbital insolation cycles defines a predictive model of alternating phases of high- and low-climate variability for tropical East Africa over the past 5 million years. This model, which is described in terms of climate variability stages, implies repeated increases in landscape/resource instability and intervening periods of stability in East Africa. It predicts eight prolonged (>192 kyr) eras of intensified habitat instability (high variability stages) in which hominin evolutionary innovations are likely to have occurred, potentially by variability selection. The prediction that repeated shifts toward high climate variability affected paleoenvironments and evolution is tested in three ways. In the first test, deep-sea records of northeast African terrigenous dust flux (Sites 721/722) and eastern Mediterranean sapropels (Site 967A) show increased and decreased variability in concert with predicted shifts in climate variability. These regional measurements of climate dynamics are complemented by stratigraphic observations in five basins with lengthy stratigraphic and paleoenvironmental records: the mid-Pleistocene Olorgesailie Basin, the Plio-Pleistocene Turkana and Olduvai Basins, and the Pliocene Tugen Hills sequence and Hadar Basin--all of which show that highly variable landscapes inhabited by hominin populations were indeed concentrated in predicted stages of prolonged high climate variability. Second, stringent null-model tests demonstrate a significant association of currently known first and last appearance datums (FADs and LADs) of the major hominin lineages, suites of technological behaviors, and dispersal events with the predicted intervals of prolonged high climate variability. Palynological study in the Nihewan Basin, China, provides a third test, which shows the occupation of highly diverse habitats in eastern Asia, consistent with the predicted increase in adaptability in dispersing Oldowan hominins. Integration of fossil, archeological
Sensor fault diagnosis using Bayesian belief networks
International Nuclear Information System (INIS)
This paper describes a method based on Bayesian belief networks (BBNs) sensor fault detection, isolation, classification, and accommodation (SFDIA). For this purpose, a BBN uses three basic types of nodes to represent the information associated with each sensor: (1) sensor-reading nodes that represent the mechanisms by which the information is communicated to the BBN, (2) sensor-status nodes that convey the status of the corresponding sensors at any given time, and (3) process-variable nodes that are a conceptual representation of the actual values of the process variables, which are unknown
A Bayesian Concept Learning Approach to Crowdsourcing
DEFF Research Database (Denmark)
Viappiani, Paolo Renato; Zilles, Sandra; Hamilton, Howard J.;
2011-01-01
We develop a Bayesian approach to concept learning for crowdsourcing applications. A probabilistic belief over possible concept definitions is maintained and updated according to (noisy) observations from experts, whose behaviors are modeled using discrete types. We propose recommendation...... techniques, inference methods, and query selection strategies to assist a user charged with choosing a configuration that satisfies some (partially known) concept. Our model is able to simultaneously learn the concept definition and the types of the experts. We evaluate our model with simulations, showing...
The Multifaceted Variable Approach: Selection of Method in Solving Simple Linear Equations
Tahir, Salma; Cavanagh, Michael
2010-01-01
This paper presents a comparison of the solution strategies used by two groups of Year 8 students as they solved linear equations. The experimental group studied algebra following a multifaceted variable approach, while the comparison group used a traditional approach. Students in the experimental group employed different solution strategies,…
Smith, Marcia
2013-01-01
The purpose of the study was to determine the degree to which academic and demographic variables affected the ACT results used in determining college readiness. This quantitative research study followed a non-experimental correlational design. A multiple regression was used to analyze archival data to determine the impact the combined Arkansas…
Parkay, Forrest W.; Conoley, Colleen
The purpose of this study was twofold: (1) to determine educators' attitudes toward corporal punishment and its alternatives in a variety of school settings throughout the Southwest; and (2) to explore the relationships between respondents' attitudes and such independent variables as dogmatism, sex, experience, level of education, job description,…
Cortical Response Variability as a Developmental Index of Selective Auditory Attention
Strait, Dana L.; Slater, Jessica; Abecassis, Victor; Kraus, Nina
2014-01-01
Attention induces synchronicity in neuronal firing for the encoding of a given stimulus at the exclusion of others. Recently, we reported decreased variability in scalp-recorded cortical evoked potentials to attended compared with ignored speech in adults. Here we aimed to determine the developmental time course for this neural index of auditory…
Temporal variability of selected chemical and physical propertires of topsoil of three soil types
Czech Academy of Sciences Publication Activity Database
Jirků, V.; Kodešová, R.; Nikodem, A.; Mühlhanselová, M.; Žigová, Anna
2013-01-01
Roč. 15, - (2013). ISSN 1607-7962. [EGU General Assembly /10./. 07.04.2013-12.04.2013, Vienna] R&D Projects: GA ČR GA526/08/0434 Institutional support: RVO:67985831 Keywords : soil properties * soil types * temporal variability Subject RIV: DF - Soil Science http://meetingorganizer.copernicus.org/EGU2013/EGU2013-7650-1.pdf
Fitzpatrick, Benjamin R; Lamb, David W; Mengersen, Kerrie
2016-01-01
Modern soil mapping is characterised by the need to interpolate point referenced (geostatistical) observations and the availability of large numbers of environmental characteristics for consideration as covariates to aid this interpolation. Modelling tasks of this nature also occur in other fields such as biogeography and environmental science. This analysis employs the Least Angle Regression (LAR) algorithm for fitting Least Absolute Shrinkage and Selection Operator (LASSO) penalized Multiple Linear Regressions models. This analysis demonstrates the efficiency of the LAR algorithm at selecting covariates to aid the interpolation of geostatistical soil carbon observations. Where an exhaustive search of the models that could be constructed from 800 potential covariate terms and 60 observations would be prohibitively demanding, LASSO variable selection is accomplished with trivial computational investment. PMID:27603135
Bayesian Magic in Asteroseismology
Kallinger, T.
2015-09-01
Only a few years ago asteroseismic observations were so rare that scientists had plenty of time to work on individual data sets. They could tune their algorithms in any possible way to squeeze out the last bit of information. Nowadays this is impossible. With missions like MOST, CoRoT, and Kepler we basically drown in new data every day. To handle this in a sufficient way statistical methods become more and more important. This is why Bayesian techniques started their triumph march across asteroseismology. I will go with you on a journey through Bayesian Magic Land, that brings us to the sea of granulation background, the forest of peakbagging, and the stony alley of model comparison.
Approximate Bayesian recursive estimation
Czech Academy of Sciences Publication Activity Database
Kárný, Miroslav
2014-01-01
Roč. 285, č. 1 (2014), s. 100-111. ISSN 0020-0255 R&D Projects: GA ČR GA13-13502S Institutional support: RVO:67985556 Keywords : Approximate parameter estimation * Bayesian recursive estimation * Kullback–Leibler divergence * Forgetting Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 4.038, year: 2014 http://library.utia.cas.cz/separaty/2014/AS/karny-0425539.pdf
Bayesian Benchmark Dose Analysis
Fang, Qijun; Piegorsch, Walter W.; Barnes, Katherine Y.
2014-01-01
An important objective in environmental risk assessment is estimation of minimum exposure levels, called Benchmark Doses (BMDs) that induce a pre-specified Benchmark Response (BMR) in a target population. Established inferential approaches for BMD analysis typically involve one-sided, frequentist confidence limits, leading in practice to what are called Benchmark Dose Lower Limits (BMDLs). Appeal to Bayesian modeling and credible limits for building BMDLs is far less developed, however. Indee...
Bayesian Generalized Rating Curves
Helgi Sigurðarson 1985
2014-01-01
A rating curve is a curve or a model that describes the relationship between water elevation, or stage, and discharge in an observation site in a river. The rating curve is fit from paired observations of stage and discharge. The rating curve then predicts discharge given observations of stage and this methodology is applied as stage is substantially easier to directly observe than discharge. In this thesis a statistical rating curve model is proposed working within the framework of Bayesian...
Heteroscedastic Treed Bayesian Optimisation
Assael, John-Alexander M.; Wang, Ziyu; Shahriari, Bobak; De Freitas, Nando
2014-01-01
Optimising black-box functions is important in many disciplines, such as tuning machine learning models, robotics, finance and mining exploration. Bayesian optimisation is a state-of-the-art technique for the global optimisation of black-box functions which are expensive to evaluate. At the core of this approach is a Gaussian process prior that captures our belief about the distribution over functions. However, in many cases a single Gaussian process is not flexible enough to capture non-stat...
Efficient Bayesian Phase Estimation
Wiebe, Nathan; Granade, Chris
2016-07-01
We introduce a new method called rejection filtering that we use to perform adaptive Bayesian phase estimation. Our approach has several advantages: it is classically efficient, easy to implement, achieves Heisenberg limited scaling, resists depolarizing noise, tracks time-dependent eigenstates, recovers from failures, and can be run on a field programmable gate array. It also outperforms existing iterative phase estimation algorithms such as Kitaev's method.
Brody, Samuel; Lapata, Mirella
2009-01-01
Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim is to partition a word’s contexts into different classes, each representing a word sense. Our work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samp...
Bayesian Neural Word Embedding
Barkan, Oren
2016-01-01
Recently, several works in the domain of natural language processing presented successful methods for word embedding. Among them, the Skip-gram (SG) with negative sampling, known also as Word2Vec, advanced the state-of-the-art of various linguistics tasks. In this paper, we propose a scalable Bayesian neural word embedding algorithm that can be beneficial to general item similarity tasks as well. The algorithm relies on a Variational Bayes solution for the SG objective and a detailed step by ...
Doubly sparse factor models for unifying feature transformation and feature selection
Energy Technology Data Exchange (ETDEWEB)
Katahira, Kentaro; Okanoya, Kazuo; Okada, Masato [ERATO, Okanoya Emotional Information Project, Japan Science Technology Agency, Saitama (Japan); Matsumoto, Narihisa; Sugase-Miyamoto, Yasuko, E-mail: okada@k.u-tokyo.ac.j [Human Technology Research Institute, National Institute of Advanced Industrial Science and Technology, Ibaraki (Japan)
2010-06-01
A number of unsupervised learning methods for high-dimensional data are largely divided into two groups based on their procedures, i.e., (1) feature selection, which discards irrelevant dimensions of the data, and (2) feature transformation, which constructs new variables by transforming and mixing over all dimensions. We propose a method that both selects and transforms features in a common Bayesian inference procedure. Our method imposes a doubly automatic relevance determination (ARD) prior on the factor loading matrix. We propose a variational Bayesian inference for our model and demonstrate the performance of our method on both synthetic and real data.
Directory of Open Access Journals (Sweden)
Boney Wera
2015-07-01
Full Text Available Successful crop breeding program incorporating agronomic and consumer preferred traits can be achieved by recognizing the existence and degree of variability among sweetpotato (Ipomoea batatas, (L. Lam. genotypes. Understanding genetic variability, genotypic and phenotypic correlation and inheritance among agronomic traits is fundamental to improvement of any crop. The study was carried out with the objective to estimate the genotypic variability and other yield related traits of highlands sweetpotato in Papua New Guinea in a polycross population. A total of 8 genotypes of sweetpotato derived from the polycross were considered in two cycles of replicated field experiments. Analysis of Variance was computed to contrast the variability within the selected genotypes based on high yielding β-carotene rich orange-fleshed sweetpotato. The results revealed significant differences among the genotypes. Genotypic coefficient of variation (GCV % was lower than phenotypic coefficient of variation (PCV % for all traits studied. Relatively high genetic variance, along with high heritability and expected genetic advances were observed in NMTN and ABYield. Harvest index (HI, scab and gall mite damage scores had heritability of 67%, 66% and 37% respectively. Marketable tuber yield (MTYield and total tuber yield (TTYield had lower genetic variance, low heritability and low genetic advance. There is need to investigate correlated inheritance among these traits. Selecting directly for yield improvement in polycross population may not be very efficient as indicated by the results. Therefore, it can be conclude that the variability within sweetpotato genotypes collected from polycross population in Aiyura Research Station for tuber yield is low and the extent of its yield improvement is narrow.
Singh, A; Saini, B S; Singh, D
2016-05-01
This study presents an alternative approach to approximate entropy (ApEn) threshold value (r) selection. There are two limitations of traditional ApEn algorithm: (1) the occurrence of undefined conditional probability (CPu) where no template match is found and (2) use of a crisp tolerance (radius) threshold 'r'. To overcome these limitations, CPu is substituted with optimum bias setting ɛ opt which is found by varying ɛ from (1/N - m) to 1 in the increments of 0.05, where N is the length of the series and m is the embedding dimension. Furthermore, an alternative approach for selection of r based on binning the distance values obtained by template matching to calculate ApEnbin is presented. It is observed that ApEnmax, ApEnchon and ApEnbin converge for ɛ opt = 0.6 in 50 realizations (n = 50) of random number series of N = 300. Similar analysis suggests ɛ opt = 0.65 and ɛ opt = 0.45 for 50 realizations each of fractional Brownian motion and MIX(P) series (Lu et al. in J Clin Monit Comput 22(1):23-29, 2008). ɛ opt = 0.5 is suggested for heart rate variability (HRV) and systolic blood pressure variability (SBPV) signals obtained from 50 young healthy subjects under supine and upright position. It is observed that (1) ApEnbin of HRV is lower than SBPV, (2) ApEnbin of HRV increases from supine to upright due to vagal inhibition and (3) ApEnbin of BPV decreases from supine to upright due to sympathetic activation. Moreover, merit of ApEnbin is that it provides an alternative to the cumbersome ApEnmax procedure. PMID:26253284
Wiegerinck, Wim; Schoenaker, Christiaan; Duane, Gregory
2016-04-01
Recently, methods for model fusion by dynamically combining model components in an interactive ensemble have been proposed. In these proposals, fusion parameters have to be learned from data. One can view these systems as parametrized dynamical systems. We address the question of learnability of dynamical systems with respect to both short term (vector field) and long term (attractor) behavior. In particular we are interested in learning in the imperfect model class setting, in which the ground truth has a higher complexity than the models, e.g. due to unresolved scales. We take a Bayesian point of view and we define a joint log-likelihood that consists of two terms, one is the vector field error and the other is the attractor error, for which we take the L1 distance between the stationary distributions of the model and the assumed ground truth. In the context of linear models (like so-called weighted supermodels), and assuming a Gaussian error model in the vector fields, vector field learning leads to a tractable Gaussian solution. This solution can then be used as a prior for the next step, Bayesian attractor learning, in which the attractor error is used as a log-likelihood term. Bayesian attractor learning is implemented by elliptical slice sampling, a sampling method for systems with a Gaussian prior and a non Gaussian likelihood. Simulations with a partially observed driven Lorenz 63 system illustrate the approach.
Bayesian theory and applications
Dellaportas, Petros; Polson, Nicholas G; Stephens, David A
2013-01-01
The development of hierarchical models and Markov chain Monte Carlo (MCMC) techniques forms one of the most profound advances in Bayesian analysis since the 1970s and provides the basis for advances in virtually all areas of applied and theoretical Bayesian statistics. This volume guides the reader along a statistical journey that begins with the basic structure of Bayesian theory, and then provides details on most of the past and present advances in this field. The book has a unique format. There is an explanatory chapter devoted to each conceptual advance followed by journal-style chapters that provide applications or further advances on the concept. Thus, the volume is both a textbook and a compendium of papers covering a vast range of topics. It is appropriate for a well-informed novice interested in understanding the basic approach, methods and recent applications. Because of its advanced chapters and recent work, it is also appropriate for a more mature reader interested in recent applications and devel...
Learning Bayesian networks using genetic algorithm
Institute of Scientific and Technical Information of China (English)
Chen Fei; Wang Xiufeng; Rao Yimei
2007-01-01
A new method to evaluate the fitness of the Bayesian networks according to the observed data is provided. The main advantage of this criterion is that it is suitable for both the complete and incomplete cases while the others not.Moreover it facilitates the computation greatly. In order to reduce the search space, the notation of equivalent class proposed by David Chickering is adopted. Instead of using the method directly, the novel criterion, variable ordering, and equivalent class are combined,moreover the proposed mthod avoids some problems caused by the previous one. Later, the genetic algorithm which allows global convergence, lack in the most of the methods searching for Bayesian network is applied to search for a good model in thisspace. To speed up the convergence, the genetic algorithm is combined with the greedy algorithm. Finally, the simulation shows the validity of the proposed approach.
Richards, Joseph W; Brink, Henrik; Miller, Adam A; Bloom, Joshua S; Butler, Nathaniel R; James, J Berian; Long, James P; Rice, John
2011-01-01
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---i...
Identifying market segments in consumer markets: variable selection and data interpretation
Tonks, D G
2004-01-01
Market segmentation is often articulated as being a process which displays the recognised features of classical rationalism but in part; convention, convenience, prior experience and the overarching impact of rhetoric will influence if not determine the outcomes of a segmentation exercise. Particular examples of this process are addressed critically in this paper which concentrates on the issues of variable choice for multivariate approaches to market segmentation and also the methods used fo...
A scale-independent clustering method with automatic variable selection based on trees
Lynch, Sarah K.
2014-01-01
Approved for public release; distribution is unlimited. Clustering is the process of putting observations into groups based on their distance, or dissimilarity, from one another. Measuring distance for continuous variables often requires scaling or monotonic transformation. Determining dissimilarity when observations have both continuous and categorical measurements can be difficult because each type of measurement must be approached differently. We introduce a new clustering method that u...
International Nuclear Information System (INIS)
Dose predictions for the ingestion of 90Sr and 137Cs, using aquatic and terrestrial food chain transport models similar to those in the Nuclear Regulatory Commission's Regulatory Guide 1.109, are evaluated through estimating the variability of model parameters and determining the effect of this variability on model output. The variability in the predicted dose equivalent is determined using analytical and numerical procedures. In addition, a detailed discussion is included on 90Sr dosimetry. The overall estimates of uncertainty are most relevant to conditions where site-specific data is unavailable and when model structure and parameter estimates are unbiased. Based on the comparisons performed in this report, it is concluded that the use of the generic default parameters in Regulatory Guide 1.109 will usually produce conservative dose estimates that exceed the 90th percentile of the predicted distribution of dose equivalents. An exception is the meat pathway for 137Cs, in which use of generic default values results in a dose estimate at the 24th percentile. Among the terrestrial pathways of exposure, the non-leafy vegetable pathway is the most important for 90Sr. For 90Sr, the parameters for soil retention, soil-to-plant transfer, and internal dosimetry contribute most significantly to the variability in the predicted dose for the combined exposure to all terrestrial pathways. For 137Cs, the meat transfer coefficient the mass interception factor for pasture forage, and the ingestion dose factor are the most important parameters. The freshwater finfish bioaccumulation factor is the most important parameter for the dose prediction of 90Sr and 137Cs transported over the water-fish-man pathway
Suganya. S
2014-01-01
The purpose of the study was to compare the psychological variables such as anxiety, achievement motivation, self concept, locus of control and team relationship among the university women football players of south and west zone Defenders, Midfielders and Forwards. The requirements for the collection of data through administration of questionnaires were explained to the subjects so as to avoid any ambiguity of the effort required on their part and prior to the administration o...
Directory of Open Access Journals (Sweden)
Sariful Isalm
2014-04-01
Full Text Available The Purpose of the study is to find out the effect of functional and aerobic training on selected fitness and performance variables among Football players at College level. Pre test and post test randomized group design was applied to this research. Sixty College men Football players from Kolkatta city were randomly selected and they were assigned into four equal groups. Each group consisted of fifteen subjects. Pre test was conducted for all the Sixty subjects on selected fitness variables namely agility, explosive power and playing ability. Playing ability was measured by experts rating. This initial test scores formed as pre test scores of the subjects. Experimental Group I was exposed to Functional Training (FTG experimental group II was exposed to Aerobic Training (ATG, Experimental III was exposed combined functional and aerobic training (CFAT and the control group was not exposed to any experimental training other than their regular daily activities. The duration of experimental period was 12 weeks. After the experimental treatment, all the Sixty subjects were measured on the selected fitness and playing ability. This final test scores formed as post test scores of the subjects. The pre test and post test scores were subjected to statistical analysis using Analysis of Covariance (ANCOVA to find out the significance among the mean differences, whenever the 'F' ratio for adjusted test was found to be significant; Scheffe's Post hoc test was used. In all cases 0.05 level of significance was fixed to test hypotheses. The result of the study reveals that the experimental trainings significantly improved fitness variables namely agility, explosive power, and playing ability of the Football players
Baltic sea algae analysis using Bayesian spatial statistics methods
Directory of Open Access Journals (Sweden)
Eglė Baltmiškytė
2013-03-01
Full Text Available Spatial statistics is one of the fields in statistics dealing with spatialy spread data analysis. Recently, Bayes methods are often applied for data statistical analysis. A spatial data model for predicting algae quantity in the Baltic Sea is made and described in this article. Black Carrageen is a dependent variable and depth, sand, pebble, boulders are independent variables in the described model. Two models with different covariation functions (Gaussian and exponential are built to estimate the best model fitting for algae quantity prediction. Unknown model parameters are estimated and Bayesian kriging prediction posterior distribution is computed in OpenBUGS modeling environment by using Bayesian spatial statistics methods.
Target selection of classical pulsating variables for space-based photometry
Plachy, E; Szabó, R; Kolenberg, K; Bányai, E
2016-01-01
In a few years the Kepler and TESS missions will provide ultra-precise photometry for thousands of RR Lyrae and hundreds of Cepheid stars. In the extended Kepler mission all targets are proposed in the Guest Observer (GO) Program, while the TESS space telescope will work with full frame images and a ~15-16th mag brightness limit with the possibility of short cadence measurements for a limited number of pre-selected objects. This paper highlights some details of the enormous and important work of the target selection process made by the members of Working Group 7 (WG#7) of the Kepler and TESS Asteroseismic Science Consortium.
Target selection of classical pulsating variables for space-based photometry
Plachy, E.; Molnar, L.; Szabo, R.; Kolenberg, K.; Banyai, E.
2016-05-01
In a few years the Kepler and TESS missions will provide ultra- precise photometry for thousands of RR Lyrae and hundreds of Cepheid stars. In the extended Kepler mission all targets are proposed in the Guest Observer (GO) Program, while the TESS space telescope will work with full frame images and a ~15-16th mag brightness limit with the possibility of short cadence measurements for a limited number of pre-selected objects. This paper highlights some details of the enormous and important work of the target selection process made by the members of Working Group 7 (WG#7) of the Kepler and TESS Asteroseismic Science Consortium.
Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai
2015-10-01
The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree.
Fast, fully Bayesian spatiotemporal inference for fMRI data.
Musgrove, Donald R; Hughes, John; Eberly, Lynn E
2016-04-01
We propose a spatial Bayesian variable selection method for detecting blood oxygenation level dependent activation in functional magnetic resonance imaging (fMRI) data. Typical fMRI experiments generate large datasets that exhibit complex spatial and temporal dependence. Fitting a full statistical model to such data can be so computationally burdensome that many practitioners resort to fitting oversimplified models, which can lead to lower quality inference. We develop a full statistical model that permits efficient computation. Our approach eases the computational burden in two ways. We partition the brain into 3D parcels, and fit our model to the parcels in parallel. Voxel-level activation within each parcel is modeled as regressions located on a lattice. Regressors represent the magnitude of change in blood oxygenation in response to a stimulus, while a latent indicator for each regressor represents whether the change is zero or non-zero. A sparse spatial generalized linear mixed model captures the spatial dependence among indicator variables within a parcel and for a given stimulus. The sparse SGLMM permits considerably more efficient computation than does the spatial model typically employed in fMRI. Through simulation we show that our parcellation scheme performs well in various realistic scenarios. Importantly, indicator variables on the boundary between parcels do not exhibit edge effects. We conclude by applying our methodology to data from a task-based fMRI experiment. PMID:26553916
Prater, T.; Tilson, W.; Jones, Z.
2015-01-01
The absence of an economy of scale in spaceflight hardware makes additive manufacturing an immensely attractive option for propulsion components. As additive manufacturing techniques are increasingly adopted by government and industry to produce propulsion hardware in human-rated systems, significant development efforts are needed to establish these methods as reliable alternatives to conventional subtractive manufacturing. One of the critical challenges facing powder bed fusion techniques in this application is variability between machines used to perform builds. Even with implementation of robust process controls, it is possible for two machines operating at identical parameters with equivalent base materials to produce specimens with slightly different material properties. The machine variability study presented here evaluates 60 specimens of identical geometry built using the same parameters. 30 samples were produced on machine 1 (M1) and the other 30 samples were built on machine 2 (M2). Each of the 30-sample sets were further subdivided into three subsets (with 10 specimens in each subset) to assess the effect of progressive heat treatment on machine variability. The three categories for post-processing were: stress relief, stress relief followed by hot isostatic press (HIP), and stress relief followed by HIP followed by heat treatment per AMS 5664. Each specimen (a round, smooth tensile) was mechanically tested per ASTM E8. Two formal statistical techniques, hypothesis testing for equivalency of means and one-way analysis of variance (ANOVA), were applied to characterize the impact of machine variability and heat treatment on six material properties: tensile stress, yield stress, modulus of elasticity, fracture elongation, and reduction of area. This work represents the type of development effort that is critical as NASA, academia, and the industrial base work collaboratively to establish a path to certification for additively manufactured parts. For future
Directory of Open Access Journals (Sweden)
da Silva SMD
2016-03-01
Full Text Available Silvia Maria Doria da Silva, Ilma Aparecida Paschoal, Eduardo Mello De Capitani, Marcos Mello Moreira, Luciana Campanatti Palhares, Mônica Corso PereiraPneumology Service, Department of Internal Medicine, School of Medical Sciences, State University of Campinas (UNICAMP, Campinas, São Paulo, BrazilBackground: Computed tomography (CT phenotypic characterization helps in understanding the clinical diversity of chronic obstructive pulmonary disease (COPD patients, but its clinical relevance and its relationship with functional features are not clarified. Volumetric capnography (VC uses the principle of gas washout and analyzes the pattern of CO2 elimination as a function of expired volume. The main variables analyzed were end-tidal concentration of carbon dioxide (ETCO2, Slope of phase 2 (Slp2, and Slope of phase 3 (Slp3 of capnogram, the curve which represents the total amount of CO2 eliminated by the lungs during each breath.Objective: To investigate, in a group of patients with severe COPD, if the phenotypic analysis by CT could identify different subsets of patients, and if there was an association of CT findings and functional variables.Subjects and methods: Sixty-five patients with COPD Gold III–IV were admitted for clinical evaluation, high-resolution CT, and functional evaluation (spirometry, 6-minute walk test [6MWT], and VC. The presence and profusion of tomography findings were evaluated, and later, the patients were identified as having emphysema (EMP or airway disease (AWD phenotype. EMP and AWD groups were compared; tomography findings scores were evaluated versus spirometric, 6MWT, and VC variables.Results: Bronchiectasis was found in 33.8% and peribronchial thickening in 69.2% of the 65 patients. Structural findings of airways had no significant correlation with spirometric variables. Air trapping and EMP were strongly correlated with VC variables, but in opposite directions. There was some overlap between the EMP and AWD
Directory of Open Access Journals (Sweden)
K. KAMALAKKANNAN
2011-06-01
Full Text Available The purpose of this study is to analyze the effect of aquatic plyometric training with and without the use ofweights on selected physical fitness variables among volleyball players. To achieve the purpose of these study 36physically active undergraduate volleyball players between 18 and 20 years of age volunteered as participants.The participants were randomly categorized into three groups of 12 each: a control group (CG, an aquaticPlyometric training with weight group (APTWG, and an aquatic Plyometric training without weight group(APTWOG. The subjects of the control group were not exposed to any training. Both experimental groupsunderwent their respective experimental treatment for 12 weeks, 3 days per week and a single session on eachday. Speed, endurance, and explosive power were measured as the dependent variables for this study. 36 days ofexperimental treatment was conducted for all the groups and pre and post data was collected. The collected datawere analyzed using an analysis of covariance (ANCOVA and followed by a Scheffé’s post hoc test. The resultsrevealed significant differences between groups on all the selected dependent variables. This study demonstratedthat aquatic plyometric training can be one effective means for improving speed, endurance, and explosivepower in volley ball players
Unbounded Bayesian Optimization via Regularization
Shahriari, Bobak; Bouchard-Côté, Alexandre; De Freitas, Nando
2015-01-01
Bayesian optimization has recently emerged as a popular and efficient tool for global optimization and hyperparameter tuning. Currently, the established Bayesian optimization practice requires a user-defined bounding box which is assumed to contain the optimizer. However, when little is known about the probed objective function, it can be difficult to prescribe such bounds. In this work we modify the standard Bayesian optimization framework in a principled way to allow automatic resizing of t...
Bayesian optimization for materials design
Frazier, Peter I.; Wang, Jialei
2015-01-01
We introduce Bayesian optimization, a technique developed for optimizing time-consuming engineering simulations and for fitting machine learning models on large datasets. Bayesian optimization guides the choice of experiments during materials design and discovery to find good material designs in as few experiments as possible. We focus on the case when materials designs are parameterized by a low-dimensional vector. Bayesian optimization is built on a statistical technique called Gaussian pro...
Using Instrumental Variables to Account for Selection Effects in Research on First-Year Programs
Pike, Gary R.; Hansen, Michele J.; Lin, Ching-Hui
2011-01-01
The widespread popularity of programs for first-year students is due, in large part, to studies showing that participation in first-year programs is significantly related to students' academic success. Because students choose to participate in first-year programs, self-selection effects prevent researchers from making causal claims about the…
Cummings, Merrilyn N.; Bell, Camille G.
1979-01-01
A study was conducted to determine if students with varying grade point averages, self-concept, internal-external locus of control, and self-directedness, selected and performed differentially under two approaches to competency-based teacher education (teacher-directed and student-directed instruction). (JH)
Empirically Driven Variable Selection for the Estimation of Causal Effects with Observational Data
Keller, Bryan; Chen, Jianshen
2016-01-01
Observational studies are common in educational research, where subjects self-select or are otherwise non-randomly assigned to different interventions (e.g., educational programs, grade retention, special education). Unbiased estimation of a causal effect with observational data depends crucially on the assumption of ignorability, which specifies…
The study of variability and strain selection in Streptomyces atroolivaceus. III
International Nuclear Information System (INIS)
Mutants of Streptomyces atroolivaceus blocked in the biosynthesis of mithramycin were isolated both by natural selection and after treatment with mutagenic factors (UV and gamma rays, nitrous acid). Both physical factors were more effective than nitrous acid. The selection was complicated by the high instability of isolates, out of which 20 to 80%=. (depending on their origin) reversed spontaneously to the parent type. Primary screening (selection of morphological variants and determination of their activity using the method of agar blocks) made it possible to detect only potentially non-productive strains; however, the final selection always had to be made under submerged conditions. Fifty-four stable non-productive mutants were divided, according to results of the chromatographic analysis, into five groups differing in the production of the six biologically inactive metabolites. The mutants did not accumulate chromomycinone, chromocyclomycin and chromocyclin. On mixed cultivation none of the pairs of mutants was capable of the cosynthesis of mithramycin or of new compounds differing from standard metabolites. Possible causes of the above results are discussed. (author)
The Relationship between Selected Body Composition Variables and Muscular Endurance in Women
Esco, Michael R.; Olson, Michele S.; Williford, Henry N.
2010-01-01
The primary purpose of this study was to determine if muscular endurance is affected by referenced waist circumference groupings, independent of body mass and subcutaneous abdominal fat, in women. This study also explored whether selected body composition measures were associated with muscular endurance. Eighty-four women were measured for height,…
SOMBI: Bayesian identification of parameter relations in unstructured cosmological data
Frank, Philipp; Enßlin, Torsten A
2016-01-01
This work describes the implementation and application of a correlation determination method based on Self Organizing Maps and Bayesian Inference (SOMBI). SOMBI aims to automatically identify relations between different observed parameters in unstructured cosmological or astrophysical surveys by automatically identifying data clusters in high-dimensional datasets via the Self Organizing Map neural network algorithm. Parameter relations are then revealed by means of a Bayesian inference within respective identified data clusters. Specifically such relations are assumed to be parametrized as a polynomial of unknown order. The Bayesian approach results in a posterior probability distribution function for respective polynomial coefficients. To decide which polynomial order suffices to describe correlation structures in data, we include a method for model selection, the Bayesian Information Criterion, to the analysis. The performance of the SOMBI algorithm is tested with mock data. As illustration we also provide ...
Xu, Yunfei; Dass, Sarat; Maiti, Tapabrata
2016-01-01
This brief introduces a class of problems and models for the prediction of the scalar field of interest from noisy observations collected by mobile sensor networks. It also introduces the problem of optimal coordination of robotic sensors to maximize the prediction quality subject to communication and mobility constraints either in a centralized or distributed manner. To solve such problems, fully Bayesian approaches are adopted, allowing various sources of uncertainties to be integrated into an inferential framework effectively capturing all aspects of variability involved. The fully Bayesian approach also allows the most appropriate values for additional model parameters to be selected automatically by data, and the optimal inference and prediction for the underlying scalar field to be achieved. In particular, spatio-temporal Gaussian process regression is formulated for robotic sensors to fuse multifactorial effects of observations, measurement noise, and prior distributions for obtaining the predictive di...
PAC-Bayesian Analysis of Martingales and Multiarmed Bandits
Seldin, Yevgeny; Shawe-Taylor, John; Peters, Jan; Auer, Peter
2011-01-01
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many ...
An Evaluation of Select Test Variables Potentially Affecting Acute Oil Toxicity.
Echols, Brandi S; Smith, A; Gardinali, P; Rand, G
2016-02-01
In the wake of the Deepwater Horizon incident (2010) in the Gulf of Mexico, an abundance of research studies have been performed, but the methodologies used have varied making comparisons and replication difficult. In this study, acute toxicity tests with mysids and inland silversides were performed to examine the effect of different variables on test results. The toxicity test variables evaluated in this study included (1) open versus closed static test chambers, (2) natural versus artificial diluent, (3) aerated versus nonaerated test solution, and (4) low versus medium energy water-accommodated (WAF) mixing energies. The use of tests using natural or artificial diluent showed no difference in either toxicity test or analytical chemistry results. Based on median lethal concentrations (LC50) of WAFs of unweathered oil (MASS), mysid tests performed in closed chambers were approximately 41 % lower than LC50 values from open-chamber studies, possibly a result of the presence of low-molecular weight volatile aromatics (i.e., naphthalenes). This research also showed that using a medium-energy WAF (with a 20–25 % vortex) increases the number of chemical components compared with low-energy WAF, thus affecting the composition of the exposure media and increasing toxicity. The comparison of toxic units as a measure of the potential toxicity of fresh and weathered oils showed that weathered oils (e.g., Juniper, CTC) are less toxic than the unweathered MASS oil. In the event of future oil spills, these variables should be considered to ensure that data regarding the potential toxicity and environmental risk are of good quality and reproducible. PMID:26467150
A nonparametric Bayesian method for estimating a response function
Brown, Scott; Meeden, Glen
2012-01-01
Consider the problem of estimating a response function which depends upon a non-stochastic independent variable under our control. The data are independent Bernoulli random variables where the probabilities of success are given by the response function at the chosen values of the independent variable. Here we present a nonparametric Bayesian method for estimating the response function. The only prior information assumed is that the response function can be well approximated by a mixture of st...
International Nuclear Information System (INIS)
Laser-induced breakdown spectroscopy has been used to obtain spectral fingerprints from live bacterial specimens from thirteen distinct taxonomic bacterial classes representative of five bacterial genera. By taking sums, ratios, and complex ratios of measured atomic emission line intensities three unique sets of independent variables (models) were constructed to determine which choice of independent variables provided optimal genus-level classification of unknown specimens utilizing a discriminant function analysis. A model composed of 80 independent variables constructed from simple and complex ratios of the measured emission line intensities was found to provide the greatest sensitivity and specificity. This model was then used in a partial least squares discriminant analysis to compare the performance of this multivariate technique with a discriminant function analysis. The partial least squares discriminant analysis possessed a higher true positive rate, possessed a higher false positive rate, and was more effective at distinguishing between highly similar spectra from closely related bacterial genera. This suggests it may be the preferred multivariate technique in future species-level or strain-level classifications. - Highlights: • Laser-induced breakdown spectroscopy was used to classify bacteria by genus. • We examine three different independent variable down selection models. • A PLS-DA returned higher rates of true positives than a DFA. • A PLS-DA returned higher rates of false positives than a DFA. • A PLS-DA was better able to discriminate similar spectra compared to DFA
Directory of Open Access Journals (Sweden)
Czyż, Piotr
2014-10-01
Full Text Available Aim. The aim of this study was to carry out a comparative analysis of selected psychopathological and personality variables in patients with allergic and non-allergic asthma, as well as an attempt to determine the significance and strength of these variables in the clinical picture of both forms of the disease. Methods. In all patients structured anamnesis, basic spirometry, and dyspnea measurement were carried out. The level of anxiety was determined using Spielberger’s questionnaire. The intensity of depression was evaluated with Beck’s Inventory. Neuroticism and extroversion-introversion were assessed by Eysenck’s Inventory. The I-E scale was used to determine the perception of the locus of control. Results. The lack of significant differences in the area of psychopathological and personality variables was found between the two types of asthma. The gender differentiated patients with respect to psychopathology. The intensity of extroversion correlated with the duration of the disease. In the case of neuroticism, the clinical form of the disease was associated with blurring the differences between genders. The intensity of dyspnea and the spirometric results correlated with the psychological background of the disease. Conclusions. No significant differences in the area of psychopathology and personality dimensions between the groups of patients with allergic and non-allergic asthma were found although psychological variables are associated with the course of asthma in adults.
Selected topics in the classical theory of functions of a complex variable
Heins, Maurice
2014-01-01
Elegant and concise, this text is geared toward advanced undergraduate students acquainted with the theory of functions of a complex variable. The treatment presents such students with a number of important topics from the theory of analytic functions that may be addressed without erecting an elaborate superstructure. These include some of the theory's most celebrated results, which seldom find their way into a first course. After a series of preliminaries, the text discusses properties of meromorphic functions, the Picard theorem, and harmonic and subharmonic functions. Subsequent topics incl
Induction and selection of superior genetic variables of oil seed rape (brassica napus L.)
International Nuclear Information System (INIS)
Dry and uniform seeds of two rape seed varieties, Ganyou-5 and Tower, were subjected to different doses of gamma rays. Genetic variation in yield and yield components generated in M1 was studied in M2 and 30 useful variants were isolated from a large magnetized population. The selected mutants were progeny tested for stability of the characters in M3. Only five out of 30 progenies were identified to be uniform and stable. Further selection was made in the segregating m3 progenies. Results on some of the promising mutants are reported. The effect of irradiation treatment was highly pronounced on pod length, seeds per pod and 1000-seed weight. The genetic changes thus induced would help to evolve high yielding versions of different rape seed varieties under local environmental conditions. (author)
Bayesian multivariate mixed-scale density estimation
Canale, Antonio
2011-01-01
Although univariate continuous density estimation has received abundant attention in the Bayesian nonparametrics literature, there is essentially no theory on multivariate mixed scale density estimation. In this article, we consider a general framework to jointly model continuous, count and categorical variables under a nonparametric prior, which is induced through rounding latent variables having an unknown density with respect to Lesbesgue measure. For the proposed class of priors, we provide sufficient conditions for large support, strong consistency and rates of posterior contraction. These conditions, which primarily relate to the prior on the latent variable density and heaviness of the tails for the observed continuous variables, allow one to convert sufficient conditions obtained in the setting of multivariate continuous density estimation to the mixed scale case. We provide new results in the multivariate continuous density estimation case, showing the Kullback-Leibler property and strong consistency...