sample-selection estimation model: Topics by WorldWideScience.org

Sample records for sample-selection estimation model

Robust inference in sample selection models

KAUST Repository

Zhelonkin, Mikhail; Genton, Marc G.; Ronchetti, Elvezio

2015-01-01

The problem of non-random sample selectivity often occurs in practice in many fields. The classical estimators introduced by Heckman are the backbone of the standard statistical analysis of these models. However, these estimators are very sensitive to small deviations from the distributional assumptions which are often not satisfied in practice. We develop a general framework to study the robustness properties of estimators and tests in sample selection models. We derive the influence function and the change-of-variance function of Heckman's two-stage estimator, and we demonstrate the non-robustness of this estimator and its estimated variance to small deviations from the model assumed. We propose a procedure for robustifying the estimator, prove its asymptotic normality and give its asymptotic variance. Both cases with and without an exclusion restriction are covered. This allows us to construct a simple robust alternative to the sample selection bias test. We illustrate the use of our new methodology in an analysis of ambulatory expenditures and we compare the performance of the classical and robust methods in a Monte Carlo simulation study.
Robust inference in sample selection models

KAUST Repository

Zhelonkin, Mikhail

2015-11-20

The problem of non-random sample selectivity often occurs in practice in many fields. The classical estimators introduced by Heckman are the backbone of the standard statistical analysis of these models. However, these estimators are very sensitive to small deviations from the distributional assumptions which are often not satisfied in practice. We develop a general framework to study the robustness properties of estimators and tests in sample selection models. We derive the influence function and the change-of-variance function of Heckman\\'s two-stage estimator, and we demonstrate the non-robustness of this estimator and its estimated variance to small deviations from the model assumed. We propose a procedure for robustifying the estimator, prove its asymptotic normality and give its asymptotic variance. Both cases with and without an exclusion restriction are covered. This allows us to construct a simple robust alternative to the sample selection bias test. We illustrate the use of our new methodology in an analysis of ambulatory expenditures and we compare the performance of the classical and robust methods in a Monte Carlo simulation study.
Optimal Selection of the Sampling Interval for Estimation of Modal Parameters by an ARMA- Model

DEFF Research Database (Denmark)

Kirkegaard, Poul Henning

1993-01-01

Optimal selection of the sampling interval for estimation of the modal parameters by an ARMA-model for a white noise loaded structure modelled as a single degree of- freedom linear mechanical system is considered. An analytical solution for an optimal uniform sampling interval, which is optimal...
High-dimensional model estimation and model selection

CERN Multimedia

CERN. Geneva

2015-01-01

I will review concepts and algorithms from high-dimensional statistics for linear model estimation and model selection. I will particularly focus on the so-called p>>n setting where the number of variables p is much larger than the number of samples n. I will focus mostly on regularized statistical estimators that produce sparse models. Important examples include the LASSO and its matrix extension, the Graphical LASSO, and more recent non-convex methods such as the TREX. I will show the applicability of these estimators in a diverse range of scientific applications, such as sparse interaction graph recovery and high-dimensional classification and regression problems in genomics.
Sample size estimation and sampling techniques for selecting a representative sample

Directory of Open Access Journals (Sweden)

Aamir Omair

2014-01-01

Full Text Available Introduction: The purpose of this article is to provide a general understanding of the concepts of sampling as applied to health-related research. Sample Size Estimation: It is important to select a representative sample in quantitative research in order to be able to generalize the results to the target population. The sample should be of the required sample size and must be selected using an appropriate probability sampling technique. There are many hidden biases which can adversely affect the outcome of the study. Important factors to consider for estimating the sample size include the size of the study population, confidence level, expected proportion of the outcome variable (for categorical variables/standard deviation of the outcome variable (for numerical variables, and the required precision (margin of accuracy from the study. The more the precision required, the greater is the required sample size. Sampling Techniques: The probability sampling techniques applied for health related research include simple random sampling, systematic random sampling, stratified random sampling, cluster sampling, and multistage sampling. These are more recommended than the nonprobability sampling techniques, because the results of the study can be generalized to the target population.
Accounting for animal movement in estimation of resource selection functions: sampling and data analysis.

Science.gov (United States)

Forester, James D; Im, Hae Kyung; Rathouz, Paul J

2009-12-01

Patterns of resource selection by animal populations emerge as a result of the behavior of many individuals. Statistical models that describe these population-level patterns of habitat use can miss important interactions between individual animals and characteristics of their local environment; however, identifying these interactions is difficult. One approach to this problem is to incorporate models of individual movement into resource selection models. To do this, we propose a model for step selection functions (SSF) that is composed of a resource-independent movement kernel and a resource selection function (RSF). We show that standard case-control logistic regression may be used to fit the SSF; however, the sampling scheme used to generate control points (i.e., the definition of availability) must be accommodated. We used three sampling schemes to analyze simulated movement data and found that ignoring sampling and the resource-independent movement kernel yielded biased estimates of selection. The level of bias depended on the method used to generate control locations, the strength of selection, and the spatial scale of the resource map. Using empirical or parametric methods to sample control locations produced biased estimates under stronger selection; however, we show that the addition of a distance function to the analysis substantially reduced that bias. Assuming a uniform availability within a fixed buffer yielded strongly biased selection estimates that could be corrected by including the distance function but remained inefficient relative to the empirical and parametric sampling methods. As a case study, we used location data collected from elk in Yellowstone National Park, USA, to show that selection and bias may be temporally variable. Because under constant selection the amount of bias depends on the scale at which a resource is distributed in the landscape, we suggest that distance always be included as a covariate in SSF analyses. This approach to
Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

International Nuclear Information System (INIS)

Elsheikh, Ahmed H.; Wheeler, Mary F.; Hoteit, Ibrahim

2014-01-01

A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems
Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

Energy Technology Data Exchange (ETDEWEB)

Elsheikh, Ahmed H., E-mail: aelsheikh@ices.utexas.edu [Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, TX (United States); Institute of Petroleum Engineering, Heriot-Watt University, Edinburgh EH14 4AS (United Kingdom); Wheeler, Mary F. [Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, TX (United States); Hoteit, Ibrahim [Department of Earth Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal (Saudi Arabia)

2014-02-01

A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems.
Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

KAUST Repository

Elsheikh, Ahmed H.

2014-02-01

A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems. © 2013 Elsevier Inc.
On a Robust MaxEnt Process Regression Model with Sample-Selection

Directory of Open Access Journals (Sweden)

Hea-Jung Kim

2018-04-01

Full Text Available In a regression analysis, a sample-selection bias arises when a dependent variable is partially observed as a result of the sample selection. This study introduces a Maximum Entropy (MaxEnt process regression model that assumes a MaxEnt prior distribution for its nonparametric regression function and finds that the MaxEnt process regression model includes the well-known Gaussian process regression (GPR model as a special case. Then, this special MaxEnt process regression model, i.e., the GPR model, is generalized to obtain a robust sample-selection Gaussian process regression (RSGPR model that deals with non-normal data in the sample selection. Various properties of the RSGPR model are established, including the stochastic representation, distributional hierarchy, and magnitude of the sample-selection bias. These properties are used in the paper to develop a hierarchical Bayesian methodology to estimate the model. This involves a simple and computationally feasible Markov chain Monte Carlo algorithm that avoids analytical or numerical derivatives of the log-likelihood function of the model. The performance of the RSGPR model in terms of the sample-selection bias correction, robustness to non-normality, and prediction, is demonstrated through results in simulations that attest to its good finite-sample performance.
An Improved Nested Sampling Algorithm for Model Selection and Assessment

Science.gov (United States)

Zeng, X.; Ye, M.; Wu, J.; WANG, D.

2017-12-01

Multimodel strategy is a general approach for treating model structure uncertainty in recent researches. The unknown groundwater system is represented by several plausible conceptual models. Each alternative conceptual model is attached with a weight which represents the possibility of this model. In Bayesian framework, the posterior model weight is computed as the product of model prior weight and marginal likelihood (or termed as model evidence). As a result, estimating marginal likelihoods is crucial for reliable model selection and assessment in multimodel analysis. Nested sampling estimator (NSE) is a new proposed algorithm for marginal likelihood estimation. The implementation of NSE comprises searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm and its variants are often used for local sampling in NSE. However, M-H is not an efficient sampling algorithm for high-dimensional or complex likelihood function. For improving the performance of NSE, it could be feasible to integrate more efficient and elaborated sampling algorithm - DREAMzs into the local sampling. In addition, in order to overcome the computation burden problem of large quantity of repeating model executions in marginal likelihood estimation, an adaptive sparse grid stochastic collocation method is used to build the surrogates for original groundwater model.
Semiparametric efficient and robust estimation of an unknown symmetric population under arbitrary sample selection bias

KAUST Repository

Ma, Yanyuan

2013-09-01

We propose semiparametric methods to estimate the center and shape of a symmetric population when a representative sample of the population is unavailable due to selection bias. We allow an arbitrary sample selection mechanism determined by the data collection procedure, and we do not impose any parametric form on the population distribution. Under this general framework, we construct a family of consistent estimators of the center that is robust to population model misspecification, and we identify the efficient member that reaches the minimum possible estimation variance. The asymptotic properties and finite sample performance of the estimation and inference procedures are illustrated through theoretical analysis and simulations. A data example is also provided to illustrate the usefulness of the methods in practice. © 2013 American Statistical Association.
Nested sampling algorithm for subsurface flow model selection, uncertainty quantification, and nonlinear calibration

KAUST Repository

Elsheikh, A. H.

2013-12-01

Calibration of subsurface flow models is an essential step for managing ground water aquifers, designing of contaminant remediation plans, and maximizing recovery from hydrocarbon reservoirs. We investigate an efficient sampling algorithm known as nested sampling (NS), which can simultaneously sample the posterior distribution for uncertainty quantification, and estimate the Bayesian evidence for model selection. Model selection statistics, such as the Bayesian evidence, are needed to choose or assign different weights to different models of different levels of complexities. In this work, we report the first successful application of nested sampling for calibration of several nonlinear subsurface flow problems. The estimated Bayesian evidence by the NS algorithm is used to weight different parameterizations of the subsurface flow models (prior model selection). The results of the numerical evaluation implicitly enforced Occam\\'s razor where simpler models with fewer number of parameters are favored over complex models. The proper level of model complexity was automatically determined based on the information content of the calibration data and the data mismatch of the calibrated model.
Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

KAUST Repository

Elsheikh, Ahmed H.; Wheeler, Mary Fanett; Hoteit, Ibrahim

2014-01-01

A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using
Estimation of a multivariate mean under model selection uncertainty

Directory of Open Access Journals (Sweden)

Georges Nguefack-Tsague

2014-05-01

Full Text Available Model selection uncertainty would occur if we selected a model based on one data set and subsequently applied it for statistical inferences, because the "correct" model would not be selected with certainty. When the selection and inference are based on the same dataset, some additional problems arise due to the correlation of the two stages (selection and inference. In this paper model selection uncertainty is considered and model averaging is proposed. The proposal is related to the theory of James and Stein of estimating more than three parameters from independent normal observations. We suggest that a model averaging scheme taking into account the selection procedure could be more appropriate than model selection alone. Some properties of this model averaging estimator are investigated; in particular we show using Stein's results that it is a minimax estimator and can outperform Stein-type estimators.
Estimating the residential demand function for natural gas in Seoul with correction for sample selection bias

International Nuclear Information System (INIS)

Yoo, Seung-Hoon; Lim, Hea-Jin; Kwak, Seung-Jun

2009-01-01

Over the last twenty years, the consumption of natural gas in Korea has increased dramatically. This increase has mainly resulted from the rise of consumption in the residential sector. The main objective of the study is to estimate households' demand function for natural gas by applying a sample selection model using data from a survey of households in Seoul. The results show that there exists a selection bias in the sample and that failure to correct for sample selection bias distorts the mean estimate, of the demand for natural gas, downward by 48.1%. In addition, according to the estimation results, the size of the house, the dummy variable for dwelling in an apartment, the dummy variable for having a bed in an inner room, and the household's income all have positive relationships with the demand for natural gas. On the other hand, the size of the family and the price of gas negatively contribute to the demand for natural gas. (author)
Sampling point selection for energy estimation in the quasicontinuum method

NARCIS (Netherlands)

Beex, L.A.A.; Peerlings, R.H.J.; Geers, M.G.D.

2010-01-01

The quasicontinuum (QC) method reduces computational costs of atomistic calculations by using interpolation between a small number of so-called repatoms to represent the displacements of the complete lattice and by selecting a small number of sampling atoms to estimate the total potential energy of
Climate Change and Agricultural Productivity in Sub-Saharan Africa: A Spatial Sample Selection Model

NARCIS (Netherlands)

Ward, P.S.; Florax, R.J.G.M.; Flores-Lagunes, A.

2014-01-01

Using spatially explicit data, we estimate a cereal yield response function using a recently developed estimator for spatial error models when endogenous sample selection is of concern. Our results suggest that yields across Sub-Saharan Africa will decline with projected climatic changes, and that
Failure Probability Estimation Using Asymptotic Sampling and Its Dependence upon the Selected Sampling Scheme

Directory of Open Access Journals (Sweden)

Martinásková Magdalena

2017-12-01

Full Text Available The article examines the use of Asymptotic Sampling (AS for the estimation of failure probability. The AS algorithm requires samples of multidimensional Gaussian random vectors, which may be obtained by many alternative means that influence the performance of the AS method. Several reliability problems (test functions have been selected in order to test AS with various sampling schemes: (i Monte Carlo designs; (ii LHS designs optimized using the Periodic Audze-Eglājs (PAE criterion; (iii designs prepared using Sobol’ sequences. All results are compared with the exact failure probability value.
The genealogy of samples in models with selection.

Science.gov (United States)

Neuhauser, C; Krone, S M

1997-02-01

We introduce the genealogy of a random sample of genes taken from a large haploid population that evolves according to random reproduction with selection and mutation. Without selection, the genealogy is described by Kingman's well-known coalescent process. In the selective case, the genealogy of the sample is embedded in a graph with a coalescing and branching structure. We describe this graph, called the ancestral selection graph, and point out differences and similarities with Kingman's coalescent. We present simulations for a two-allele model with symmetric mutation in which one of the alleles has a selective advantage over the other. We find that when the allele frequencies in the population are already in equilibrium, then the genealogy does not differ much from the neutral case. This is supported by rigorous results. Furthermore, we describe the ancestral selection graph for other selective models with finitely many selection classes, such as the K-allele models, infinitely-many-alleles models. DNA sequence models, and infinitely-many-sites models, and briefly discuss the diploid case.

Variable selection and estimation for longitudinal survey data

KAUST Repository

Wang, Li

2014-09-01

There is wide interest in studying longitudinal surveys where sample subjects are observed successively over time. Longitudinal surveys have been used in many areas today, for example, in the health and social sciences, to explore relationships or to identify significant variables in regression settings. This paper develops a general strategy for the model selection problem in longitudinal sample surveys. A survey weighted penalized estimating equation approach is proposed to select significant variables and estimate the coefficients simultaneously. The proposed estimators are design consistent and perform as well as the oracle procedure when the correct submodel was known. The estimating function bootstrap is applied to obtain the standard errors of the estimated parameters with good accuracy. A fast and efficient variable selection algorithm is developed to identify significant variables for complex longitudinal survey data. Simulated examples are illustrated to show the usefulness of the proposed methodology under various model settings and sampling designs. © 2014 Elsevier Inc.
Working covariance model selection for generalized estimating equations.

Science.gov (United States)

Carey, Vincent J; Wang, You-Gan

2011-11-20

We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice. Copyright © 2011 John Wiley & Sons, Ltd.
Efficiently adapting graphical models for selectivity estimation

DEFF Research Database (Denmark)

Tzoumas, Kostas; Deshpande, Amol; Jensen, Christian S.

2013-01-01

cardinality estimation without making the independence assumption. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss...... in estimation accuracy. We show how to efficiently construct such a graphical model from the database using only two-way join queries, and we show how to perform selectivity estimation in a highly efficient manner. We integrate our algorithms into the PostgreSQL DBMS. Experimental results indicate...
Evaluation of sampling strategies to estimate crown biomass

Directory of Open Access Journals (Sweden)

Krishna P Poudel

2015-01-01

Full Text Available Background Depending on tree and site characteristics crown biomass accounts for a significant portion of the total aboveground biomass in the tree. Crown biomass estimation is useful for different purposes including evaluating the economic feasibility of crown utilization for energy production or forest products, fuel load assessments and fire management strategies, and wildfire modeling. However, crown biomass is difficult to predict because of the variability within and among species and sites. Thus the allometric equations used for predicting crown biomass should be based on data collected with precise and unbiased sampling strategies. In this study, we evaluate the performance different sampling strategies to estimate crown biomass and to evaluate the effect of sample size in estimating crown biomass. Methods Using data collected from 20 destructively sampled trees, we evaluated 11 different sampling strategies using six evaluation statistics: bias, relative bias, root mean square error (RMSE, relative RMSE, amount of biomass sampled, and relative biomass sampled. We also evaluated the performance of the selected sampling strategies when different numbers of branches (3, 6, 9, and 12 are selected from each tree. Tree specific log linear model with branch diameter and branch length as covariates was used to obtain individual branch biomass. Results Compared to all other methods stratified sampling with probability proportional to size estimation technique produced better results when three or six branches per tree were sampled. However, the systematic sampling with ratio estimation technique was the best when at least nine branches per tree were sampled. Under the stratified sampling strategy, selecting unequal number of branches per stratum produced approximately similar results to simple random sampling, but it further decreased RMSE when information on branch diameter is used in the design and estimation phases. Conclusions Use of
Sample selection and taste correlation in discrete choice transport modelling

DEFF Research Database (Denmark)

Mabit, Stefan Lindhard

2008-01-01

explain counterintuitive results in value of travel time estimation. However, the results also point at the difficulty of finding suitable instruments for the selection mechanism. Taste heterogeneity is another important aspect of discrete choice modelling. Mixed logit models are designed to capture...... the question for a broader class of models. It is shown that the original result may be somewhat generalised. Another question investigated is whether mode choice operates as a self-selection mechanism in the estimation of the value of travel time. The results show that self-selection can at least partly...... of taste correlation in willingness-to-pay estimation are presented. The first contribution addresses how to incorporate taste correlation in the estimation of the value of travel time for public transport. Given a limited dataset the approach taken is to use theory on the value of travel time as guidance...
National HIV prevalence estimates for sub-Saharan Africa: controlling selection bias with Heckman-type selection models

Science.gov (United States)

Hogan, Daniel R; Salomon, Joshua A; Canning, David; Hammitt, James K; Zaslavsky, Alan M; Bärnighausen, Till

2012-01-01

Objectives Population-based HIV testing surveys have become central to deriving estimates of national HIV prevalence in sub-Saharan Africa. However, limited participation in these surveys can lead to selection bias. We control for selection bias in national HIV prevalence estimates using a novel approach, which unlike conventional imputation can account for selection on unobserved factors. Methods For 12 Demographic and Health Surveys conducted from 2001 to 2009 (N=138 300), we predict HIV status among those missing a valid HIV test with Heckman-type selection models, which allow for correlation between infection status and participation in survey HIV testing. We compare these estimates with conventional ones and introduce a simulation procedure that incorporates regression model parameter uncertainty into confidence intervals. Results Selection model point estimates of national HIV prevalence were greater than unadjusted estimates for 10 of 12 surveys for men and 11 of 12 surveys for women, and were also greater than the majority of estimates obtained from conventional imputation, with significantly higher HIV prevalence estimates for men in Cote d'Ivoire 2005, Mali 2006 and Zambia 2007. Accounting for selective non-participation yielded 95% confidence intervals around HIV prevalence estimates that are wider than those obtained with conventional imputation by an average factor of 4.5. Conclusions Our analysis indicates that national HIV prevalence estimates for many countries in sub-Saharan African are more uncertain than previously thought, and may be underestimated in several cases, underscoring the need for increasing participation in HIV surveys. Heckman-type selection models should be included in the set of tools used for routine estimation of HIV prevalence. PMID:23172342
A Heckman Selection- t Model

KAUST Repository

Marchenko, Yulia V.

2012-03-01

Sample selection arises often in practice as a result of the partial observability of the outcome of interest in a study. In the presence of sample selection, the observed data do not represent a random sample from the population, even after controlling for explanatory variables. That is, data are missing not at random. Thus, standard analysis using only complete cases will lead to biased results. Heckman introduced a sample selection model to analyze such data and proposed a full maximum likelihood estimation method under the assumption of normality. The method was criticized in the literature because of its sensitivity to the normality assumption. In practice, data, such as income or expenditure data, often violate the normality assumption because of heavier tails. We first establish a new link between sample selection models and recently studied families of extended skew-elliptical distributions. Then, this allows us to introduce a selection-t (SLt) model, which models the error distribution using a Student\\'s t distribution. We study its properties and investigate the finite-sample performance of the maximum likelihood estimators for this model. We compare the performance of the SLt model to the conventional Heckman selection-normal (SLN) model and apply it to analyze ambulatory expenditures. Unlike the SLNmodel, our analysis using the SLt model provides statistical evidence for the existence of sample selection bias in these data. We also investigate the performance of the test for sample selection bias based on the SLt model and compare it with the performances of several tests used with the SLN model. Our findings indicate that the latter tests can be misleading in the presence of heavy-tailed data. © 2012 American Statistical Association.
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.

Science.gov (United States)

Storlie, Curtis B; Bondell, Howard D; Reich, Brian J; Zhang, Hao Helen

2011-04-01

Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.
Consistency in Estimation and Model Selection of Dynamic Panel Data Models with Fixed Effects

Directory of Open Access Journals (Sweden)

Guangjie Li

2015-07-01

Full Text Available We examine the relationship between consistent parameter estimation and model selection for autoregressive panel data models with fixed effects. We find that the transformation of fixed effects proposed by Lancaster (2002 does not necessarily lead to consistent estimation of common parameters when some true exogenous regressors are excluded. We propose a data dependent way to specify the prior of the autoregressive coefficient and argue for comparing different model specifications before parameter estimation. Model selection properties of Bayes factors and Bayesian information criterion (BIC are investigated. When model uncertainty is substantial, we recommend the use of Bayesian Model Averaging to obtain point estimators with lower root mean squared errors (RMSE. We also study the implications of different levels of inclusion probabilities by simulations.
Comparing interval estimates for small sample ordinal CFA models.

Science.gov (United States)

Natesan, Prathiba

2015-01-01

Robust maximum likelihood (RML) and asymptotically generalized least squares (AGLS) methods have been recommended for fitting ordinal structural equation models. Studies show that some of these methods underestimate standard errors. However, these studies have not investigated the coverage and bias of interval estimates. An estimate with a reasonable standard error could still be severely biased. This can only be known by systematically investigating the interval estimates. The present study compares Bayesian, RML, and AGLS interval estimates of factor correlations in ordinal confirmatory factor analysis models (CFA) for small sample data. Six sample sizes, 3 factor correlations, and 2 factor score distributions (multivariate normal and multivariate mildly skewed) were studied. Two Bayesian prior specifications, informative and relatively less informative were studied. Undercoverage of confidence intervals and underestimation of standard errors was common in non-Bayesian methods. Underestimated standard errors may lead to inflated Type-I error rates. Non-Bayesian intervals were more positive biased than negatively biased, that is, most intervals that did not contain the true value were greater than the true value. Some non-Bayesian methods had non-converging and inadmissible solutions for small samples and non-normal data. Bayesian empirical standard error estimates for informative and relatively less informative priors were closer to the average standard errors of the estimates. The coverage of Bayesian credibility intervals was closer to what was expected with overcoverage in a few cases. Although some Bayesian credibility intervals were wider, they reflected the nature of statistical uncertainty that comes with the data (e.g., small sample). Bayesian point estimates were also more accurate than non-Bayesian estimates. The results illustrate the importance of analyzing coverage and bias of interval estimates, and how ignoring interval estimates can be misleading
A model for estimating the minimum number of offspring to sample in studies of reproductive success.

Science.gov (United States)

Anderson, Joseph H; Ward, Eric J; Carlson, Stephanie M

2011-01-01

Molecular parentage permits studies of selection and evolution in fecund species with cryptic mating systems, such as fish, amphibians, and insects. However, there exists no method for estimating the number of offspring that must be assigned parentage to achieve robust estimates of reproductive success when only a fraction of offspring can be sampled. We constructed a 2-stage model that first estimated the mean (μ) and variance (v) in reproductive success from published studies on salmonid fishes and then sampled offspring from reproductive success distributions simulated from the μ and v estimates. Results provided strong support for modeling salmonid reproductive success via the negative binomial distribution and suggested that few offspring samples are needed to reject the null hypothesis of uniform offspring production. However, the sampled reproductive success distributions deviated significantly (χ(2) goodness-of-fit test p value reproductive success distribution at rates often >0.05 and as high as 0.24, even when hundreds of offspring were assigned parentage. In general, reproductive success patterns were less accurate when offspring were sampled from cohorts with larger numbers of parents and greater variance in reproductive success. Our model can be reparameterized with data from other species and will aid researchers in planning reproductive success studies by providing explicit sampling targets required to accurately assess reproductive success.
PARAMETER ESTIMATION AND MODEL SELECTION FOR INDOOR ENVIRONMENTS BASED ON SPARSE OBSERVATIONS

Directory of Open Access Journals (Sweden)

Y. Dehbi

2017-09-01

Full Text Available This paper presents a novel method for the parameter estimation and model selection for the reconstruction of indoor environments based on sparse observations. While most approaches for the reconstruction of indoor models rely on dense observations, we predict scenes of the interior with high accuracy in the absence of indoor measurements. We use a model-based top-down approach and incorporate strong but profound prior knowledge. The latter includes probability density functions for model parameters and sparse observations such as room areas and the building footprint. The floorplan model is characterized by linear and bi-linear relations with discrete and continuous parameters. We focus on the stochastic estimation of model parameters based on a topological model derived by combinatorial reasoning in a first step. A Gauss-Markov model is applied for estimation and simulation of the model parameters. Symmetries are represented and exploited during the estimation process. Background knowledge as well as observations are incorporated in a maximum likelihood estimation and model selection is performed with AIC/BIC. The likelihood is also used for the detection and correction of potential errors in the topological model. Estimation results are presented and discussed.
Parameter Estimation and Model Selection for Indoor Environments Based on Sparse Observations

Science.gov (United States)

Dehbi, Y.; Loch-Dehbi, S.; Plümer, L.

2017-09-01

This paper presents a novel method for the parameter estimation and model selection for the reconstruction of indoor environments based on sparse observations. While most approaches for the reconstruction of indoor models rely on dense observations, we predict scenes of the interior with high accuracy in the absence of indoor measurements. We use a model-based top-down approach and incorporate strong but profound prior knowledge. The latter includes probability density functions for model parameters and sparse observations such as room areas and the building footprint. The floorplan model is characterized by linear and bi-linear relations with discrete and continuous parameters. We focus on the stochastic estimation of model parameters based on a topological model derived by combinatorial reasoning in a first step. A Gauss-Markov model is applied for estimation and simulation of the model parameters. Symmetries are represented and exploited during the estimation process. Background knowledge as well as observations are incorporated in a maximum likelihood estimation and model selection is performed with AIC/BIC. The likelihood is also used for the detection and correction of potential errors in the topological model. Estimation results are presented and discussed.
Effects of sample size on estimates of population growth rates calculated with matrix models.

Directory of Open Access Journals (Sweden)

Ian J Fiske

Full Text Available BACKGROUND: Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (lambda calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of lambda-Jensen's Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of lambda due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of lambda. METHODOLOGY/PRINCIPAL FINDINGS: Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating lambda for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of lambda with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography. CONCLUSIONS/SIGNIFICANCE: We found significant bias at small sample sizes when survival was low (survival = 0.5, and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high
Effects of sample size on estimates of population growth rates calculated with matrix models.

Science.gov (United States)

Fiske, Ian J; Bruna, Emilio M; Bolker, Benjamin M

2008-08-28

Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (lambda) calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of lambda-Jensen's Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of lambda due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of lambda. Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating lambda for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of lambda with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography. We found significant bias at small sample sizes when survival was low (survival = 0.5), and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high elasticities.
Observed Characteristics and Teacher Quality: Impacts of Sample Selection on a Value Added Model

Science.gov (United States)

Winters, Marcus A.; Dixon, Bruce L.; Greene, Jay P.

2012-01-01

We measure the impact of observed teacher characteristics on student math and reading proficiency using a rich dataset from Florida. We expand upon prior work by accounting directly for nonrandom attrition of teachers from the classroom in a sample selection framework. We find evidence that sample selection is present in the estimation of the…
Optimal time points sampling in pathway modelling.

Science.gov (United States)

Hu, Shiyan

2004-01-01

Modelling cellular dynamics based on experimental data is at the heart of system biology. Considerable progress has been made to dynamic pathway modelling as well as the related parameter estimation. However, few of them gives consideration for the issue of optimal sampling time selection for parameter estimation. Time course experiments in molecular biology rarely produce large and accurate data sets and the experiments involved are usually time consuming and expensive. Therefore, to approximate parameters for models with only few available sampling data is of significant practical value. For signal transduction, the sampling intervals are usually not evenly distributed and are based on heuristics. In the paper, we investigate an approach to guide the process of selecting time points in an optimal way to minimize the variance of parameter estimates. In the method, we first formulate the problem to a nonlinear constrained optimization problem by maximum likelihood estimation. We then modify and apply a quantum-inspired evolutionary algorithm, which combines the advantages of both quantum computing and evolutionary computing, to solve the optimization problem. The new algorithm does not suffer from the morass of selecting good initial values and being stuck into local optimum as usually accompanied with the conventional numerical optimization techniques. The simulation results indicate the soundness of the new method.
Using maximum entropy modeling for optimal selection of sampling sites for monitoring networks

Science.gov (United States)

Stohlgren, Thomas J.; Kumar, Sunil; Barnett, David T.; Evangelista, Paul H.

2011-01-01

Environmental monitoring programs must efficiently describe state shifts. We propose using maximum entropy modeling to select dissimilar sampling sites to capture environmental variability at low cost, and demonstrate a specific application: sample site selection for the Central Plains domain (453,490 km2) of the National Ecological Observatory Network (NEON). We relied on four environmental factors: mean annual temperature and precipitation, elevation, and vegetation type. A “sample site” was defined as a 20 km × 20 km area (equal to NEON’s airborne observation platform [AOP] footprint), within which each 1 km2 cell was evaluated for each environmental factor. After each model run, the most environmentally dissimilar site was selected from all potential sample sites. The iterative selection of eight sites captured approximately 80% of the environmental envelope of the domain, an improvement over stratified random sampling and simple random designs for sample site selection. This approach can be widely used for cost-efficient selection of survey and monitoring sites.
Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage

DEFF Research Database (Denmark)

Yang, Ziheng; Nielsen, Rasmus

2008-01-01

Current models of codon substitution are formulated at the levels of nucleotide substitution and do not explicitly consider the separate effects of mutation and selection. They are thus incapable of inferring whether mutation or selection is responsible for evolution at silent sites. Here we impl...... codon usage in mammals. Estimates of selection coefficients nevertheless suggest that selection on codon usage is weak and most mutations are nearly neutral. The sensitivity of the analysis on the assumed mutation model is discussed.......Current models of codon substitution are formulated at the levels of nucleotide substitution and do not explicitly consider the separate effects of mutation and selection. They are thus incapable of inferring whether mutation or selection is responsible for evolution at silent sites. Here we...... implement a few population genetics models of codon substitution that explicitly consider mutation bias and natural selection at the DNA level. Selection on codon usage is modeled by introducing codon-fitness parameters, which together with mutation-bias parameters, predict optimal codon frequencies...
Estimating species – area relationships by modeling abundance and frequency subject to incomplete sampling

Science.gov (United States)

Yamaura, Yuichi; Connor, Edward F.; Royle, Andy; Itoh, Katsuo; Sato, Kiyoshi; Taki, Hisatomo; Mishima, Yoshio

2016-01-01

Models and data used to describe species–area relationships confound sampling with ecological process as they fail to acknowledge that estimates of species richness arise due to sampling. This compromises our ability to make ecological inferences from and about species–area relationships. We develop and illustrate hierarchical community models of abundance and frequency to estimate species richness. The models we propose separate sampling from ecological processes by explicitly accounting for the fact that sampled patches are seldom completely covered by sampling plots and that individuals present in the sampling plots are imperfectly detected. We propose a multispecies abundance model in which community assembly is treated as the summation of an ensemble of species-level Poisson processes and estimate patch-level species richness as a derived parameter. We use sampling process models appropriate for specific survey methods. We propose a multispecies frequency model that treats the number of plots in which a species occurs as a binomial process. We illustrate these models using data collected in surveys of early-successional bird species and plants in young forest plantation patches. Results indicate that only mature forest plant species deviated from the constant density hypothesis, but the null model suggested that the deviations were too small to alter the form of species–area relationships. Nevertheless, results from simulations clearly show that the aggregate pattern of individual species density–area relationships and occurrence probability–area relationships can alter the form of species–area relationships. The plant community model estimated that only half of the species present in the regional species pool were encountered during the survey. The modeling framework we propose explicitly accounts for sampling processes so that ecological processes can be examined free of sampling artefacts. Our modeling approach is extensible and could be applied

Infusion and sampling site effects on two-pool model estimates of leucine metabolism

International Nuclear Information System (INIS)

Helland, S.J.; Grisdale-Helland, B.; Nissen, S.

1988-01-01

To assess the effect of site of isotope infusion on estimates of leucine metabolism infusions of alpha-[4,5-3H]ketoisocaproate (KIC) and [U- 14 C]leucine were made into the left or right ventricles of sheep and pigs. Blood was sampled from the opposite ventricle. In both species, left ventricular infusions resulted in significantly lower specific radioactivities (SA) of [ 14 C]leucine and [ 3 H]KIC. [ 14 C]KIC SA was found to be insensitive to infusion and sampling sites. [ 14 C]KIC was in addition found to be equal to the SA of [ 14 C]leucine only during the left heart infusions. Therefore, [ 14 C]KIC SA was used as the only estimate for [ 14 C]SA in the equations for the two-pool model. This model eliminated the influence of site of infusion and blood sampling on the estimates for leucine entry and reduced the impact on the estimates for proteolysis and oxidation. This two-pool model could not compensate for the underestimation of transamination reactions occurring during the traditional venous isotope infusion and arterial blood sampling
Identification and estimation of nonlinear models using two samples with nonclassical measurement errors

KAUST Repository

Carroll, Raymond J.

2010-05-01

This paper considers identification and estimation of a general nonlinear Errors-in-Variables (EIV) model using two samples. Both samples consist of a dependent variable, some error-free covariates, and an error-prone covariate, for which the measurement error has unknown distribution and could be arbitrarily correlated with the latent true values; and neither sample contains an accurate measurement of the corresponding true variable. We assume that the regression model of interest - the conditional distribution of the dependent variable given the latent true covariate and the error-free covariates - is the same in both samples, but the distributions of the latent true covariates vary with observed error-free discrete covariates. We first show that the general latent nonlinear model is nonparametrically identified using the two samples when both could have nonclassical errors, without either instrumental variables or independence between the two samples. When the two samples are independent and the nonlinear regression model is parameterized, we propose sieve Quasi Maximum Likelihood Estimation (Q-MLE) for the parameter of interest, and establish its root-n consistency and asymptotic normality under possible misspecification, and its semiparametric efficiency under correct specification, with easily estimated standard errors. A Monte Carlo simulation and a data application are presented to show the power of the approach.
COPS model estimates of LLEA availability near selected reactor sites

International Nuclear Information System (INIS)

Berkbigler, K.P.

1979-11-01

The COPS computer model has been used to estimate local law enforcement agency (LLEA) officer availability in the neighborhood of selected nuclear reactor sites. The results of these analyses are presented both in graphic and tabular form in this report
Conditional estimation of exponential random graph models from snowball sampling designs

NARCIS (Netherlands)

Pattison, Philippa E.; Robins, Garry L.; Snijders, Tom A. B.; Wang, Peng

2013-01-01

A complete survey of a network in a large population may be prohibitively difficult and costly. So it is important to estimate models for networks using data from various network sampling designs, such as link-tracing designs. We focus here on snowball sampling designs, designs in which the members
Model Selection and Risk Estimation with Applications to Nonlinear Ordinary Differential Equation Systems

DEFF Research Database (Denmark)

Mikkelsen, Frederik Vissing

eective computational tools for estimating unknown structures in dynamical systems, such as gene regulatory networks, which may be used to predict downstream eects of interventions in the system. A recommended algorithm based on the computational tools is presented and thoroughly tested in various......Broadly speaking, this thesis is devoted to model selection applied to ordinary dierential equations and risk estimation under model selection. A model selection framework was developed for modelling time course data by ordinary dierential equations. The framework is accompanied by the R software...... package, episode. This package incorporates a collection of sparsity inducing penalties into two types of loss functions: a squared loss function relying on numerically solving the equations and an approximate loss function based on inverse collocation methods. The goal of this framework is to provide...
Estimating nonlinear selection gradients using quadratic regression coefficients: double or nothing?

Science.gov (United States)

Stinchcombe, John R; Agrawal, Aneil F; Hohenlohe, Paul A; Arnold, Stevan J; Blows, Mark W

2008-09-01

The use of regression analysis has been instrumental in allowing evolutionary biologists to estimate the strength and mode of natural selection. Although directional and correlational selection gradients are equal to their corresponding regression coefficients, quadratic regression coefficients must be doubled to estimate stabilizing/disruptive selection gradients. Based on a sample of 33 papers published in Evolution between 2002 and 2007, at least 78% of papers have not doubled quadratic regression coefficients, leading to an appreciable underestimate of the strength of stabilizing and disruptive selection. Proper treatment of quadratic regression coefficients is necessary for estimation of fitness surfaces and contour plots, canonical analysis of the gamma matrix, and modeling the evolution of populations on an adaptive landscape.
N-mix for fish: estimating riverine salmonid habitat selection via N-mixture models

Science.gov (United States)

Som, Nicholas A.; Perry, Russell W.; Jones, Edward C.; De Juilio, Kyle; Petros, Paul; Pinnix, William D.; Rupert, Derek L.

2018-01-01

Models that formulate mathematical linkages between fish use and habitat characteristics are applied for many purposes. For riverine fish, these linkages are often cast as resource selection functions with variables including depth and velocity of water and distance to nearest cover. Ecologists are now recognizing the role that detection plays in observing organisms, and failure to account for imperfect detection can lead to spurious inference. Herein, we present a flexible N-mixture model to associate habitat characteristics with the abundance of riverine salmonids that simultaneously estimates detection probability. Our formulation has the added benefits of accounting for demographics variation and can generate probabilistic statements regarding intensity of habitat use. In addition to the conceptual benefits, model application to data from the Trinity River, California, yields interesting results. Detection was estimated to vary among surveyors, but there was little spatial or temporal variation. Additionally, a weaker effect of water depth on resource selection is estimated than that reported by previous studies not accounting for detection probability. N-mixture models show great promise for applications to riverine resource selection.
Systematic sampling of discrete and continuous populations: sample selection and the choice of estimator

Science.gov (United States)

Harry T. Valentine; David L. R. Affleck; Timothy G. Gregoire

2009-01-01

Systematic sampling is easy, efficient, and widely used, though it is not generally recognized that a systematic sample may be drawn from the population of interest with or without restrictions on randomization. The restrictions or the lack of them determine which estimators are unbiased, when using the sampling design as the basis for inference. We describe the...
Estimation and model selection of semiparametric multivariate survival functions under general censorship.

Science.gov (United States)

Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

2010-07-01

We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.
Estimation and variable selection for generalized additive partial linear models

KAUST Repository

Wang, Li

2011-08-01

We study generalized additive partial linear models, proposing the use of polynomial spline smoothing for estimation of nonparametric functions, and deriving quasi-likelihood based estimators for the linear parameters. We establish asymptotic normality for the estimators of the parametric components. The procedure avoids solving large systems of equations as in kernel-based procedures and thus results in gains in computational simplicity. We further develop a class of variable selection procedures for the linear parameters by employing a nonconcave penalized quasi-likelihood, which is shown to have an asymptotic oracle property. Monte Carlo simulations and an empirical example are presented for illustration. © Institute of Mathematical Statistics, 2011.
Selecting the best stable isotope mixing model to estimate grizzly bear diets in the Greater Yellowstone Ecosystem.

Science.gov (United States)

Hopkins, John B; Ferguson, Jake M; Tyers, Daniel B; Kurle, Carolyn M

2017-01-01

Past research indicates that whitebark pine seeds are a critical food source for Threatened grizzly bears (Ursus arctos) in the Greater Yellowstone Ecosystem (GYE). In recent decades, whitebark pine forests have declined markedly due to pine beetle infestation, invasive blister rust, and landscape-level fires. To date, no study has reliably estimated the contribution of whitebark pine seeds to the diets of grizzlies through time. We used stable isotope ratios (expressed as δ13C, δ15N, and δ34S values) measured in grizzly bear hair and their major food sources to estimate the diets of grizzlies sampled in Cooke City Basin, Montana. We found that stable isotope mixing models that included different combinations of stable isotope values for bears and their foods generated similar proportional dietary contributions. Estimates generated by our top model suggest that whitebark pine seeds (35±10%) and other plant foods (56±10%) were more important than meat (9±8%) to grizzly bears sampled in the study area. Stable isotope values measured in bear hair collected elsewhere in the GYE and North America support our conclusions about plant-based foraging. We recommend that researchers consider model selection when estimating the diets of animals using stable isotope mixing models. We also urge researchers to use the new statistical framework described here to estimate the dietary responses of grizzlies to declines in whitebark pine seeds and other important food sources through time in the GYE (e.g., cutthroat trout), as such information could be useful in predicting how the population will adapt to future environmental change.
Soybean yield modeling using bootstrap methods for small samples

Energy Technology Data Exchange (ETDEWEB)

Dalposso, G.A.; Uribe-Opazo, M.A.; Johann, J.A.

2016-11-01

One of the problems that occur when working with regression models is regarding the sample size; once the statistical methods used in inferential analyzes are asymptotic if the sample is small the analysis may be compromised because the estimates will be biased. An alternative is to use the bootstrap methodology, which in its non-parametric version does not need to guess or know the probability distribution that generated the original sample. In this work we used a set of soybean yield data and physical and chemical soil properties formed with fewer samples to determine a multiple linear regression model. Bootstrap methods were used for variable selection, identification of influential points and for determination of confidence intervals of the model parameters. The results showed that the bootstrap methods enabled us to select the physical and chemical soil properties, which were significant in the construction of the soybean yield regression model, construct the confidence intervals of the parameters and identify the points that had great influence on the estimated parameters. (Author)
Selecting the best stable isotope mixing model to estimate grizzly bear diets in the Greater Yellowstone Ecosystem.

Directory of Open Access Journals (Sweden)

John B Hopkins

Full Text Available Past research indicates that whitebark pine seeds are a critical food source for Threatened grizzly bears (Ursus arctos in the Greater Yellowstone Ecosystem (GYE. In recent decades, whitebark pine forests have declined markedly due to pine beetle infestation, invasive blister rust, and landscape-level fires. To date, no study has reliably estimated the contribution of whitebark pine seeds to the diets of grizzlies through time. We used stable isotope ratios (expressed as δ13C, δ15N, and δ34S values measured in grizzly bear hair and their major food sources to estimate the diets of grizzlies sampled in Cooke City Basin, Montana. We found that stable isotope mixing models that included different combinations of stable isotope values for bears and their foods generated similar proportional dietary contributions. Estimates generated by our top model suggest that whitebark pine seeds (35±10% and other plant foods (56±10% were more important than meat (9±8% to grizzly bears sampled in the study area. Stable isotope values measured in bear hair collected elsewhere in the GYE and North America support our conclusions about plant-based foraging. We recommend that researchers consider model selection when estimating the diets of animals using stable isotope mixing models. We also urge researchers to use the new statistical framework described here to estimate the dietary responses of grizzlies to declines in whitebark pine seeds and other important food sources through time in the GYE (e.g., cutthroat trout, as such information could be useful in predicting how the population will adapt to future environmental change.
SU-E-I-46: Sample-Size Dependence of Model Observers for Estimating Low-Contrast Detection Performance From CT Images

International Nuclear Information System (INIS)

Reiser, I; Lu, Z

2014-01-01

Purpose: Recently, task-based assessment of diagnostic CT systems has attracted much attention. Detection task performance can be estimated using human observers, or mathematical observer models. While most models are well established, considerable bias can be introduced when performance is estimated from a limited number of image samples. Thus, the purpose of this work was to assess the effect of sample size on bias and uncertainty of two channelized Hotelling observers and a template-matching observer. Methods: The image data used for this study consisted of 100 signal-present and 100 signal-absent regions-of-interest, which were extracted from CT slices. The experimental conditions included two signal sizes and five different x-ray beam current settings (mAs). Human observer performance for these images was determined in 2-alternative forced choice experiments. These data were provided by the Mayo clinic in Rochester, MN. Detection performance was estimated from three observer models, including channelized Hotelling observers (CHO) with Gabor or Laguerre-Gauss (LG) channels, and a template-matching observer (TM). Different sample sizes were generated by randomly selecting a subset of image pairs, (N=20,40,60,80). Observer performance was quantified as proportion of correct responses (PC). Bias was quantified as the relative difference of PC for 20 and 80 image pairs. Results: For n=100, all observer models predicted human performance across mAs and signal sizes. Bias was 23% for CHO (Gabor), 7% for CHO (LG), and 3% for TM. The relative standard deviation, σ(PC)/PC at N=20 was highest for the TM observer (11%) and lowest for the CHO (Gabor) observer (5%). Conclusion: In order to make image quality assessment feasible in the clinical practice, a statistically efficient observer model, that can predict performance from few samples, is needed. Our results identified two observer models that may be suited for this task
Principal Stratification in sample selection problems with non normal error terms

DEFF Research Database (Denmark)

Rocci, Roberto; Mellace, Giovanni

The aim of the paper is to relax distributional assumptions on the error terms, often imposed in parametric sample selection models to estimate causal effects, when plausible exclusion restrictions are not available. Within the principal stratification framework, we approximate the true distribut...... an application to the Job Corps training program....
Some remarks on estimating a covariance structure model from a sample correlation matrix

OpenAIRE

Maydeu Olivares, Alberto; Hernández Estrada, Adolfo

2000-01-01

A popular model in structural equation modeling involves a multivariate normal density with a structured covariance matrix that has been categorized according to a set of thresholds. In this setup one may estimate the covariance structure parameters from the sample tetrachoricl polychoric correlations but only if the covariance structure is scale invariant. Doing so when the covariance structure is not scale invariant results in estimating a more restricted covariance structure than the one i...
Population genetics inference for longitudinally-sampled mutants under strong selection.

Science.gov (United States)

Lacerda, Miguel; Seoighe, Cathal

2014-11-01

Longitudinal allele frequency data are becoming increasingly prevalent. Such samples permit statistical inference of the population genetics parameters that influence the fate of mutant variants. To infer these parameters by maximum likelihood, the mutant frequency is often assumed to evolve according to the Wright-Fisher model. For computational reasons, this discrete model is commonly approximated by a diffusion process that requires the assumption that the forces of natural selection and mutation are weak. This assumption is not always appropriate. For example, mutations that impart drug resistance in pathogens may evolve under strong selective pressure. Here, we present an alternative approximation to the mutant-frequency distribution that does not make any assumptions about the magnitude of selection or mutation and is much more computationally efficient than the standard diffusion approximation. Simulation studies are used to compare the performance of our method to that of the Wright-Fisher and Gaussian diffusion approximations. For large populations, our method is found to provide a much better approximation to the mutant-frequency distribution when selection is strong, while all three methods perform comparably when selection is weak. Importantly, maximum-likelihood estimates of the selection coefficient are severely attenuated when selection is strong under the two diffusion models, but not when our method is used. This is further demonstrated with an application to mutant-frequency data from an experimental study of bacteriophage evolution. We therefore recommend our method for estimating the selection coefficient when the effective population size is too large to utilize the discrete Wright-Fisher model. Copyright © 2014 by the Genetics Society of America.
A quick method based on SIMPLISMA-KPLS for simultaneously selecting outlier samples and informative samples for model standardization in near infrared spectroscopy

Science.gov (United States)

Li, Li-Na; Ma, Chang-Ming; Chang, Ming; Zhang, Ren-Cheng

2017-12-01

A novel method based on SIMPLe-to-use Interactive Self-modeling Mixture Analysis (SIMPLISMA) and Kernel Partial Least Square (KPLS), named as SIMPLISMA-KPLS, is proposed in this paper for selection of outlier samples and informative samples simultaneously. It is a quick algorithm used to model standardization (or named as model transfer) in near infrared (NIR) spectroscopy. The NIR experiment data of the corn for analysis of the protein content is introduced to evaluate the proposed method. Piecewise direct standardization (PDS) is employed in model transfer. And the comparison of SIMPLISMA-PDS-KPLS and KS-PDS-KPLS is given in this research by discussion of the prediction accuracy of protein content and calculation speed of each algorithm. The conclusions include that SIMPLISMA-KPLS can be utilized as an alternative sample selection method for model transfer. Although it has similar accuracy to Kennard-Stone (KS), it is different from KS as it employs concentration information in selection program. This means that it ensures analyte information is involved in analysis, and the spectra (X) of the selected samples is interrelated with concentration (y). And it can be used for outlier sample elimination simultaneously by validation of calibration. According to the statistical data results of running time, it is clear that the sample selection process is more rapid when using KPLS. The quick algorithm of SIMPLISMA-KPLS is beneficial to improve the speed of online measurement using NIR spectroscopy.
Low-sampling-rate ultra-wideband channel estimation using equivalent-time sampling

KAUST Repository

Ballal, Tarig

2014-09-01

In this paper, a low-sampling-rate scheme for ultra-wideband channel estimation is proposed. The scheme exploits multiple observations generated by transmitting multiple pulses. In the proposed scheme, P pulses are transmitted to produce channel impulse response estimates at a desired sampling rate, while the ADC samples at a rate that is P times slower. To avoid loss of fidelity, the number of sampling periods (based on the desired rate) in the inter-pulse interval is restricted to be co-prime with P. This condition is affected when clock drift is present and the transmitted pulse locations change. To handle this case, and to achieve an overall good channel estimation performance, without using prior information, we derive an improved estimator based on the bounded data uncertainty (BDU) model. It is shown that this estimator is related to the Bayesian linear minimum mean squared error (LMMSE) estimator. Channel estimation performance of the proposed sub-sampling scheme combined with the new estimator is assessed in simulation. The results show that high reduction in sampling rate can be achieved. The proposed estimator outperforms the least squares estimator in almost all cases, while in the high SNR regime it also outperforms the LMMSE estimator. In addition to channel estimation, a synchronization method is also proposed that utilizes the same pulse sequence used for channel estimation. © 2014 IEEE.
On the selection of ordinary differential equation models with application to predator-prey dynamical models.

Science.gov (United States)

Zhang, Xinyu; Cao, Jiguo; Carroll, Raymond J

2015-03-01

We consider model selection and estimation in a context where there are competing ordinary differential equation (ODE) models, and all the models are special cases of a "full" model. We propose a computationally inexpensive approach that employs statistical estimation of the full model, followed by a combination of a least squares approximation (LSA) and the adaptive Lasso. We show the resulting method, here called the LSA method, to be an (asymptotically) oracle model selection method. The finite sample performance of the proposed LSA method is investigated with Monte Carlo simulations, in which we examine the percentage of selecting true ODE models, the efficiency of the parameter estimation compared to simply using the full and true models, and coverage probabilities of the estimated confidence intervals for ODE parameters, all of which have satisfactory performances. Our method is also demonstrated by selecting the best predator-prey ODE to model a lynx and hare population dynamical system among some well-known and biologically interpretable ODE models. © 2014, The International Biometric Society.

Variable selection for confounder control, flexible modeling and Collaborative Targeted Minimum Loss-based Estimation in causal inference

Science.gov (United States)

Schnitzer, Mireille E.; Lok, Judith J.; Gruber, Susan

2015-01-01

This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low-and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios. PMID:26226129
Maximum likelihood estimation and EM algorithm of Copas-like selection model for publication bias correction.

Science.gov (United States)

Ning, Jing; Chen, Yong; Piao, Jin

2017-07-01

Publication bias occurs when the published research results are systematically unrepresentative of the population of studies that have been conducted, and is a potential threat to meaningful meta-analysis. The Copas selection model provides a flexible framework for correcting estimates and offers considerable insight into the publication bias. However, maximizing the observed likelihood under the Copas selection model is challenging because the observed data contain very little information on the latent variable. In this article, we study a Copas-like selection model and propose an expectation-maximization (EM) algorithm for estimation based on the full likelihood. Empirical simulation studies show that the EM algorithm and its associated inferential procedure performs well and avoids the non-convergence problem when maximizing the observed likelihood. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Bayesian Model Selection under Time Constraints

Science.gov (United States)

Hoege, M.; Nowak, W.; Illman, W. A.

2017-12-01

Bayesian model selection (BMS) provides a consistent framework for rating and comparing models in multi-model inference. In cases where models of vastly different complexity compete with each other, we also face vastly different computational runtimes of such models. For instance, time series of a quantity of interest can be simulated by an autoregressive process model that takes even less than a second for one run, or by a partial differential equations-based model with runtimes up to several hours or even days. The classical BMS is based on a quantity called Bayesian model evidence (BME). It determines the model weights in the selection process and resembles a trade-off between bias of a model and its complexity. However, in practice, the runtime of models is another weight relevant factor for model selection. Hence, we believe that it should be included, leading to an overall trade-off problem between bias, variance and computing effort. We approach this triple trade-off from the viewpoint of our ability to generate realizations of the models under a given computational budget. One way to obtain BME values is through sampling-based integration techniques. We argue with the fact that more expensive models can be sampled much less under time constraints than faster models (in straight proportion to their runtime). The computed evidence in favor of a more expensive model is statistically less significant than the evidence computed in favor of a faster model, since sampling-based strategies are always subject to statistical sampling error. We present a straightforward way to include this misbalance into the model weights that are the basis for model selection. Our approach follows directly from the idea of insufficient significance. It is based on a computationally cheap bootstrapping error estimate of model evidence and is easy to implement. The approach is illustrated in a small synthetic modeling study.
Maximum likelihood estimation for Cox's regression model under nested case-control sampling

DEFF Research Database (Denmark)

Scheike, Thomas Harder; Juul, Anders

2004-01-01

-like growth factor I was associated with ischemic heart disease. The study was based on a population of 3784 Danes and 231 cases of ischemic heart disease where controls were matched on age and gender. We illustrate the use of the MLE for these data and show how the maximum likelihood framework can be used......Nested case-control sampling is designed to reduce the costs of large cohort studies. It is important to estimate the parameters of interest as efficiently as possible. We present a new maximum likelihood estimator (MLE) for nested case-control sampling in the context of Cox's proportional hazards...... model. The MLE is computed by the EM-algorithm, which is easy to implement in the proportional hazards setting. Standard errors are estimated by a numerical profile likelihood approach based on EM aided differentiation. The work was motivated by a nested case-control study that hypothesized that insulin...
Reserves' potential of sedimentary basin: modeling and estimation; Potentiel de reserves d'un bassin petrolier: modelisation et estimation

Energy Technology Data Exchange (ETDEWEB)

Lepez, V

2002-12-01

The aim of this thesis is to build a statistical model of oil and gas fields' sizes distribution in a given sedimentary basin, for both the fields that exist in:the subsoil and those which have already been discovered. The estimation of all the parameters of the model via estimation of the density of the observations by model selection of piecewise polynomials by penalized maximum likelihood techniques enables to provide estimates of the total number of fields which are yet to be discovered, by class of size. We assume that the set of underground fields' sizes is an i.i.d. sample of unknown population with Levy-Pareto law with unknown parameter. The set of already discovered fields is a sub-sample without replacement from the previous which is 'size-biased'. The associated inclusion probabilities are to be estimated. We prove that the probability density of the observations is the product of the underlying density and of an unknown weighting function representing the sampling bias. An arbitrary partition of the sizes' interval being set (called a model), the analytical solutions of likelihood maximization enables to estimate both the parameter of the underlying Levy-Pareto law and the weighting function, which is assumed to be piecewise constant and based upon the partition. We shall add a monotonousness constraint over the latter, taking into account the fact that the bigger a field, the higher its probability of being discovered. Horvitz-Thompson-like estimators finally give the conclusion. We then allow our partitions to vary inside several classes of models and prove a model selection theorem which aims at selecting the best partition within a class, in terms of both Kuilback and Hellinger risk of the associated estimator. We conclude by simulations and various applications to real data from sedimentary basins of four continents, in order to illustrate theoretical as well as practical aspects of our model. (author)
Model to Estimate Monthly Time Horizons for Application of DEA in Selection of Stock Portfolio and for Maintenance of the Selected Portfolio

Directory of Open Access Journals (Sweden)

José Claudio Isaias

2015-01-01

Full Text Available In the selecting of stock portfolios, one type of analysis that has shown good results is Data Envelopment Analysis (DEA. It, however, has been shown to have gaps regarding its estimates of monthly time horizons of data collection for the selection of stock portfolios and of monthly time horizons for the maintenance of a selected portfolio. To better estimate these horizons, this study proposes a model of mathematical programming binary of minimization of square errors. This model is the paper’s main contribution. The model’s results are validated by simulating the estimated annual return indexes of a portfolio that uses both horizons estimated and of other portfolios that do not use these horizons. The simulation shows that portfolios with both horizons estimated have higher indexes, on average 6.99% per year. The hypothesis tests confirm the statistically significant superiority of the results of the proposed mathematical model’s indexes. The model’s indexes are also compared with portfolios that use just one of the horizons estimated; here the indexes of the dual-horizon portfolios outperform the single-horizon portfolios, though with a decrease in percentage of statistically significant superiority.
Two-step variable selection in quantile regression models

Directory of Open Access Journals (Sweden)

FAN Yali

2015-06-01

Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions, in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform ℓ1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.
Limited sampling strategy models for estimating the AUC of gliclazide in Chinese healthy volunteers.

Science.gov (United States)

Huang, Ji-Han; Wang, Kun; Huang, Xiao-Hui; He, Ying-Chun; Li, Lu-Jin; Sheng, Yu-Cheng; Yang, Juan; Zheng, Qing-Shan

2013-06-01

The aim of this work is to reduce the cost of required sampling for the estimation of the area under the gliclazide plasma concentration versus time curve within 60 h (AUC0-60t ). The limited sampling strategy (LSS) models were established and validated by the multiple regression model within 4 or fewer gliclazide concentration values. Absolute prediction error (APE), root of mean square error (RMSE) and visual prediction check were used as criterion. The results of Jack-Knife validation showed that 10 (25.0 %) of the 40 LSS based on the regression analysis were not within an APE of 15 % using one concentration-time point. 90.2, 91.5 and 92.4 % of the 40 LSS models were capable of prediction using 2, 3 and 4 points, respectively. Limited sampling strategies were developed and validated for estimating AUC0-60t of gliclazide. This study indicates that the implementation of an 80 mg dosage regimen enabled accurate predictions of AUC0-60t by the LSS model. This study shows that 12, 6, 4, 2 h after administration are the key sampling times. The combination of (12, 2 h), (12, 8, 2 h) or (12, 8, 4, 2 h) can be chosen as sampling hours for predicting AUC0-60t in practical application according to requirement.
Optimal difference-based estimation for partially linear models

KAUST Repository

Zhou, Yuejin; Cheng, Yebin; Dai, Wenlin; Tong, Tiejun

2017-01-01

Difference-based methods have attracted increasing attention for analyzing partially linear models in the recent literature. In this paper, we first propose to solve the optimal sequence selection problem in difference-based estimation for the linear component. To achieve the goal, a family of new sequences and a cross-validation method for selecting the adaptive sequence are proposed. We demonstrate that the existing sequences are only extreme cases in the proposed family. Secondly, we propose a new estimator for the residual variance by fitting a linear regression method to some difference-based estimators. Our proposed estimator achieves the asymptotic optimal rate of mean squared error. Simulation studies also demonstrate that our proposed estimator performs better than the existing estimator, especially when the sample size is small and the nonparametric function is rough.
Optimal difference-based estimation for partially linear models

KAUST Repository

Zhou, Yuejin

2017-12-16

Difference-based methods have attracted increasing attention for analyzing partially linear models in the recent literature. In this paper, we first propose to solve the optimal sequence selection problem in difference-based estimation for the linear component. To achieve the goal, a family of new sequences and a cross-validation method for selecting the adaptive sequence are proposed. We demonstrate that the existing sequences are only extreme cases in the proposed family. Secondly, we propose a new estimator for the residual variance by fitting a linear regression method to some difference-based estimators. Our proposed estimator achieves the asymptotic optimal rate of mean squared error. Simulation studies also demonstrate that our proposed estimator performs better than the existing estimator, especially when the sample size is small and the nonparametric function is rough.
Automated sample plan selection for OPC modeling

Science.gov (United States)

Casati, Nathalie; Gabrani, Maria; Viswanathan, Ramya; Bayraktar, Zikri; Jaiswal, Om; DeMaris, David; Abdo, Amr Y.; Oberschmidt, James; Krause, Andreas

2014-03-01

It is desired to reduce the time required to produce metrology data for calibration of Optical Proximity Correction (OPC) models and also maintain or improve the quality of the data collected with regard to how well that data represents the types of patterns that occur in real circuit designs. Previous work based on clustering in geometry and/or image parameter space has shown some benefit over strictly manual or intuitive selection, but leads to arbitrary pattern exclusion or selection which may not be the best representation of the product. Forming the pattern selection as an optimization problem, which co-optimizes a number of objective functions reflecting modelers' insight and expertise, has shown to produce models with equivalent quality to the traditional plan of record (POR) set but in a less time.
Optimal complex exponentials BEM and channel estimation in doubly selective channel

International Nuclear Information System (INIS)

Song, Lijun; Lei, Xia; Yu, Feng; Jin, Maozhu

2016-01-01

Over doubly selective channel, the optimal complex exponentials BEM (CE-BEM) is required to characterize the transmission in transform domain in order to reducing the huge number of the estimated parameters during directly estimating the impulse response in time domain. This paper proposed an improved CE-BEM to alleviating the high frequency sampling error caused by conventional CE-BEM. On the one hand, exploiting the improved CE-BEM, we achieve the sampling point is in the Doppler spread spectrum and the maximum sampling frequency is equal to the maximum Doppler shift. On the other hand we optimize the function and dimension of basis in CE-BEM respectively ,and obtain the closed solution of the EM based channel estimation differential operator by exploiting the above optimal BEM. Finally, the numerical results and theoretic analysis show that the dimension of basis is mainly depend on the maximum Doppler shift and signal-to-noise ratio (SNR), and if fixing the number of the pilot symbol, the dimension of basis is higher, the modeling error is smaller, while the accuracy of the parameter estimation is reduced, which implies that we need to achieve a tradeoff between the modeling error and the accuracy of the parameter estimation and the basis function influences the accuracy of describing the Doppler spread spectrum after identifying the dimension of the basis.
Sworn testimony of the model evidence: Gaussian Mixture Importance (GAME) sampling

Science.gov (United States)

Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.

2017-07-01

What is the "best" model? The answer to this question lies in part in the eyes of the beholder, nevertheless a good model must blend rigorous theory with redeeming qualities such as parsimony and quality of fit. Model selection is used to make inferences, via weighted averaging, from a set of K candidate models, Mk; k=>(1,…,K>), and help identify which model is most supported by the observed data, Y>˜=>(y˜1,…,y˜n>). Here, we introduce a new and robust estimator of the model evidence, p>(Y>˜|Mk>), which acts as normalizing constant in the denominator of Bayes' theorem and provides a single quantitative measure of relative support for each hypothesis that integrates model accuracy, uncertainty, and complexity. However, p>(Y>˜|Mk>) is analytically intractable for most practical modeling problems. Our method, coined GAussian Mixture importancE (GAME) sampling, uses bridge sampling of a mixture distribution fitted to samples of the posterior model parameter distribution derived from MCMC simulation. We benchmark the accuracy and reliability of GAME sampling by application to a diverse set of multivariate target distributions (up to 100 dimensions) with known values of p>(Y>˜|Mk>) and to hypothesis testing using numerical modeling of the rainfall-runoff transformation of the Leaf River watershed in Mississippi, USA. These case studies demonstrate that GAME sampling provides robust and unbiased estimates of the evidence at a relatively small computational cost outperforming commonly used estimators. The GAME sampler is implemented in the MATLAB package of DREAM and simplifies considerably scientific inquiry through hypothesis testing and model selection.
Implementing Generalized Additive Models to Estimate the Expected Value of Sample Information in a Microsimulation Model: Results of Three Case Studies.

Science.gov (United States)

Rabideau, Dustin J; Pei, Pamela P; Walensky, Rochelle P; Zheng, Amy; Parker, Robert A

2018-02-01

The expected value of sample information (EVSI) can help prioritize research but its application is hampered by computational infeasibility, especially for complex models. We investigated an approach by Strong and colleagues to estimate EVSI by applying generalized additive models (GAM) to results generated from a probabilistic sensitivity analysis (PSA). For 3 potential HIV prevention and treatment strategies, we estimated life expectancy and lifetime costs using the Cost-effectiveness of Preventing AIDS Complications (CEPAC) model, a complex patient-level microsimulation model of HIV progression. We fitted a GAM-a flexible regression model that estimates the functional form as part of the model fitting process-to the incremental net monetary benefits obtained from the CEPAC PSA. For each case study, we calculated the expected value of partial perfect information (EVPPI) using both the conventional nested Monte Carlo approach and the GAM approach. EVSI was calculated using the GAM approach. For all 3 case studies, the GAM approach consistently gave similar estimates of EVPPI compared with the conventional approach. The EVSI behaved as expected: it increased and converged to EVPPI for larger sample sizes. For each case study, generating the PSA results for the GAM approach required 3 to 4 days on a shared cluster, after which EVPPI and EVSI across a range of sample sizes were evaluated in minutes. The conventional approach required approximately 5 weeks for the EVPPI calculation alone. Estimating EVSI using the GAM approach with results from a PSA dramatically reduced the time required to conduct a computationally intense project, which would otherwise have been impractical. Using the GAM approach, we can efficiently provide policy makers with EVSI estimates, even for complex patient-level microsimulation models.
Optimal covariance selection for estimation using graphical models

OpenAIRE

Vichik, Sergey; Oshman, Yaakov

2011-01-01

We consider a problem encountered when trying to estimate a Gaussian random field using a distributed estimation approach based on Gaussian graphical models. Because of constraints imposed by estimation tools used in Gaussian graphical models, the a priori covariance of the random field is constrained to embed conditional independence constraints among a significant number of variables. The problem is, then: given the (unconstrained) a priori covariance of the random field, and the conditiona...
Reserves' potential of sedimentary basin: modeling and estimation; Potentiel de reserves d'un bassin petrolier: modelisation et estimation

Energy Technology Data Exchange (ETDEWEB)

Lepez, V.

2002-12-01

The aim of this thesis is to build a statistical model of oil and gas fields' sizes distribution in a given sedimentary basin, for both the fields that exist in:the subsoil and those which have already been discovered. The estimation of all the parameters of the model via estimation of the density of the observations by model selection of piecewise polynomials by penalized maximum likelihood techniques enables to provide estimates of the total number of fields which are yet to be discovered, by class of size. We assume that the set of underground fields' sizes is an i.i.d. sample of unknown population with Levy-Pareto law with unknown parameter. The set of already discovered fields is a sub-sample without replacement from the previous which is 'size-biased'. The associated inclusion probabilities are to be estimated. We prove that the probability density of the observations is the product of the underlying density and of an unknown weighting function representing the sampling bias. An arbitrary partition of the sizes' interval being set (called a model), the analytical solutions of likelihood maximization enables to estimate both the parameter of the underlying Levy-Pareto law and the weighting function, which is assumed to be piecewise constant and based upon the partition. We shall add a monotonousness constraint over the latter, taking into account the fact that the bigger a field, the higher its probability of being discovered. Horvitz-Thompson-like estimators finally give the conclusion. We then allow our partitions to vary inside several classes of models and prove a model selection theorem which aims at selecting the best partition within a class, in terms of both Kuilback and Hellinger risk of the associated estimator. We conclude by simulations and various applications to real data from sedimentary basins of four continents, in order to illustrate theoretical as well as practical aspects of our model. (author)
Post-model selection inference and model averaging

Directory of Open Access Journals (Sweden)

Georges Nguefack-Tsague

2011-07-01

Full Text Available Although model selection is routinely used in practice nowadays, little is known about its precise effects on any subsequent inference that is carried out. The same goes for the effects induced by the closely related technique of model averaging. This paper is concerned with the use of the same data first to select a model and then to carry out inference, in particular point estimation and point prediction. The properties of the resulting estimator, called a post-model-selection estimator (PMSE, are hard to derive. Using selection criteria such as hypothesis testing, AIC, BIC, HQ and Cp, we illustrate that, in terms of risk function, no single PMSE dominates the others. The same conclusion holds more generally for any penalised likelihood information criterion. We also compare various model averaging schemes and show that no single one dominates the others in terms of risk function. Since PMSEs can be regarded as a special case of model averaging, with 0-1 random-weights, we propose a connection between the two theories, in the frequentist approach, by taking account of the selection procedure when performing model averaging. We illustrate the point by simulating a simple linear regression model.
System health monitoring using multiple-model adaptive estimation techniques

Science.gov (United States)

Sifford, Stanley Ryan

Monitoring system health for fault detection and diagnosis by tracking system parameters concurrently with state estimates is approached using a new multiple-model adaptive estimation (MMAE) method. This novel method is called GRid-based Adaptive Parameter Estimation (GRAPE). GRAPE expands existing MMAE methods by using new techniques to sample the parameter space. GRAPE expands on MMAE with the hypothesis that sample models can be applied and resampled without relying on a predefined set of models. GRAPE is initially implemented in a linear framework using Kalman filter models. A more generalized GRAPE formulation is presented using extended Kalman filter (EKF) models to represent nonlinear systems. GRAPE can handle both time invariant and time varying systems as it is designed to track parameter changes. Two techniques are presented to generate parameter samples for the parallel filter models. The first approach is called selected grid-based stratification (SGBS). SGBS divides the parameter space into equally spaced strata. The second approach uses Latin Hypercube Sampling (LHS) to determine the parameter locations and minimize the total number of required models. LHS is particularly useful when the parameter dimensions grow. Adding more parameters does not require the model count to increase for LHS. Each resample is independent of the prior sample set other than the location of the parameter estimate. SGBS and LHS can be used for both the initial sample and subsequent resamples. Furthermore, resamples are not required to use the same technique. Both techniques are demonstrated for both linear and nonlinear frameworks. The GRAPE framework further formalizes the parameter tracking process through a general approach for nonlinear systems. These additional methods allow GRAPE to either narrow the focus to converged values within a parameter range or expand the range in the appropriate direction to track the parameters outside the current parameter range boundary
Hybrid Cubature Kalman filtering for identifying nonlinear models from sampled recording: Estimation of neuronal dynamics.

Science.gov (United States)

Madi, Mahmoud K; Karameh, Fadi N

2017-01-01

Kalman filtering methods have long been regarded as efficient adaptive Bayesian techniques for estimating hidden states in models of linear dynamical systems under Gaussian uncertainty. Recent advents of the Cubature Kalman filter (CKF) have extended this efficient estimation property to nonlinear systems, and also to hybrid nonlinear problems where by the processes are continuous and the observations are discrete (continuous-discrete CD-CKF). Employing CKF techniques, therefore, carries high promise for modeling many biological phenomena where the underlying processes exhibit inherently nonlinear, continuous, and noisy dynamics and the associated measurements are uncertain and time-sampled. This paper investigates the performance of cubature filtering (CKF and CD-CKF) in two flagship problems arising in the field of neuroscience upon relating brain functionality to aggregate neurophysiological recordings: (i) estimation of the firing dynamics and the neural circuit model parameters from electric potentials (EP) observations, and (ii) estimation of the hemodynamic model parameters and the underlying neural drive from BOLD (fMRI) signals. First, in simulated neural circuit models, estimation accuracy was investigated under varying levels of observation noise (SNR), process noise structures, and observation sampling intervals (dt). When compared to the CKF, the CD-CKF consistently exhibited better accuracy for a given SNR, sharp accuracy increase with higher SNR, and persistent error reduction with smaller dt. Remarkably, CD-CKF accuracy shows only a mild deterioration for non-Gaussian process noise, specifically with Poisson noise, a commonly assumed form of background fluctuations in neuronal systems. Second, in simulated hemodynamic models, parametric estimates were consistently improved under CD-CKF. Critically, time-localization of the underlying neural drive, a determinant factor in fMRI-based functional connectivity studies, was significantly more accurate
Hybrid Cubature Kalman filtering for identifying nonlinear models from sampled recording: Estimation of neuronal dynamics

Science.gov (United States)

2017-01-01

Kalman filtering methods have long been regarded as efficient adaptive Bayesian techniques for estimating hidden states in models of linear dynamical systems under Gaussian uncertainty. Recent advents of the Cubature Kalman filter (CKF) have extended this efficient estimation property to nonlinear systems, and also to hybrid nonlinear problems where by the processes are continuous and the observations are discrete (continuous-discrete CD-CKF). Employing CKF techniques, therefore, carries high promise for modeling many biological phenomena where the underlying processes exhibit inherently nonlinear, continuous, and noisy dynamics and the associated measurements are uncertain and time-sampled. This paper investigates the performance of cubature filtering (CKF and CD-CKF) in two flagship problems arising in the field of neuroscience upon relating brain functionality to aggregate neurophysiological recordings: (i) estimation of the firing dynamics and the neural circuit model parameters from electric potentials (EP) observations, and (ii) estimation of the hemodynamic model parameters and the underlying neural drive from BOLD (fMRI) signals. First, in simulated neural circuit models, estimation accuracy was investigated under varying levels of observation noise (SNR), process noise structures, and observation sampling intervals (dt). When compared to the CKF, the CD-CKF consistently exhibited better accuracy for a given SNR, sharp accuracy increase with higher SNR, and persistent error reduction with smaller dt. Remarkably, CD-CKF accuracy shows only a mild deterioration for non-Gaussian process noise, specifically with Poisson noise, a commonly assumed form of background fluctuations in neuronal systems. Second, in simulated hemodynamic models, parametric estimates were consistently improved under CD-CKF. Critically, time-localization of the underlying neural drive, a determinant factor in fMRI-based functional connectivity studies, was significantly more accurate

A simple numerical model to estimate the effect of coal selection on pulverized fuel burnout

Energy Technology Data Exchange (ETDEWEB)

Sun, J.K.; Hurt, R.H.; Niksa, S.; Muzio, L.; Mehta, A.; Stallings, J. [Brown University, Providence, RI (USA). Division Engineering

2003-06-01

The amount of unburned carbon in ash is an important performance characteristic in commercial boilers fired with pulverized coal. Unburned carbon levels are known to be sensitive to fuel selection, and there is great interest in methods of estimating the burnout propensity of coals based on proximate and ultimate analysis - the only fuel properties readily available to utility practitioners. A simple numerical model is described that is specifically designed to estimate the effects of coal selection on burnout in a way that is useful for commercial coal screening. The model is based on a highly idealized description of the combustion chamber but employs detailed descriptions of the fundamental fuel transformations. The model is validated against data from laboratory and pilot-scale combustors burning a range of international coals, and then against data obtained from full-scale units during periods of coal switching. The validated model form is then used in a series of sensitivity studies to explore the role of various individual fuel properties that influence burnout.
A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model

Directory of Open Access Journals (Sweden)

Michael Pfaffermayr

2014-10-01

Full Text Available The Heckman sample selection model relies on the assumption of normal and homoskedastic disturbances. However, before considering more general, alternative semiparametric models that do not need the normality assumption, it seems useful to test this assumption. Following Meijer and Wansbeek (2007, the present contribution derives a GMM-based pseudo-score LM test on whether the third and fourth moments of the disturbances of the outcome equation of the Heckman model conform to those implied by the truncated normal distribution. The test is easy to calculate and in Monte Carlo simulations it shows good performance for sample sizes of 1000 or larger.
A Model Based Approach to Sample Size Estimation in Recent Onset Type 1 Diabetes

Science.gov (United States)

Bundy, Brian; Krischer, Jeffrey P.

2016-01-01

The area under the curve C-peptide following a 2-hour mixed meal tolerance test from 481 individuals enrolled on 5 prior TrialNet studies of recent onset type 1 diabetes from baseline to 12 months after enrollment were modelled to produce estimates of its rate of loss and variance. Age at diagnosis and baseline C-peptide were found to be significant predictors and adjusting for these in an ANCOVA resulted in estimates with lower variance. Using these results as planning parameters for new studies results in a nearly 50% reduction in the target sample size. The modelling also produces an expected C-peptide that can be used in Observed vs. Expected calculations to estimate the presumption of benefit in ongoing trials. PMID:26991448
Finite mixture model: A maximum likelihood estimation approach on time series data

Science.gov (United States)

Yen, Phoong Seuk; Ismail, Mohd Tahir; Hamzah, Firdaus Mohamad

2014-09-01

Recently, statistician emphasized on the fitting of finite mixture model by using maximum likelihood estimation as it provides asymptotic properties. In addition, it shows consistency properties as the sample sizes increases to infinity. This illustrated that maximum likelihood estimation is an unbiased estimator. Moreover, the estimate parameters obtained from the application of maximum likelihood estimation have smallest variance as compared to others statistical method as the sample sizes increases. Thus, maximum likelihood estimation is adopted in this paper to fit the two-component mixture model in order to explore the relationship between rubber price and exchange rate for Malaysia, Thailand, Philippines and Indonesia. Results described that there is a negative effect among rubber price and exchange rate for all selected countries.
Mixed Frequency Data Sampling Regression Models: The R Package midasr

Directory of Open Access Journals (Sweden)

Eric Ghysels

2016-08-01

Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.
Estimating the encounter rate variance in distance sampling

Science.gov (United States)

Fewster, R.M.; Buckland, S.T.; Burnham, K.P.; Borchers, D.L.; Jupp, P.E.; Laake, J.L.; Thomas, L.

2009-01-01

The dominant source of variance in line transect sampling is usually the encounter rate variance. Systematic survey designs are often used to reduce the true variability among different realizations of the design, but estimating the variance is difficult and estimators typically approximate the variance by treating the design as a simple random sample of lines. We explore the properties of different encounter rate variance estimators under random and systematic designs. We show that a design-based variance estimator improves upon the model-based estimator of Buckland et al. (2001, Introduction to Distance Sampling. Oxford: Oxford University Press, p. 79) when transects are positioned at random. However, if populations exhibit strong spatial trends, both estimators can have substantial positive bias under systematic designs. We show that poststratification is effective in reducing this bias. ?? 2008, The International Biometric Society.
An econometric method for estimating population parameters from non-random samples: An application to clinical case finding.

Science.gov (United States)

Burger, Rulof P; McLaren, Zoë M

2017-09-01

The problem of sample selection complicates the process of drawing inference about populations. Selective sampling arises in many real world situations when agents such as doctors and customs officials search for targets with high values of a characteristic. We propose a new method for estimating population characteristics from these types of selected samples. We develop a model that captures key features of the agent's sampling decision. We use a generalized method of moments with instrumental variables and maximum likelihood to estimate the population prevalence of the characteristic of interest and the agents' accuracy in identifying targets. We apply this method to tuberculosis (TB), which is the leading infectious disease cause of death worldwide. We use a national database of TB test data from South Africa to examine testing for multidrug resistant TB (MDR-TB). Approximately one quarter of MDR-TB cases was undiagnosed between 2004 and 2010. The official estimate of 2.5% is therefore too low, and MDR-TB prevalence is as high as 3.5%. Signal-to-noise ratios are estimated to be between 0.5 and 1. Our approach is widely applicable because of the availability of routinely collected data and abundance of potential instruments. Using routinely collected data to monitor population prevalence can guide evidence-based policy making. Copyright © 2017 John Wiley & Sons, Ltd.
A geostatistical estimation of zinc grade in bore-core samples

International Nuclear Information System (INIS)

Starzec, A.

1987-01-01

Possibilities and preliminary results of geostatistical interpretation of the XRF determination of zinc in bore-core samples are considered. For the spherical model of the variogram the estimation variance of grade in a disk-shape sample (estimated from the grade on the circumference sample) is calculated. Variograms of zinc grade in core samples are presented and examples of the grade estimation are discussed. 4 refs., 7 figs., 1 tab. (author)
Estimating abundance of mountain lions from unstructured spatial sampling

Science.gov (United States)

Russell, Robin E.; Royle, J. Andrew; Desimone, Richard; Schwartz, Michael K.; Edwards, Victoria L.; Pilgrim, Kristy P.; Mckelvey, Kevin S.

2012-01-01

Mountain lions (Puma concolor) are often difficult to monitor because of their low capture probabilities, extensive movements, and large territories. Methods for estimating the abundance of this species are needed to assess population status, determine harvest levels, evaluate the impacts of management actions on populations, and derive conservation and management strategies. Traditional mark–recapture methods do not explicitly account for differences in individual capture probabilities due to the spatial distribution of individuals in relation to survey effort (or trap locations). However, recent advances in the analysis of capture–recapture data have produced methods estimating abundance and density of animals from spatially explicit capture–recapture data that account for heterogeneity in capture probabilities due to the spatial organization of individuals and traps. We adapt recently developed spatial capture–recapture models to estimate density and abundance of mountain lions in western Montana. Volunteers and state agency personnel collected mountain lion DNA samples in portions of the Blackfoot drainage (7,908 km2) in west-central Montana using 2 methods: snow back-tracking mountain lion tracks to collect hair samples and biopsy darting treed mountain lions to obtain tissue samples. Overall, we recorded 72 individual capture events, including captures both with and without tissue sample collection and hair samples resulting in the identification of 50 individual mountain lions (30 females, 19 males, and 1 unknown sex individual). We estimated lion densities from 8 models containing effects of distance, sex, and survey effort on detection probability. Our population density estimates ranged from a minimum of 3.7 mountain lions/100 km2 (95% Cl 2.3–5.7) under the distance only model (including only an effect of distance on detection probability) to 6.7 (95% Cl 3.1–11.0) under the full model (including effects of distance, sex, survey effort, and
Varying Coefficient Panel Data Model in the Presence of Endogenous Selectivity and Fixed Effects

OpenAIRE

Malikov, Emir; Kumbhakar, Subal C.; Sun, Yiguo

2013-01-01

This paper considers a flexible panel data sample selection model in which (i) the outcome equation is permitted to take a semiparametric, varying coefficient form to capture potential parameter heterogeneity in the relationship of interest, (ii) both the outcome and (parametric) selection equations contain unobserved fixed effects and (iii) selection is generalized to a polychotomous case. We propose a two-stage estimator. Given consistent parameter estimates from the selection equation obta...
Estimating HIES Data through Ratio and Regression Methods for Different Sampling Designs

Directory of Open Access Journals (Sweden)

Faqir Muhammad

2007-01-01

Full Text Available In this study, comparison has been made for different sampling designs, using the HIES data of North West Frontier Province (NWFP for 2001-02 and 1998-99 collected from the Federal Bureau of Statistics, Statistical Division, Government of Pakistan, Islamabad. The performance of the estimators has also been considered using bootstrap and Jacknife. A two-stage stratified random sample design is adopted by HIES. In the first stage, enumeration blocks and villages are treated as the first stage Primary Sampling Units (PSU. The sample PSU’s are selected with probability proportional to size. Secondary Sampling Units (SSU i.e., households are selected by systematic sampling with a random start. They have used a single study variable. We have compared the HIES technique with some other designs, which are: Stratified Simple Random Sampling. Stratified Systematic Sampling. Stratified Ranked Set Sampling. Stratified Two Phase Sampling. Ratio and Regression methods were applied with two study variables, which are: Income (y and Household sizes (x. Jacknife and Bootstrap are used for variance replication. Simple Random Sampling with sample size (462 to 561 gave moderate variances both by Jacknife and Bootstrap. By applying Systematic Sampling, we received moderate variance with sample size (467. In Jacknife with Systematic Sampling, we obtained variance of regression estimator greater than that of ratio estimator for a sample size (467 to 631. At a sample size (952 variance of ratio estimator gets greater than that of regression estimator. The most efficient design comes out to be Ranked set sampling compared with other designs. The Ranked set sampling with jackknife and bootstrap, gives minimum variance even with the smallest sample size (467. Two Phase sampling gave poor performance. Multi-stage sampling applied by HIES gave large variances especially if used with a single study variable.
Effects of model complexity and priors on estimation using sequential importance sampling/resampling for species conservation

Science.gov (United States)

Dunham, Kylee; Grand, James B.

2016-01-01

We examined the effects of complexity and priors on the accuracy of models used to estimate ecological and observational processes, and to make predictions regarding population size and structure. State-space models are useful for estimating complex, unobservable population processes and making predictions about future populations based on limited data. To better understand the utility of state space models in evaluating population dynamics, we used them in a Bayesian framework and compared the accuracy of models with differing complexity, with and without informative priors using sequential importance sampling/resampling (SISR). Count data were simulated for 25 years using known parameters and observation process for each model. We used kernel smoothing to reduce the effect of particle depletion, which is common when estimating both states and parameters with SISR. Models using informative priors estimated parameter values and population size with greater accuracy than their non-informative counterparts. While the estimates of population size and trend did not suffer greatly in models using non-informative priors, the algorithm was unable to accurately estimate demographic parameters. This model framework provides reasonable estimates of population size when little to no information is available; however, when information on some vital rates is available, SISR can be used to obtain more precise estimates of population size and process. Incorporating model complexity such as that required by structured populations with stage-specific vital rates affects precision and accuracy when estimating latent population variables and predicting population dynamics. These results are important to consider when designing monitoring programs and conservation efforts requiring management of specific population segments.
Adaptive measurement selection for progressive damage estimation

Science.gov (United States)

Zhou, Wenfan; Kovvali, Narayan; Papandreou-Suppappola, Antonia; Chattopadhyay, Aditi; Peralta, Pedro

2011-04-01

Noise and interference in sensor measurements degrade the quality of data and have a negative impact on the performance of structural damage diagnosis systems. In this paper, a novel adaptive measurement screening approach is presented to automatically select the most informative measurements and use them intelligently for structural damage estimation. The method is implemented efficiently in a sequential Monte Carlo (SMC) setting using particle filtering. The noise suppression and improved damage estimation capability of the proposed method is demonstrated by an application to the problem of estimating progressive fatigue damage in an aluminum compact-tension (CT) sample using noisy PZT sensor measurements.
A model-based approach to sample size estimation in recent onset type 1 diabetes.

Science.gov (United States)

Bundy, Brian N; Krischer, Jeffrey P

2016-11-01

The area under the curve C-peptide following a 2-h mixed meal tolerance test from 498 individuals enrolled on five prior TrialNet studies of recent onset type 1 diabetes from baseline to 12 months after enrolment were modelled to produce estimates of its rate of loss and variance. Age at diagnosis and baseline C-peptide were found to be significant predictors, and adjusting for these in an ANCOVA resulted in estimates with lower variance. Using these results as planning parameters for new studies results in a nearly 50% reduction in the target sample size. The modelling also produces an expected C-peptide that can be used in observed versus expected calculations to estimate the presumption of benefit in ongoing trials. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A sampling strategy for estimating plot average annual fluxes of chemical elements from forest soils

NARCIS (Netherlands)

Brus, D.J.; Gruijter, de J.J.; Vries, de W.

2010-01-01

A sampling strategy for estimating spatially averaged annual element leaching fluxes from forest soils is presented and tested in three Dutch forest monitoring plots. In this method sampling locations and times (days) are selected by probability sampling. Sampling locations were selected by
Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.

Science.gov (United States)

Zeng, Xueqiang; Luo, Gang

2017-12-01

Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Comparison of sampling techniques for Bayesian parameter estimation

Science.gov (United States)

Allison, Rupert; Dunkley, Joanna

2014-02-01

The posterior probability distribution for a set of model parameters encodes all that the data have to tell us in the context of a given model; it is the fundamental quantity for Bayesian parameter estimation. In order to infer the posterior probability distribution we have to decide how to explore parameter space. Here we compare three prescriptions for how parameter space is navigated, discussing their relative merits. We consider Metropolis-Hasting sampling, nested sampling and affine-invariant ensemble Markov chain Monte Carlo (MCMC) sampling. We focus on their performance on toy-model Gaussian likelihoods and on a real-world cosmological data set. We outline the sampling algorithms themselves and elaborate on performance diagnostics such as convergence time, scope for parallelization, dimensional scaling, requisite tunings and suitability for non-Gaussian distributions. We find that nested sampling delivers high-fidelity estimates for posterior statistics at low computational cost, and should be adopted in favour of Metropolis-Hastings in many cases. Affine-invariant MCMC is competitive when computing clusters can be utilized for massive parallelization. Affine-invariant MCMC and existing extensions to nested sampling naturally probe multimodal and curving distributions.
Instance Selection for Classifier Performance Estimation in Meta Learning

OpenAIRE

Marcin Blachnik

2017-01-01

Building an accurate prediction model is challenging and requires appropriate model selection. This process is very time consuming but can be accelerated with meta-learning–automatic model recommendation by estimating the performances of given prediction models without training them. Meta-learning utilizes metadata extracted from the dataset to effectively estimate the accuracy of the model in question. To achieve that goal, metadata descriptors must be gathered efficiently and must be inform...
Maximum likelihood estimation for Cox's regression model under nested case-control sampling

DEFF Research Database (Denmark)

Scheike, Thomas; Juul, Anders

2004-01-01

Nested case-control sampling is designed to reduce the costs of large cohort studies. It is important to estimate the parameters of interest as efficiently as possible. We present a new maximum likelihood estimator (MLE) for nested case-control sampling in the context of Cox's proportional hazard...
A Note on the Large Sample Properties of Estimators Based on Generalized Linear Models for Correlated Pseudo-observations

DEFF Research Database (Denmark)

Jacobsen, Martin; Martinussen, Torben

2016-01-01

Pseudo-values have proven very useful in censored data analysis in complex settings such as multi-state models. It was originally suggested by Andersen et al., Biometrika, 90, 2003, 335 who also suggested to estimate standard errors using classical generalized estimating equation results. These r......Pseudo-values have proven very useful in censored data analysis in complex settings such as multi-state models. It was originally suggested by Andersen et al., Biometrika, 90, 2003, 335 who also suggested to estimate standard errors using classical generalized estimating equation results....... These results were studied more formally in Graw et al., Lifetime Data Anal., 15, 2009, 241 that derived some key results based on a second-order von Mises expansion. However, results concerning large sample properties of estimates based on regression models for pseudo-values still seem unclear. In this paper......, we study these large sample properties in the simple setting of survival probabilities and show that the estimating function can be written as a U-statistic of second order giving rise to an additional term that does not vanish asymptotically. We further show that previously advocated standard error...

Lightweight Graphical Models for Selectivity Estimation Without Independence Assumptions

DEFF Research Database (Denmark)

Tzoumas, Kostas; Deshpande, Amol; Jensen, Christian S.

2011-01-01

the attributes in the database into small, usually two-dimensional distributions. We describe several optimizations that can make selectivity estimation highly efficient, and we present a complete implementation inside PostgreSQL’s query optimizer. Experimental results indicate an order of magnitude better...
Testing the normality assumption in the sample selection model with an application to travel demand

NARCIS (Netherlands)

van der Klaauw, B.; Koning, R.H.

2003-01-01

In this article we introduce a test for the normality assumption in the sample selection model. The test is based on a flexible parametric specification of the density function of the error terms in the model. This specification follows a Hermite series with bivariate normality as a special case.
Testing the normality assumption in the sample selection model with an application to travel demand

NARCIS (Netherlands)

van der Klauw, B.; Koning, R.H.

In this article we introduce a test for the normality assumption in the sample selection model. The test is based on a flexible parametric specification of the density function of the error terms in the model. This specification follows a Hermite series with bivariate normality as a special case.
Estimating cross-validatory predictive p-values with integrated importance sampling for disease mapping models.

Science.gov (United States)

Li, Longhai; Feng, Cindy X; Qiu, Shi

2017-06-30

An important statistical task in disease mapping problems is to identify divergent regions with unusually high or low risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is the gold standard for estimating predictive p-values that can flag such divergent regions. However, actual LOOCV is time-consuming because one needs to rerun a Markov chain Monte Carlo analysis for each posterior distribution in which an observation is held out as a test case. This paper introduces a new method, called integrated importance sampling (iIS), for estimating LOOCV predictive p-values with only Markov chain samples drawn from the posterior based on a full data set. The key step in iIS is that we integrate away the latent variables associated the test observation with respect to their conditional distribution without reference to the actual observation. By following the general theory for importance sampling, the formula used by iIS can be proved to be equivalent to the LOOCV predictive p-value. We compare iIS and other three existing methods in the literature with two disease mapping datasets. Our empirical results show that the predictive p-values estimated with iIS are almost identical to the predictive p-values estimated with actual LOOCV and outperform those given by the existing three methods, namely, the posterior predictive checking, the ordinary importance sampling, and the ghosting method by Marshall and Spiegelhalter (2003). Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
An Improvement to Interval Estimation for Small Samples

Directory of Open Access Journals (Sweden)

SUN Hui-Ling

2017-02-01

Full Text Available Because it is difficult and complex to determine the probability distribution of small samples，it is improper to use traditional probability theory to process parameter estimation for small samples. Bayes Bootstrap method is always used in the project. Although，the Bayes Bootstrap method has its own limitation，In this article an improvement is given to the Bayes Bootstrap method，This method extended the amount of samples by numerical simulation without changing the circumstances in a small sample of the original sample. And the new method can give the accurate interval estimation for the small samples. Finally，by using the Monte Carlo simulation to model simulation to the specific small sample problems. The effectiveness and practicability of the Improved-Bootstrap method was proved.
Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling

KAUST Repository

Maadooliat, Mehdi

2015-10-21

This paper develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well.
Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling

KAUST Repository

Maadooliat, Mehdi; Zhou, Lan; Najibi, Seyed Morteza; Gao, Xin; Huang, Jianhua Z.

2015-01-01

This paper develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well.
Impact of multicollinearity on small sample hydrologic regression models

Science.gov (United States)

Kroll, Charles N.; Song, Peter

2013-06-01

Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Gender Wage Gap : A Semi-Parametric Approach With Sample Selection Correction

NARCIS (Netherlands)

Picchio, M.; Mussida, C.

2010-01-01

Sizeable gender differences in employment rates are observed in many countries. Sample selection into the workforce might therefore be a relevant issue when estimating gender wage gaps. This paper proposes a new semi-parametric estimator of densities in the presence of covariates which incorporates
Statistical approach for selection of regression model during validation of bioanalytical method

Directory of Open Access Journals (Sweden)

Natalija Nakov

2014-06-01

Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.
Small area estimation (SAE) model: Case study of poverty in West Java Province

Science.gov (United States)

Suhartini, Titin; Sadik, Kusman; Indahwati

2016-02-01

This paper showed the comparative of direct estimation and indirect/Small Area Estimation (SAE) model. Model selection included resolve multicollinearity problem in auxiliary variable, such as choosing only variable non-multicollinearity and implemented principal component (PC). Concern parameters in this paper were the proportion of agricultural venture poor households and agricultural poor households area level in West Java Province. The approach for estimating these parameters could be performed based on direct estimation and SAE. The problem of direct estimation, three area even zero and could not be conducted by directly estimation, because small sample size. The proportion of agricultural venture poor households showed 19.22% and agricultural poor households showed 46.79%. The best model from agricultural venture poor households by choosing only variable non-multicollinearity and the best model from agricultural poor households by implemented PC. The best estimator showed SAE better then direct estimation both of the proportion of agricultural venture poor households and agricultural poor households area level in West Java Province. The solution overcame small sample size and obtained estimation for small area was implemented small area estimation method for evidence higher accuracy and better precision improved direct estimator.
Linking resource selection and mortality modeling for population estimation of mountain lions in Montana

Science.gov (United States)

Robinson, Hugh S.; Ruth, Toni K.; Gude, Justin A.; Choate, David; DeSimone, Rich; Hebblewhite, Mark; Matchett, Marc R.; Mitchell, Michael S.; Murphy, Kerry; Williams, Jim

2015-01-01

To be most effective, the scale of wildlife management practices should match the range of a particular species’ movements. For this reason, combined with our inability to rigorously or regularly census mountain lion populations, several authors have suggested that mountain lions be managed in a source-sink or metapopulation framework. We used a combination of resource selection functions, mortality estimation, and dispersal modeling to estimate cougar population levels in Montana statewide and potential population level effects of planned harvest levels. Between 1980 and 2012, 236 independent mountain lions were collared and monitored for research in Montana. From these data we used 18,695 GPS locations collected during winter from 85 animals to develop a resource selection function (RSF), and 11,726 VHF and GPS locations from 142 animals along with the locations of 6343 mountain lions harvested from 1988–2011 to validate the RSF model. Our RSF model validated well in all portions of the State, although it appeared to perform better in Montana Fish, Wildlife and Parks (MFWP) Regions 1, 2, 4 and 6, than in Regions 3, 5, and 7. Our mean RSF based population estimate for the total population (kittens, juveniles, and adults) of mountain lions in Montana in 2005 was 3926, with almost 25% of the entire population in MFWP Region 1. Estimates based on a high and low reference population estimates produce a possible range of 2784 to 5156 mountain lions statewide. Based on a range of possible survival rates we estimated the mountain lion population in Montana to be stable to slightly increasing between 2005 and 2010 with lambda ranging from 0.999 (SD = 0.05) to 1.02 (SD = 0.03). We believe these population growth rates to be a conservative estimate of true population growth. Our model suggests that proposed changes to female harvest quotas for 2013–2015 will result in an annual statewide population decline of 3% and shows that, due to reduced dispersal, changes to
Model-based estimation of finite population total in stratified sampling

African Journals Online (AJOL)

The work presented in this paper concerns the estimation of finite population total under model – based framework. Nonparametric regression approach as a method of estimating finite population total is explored. The asymptotic properties of the estimators based on nonparametric regression are also developed under ...
Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection.

Science.gov (United States)

Kim, Sanghong; Kano, Manabu; Nakagawa, Hiroshi; Hasebe, Shinji

2011-12-15

Development of quality estimation models using near infrared spectroscopy (NIRS) and multivariate analysis has been accelerated as a process analytical technology (PAT) tool in the pharmaceutical industry. Although linear regression methods such as partial least squares (PLS) are widely used, they cannot always achieve high estimation accuracy because physical and chemical properties of a measuring object have a complex effect on NIR spectra. In this research, locally weighted PLS (LW-PLS) which utilizes a newly defined similarity between samples is proposed to estimate active pharmaceutical ingredient (API) content in granules for tableting. In addition, a statistical wavelength selection method which quantifies the effect of API content and other factors on NIR spectra is proposed. LW-PLS and the proposed wavelength selection method were applied to real process data provided by Daiichi Sankyo Co., Ltd., and the estimation accuracy was improved by 38.6% in root mean square error of prediction (RMSEP) compared to the conventional PLS using wavelengths selected on the basis of variable importance on the projection (VIP). The results clearly show that the proposed calibration modeling technique is useful for API content estimation and is superior to the conventional one. Copyright © 2011 Elsevier B.V. All rights reserved.
Robust estimation for partially linear models with large-dimensional covariates.

Science.gov (United States)

Zhu, LiPing; Li, RunZe; Cui, HengJian

2013-10-01

We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.
How many dinosaur species were there? Fossil bias and true richness estimated using a Poisson sampling model.

Science.gov (United States)

Starrfelt, Jostein; Liow, Lee Hsiang

2016-04-05

The fossil record is a rich source of information about biological diversity in the past. However, the fossil record is not only incomplete but has also inherent biases due to geological, physical, chemical and biological factors. Our knowledge of past life is also biased because of differences in academic and amateur interests and sampling efforts. As a result, not all individuals or species that lived in the past are equally likely to be discovered at any point in time or space. To reconstruct temporal dynamics of diversity using the fossil record, biased sampling must be explicitly taken into account. Here, we introduce an approach that uses the variation in the number of times each species is observed in the fossil record to estimate both sampling bias and true richness. We term our technique TRiPS (True Richness estimated using a Poisson Sampling model) and explore its robustness to violation of its assumptions via simulations. We then venture to estimate sampling bias and absolute species richness of dinosaurs in the geological stages of the Mesozoic. Using TRiPS, we estimate that 1936 (1543-2468) species of dinosaurs roamed the Earth during the Mesozoic. We also present improved estimates of species richness trajectories of the three major dinosaur clades: the sauropodomorphs, ornithischians and theropods, casting doubt on the Jurassic-Cretaceous extinction event and demonstrating that all dinosaur groups are subject to considerable sampling bias throughout the Mesozoic. © 2016 The Authors.
Instance Selection for Classifier Performance Estimation in Meta Learning

Directory of Open Access Journals (Sweden)

Marcin Blachnik

2017-11-01

Full Text Available Building an accurate prediction model is challenging and requires appropriate model selection. This process is very time consuming but can be accelerated with meta-learning–automatic model recommendation by estimating the performances of given prediction models without training them. Meta-learning utilizes metadata extracted from the dataset to effectively estimate the accuracy of the model in question. To achieve that goal, metadata descriptors must be gathered efficiently and must be informative to allow the precise estimation of prediction accuracy. In this paper, a new type of metadata descriptors is analyzed. These descriptors are based on the compression level obtained from the instance selection methods at the data-preprocessing stage. To verify their suitability, two types of experiments on real-world datasets have been conducted. In the first one, 11 instance selection methods were examined in order to validate the compression–accuracy relation for three classifiers: k-nearest neighbors (kNN, support vector machine (SVM, and random forest. From this analysis, two methods are recommended (instance-based learning type 2 (IB2, and edited nearest neighbor (ENN which are then compared with the state-of-the-art metaset descriptors. The obtained results confirm that the two suggested compression-based meta-features help to predict accuracy of the base model much more accurately than the state-of-the-art solution.
Calibration model maintenance in melamine resin production: Integrating drift detection, smart sample selection and model adaptation.

Science.gov (United States)

Nikzad-Langerodi, Ramin; Lughofer, Edwin; Cernuda, Carlos; Reischer, Thomas; Kantner, Wolfgang; Pawliczek, Marcin; Brandstetter, Markus

2018-07-12

selection of samples by active learning (AL) used for subsequent model adaptation is advantageous compared to passive (random) selection in case that a drift leads to persistent prediction bias allowing more rapid adaptation at lower reference measurement rates. Fully unsupervised adaptation using FLEXFIS-PLS could improve predictive accuracy significantly for light drifts but was not able to fully compensate for prediction bias in case of significant lack of fit w.r.t. the latent variable space. Copyright © 2018 Elsevier B.V. All rights reserved.
Vibration and acoustic frequency spectra for industrial process modeling using selective fusion multi-condition samples and multi-source features

Science.gov (United States)

Tang, Jian; Qiao, Junfei; Wu, ZhiWei; Chai, Tianyou; Zhang, Jian; Yu, Wen

2018-01-01

Frequency spectral data of mechanical vibration and acoustic signals relate to difficult-to-measure production quality and quantity parameters of complex industrial processes. A selective ensemble (SEN) algorithm can be used to build a soft sensor model of these process parameters by fusing valued information selectively from different perspectives. However, a combination of several optimized ensemble sub-models with SEN cannot guarantee the best prediction model. In this study, we use several techniques to construct mechanical vibration and acoustic frequency spectra of a data-driven industrial process parameter model based on selective fusion multi-condition samples and multi-source features. Multi-layer SEN (MLSEN) strategy is used to simulate the domain expert cognitive process. Genetic algorithm and kernel partial least squares are used to construct the inside-layer SEN sub-model based on each mechanical vibration and acoustic frequency spectral feature subset. Branch-and-bound and adaptive weighted fusion algorithms are integrated to select and combine outputs of the inside-layer SEN sub-models. Then, the outside-layer SEN is constructed. Thus, "sub-sampling training examples"-based and "manipulating input features"-based ensemble construction methods are integrated, thereby realizing the selective information fusion process based on multi-condition history samples and multi-source input features. This novel approach is applied to a laboratory-scale ball mill grinding process. A comparison with other methods indicates that the proposed MLSEN approach effectively models mechanical vibration and acoustic signals.
Estimation of reference intervals from small samples: an example using canine plasma creatinine.

Science.gov (United States)

Geffré, A; Braun, J P; Trumel, C; Concordet, D

2009-12-01

According to international recommendations, reference intervals should be determined from at least 120 reference individuals, which often are impossible to achieve in veterinary clinical pathology, especially for wild animals. When only a small number of reference subjects is available, the possible bias cannot be known and the normality of the distribution cannot be evaluated. A comparison of reference intervals estimated by different methods could be helpful. The purpose of this study was to compare reference limits determined from a large set of canine plasma creatinine reference values, and large subsets of this data, with estimates obtained from small samples selected randomly. Twenty sets each of 120 and 27 samples were randomly selected from a set of 1439 plasma creatinine results obtained from healthy dogs in another study. Reference intervals for the whole sample and for the large samples were determined by a nonparametric method. The estimated reference limits for the small samples were minimum and maximum, mean +/- 2 SD of native and Box-Cox-transformed values, 2.5th and 97.5th percentiles by a robust method on native and Box-Cox-transformed values, and estimates from diagrams of cumulative distribution functions. The whole sample had a heavily skewed distribution, which approached Gaussian after Box-Cox transformation. The reference limits estimated from small samples were highly variable. The closest estimates to the 1439-result reference interval for 27-result subsamples were obtained by both parametric and robust methods after Box-Cox transformation but were grossly erroneous in some cases. For small samples, it is recommended that all values be reported graphically in a dot plot or histogram and that estimates of the reference limits be compared using different methods.

Estimates of live-tree carbon stores in the Pacific Northwest are sensitive to model selection

Science.gov (United States)

Susanna L. Melson; Mark E. Harmon; Jeremy S. Fried; James B. Domingo

2011-01-01

Estimates of live-tree carbon stores are influenced by numerous uncertainties. One of them is model-selection uncertainty: one has to choose among multiple empirical equations and conversion factors that can be plausibly justified as locally applicable to calculate the carbon store from inventory measurements such as tree height and diameter at breast height (DBH)....
Estimation of river and stream temperature trends under haphazard sampling

Science.gov (United States)

Gray, Brian R.; Lyubchich, Vyacheslav; Gel, Yulia R.; Rogala, James T.; Robertson, Dale M.; Wei, Xiaoqiao

2015-01-01

Long-term temporal trends in water temperature in rivers and streams are typically estimated under the assumption of evenly-spaced space-time measurements. However, sampling times and dates associated with historical water temperature datasets and some sampling designs may be haphazard. As a result, trends in temperature may be confounded with trends in time or space of sampling which, in turn, may yield biased trend estimators and thus unreliable conclusions. We address this concern using multilevel (hierarchical) linear models, where time effects are allowed to vary randomly by day and date effects by year. We evaluate the proposed approach by Monte Carlo simulations with imbalance, sparse data and confounding by trend in time and date of sampling. Simulation results indicate unbiased trend estimators while results from a case study of temperature data from the Illinois River, USA conform to river thermal assumptions. We also propose a new nonparametric bootstrap inference on multilevel models that allows for a relatively flexible and distribution-free quantification of uncertainties. The proposed multilevel modeling approach may be elaborated to accommodate nonlinearities within days and years when sampling times or dates typically span temperature extremes.
Optimal Tuner Selection for Kalman-Filter-Based Aircraft Engine Performance Estimation

Science.gov (United States)

Simon, Donald L.; Garg, Sanjay

2011-01-01

An emerging approach in the field of aircraft engine controls and system health management is the inclusion of real-time, onboard models for the inflight estimation of engine performance variations. This technology, typically based on Kalman-filter concepts, enables the estimation of unmeasured engine performance parameters that can be directly utilized by controls, prognostics, and health-management applications. A challenge that complicates this practice is the fact that an aircraft engine s performance is affected by its level of degradation, generally described in terms of unmeasurable health parameters such as efficiencies and flow capacities related to each major engine module. Through Kalman-filter-based estimation techniques, the level of engine performance degradation can be estimated, given that there are at least as many sensors as health parameters to be estimated. However, in an aircraft engine, the number of sensors available is typically less than the number of health parameters, presenting an under-determined estimation problem. A common approach to address this shortcoming is to estimate a subset of the health parameters, referred to as model tuning parameters. The problem/objective is to optimally select the model tuning parameters to minimize Kalman-filterbased estimation error. A tuner selection technique has been developed that specifically addresses the under-determined estimation problem, where there are more unknown parameters than available sensor measurements. A systematic approach is applied to produce a model tuning parameter vector of appropriate dimension to enable estimation by a Kalman filter, while minimizing the estimation error in the parameters of interest. Tuning parameter selection is performed using a multi-variable iterative search routine that seeks to minimize the theoretical mean-squared estimation error of the Kalman filter. This approach can significantly reduce the error in onboard aircraft engine parameter estimation
Sampling and estimating recreational use.

Science.gov (United States)

Timothy G. Gregoire; Gregory J. Buhyoff

1999-01-01

Probability sampling methods applicable to estimate recreational use are presented. Both single- and multiple-access recreation sites are considered. One- and two-stage sampling methods are presented. Estimation of recreational use is presented in a series of examples.
A comparison of small-area estimation techniques to estimate selected stand attributes using LiDAR-derived auxiliary variables

Science.gov (United States)

Michael E. Goerndt; Vicente J. Monleon; Hailemariam. Temesgen

2011-01-01

One of the challenges often faced in forestry is the estimation of forest attributes for smaller areas of interest within a larger population. Small-area estimation (SAE) is a set of techniques well suited to estimation of forest attributes for small areas in which the existing sample size is small and auxiliary information is available. Selected SAE methods were...
Are we under-estimating the association between autism symptoms?: The importance of considering simultaneous selection when using samples of individuals who meet diagnostic criteria for an autism spectrum disorder.

Science.gov (United States)

Murray, Aja Louise; McKenzie, Karen; Kuenssberg, Renate; O'Donnell, Michael

2014-11-01

The magnitude of symptom inter-correlations in diagnosed individuals has contributed to the evidence that autism spectrum disorders (ASD) is a fractionable disorder. Such correlations may substantially under-estimate the population correlations among symptoms due to simultaneous selection on the areas of deficit required for diagnosis. Using statistical simulations of this selection mechanism, we provide estimates of the extent of this bias, given different levels of population correlation between symptoms. We then use real data to compare domain inter-correlations in the Autism Spectrum Quotient, in those with ASD versus a combined ASD and non-ASD sample. Results from both studies indicate that samples restricted to individuals with a diagnosis of ASD potentially substantially under-estimate the magnitude of association between features of ASD.
Estimation of AUC or Partial AUC under Test-Result-Dependent Sampling.

Science.gov (United States)

Wang, Xiaofei; Ma, Junling; George, Stephen; Zhou, Haibo

2012-01-01

The area under the ROC curve (AUC) and partial area under the ROC curve (pAUC) are summary measures used to assess the accuracy of a biomarker in discriminating true disease status. The standard sampling approach used in biomarker validation studies is often inefficient and costly, especially when ascertaining the true disease status is costly and invasive. To improve efficiency and reduce the cost of biomarker validation studies, we consider a test-result-dependent sampling (TDS) scheme, in which subject selection for determining the disease state is dependent on the result of a biomarker assay. We first estimate the test-result distribution using data arising from the TDS design. With the estimated empirical test-result distribution, we propose consistent nonparametric estimators for AUC and pAUC and establish the asymptotic properties of the proposed estimators. Simulation studies show that the proposed estimators have good finite sample properties and that the TDS design yields more efficient AUC and pAUC estimates than a simple random sampling (SRS) design. A data example based on an ongoing cancer clinical trial is provided to illustrate the TDS design and the proposed estimators. This work can find broad applications in design and analysis of biomarker validation studies.
Iterative importance sampling algorithms for parameter estimation

OpenAIRE

Morzfeld, Matthias; Day, Marcus S.; Grout, Ray W.; Pau, George Shu Heng; Finsterle, Stefan A.; Bell, John B.

2016-01-01

In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov Chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is ...
Environmental risk assessment of selected organic chemicals based on TOC test and QSAR estimation models.

Science.gov (United States)

Chi, Yulang; Zhang, Huanteng; Huang, Qiansheng; Lin, Yi; Ye, Guozhu; Zhu, Huimin; Dong, Sijun

2018-02-01

Environmental risks of organic chemicals have been greatly determined by their persistence, bioaccumulation, and toxicity (PBT) and physicochemical properties. Major regulations in different countries and regions identify chemicals according to their bioconcentration factor (BCF) and octanol-water partition coefficient (Kow), which frequently displays a substantial correlation with the sediment sorption coefficient (Koc). Half-life or degradability is crucial for the persistence evaluation of chemicals. Quantitative structure activity relationship (QSAR) estimation models are indispensable for predicting environmental fate and health effects in the absence of field- or laboratory-based data. In this study, 39 chemicals of high concern were chosen for half-life testing based on total organic carbon (TOC) degradation, and two widely accepted and highly used QSAR estimation models (i.e., EPI Suite and PBT Profiler) were adopted for environmental risk evaluation. The experimental results and estimated data, as well as the two model-based results were compared, based on the water solubility, Kow, Koc, BCF and half-life. Environmental risk assessment of the selected compounds was achieved by combining experimental data and estimation models. It was concluded that both EPI Suite and PBT Profiler were fairly accurate in measuring the physicochemical properties and degradation half-lives for water, soil, and sediment. However, the half-lives between the experimental and the estimated results were still not absolutely consistent. This suggests deficiencies of the prediction models in some ways, and the necessity to combine the experimental data and predicted results for the evaluation of environmental fate and risks of pollutants. Copyright © 2016. Published by Elsevier B.V.
Monte Carlo simulation for uncertainty estimation on structural data in implicit 3-D geological modeling, a guide for disturbance distribution selection and parameterization

Science.gov (United States)

Pakyuz-Charrier, Evren; Lindsay, Mark; Ogarko, Vitaliy; Giraud, Jeremie; Jessell, Mark

2018-04-01

Three-dimensional (3-D) geological structural modeling aims to determine geological information in a 3-D space using structural data (foliations and interfaces) and topological rules as inputs. This is necessary in any project in which the properties of the subsurface matters; they express our understanding of geometries in depth. For that reason, 3-D geological models have a wide range of practical applications including but not restricted to civil engineering, the oil and gas industry, the mining industry, and water management. These models, however, are fraught with uncertainties originating from the inherent flaws of the modeling engines (working hypotheses, interpolator's parameterization) and the inherent lack of knowledge in areas where there are no observations combined with input uncertainty (observational, conceptual and technical errors). Because 3-D geological models are often used for impactful decision-making it is critical that all 3-D geological models provide accurate estimates of uncertainty. This paper's focus is set on the effect of structural input data measurement uncertainty propagation in implicit 3-D geological modeling. This aim is achieved using Monte Carlo simulation for uncertainty estimation (MCUE), a stochastic method which samples from predefined disturbance probability distributions that represent the uncertainty of the original input data set. MCUE is used to produce hundreds to thousands of altered unique data sets. The altered data sets are used as inputs to produce a range of plausible 3-D models. The plausible models are then combined into a single probabilistic model as a means to propagate uncertainty from the input data to the final model. In this paper, several improved methods for MCUE are proposed. The methods pertain to distribution selection for input uncertainty, sample analysis and statistical consistency of the sampled distribution. Pole vector sampling is proposed as a more rigorous alternative than dip vector
Monte Carlo simulation for uncertainty estimation on structural data in implicit 3-D geological modeling, a guide for disturbance distribution selection and parameterization

Directory of Open Access Journals (Sweden)

E. Pakyuz-Charrier

2018-04-01

Full Text Available Three-dimensional (3-D geological structural modeling aims to determine geological information in a 3-D space using structural data (foliations and interfaces and topological rules as inputs. This is necessary in any project in which the properties of the subsurface matters; they express our understanding of geometries in depth. For that reason, 3-D geological models have a wide range of practical applications including but not restricted to civil engineering, the oil and gas industry, the mining industry, and water management. These models, however, are fraught with uncertainties originating from the inherent flaws of the modeling engines (working hypotheses, interpolator's parameterization and the inherent lack of knowledge in areas where there are no observations combined with input uncertainty (observational, conceptual and technical errors. Because 3-D geological models are often used for impactful decision-making it is critical that all 3-D geological models provide accurate estimates of uncertainty. This paper's focus is set on the effect of structural input data measurement uncertainty propagation in implicit 3-D geological modeling. This aim is achieved using Monte Carlo simulation for uncertainty estimation (MCUE, a stochastic method which samples from predefined disturbance probability distributions that represent the uncertainty of the original input data set. MCUE is used to produce hundreds to thousands of altered unique data sets. The altered data sets are used as inputs to produce a range of plausible 3-D models. The plausible models are then combined into a single probabilistic model as a means to propagate uncertainty from the input data to the final model. In this paper, several improved methods for MCUE are proposed. The methods pertain to distribution selection for input uncertainty, sample analysis and statistical consistency of the sampled distribution. Pole vector sampling is proposed as a more rigorous alternative than
inverse gaussian model for small area estimation via gibbs sampling

African Journals Online (AJOL)

ADMIN

For example, MacGibbon and Tomberlin. (1989) have considered estimating small area rates and binomial parameters using empirical Bayes methods. Stroud (1991) used hierarchical Bayes approach for univariate natural exponential families with quadratic variance functions in sample survey applications, while Chaubey ...
The Influence of Mark-Recapture Sampling Effort on Estimates of Rock Lobster Survival.

Directory of Open Access Journals (Sweden)

Ziya Kordjazi

Full Text Available Five annual capture-mark-recapture surveys on Jasus edwardsii were used to evaluate the effect of sample size and fishing effort on the precision of estimated survival probability. Datasets of different numbers of individual lobsters (ranging from 200 to 1,000 lobsters were created by random subsampling from each annual survey. This process of random subsampling was also used to create 12 datasets of different levels of effort based on three levels of the number of traps (15, 30 and 50 traps per day and four levels of the number of sampling-days (2, 4, 6 and 7 days. The most parsimonious Cormack-Jolly-Seber (CJS model for estimating survival probability shifted from a constant model towards sex-dependent models with increasing sample size and effort. A sample of 500 lobsters or 50 traps used on four consecutive sampling-days was required for obtaining precise survival estimations for males and females, separately. Reduced sampling effort of 30 traps over four sampling days was sufficient if a survival estimate for both sexes combined was sufficient for management of the fishery.
The Influence of Mark-Recapture Sampling Effort on Estimates of Rock Lobster Survival

Science.gov (United States)

Kordjazi, Ziya; Frusher, Stewart; Buxton, Colin; Gardner, Caleb; Bird, Tomas

2016-01-01

Five annual capture-mark-recapture surveys on Jasus edwardsii were used to evaluate the effect of sample size and fishing effort on the precision of estimated survival probability. Datasets of different numbers of individual lobsters (ranging from 200 to 1,000 lobsters) were created by random subsampling from each annual survey. This process of random subsampling was also used to create 12 datasets of different levels of effort based on three levels of the number of traps (15, 30 and 50 traps per day) and four levels of the number of sampling-days (2, 4, 6 and 7 days). The most parsimonious Cormack-Jolly-Seber (CJS) model for estimating survival probability shifted from a constant model towards sex-dependent models with increasing sample size and effort. A sample of 500 lobsters or 50 traps used on four consecutive sampling-days was required for obtaining precise survival estimations for males and females, separately. Reduced sampling effort of 30 traps over four sampling days was sufficient if a survival estimate for both sexes combined was sufficient for management of the fishery. PMID:26990561
Stability Analysis for Li-Ion Battery Model Parameters and State of Charge Estimation by Measurement Uncertainty Consideration

Directory of Open Access Journals (Sweden)

Shifei Yuan

2015-07-01

Full Text Available Accurate estimation of model parameters and state of charge (SoC is crucial for the lithium-ion battery management system (BMS. In this paper, the stability of the model parameters and SoC estimation under measurement uncertainty is evaluated by three different factors: (i sampling periods of 1/0.5/0.1 s; (ii current sensor precisions of ±5/±50/±500 mA; and (iii voltage sensor precisions of ±1/±2.5/±5 mV. Firstly, the numerical model stability analysis and parametric sensitivity analysis for battery model parameters are conducted under sampling frequency of 1–50 Hz. The perturbation analysis is theoretically performed of current/voltage measurement uncertainty on model parameter variation. Secondly, the impact of three different factors on the model parameters and SoC estimation was evaluated with the federal urban driving sequence (FUDS profile. The bias correction recursive least square (CRLS and adaptive extended Kalman filter (AEKF algorithm were adopted to estimate the model parameters and SoC jointly. Finally, the simulation results were compared and some insightful findings were concluded. For the given battery model and parameter estimation algorithm, the sampling period, and current/voltage sampling accuracy presented a non-negligible effect on the estimation results of model parameters. This research revealed the influence of the measurement uncertainty on the model parameter estimation, which will provide the guidelines to select a reasonable sampling period and the current/voltage sensor sampling precisions in engineering applications.
The effect of selection on genetic parameter estimates

African Journals Online (AJOL)

Unknown

The South African Journal of Animal Science is available online at ... A simulation study was carried out to investigate the effect of selection on the estimation of genetic ... The model contained a fixed effect, random genetic and random.
Inverse Gaussian model for small area estimation via Gibbs sampling

African Journals Online (AJOL)

We present a Bayesian method for estimating small area parameters under an inverse Gaussian model. The method is extended to estimate small area parameters for finite populations. The Gibbs sampler is proposed as a mechanism for implementing the Bayesian paradigm. We illustrate the method by application to ...
Improving snow density estimation for mapping SWE with Lidar snow depth: assessment of uncertainty in modeled density and field sampling strategies in NASA SnowEx

Science.gov (United States)

Raleigh, M. S.; Smyth, E.; Small, E. E.

2017-12-01

The spatial distribution of snow water equivalent (SWE) is not sufficiently monitored with either remotely sensed or ground-based observations for water resources management. Recent applications of airborne Lidar have yielded basin-wide mapping of SWE when combined with a snow density model. However, in the absence of snow density observations, the uncertainty in these SWE maps is dominated by uncertainty in modeled snow density rather than in Lidar measurement of snow depth. Available observations tend to have a bias in physiographic regime (e.g., flat open areas) and are often insufficient in number to support testing of models across a range of conditions. Thus, there is a need for targeted sampling strategies and controlled model experiments to understand where and why different snow density models diverge. This will enable identification of robust model structures that represent dominant processes controlling snow densification, in support of basin-scale estimation of SWE with remotely-sensed snow depth datasets. The NASA SnowEx mission is a unique opportunity to evaluate sampling strategies of snow density and to quantify and reduce uncertainty in modeled snow density. In this presentation, we present initial field data analyses and modeling results over the Colorado SnowEx domain in the 2016-2017 winter campaign. We detail a framework for spatially mapping the uncertainty in snowpack density, as represented across multiple models. Leveraging the modular SUMMA model, we construct a series of physically-based models to assess systematically the importance of specific process representations to snow density estimates. We will show how models and snow pit observations characterize snow density variations with forest cover in the SnowEx domains. Finally, we will use the spatial maps of density uncertainty to evaluate the selected locations of snow pits, thereby assessing the adequacy of the sampling strategy for targeting uncertainty in modeled snow density.
An open-population hierarchical distance sampling model

Science.gov (United States)

Sollmann, Rachel; Beth Gardner,; Richard B Chandler,; Royle, J. Andrew; T Scott Sillett,

2015-01-01

Modeling population dynamics while accounting for imperfect detection is essential to monitoring programs. Distance sampling allows estimating population size while accounting for imperfect detection, but existing methods do not allow for direct estimation of demographic parameters. We develop a model that uses temporal correlation in abundance arising from underlying population dynamics to estimate demographic parameters from repeated distance sampling surveys. Using a simulation study motivated by designing a monitoring program for island scrub-jays (Aphelocoma insularis), we investigated the power of this model to detect population trends. We generated temporally autocorrelated abundance and distance sampling data over six surveys, using population rates of change of 0.95 and 0.90. We fit the data generating Markovian model and a mis-specified model with a log-linear time effect on abundance, and derived post hoc trend estimates from a model estimating abundance for each survey separately. We performed these analyses for varying number of survey points. Power to detect population changes was consistently greater under the Markov model than under the alternatives, particularly for reduced numbers of survey points. The model can readily be extended to more complex demographic processes than considered in our simulations. This novel framework can be widely adopted for wildlife population monitoring.
An open-population hierarchical distance sampling model.

Science.gov (United States)

Sollmann, Rahel; Gardner, Beth; Chandler, Richard B; Royle, J Andrew; Sillett, T Scott

2015-02-01

Modeling population dynamics while accounting for imperfect detection is essential to monitoring programs. Distance sampling allows estimating population size while accounting for imperfect detection, but existing methods do not allow for estimation of demographic parameters. We develop a model that uses temporal correlation in abundance arising from underlying population dynamics to estimate demographic parameters from repeated distance sampling surveys. Using a simulation study motivated by designing a monitoring program for Island Scrub-Jays (Aphelocoma insularis), we investigated the power of this model to detect population trends. We generated temporally autocorrelated abundance and distance sampling data over six surveys, using population rates of change of 0.95 and 0.90. We fit the data generating Markovian model and a mis-specified model with a log-linear time effect on abundance, and derived post hoc trend estimates from a model estimating abundance for each survey separately. We performed these analyses for varying numbers of survey points. Power to detect population changes was consistently greater under the Markov model than under the alternatives, particularly for reduced numbers of survey points. The model can readily be extended to more complex demographic processes than considered in our simulations. This novel framework can be widely adopted for wildlife population monitoring.

Network Model-Assisted Inference from Respondent-Driven Sampling Data.

Science.gov (United States)

Gile, Krista J; Handcock, Mark S

2015-06-01

Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population.
Augmented switching linear dynamical system model for gas concentration estimation with MOX sensors in an open sampling system.

Science.gov (United States)

Di Lello, Enrico; Trincavelli, Marco; Bruyninckx, Herman; De Laet, Tinne

2014-07-11

In this paper, we introduce a Bayesian time series model approach for gas concentration estimation using Metal Oxide (MOX) sensors in Open Sampling System (OSS). Our approach focuses on the compensation of the slow response of MOX sensors, while concurrently solving the problem of estimating the gas concentration in OSS. The proposed Augmented Switching Linear System model allows to include all the sources of uncertainty arising at each step of the problem in a single coherent probabilistic formulation. In particular, the problem of detecting on-line the current sensor dynamical regime and estimating the underlying gas concentration under environmental disturbances and noisy measurements is formulated and solved as a statistical inference problem. Our model improves, with respect to the state of the art, where system modeling approaches have been already introduced, but only provided an indirect relative measures proportional to the gas concentration and the problem of modeling uncertainty was ignored. Our approach is validated experimentally and the performances in terms of speed of and quality of the gas concentration estimation are compared with the ones obtained using a photo-ionization detector.
A guide to developing resource selection functions from telemetry data using generalized estimating equations and generalized linear mixed models

Directory of Open Access Journals (Sweden)

Nicola Koper

2012-03-01

Full Text Available Resource selection functions (RSF are often developed using satellite (ARGOS or Global Positioning System (GPS telemetry datasets, which provide a large amount of highly correlated data. We discuss and compare the use of generalized linear mixed-effects models (GLMM and generalized estimating equations (GEE for using this type of data to develop RSFs. GLMMs directly model differences among caribou, while GEEs depend on an adjustment of the standard error to compensate for correlation of data points within individuals. Empirical standard errors, rather than model-based standard errors, must be used with either GLMMs or GEEs when developing RSFs. There are several important differences between these approaches; in particular, GLMMs are best for producing parameter estimates that predict how management might influence individuals, while GEEs are best for predicting how management might influence populations. As the interpretation, value, and statistical significance of both types of parameter estimates differ, it is important that users select the appropriate analytical method. We also outline the use of k-fold cross validation to assess fit of these models. Both GLMMs and GEEs hold promise for developing RSFs as long as they are used appropriately.
A Bayesian random effects discrete-choice model for resource selection: Population-level selection inference

Science.gov (United States)

Thomas, D.L.; Johnson, D.; Griffith, B.

2006-01-01

Modeling the probability of use of land units characterized by discrete and continuous measures, we present a Bayesian random-effects model to assess resource selection. This model provides simultaneous estimation of both individual- and population-level selection. Deviance information criterion (DIC), a Bayesian alternative to AIC that is sample-size specific, is used for model selection. Aerial radiolocation data from 76 adult female caribou (Rangifer tarandus) and calf pairs during 1 year on an Arctic coastal plain calving ground were used to illustrate models and assess population-level selection of landscape attributes, as well as individual heterogeneity of selection. Landscape attributes included elevation, NDVI (a measure of forage greenness), and land cover-type classification. Results from the first of a 2-stage model-selection procedure indicated that there is substantial heterogeneity among cow-calf pairs with respect to selection of the landscape attributes. In the second stage, selection of models with heterogeneity included indicated that at the population-level, NDVI and land cover class were significant attributes for selection of different landscapes by pairs on the calving ground. Population-level selection coefficients indicate that the pairs generally select landscapes with higher levels of NDVI, but the relationship is quadratic. The highest rate of selection occurs at values of NDVI less than the maximum observed. Results for land cover-class selections coefficients indicate that wet sedge, moist sedge, herbaceous tussock tundra, and shrub tussock tundra are selected at approximately the same rate, while alpine and sparsely vegetated landscapes are selected at a lower rate. Furthermore, the variability in selection by individual caribou for moist sedge and sparsely vegetated landscapes is large relative to the variability in selection of other land cover types. The example analysis illustrates that, while sometimes computationally intense, a
A simple algorithm to estimate genetic variance in an animal threshold model using Bayesian inference Genetics Selection Evolution 2010, 42:29

DEFF Research Database (Denmark)

Ødegård, Jørgen; Meuwissen, Theo HE; Heringstad, Bjørg

2010-01-01

Background In the genetic analysis of binary traits with one observation per animal, animal threshold models frequently give biased heritability estimates. In some cases, this problem can be circumvented by fitting sire- or sire-dam models. However, these models are not appropriate in cases where...... records exist for the parents). Furthermore, the new algorithm showed much faster Markov chain mixing properties for genetic parameters (similar to the sire-dam model). Conclusions The new algorithm to estimate genetic parameters via Gibbs sampling solves the bias problems typically occurring in animal...... individual records exist on parents. Therefore, the aim of our study was to develop a new Gibbs sampling algorithm for a proper estimation of genetic (co)variance components within an animal threshold model framework. Methods In the proposed algorithm, individuals are classified as either "informative...
TH-EF-BRA-08: A Novel Technique for Estimating Volumetric Cine MRI (VC-MRI) From Multi-Slice Sparsely Sampled Cine Images Using Motion Modeling and Free Form Deformation

International Nuclear Information System (INIS)

Harris, W; Yin, F; Wang, C; Chang, Z; Cai, J; Zhang, Y; Ren, L

2016-01-01

Purpose: To develop a technique to estimate on-board VC-MRI using multi-slice sparsely-sampled cine images, patient prior 4D-MRI, motion-modeling and free-form deformation for real-time 3D target verification of lung radiotherapy. Methods: A previous method has been developed to generate on-board VC-MRI by deforming prior MRI images based on a motion model(MM) extracted from prior 4D-MRI and a single-slice on-board 2D-cine image. In this study, free-form deformation(FD) was introduced to correct for errors in the MM when large anatomical changes exist. Multiple-slice sparsely-sampled on-board 2D-cine images located within the target are used to improve both the estimation accuracy and temporal resolution of VC-MRI. The on-board 2D-cine MRIs are acquired at 20–30frames/s by sampling only 10% of the k-space on Cartesian grid, with 85% of that taken at the central k-space. The method was evaluated using XCAT(computerized patient model) simulation of lung cancer patients with various anatomical and respirational changes from prior 4D-MRI to onboard volume. The accuracy was evaluated using Volume-Percent-Difference(VPD) and Center-of-Mass-Shift(COMS) of the estimated tumor volume. Effects of region-of-interest(ROI) selection, 2D-cine slice orientation, slice number and slice location on the estimation accuracy were evaluated. Results: VCMRI estimated using 10 sparsely-sampled sagittal 2D-cine MRIs achieved VPD/COMS of 9.07±3.54%/0.45±0.53mm among all scenarios based on estimation with ROI_MM-ROI_FD. The FD optimization improved estimation significantly for scenarios with anatomical changes. Using ROI-FD achieved better estimation than global-FD. Changing the multi-slice orientation to axial, coronal, and axial/sagittal orthogonal reduced the accuracy of VCMRI to VPD/COMS of 19.47±15.74%/1.57±2.54mm, 20.70±9.97%/2.34±0.92mm, and 16.02±13.79%/0.60±0.82mm, respectively. Reducing the number of cines to 8 enhanced temporal resolution of VC-MRI by 25% while
TH-EF-BRA-08: A Novel Technique for Estimating Volumetric Cine MRI (VC-MRI) From Multi-Slice Sparsely Sampled Cine Images Using Motion Modeling and Free Form Deformation

Energy Technology Data Exchange (ETDEWEB)

Harris, W; Yin, F; Wang, C; Chang, Z; Cai, J; Zhang, Y; Ren, L [Duke University Medical Center, Durham, NC (United States)

2016-06-15

Purpose: To develop a technique to estimate on-board VC-MRI using multi-slice sparsely-sampled cine images, patient prior 4D-MRI, motion-modeling and free-form deformation for real-time 3D target verification of lung radiotherapy. Methods: A previous method has been developed to generate on-board VC-MRI by deforming prior MRI images based on a motion model(MM) extracted from prior 4D-MRI and a single-slice on-board 2D-cine image. In this study, free-form deformation(FD) was introduced to correct for errors in the MM when large anatomical changes exist. Multiple-slice sparsely-sampled on-board 2D-cine images located within the target are used to improve both the estimation accuracy and temporal resolution of VC-MRI. The on-board 2D-cine MRIs are acquired at 20–30frames/s by sampling only 10% of the k-space on Cartesian grid, with 85% of that taken at the central k-space. The method was evaluated using XCAT(computerized patient model) simulation of lung cancer patients with various anatomical and respirational changes from prior 4D-MRI to onboard volume. The accuracy was evaluated using Volume-Percent-Difference(VPD) and Center-of-Mass-Shift(COMS) of the estimated tumor volume. Effects of region-of-interest(ROI) selection, 2D-cine slice orientation, slice number and slice location on the estimation accuracy were evaluated. Results: VCMRI estimated using 10 sparsely-sampled sagittal 2D-cine MRIs achieved VPD/COMS of 9.07±3.54%/0.45±0.53mm among all scenarios based on estimation with ROI-MM-ROI-FD. The FD optimization improved estimation significantly for scenarios with anatomical changes. Using ROI-FD achieved better estimation than global-FD. Changing the multi-slice orientation to axial, coronal, and axial/sagittal orthogonal reduced the accuracy of VCMRI to VPD/COMS of 19.47±15.74%/1.57±2.54mm, 20.70±9.97%/2.34±0.92mm, and 16.02±13.79%/0.60±0.82mm, respectively. Reducing the number of cines to 8 enhanced temporal resolution of VC-MRI by 25% while
Monte Carlo Simulation Of The Portfolio-Balance Model Of Exchange Rates: Finite Sample Properties Of The GMM Estimator

OpenAIRE

Hong-Ghi Min

2011-01-01

Using Monte Carlo simulation of the Portfolio-balance model of the exchange rates, we report finite sample properties of the GMM estimator for testing over-identifying restrictions in the simultaneous equations model. F-form of Sargans statistic performs better than its chi-squared form while Hansens GMM statistic has the smallest bias.
An efficient modularized sample-based method to estimate the first-order Sobol' index

International Nuclear Information System (INIS)

Li, Chenzhao; Mahadevan, Sankaran

2016-01-01

Sobol' index is a prominent methodology in global sensitivity analysis. This paper aims to directly estimate the Sobol' index based only on available input–output samples, even if the underlying model is unavailable. For this purpose, a new method to calculate the first-order Sobol' index is proposed. The innovation is that the conditional variance and mean in the formula of the first-order index are calculated at an unknown but existing location of model inputs, instead of an explicit user-defined location. The proposed method is modularized in two aspects: 1) index calculations for different model inputs are separate and use the same set of samples; and 2) model input sampling, model evaluation, and index calculation are separate. Due to this modularization, the proposed method is capable to compute the first-order index if only input–output samples are available but the underlying model is unavailable, and its computational cost is not proportional to the dimension of the model inputs. In addition, the proposed method can also estimate the first-order index with correlated model inputs. Considering that the first-order index is a desired metric to rank model inputs but current methods can only handle independent model inputs, the proposed method contributes to fill this gap. - Highlights: • An efficient method to estimate the first-order Sobol' index. • Estimate the index from input–output samples directly. • Computational cost is not proportional to the number of model inputs. • Handle both uncorrelated and correlated model inputs.
Network Model-Assisted Inference from Respondent-Driven Sampling Data

Science.gov (United States)

Gile, Krista J.; Handcock, Mark S.

2015-01-01

Summary Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population. PMID:26640328
Estimation of breeding values using selected pedigree records.

Science.gov (United States)

Morton, Richard; Howarth, Jordan M

2005-06-01

Fish bred in tanks or ponds cannot be easily tagged individually. The parentage of any individual may be determined by DNA fingerprinting, but is sufficiently expensive that large numbers cannot be so finger-printed. The measurement of the objective trait can be made on a much larger sample relatively cheaply. This article deals with experimental designs for selecting individuals to be finger-printed and for the estimation of the individual and family breeding values. The general setup provides estimates for both genetic effects regarded as fixed or random and for fixed effects due to known regressors. The family effects can be well estimated when even very small numbers are finger-printed, provided that they are the individuals with the most extreme phenotypes.
Estimating uncertainty in multivariate responses to selection.

Science.gov (United States)

Stinchcombe, John R; Simonsen, Anna K; Blows, Mark W

2014-04-01

Predicting the responses to natural selection is one of the key goals of evolutionary biology. Two of the challenges in fulfilling this goal have been the realization that many estimates of natural selection might be highly biased by environmentally induced covariances between traits and fitness, and that many estimated responses to selection do not incorporate or report uncertainty in the estimates. Here we describe the application of a framework that blends the merits of the Robertson-Price Identity approach and the multivariate breeder's equation to address these challenges. The approach allows genetic covariance matrices, selection differentials, selection gradients, and responses to selection to be estimated without environmentally induced bias, direct and indirect selection and responses to selection to be distinguished, and if implemented in a Bayesian-MCMC framework, statistically robust estimates of uncertainty on all of these parameters to be made. We illustrate our approach with a worked example of previously published data. More generally, we suggest that applying both the Robertson-Price Identity and the multivariate breeder's equation will facilitate hypothesis testing about natural selection, genetic constraints, and evolutionary responses. © 2013 The Author(s). Evolution © 2013 The Society for the Study of Evolution.
A probabilistic method for testing and estimating selection differences between populations.

Science.gov (United States)

He, Yungang; Wang, Minxian; Huang, Xin; Li, Ran; Xu, Hongyang; Xu, Shuhua; Jin, Li

2015-12-01

Human populations around the world encounter various environmental challenges and, consequently, develop genetic adaptations to different selection forces. Identifying the differences in natural selection between populations is critical for understanding the roles of specific genetic variants in evolutionary adaptation. Although numerous methods have been developed to detect genetic loci under recent directional selection, a probabilistic solution for testing and quantifying selection differences between populations is lacking. Here we report the development of a probabilistic method for testing and estimating selection differences between populations. By use of a probabilistic model of genetic drift and selection, we showed that logarithm odds ratios of allele frequencies provide estimates of the differences in selection coefficients between populations. The estimates approximate a normal distribution, and variance can be estimated using genome-wide variants. This allows us to quantify differences in selection coefficients and to determine the confidence intervals of the estimate. Our work also revealed the link between genetic association testing and hypothesis testing of selection differences. It therefore supplies a solution for hypothesis testing of selection differences. This method was applied to a genome-wide data analysis of Han and Tibetan populations. The results confirmed that both the EPAS1 and EGLN1 genes are under statistically different selection in Han and Tibetan populations. We further estimated differences in the selection coefficients for genetic variants involved in melanin formation and determined their confidence intervals between continental population groups. Application of the method to empirical data demonstrated the outstanding capability of this novel approach for testing and quantifying differences in natural selection. © 2015 He et al.; Published by Cold Spring Harbor Laboratory Press.
Selection of the Sample for Data-Driven $Z \\to \

CERN Document Server

Krauss, Martin

2009-01-01

The topic of this study was to improve the selection of the sample for data-driven Z → ν ν background estimation, which is a major contribution in supersymmetric searches in ̄ a no-lepton search mode. The data is based on Z → + − samples using data created with ATLAS simulation software. This method works if two leptons are reconstructed, but using cuts that are typical for SUSY searches reconstruction efficiency for electrons and muons is rather low. For this reason it was tried to enhance the data sample. Therefore events were considered, where only one electron was reconstructed. In this case the invariant mass for the electron and each jet was computed to select the jet with the best match for the Z boson mass as not reconstructed electron. This way the sample can be extended but significantly looses purity because of also reconstructed background events. To improve this method other variables have to be considered which were not available for this study. Applying a similar method to muons using ...
Selection of the optimal Box-Cox transformation parameter for modelling and forecasting age-specific fertility

OpenAIRE

Shang, Han Lin

2015-01-01

The Box-Cox transformation can sometimes yield noticeable improvements in model simplicity, variance homogeneity and precision of estimation, such as in modelling and forecasting age-specific fertility. Despite its importance, there have been few studies focusing on the optimal selection of Box-Cox transformation parameters in demographic forecasting. A simple method is proposed for selecting the optimal Box-Cox transformation parameter, along with an algorithm based on an in-sample forecast ...
Verification Techniques for Parameter Selection and Bayesian Model Calibration Presented for an HIV Model

Science.gov (United States)

Wentworth, Mami Tonoe

techniques for model calibration. For Bayesian model calibration, we employ adaptive Metropolis algorithms to construct densities for input parameters in the heat model and the HIV model. To quantify the uncertainty in the parameters, we employ two MCMC algorithms: Delayed Rejection Adaptive Metropolis (DRAM) [33] and Differential Evolution Adaptive Metropolis (DREAM) [66, 68]. The densities obtained using these methods are compared to those obtained through the direct numerical evaluation of the Bayes' formula. We also combine uncertainties in input parameters and measurement errors to construct predictive estimates for a model response. A significant emphasis is on the development and illustration of techniques to verify the accuracy of sampling-based Metropolis algorithms. We verify the accuracy of DRAM and DREAM by comparing chains, densities and correlations obtained using DRAM, DREAM and the direct evaluation of Bayes formula. We also perform similar analysis for credible and prediction intervals for responses. Once the parameters are estimated, we employ energy statistics test [63, 64] to compare the densities obtained by different methods for the HIV model. The energy statistics are used to test the equality of distributions. We also consider parameter selection and verification techniques for models having one or more parameters that are noninfluential in the sense that they minimally impact model outputs. We illustrate these techniques for a dynamic HIV model but note that the parameter selection and verification framework is applicable to a wide range of biological and physical models. To accommodate the nonlinear input to output relations, which are typical for such models, we focus on global sensitivity analysis techniques, including those based on partial correlations, Sobol indices based on second-order model representations, and Morris indices, as well as a parameter selection technique based on standard errors. A significant objective is to provide
Dealing with selection bias in educational transition models

DEFF Research Database (Denmark)

Holm, Anders; Jæger, Mads Meier

2011-01-01

This paper proposes the bivariate probit selection model (BPSM) as an alternative to the traditional Mare model for analyzing educational transitions. The BPSM accounts for selection on unobserved variables by allowing for unobserved variables which affect the probability of making educational tr...... account for selection on unobserved variables and high-quality data are both required in order to estimate credible educational transition models.......This paper proposes the bivariate probit selection model (BPSM) as an alternative to the traditional Mare model for analyzing educational transitions. The BPSM accounts for selection on unobserved variables by allowing for unobserved variables which affect the probability of making educational...... transitions to be correlated across transitions. We use simulated and real data to illustrate how the BPSM improves on the traditional Mare model in terms of correcting for selection bias and providing credible estimates of the effect of family background on educational success. We conclude that models which...
Estimating fluvial wood discharge from timelapse photography with varying sampling intervals

Science.gov (United States)

Anderson, N. K.

2013-12-01

There is recent focus on calculating wood budgets for streams and rivers to help inform management decisions, ecological studies and carbon/nutrient cycling models. Most work has measured in situ wood in temporary storage along stream banks or estimated wood inputs from banks. Little effort has been employed monitoring and quantifying wood in transport during high flows. This paper outlines a procedure for estimating total seasonal wood loads using non-continuous coarse interval sampling and examines differences in estimation between sampling at 1, 5, 10 and 15 minutes. Analysis is performed on wood transport for the Slave River in Northwest Territories, Canada. Relative to the 1 minute dataset, precision decreased by 23%, 46% and 60% for the 5, 10 and 15 minute datasets, respectively. Five and 10 minute sampling intervals provided unbiased equal variance estimates of 1 minute sampling, whereas 15 minute intervals were biased towards underestimation by 6%. Stratifying estimates by day and by discharge increased precision over non-stratification by 4% and 3%, respectively. Not including wood transported during ice break-up, the total minimum wood load estimated at this site is 3300 × 800$ m3 for the 2012 runoff season. The vast majority of the imprecision in total wood volumes came from variance in estimating average volume per log. Comparison of proportions and variance across sample intervals using bootstrap sampling to achieve equal n. Each trial was sampled for n=100, 10,000 times and averaged. All trials were then averaged to obtain an estimate for each sample interval. Dashed lines represent values from the one minute dataset.
Comparison of Three Plot Selection Methods for Estimating Change in Temporally Variable, Spatially Clustered Populations.

Energy Technology Data Exchange (ETDEWEB)

Thompson, William L. [Bonneville Power Administration, Portland, OR (US). Environment, Fish and Wildlife

2001-07-01

Monitoring population numbers is important for assessing trends and meeting various legislative mandates. However, sampling across time introduces a temporal aspect to survey design in addition to the spatial one. For instance, a sample that is initially representative may lose this attribute if there is a shift in numbers and/or spatial distribution in the underlying population that is not reflected in later sampled plots. Plot selection methods that account for this temporal variability will produce the best trend estimates. Consequently, I used simulation to compare bias and relative precision of estimates of population change among stratified and unstratified sampling designs based on permanent, temporary, and partial replacement plots under varying levels of spatial clustering, density, and temporal shifting of populations. Permanent plots produced more precise estimates of change than temporary plots across all factors. Further, permanent plots performed better than partial replacement plots except for high density (5 and 10 individuals per plot) and 25% - 50% shifts in the population. Stratified designs always produced less precise estimates of population change for all three plot selection methods, and often produced biased change estimates and greatly inflated variance estimates under sampling with partial replacement. Hence, stratification that remains fixed across time should be avoided when monitoring populations that are likely to exhibit large changes in numbers and/or spatial distribution during the study period. Key words: bias; change estimation; monitoring; permanent plots; relative precision; sampling with partial replacement; temporary plots.
Estimation of uranium in bioassay samples of occupational workers by laser fluorimetry

International Nuclear Information System (INIS)

Suja, A.; Prabhu, S.P.; Sawant, P.D.; Sarkar, P.K.; Tiwari, A.K.; Sharma, R.

2010-01-01

A newly established uranium processing facility has been commissioned at BARC, Trombay. Monitoring of occupational workers at regulars intervals is essential to assess intake of uranium by the workers in this facility. The design and engineering safety features of the plant are such that there is very low probability of uranium getting air borne during normal operations. However, the leakages from the system during routine maintenance of the plant may result in intake of uranium by workers. As per the new biokinetic model for uranium, 63% of uranium entering the blood stream gets directly excreted in urine. Therefore, bioassay monitoring (urinalysis) was recommended for these workers. A group of 21 workers was selected for bioassay monitoring to assess the existing urinary excretion levels of uranium before the commencement of actual work. For this purpose, sample collection kit along with an instruction slip was provided to the workers. Bioassay samples received were wet ashed with conc. nitric acid and hydrogen peroxide to break down the metabolized complexes of uranium and it was co-precipitated with calcium phosphate. Separation of uranium from the matrix was done using ion exchange technique and final activity quantification in these samples was done using laser fluorimeter (Quantalase, Model No. NFL/02). Calibration of the laser fluorimeter is done using 10 ppb uranium standard (WHO, France Ref. No. 180000). Verification of the system performance is done by measuring concentration of uranium in the standards (1 ppb to 100 ppb). Standard addition method was followed for estimation of uranium concentration in the samples. Uranyl ions present in the sample get excited by pulsed nitrogen laser at 337.1 nm, and on de-excitation emit fluorescence light (540 nm) intensity which is measured by the PMT. To estimate the uranium in the bioassay samples, a known aliquot of the sample was mixed with 5% sodium pyrophosphate and fluorescence intensity was measured

Regression model development and computational procedures to support estimation of real-time concentrations and loads of selected constituents in two tributaries to Lake Houston near Houston, Texas, 2005-9

Science.gov (United States)

Lee, Michael T.; Asquith, William H.; Oden, Timothy D.

2012-01-01

In December 2005, the U.S. Geological Survey (USGS), in cooperation with the City of Houston, Texas, began collecting discrete water-quality samples for nutrients, total organic carbon, bacteria (Escherichia coli and total coliform), atrazine, and suspended sediment at two USGS streamflow-gaging stations that represent watersheds contributing to Lake Houston (08068500 Spring Creek near Spring, Tex., and 08070200 East Fork San Jacinto River near New Caney, Tex.). Data from the discrete water-quality samples collected during 2005–9, in conjunction with continuously monitored real-time data that included streamflow and other physical water-quality properties (specific conductance, pH, water temperature, turbidity, and dissolved oxygen), were used to develop regression models for the estimation of concentrations of water-quality constituents of substantial source watersheds to Lake Houston. The potential explanatory variables included discharge (streamflow), specific conductance, pH, water temperature, turbidity, dissolved oxygen, and time (to account for seasonal variations inherent in some water-quality data). The response variables (the selected constituents) at each site were nitrite plus nitrate nitrogen, total phosphorus, total organic carbon, E. coli, atrazine, and suspended sediment. The explanatory variables provide easily measured quantities to serve as potential surrogate variables to estimate concentrations of the selected constituents through statistical regression. Statistical regression also facilitates accompanying estimates of uncertainty in the form of prediction intervals. Each regression model potentially can be used to estimate concentrations of a given constituent in real time. Among other regression diagnostics, the diagnostics used as indicators of general model reliability and reported herein include the adjusted R-squared, the residual standard error, residual plots, and p-values. Adjusted R-squared values for the Spring Creek models ranged
Optimal Tuner Selection for Kalman Filter-Based Aircraft Engine Performance Estimation

Science.gov (United States)

Simon, Donald L.; Garg, Sanjay

2010-01-01

A linear point design methodology for minimizing the error in on-line Kalman filter-based aircraft engine performance estimation applications is presented. This technique specifically addresses the underdetermined estimation problem, where there are more unknown parameters than available sensor measurements. A systematic approach is applied to produce a model tuning parameter vector of appropriate dimension to enable estimation by a Kalman filter, while minimizing the estimation error in the parameters of interest. Tuning parameter selection is performed using a multi-variable iterative search routine which seeks to minimize the theoretical mean-squared estimation error. This paper derives theoretical Kalman filter estimation error bias and variance values at steady-state operating conditions, and presents the tuner selection routine applied to minimize these values. Results from the application of the technique to an aircraft engine simulation are presented and compared to the conventional approach of tuner selection. Experimental simulation results are found to be in agreement with theoretical predictions. The new methodology is shown to yield a significant improvement in on-line engine performance estimation accuracy
Design-based estimators for snowball sampling

OpenAIRE

Shafie, Termeh

2010-01-01

Snowball sampling, where existing study subjects recruit further subjects from amongtheir acquaintances, is a popular approach when sampling from hidden populations.Since people with many in-links are more likely to be selected, there will be a selectionbias in the samples obtained. In order to eliminate this bias, the sample data must beweighted. However, the exact selection probabilities are unknown for snowball samplesand need to be approximated in an appropriate way. This paper proposes d...
Sensor Selection for Aircraft Engine Performance Estimation and Gas Path Fault Diagnostics

Science.gov (United States)

Simon, Donald L.; Rinehart, Aidan W.

2016-01-01

This paper presents analytical techniques for aiding system designers in making aircraft engine health management sensor selection decisions. The presented techniques, which are based on linear estimation and probability theory, are tailored for gas turbine engine performance estimation and gas path fault diagnostics applications. They enable quantification of the performance estimation and diagnostic accuracy offered by different candidate sensor suites. For performance estimation, sensor selection metrics are presented for two types of estimators including a Kalman filter and a maximum a posteriori estimator. For each type of performance estimator, sensor selection is based on minimizing the theoretical sum of squared estimation errors in health parameters representing performance deterioration in the major rotating modules of the engine. For gas path fault diagnostics, the sensor selection metric is set up to maximize correct classification rate for a diagnostic strategy that performs fault classification by identifying the fault type that most closely matches the observed measurement signature in a weighted least squares sense. Results from the application of the sensor selection metrics to a linear engine model are presented and discussed. Given a baseline sensor suite and a candidate list of optional sensors, an exhaustive search is performed to determine the optimal sensor suites for performance estimation and fault diagnostics. For any given sensor suite, Monte Carlo simulation results are found to exhibit good agreement with theoretical predictions of estimation and diagnostic accuracies.
A Heckman selection model for the safety analysis of signalized intersections.

Directory of Open Access Journals (Sweden)

Xuecai Xu

Full Text Available The objective of this paper is to provide a new method for estimating crash rate and severity simultaneously.This study explores a Heckman selection model of the crash rate and severity simultaneously at different levels and a two-step procedure is used to investigate the crash rate and severity levels. The first step uses a probit regression model to determine the sample selection process, and the second step develops a multiple regression model to simultaneously evaluate the crash rate and severity for slight injury/kill or serious injury (KSI, respectively. The model uses 555 observations from 262 signalized intersections in the Hong Kong metropolitan area, integrated with information on the traffic flow, geometric road design, road environment, traffic control and any crashes that occurred during two years.The results of the proposed two-step Heckman selection model illustrate the necessity of different crash rates for different crash severity levels.A comparison with the existing approaches suggests that the Heckman selection model offers an efficient and convenient alternative method for evaluating the safety performance at signalized intersections.
Predictive ability of genomic selection models for breeding value estimation on growth traits of Pacific white shrimp Litopenaeus vannamei

Science.gov (United States)

Wang, Quanchao; Yu, Yang; Li, Fuhua; Zhang, Xiaojun; Xiang, Jianhai

2017-09-01

Genomic selection (GS) can be used to accelerate genetic improvement by shortening the selection interval. The successful application of GS depends largely on the accuracy of the prediction of genomic estimated breeding value (GEBV). This study is a first attempt to understand the practicality of GS in Litopenaeus vannamei and aims to evaluate models for GS on growth traits. The performance of GS models in L. vannamei was evaluated in a population consisting of 205 individuals, which were genotyped for 6 359 single nucleotide polymorphism (SNP) markers by specific length amplified fragment sequencing (SLAF-seq) and phenotyped for body length and body weight. Three GS models (RR-BLUP, BayesA, and Bayesian LASSO) were used to obtain the GEBV, and their predictive ability was assessed by the reliability of the GEBV and the bias of the predicted phenotypes. The mean reliability of the GEBVs for body length and body weight predicted by the different models was 0.296 and 0.411, respectively. For each trait, the performances of the three models were very similar to each other with respect to predictability. The regression coefficients estimated by the three models were close to one, suggesting near to zero bias for the predictions. Therefore, when GS was applied in a L. vannamei population for the studied scenarios, all three models appeared practicable. Further analyses suggested that improved estimation of the genomic prediction could be realized by increasing the size of the training population as well as the density of SNPs.
Graph Sampling for Covariance Estimation

KAUST Repository

Chepuri, Sundeep Prabhakar

2017-04-25

In this paper the focus is on subsampling as well as reconstructing the second-order statistics of signals residing on nodes of arbitrary undirected graphs. Second-order stationary graph signals may be obtained by graph filtering zero-mean white noise and they admit a well-defined power spectrum whose shape is determined by the frequency response of the graph filter. Estimating the graph power spectrum forms an important component of stationary graph signal processing and related inference tasks such as Wiener prediction or inpainting on graphs. The central result of this paper is that by sampling a significantly smaller subset of vertices and using simple least squares, we can reconstruct the second-order statistics of the graph signal from the subsampled observations, and more importantly, without any spectral priors. To this end, both a nonparametric approach as well as parametric approaches including moving average and autoregressive models for the graph power spectrum are considered. The results specialize for undirected circulant graphs in that the graph nodes leading to the best compression rates are given by the so-called minimal sparse rulers. A near-optimal greedy algorithm is developed to design the subsampling scheme for the non-parametric and the moving average models, whereas a particular subsampling scheme that allows linear estimation for the autoregressive model is proposed. Numerical experiments on synthetic as well as real datasets related to climatology and processing handwritten digits are provided to demonstrate the developed theory.
Model Selection in Continuous Test Norming With GAMLSS.

Science.gov (United States)

Voncken, Lieke; Albers, Casper J; Timmerman, Marieke E

2017-06-01

To compute norms from reference group test scores, continuous norming is preferred over traditional norming. A suitable continuous norming approach for continuous data is the use of the Box-Cox Power Exponential model, which is found in the generalized additive models for location, scale, and shape. Applying the Box-Cox Power Exponential model for test norming requires model selection, but it is unknown how well this can be done with an automatic selection procedure. In a simulation study, we compared the performance of two stepwise model selection procedures combined with four model-fit criteria (Akaike information criterion, Bayesian information criterion, generalized Akaike information criterion (3), cross-validation), varying data complexity, sampling design, and sample size in a fully crossed design. The new procedure combined with one of the generalized Akaike information criterion was the most efficient model selection procedure (i.e., required the smallest sample size). The advocated model selection procedure is illustrated with norming data of an intelligence test.
Application of Bayesian methods to habitat selection modeling of the northern spotted owl in California: new statistical methods for wildlife research

Science.gov (United States)

Howard B. Stauffer; Cynthia J. Zabel; Jeffrey R. Dunk

2005-01-01

We compared a set of competing logistic regression habitat selection models for Northern Spotted Owls (Strix occidentalis caurina) in California. The habitat selection models were estimated, compared, evaluated, and tested using multiple sample datasets collected on federal forestlands in northern California. We used Bayesian methods in interpreting...
Estimating Sample Size for Usability Testing

Directory of Open Access Journals (Sweden)

Alex Cazañas

2017-02-01

Full Text Available One strategy used to assure that an interface meets user requirements is to conduct usability testing. When conducting such testing one of the unknowns is sample size. Since extensive testing is costly, minimizing the number of participants can contribute greatly to successful resource management of a project. Even though a significant number of models have been proposed to estimate sample size in usability testing, there is still not consensus on the optimal size. Several studies claim that 3 to 5 users suffice to uncover 80% of problems in a software interface. However, many other studies challenge this assertion. This study analyzed data collected from the user testing of a web application to verify the rule of thumb, commonly known as the “magic number 5”. The outcomes of the analysis showed that the 5-user rule significantly underestimates the required sample size to achieve reasonable levels of problem detection.
Selected Tether Applications Cost Model

Science.gov (United States)

Keeley, Michael G.

1988-01-01

Diverse cost-estimating techniques and data combined into single program. Selected Tether Applications Cost Model (STACOM 1.0) is interactive accounting software tool providing means for combining several independent cost-estimating programs into fully-integrated mathematical model capable of assessing costs, analyzing benefits, providing file-handling utilities, and putting out information in text and graphical forms to screen, printer, or plotter. Program based on Lotus 1-2-3, version 2.0. Developed to provide clear, concise traceability and visibility into methodology and rationale for estimating costs and benefits of operations of Space Station tether deployer system.
40 CFR 89.507 - Sample selection.

Science.gov (United States)

2010-07-01

... Auditing § 89.507 Sample selection. (a) Engines comprising a test sample will be selected at the location...). However, once the manufacturer ships any test engine, it relinquishes the prerogative to conduct retests...
Use of Maximum Likelihood-Mixed Models to select stable reference genes: a case of heat stress response in sheep

Directory of Open Access Journals (Sweden)

Salces Judit

2011-08-01

Full Text Available Abstract Background Reference genes with stable expression are required to normalize expression differences of target genes in qPCR experiments. Several procedures and companion software have been proposed to find the most stable genes. Model based procedures are attractive because they provide a solid statistical framework. NormFinder, a widely used software, uses a model based method. The pairwise comparison procedure implemented in GeNorm is a simpler procedure but one of the most extensively used. In the present work a statistical approach based in Maximum Likelihood estimation under mixed models was tested and compared with NormFinder and geNorm softwares. Sixteen candidate genes were tested in whole blood samples from control and heat stressed sheep. Results A model including gene and treatment as fixed effects, sample (animal, gene by treatment, gene by sample and treatment by sample interactions as random effects with heteroskedastic residual variance in gene by treatment levels was selected using goodness of fit and predictive ability criteria among a variety of models. Mean Square Error obtained under the selected model was used as indicator of gene expression stability. Genes top and bottom ranked by the three approaches were similar; however, notable differences for the best pair of genes selected for each method and the remaining genes of the rankings were shown. Differences among the expression values of normalized targets for each statistical approach were also found. Conclusions Optimal statistical properties of Maximum Likelihood estimation joined to mixed model flexibility allow for more accurate estimation of expression stability of genes under many different situations. Accurate selection of reference genes has a direct impact over the normalized expression values of a given target gene. This may be critical when the aim of the study is to compare expression rate differences among samples under different environmental
Macroeconomic Forecasts in Models with Bayesian Averaging of Classical Estimates

Directory of Open Access Journals (Sweden)

Piotr Białowolski

2012-03-01

Full Text Available The aim of this paper is to construct a forecasting model oriented on predicting basic macroeconomic variables, namely: the GDP growth rate, the unemployment rate, and the consumer price inflation. In order to select the set of the best regressors, Bayesian Averaging of Classical Estimators (BACE is employed. The models are atheoretical (i.e. they do not reflect causal relationships postulated by the macroeconomic theory and the role of regressors is played by business and consumer tendency survey-based indicators. Additionally, survey-based indicators are included with a lag that enables to forecast the variables of interest (GDP, unemployment, and inflation for the four forthcoming quarters without the need to make any additional assumptions concerning the values of predictor variables in the forecast period. Bayesian Averaging of Classical Estimators is a method allowing for full and controlled overview of all econometric models which can be obtained out of a particular set of regressors. In this paper authors describe the method of generating a family of econometric models and the procedure for selection of a final forecasting model. Verification of the procedure is performed by means of out-of-sample forecasts of main economic variables for the quarters of 2011. The accuracy of the forecasts implies that there is still a need to search for new solutions in the atheoretical modelling.
Sampling of systematic errors to estimate likelihood weights in nuclear data uncertainty propagation

International Nuclear Information System (INIS)

Helgesson, P.; Sjöstrand, H.; Koning, A.J.; Rydén, J.; Rochman, D.; Alhassan, E.; Pomp, S.

2016-01-01

In methodologies for nuclear data (ND) uncertainty assessment and propagation based on random sampling, likelihood weights can be used to infer experimental information into the distributions for the ND. As the included number of correlated experimental points grows large, the computational time for the matrix inversion involved in obtaining the likelihood can become a practical problem. There are also other problems related to the conventional computation of the likelihood, e.g., the assumption that all experimental uncertainties are Gaussian. In this study, a way to estimate the likelihood which avoids matrix inversion is investigated; instead, the experimental correlations are included by sampling of systematic errors. It is shown that the model underlying the sampling methodology (using univariate normal distributions for random and systematic errors) implies a multivariate Gaussian for the experimental points (i.e., the conventional model). It is also shown that the likelihood estimates obtained through sampling of systematic errors approach the likelihood obtained with matrix inversion as the sample size for the systematic errors grows large. In studied practical cases, it is seen that the estimates for the likelihood weights converge impractically slowly with the sample size, compared to matrix inversion. The computational time is estimated to be greater than for matrix inversion in cases with more experimental points, too. Hence, the sampling of systematic errors has little potential to compete with matrix inversion in cases where the latter is applicable. Nevertheless, the underlying model and the likelihood estimates can be easier to intuitively interpret than the conventional model and the likelihood function involving the inverted covariance matrix. Therefore, this work can both have pedagogical value and be used to help motivating the conventional assumption of a multivariate Gaussian for experimental data. The sampling of systematic errors could also
40 CFR 90.507 - Sample selection.

Science.gov (United States)

2010-07-01

... Auditing § 90.507 Sample selection. (a) Engines comprising a test sample will be selected at the location... manufacturer ships any test engine, it relinquishes the prerogative to conduct retests as provided in § 90.508...
Low-sampling-rate ultra-wideband channel estimation using a bounded-data-uncertainty approach

KAUST Repository

Ballal, Tarig

2014-01-01

This paper proposes a low-sampling-rate scheme for ultra-wideband channel estimation. In the proposed scheme, P pulses are transmitted to produce P observations. These observations are exploited to produce channel impulse response estimates at a desired sampling rate, while the ADC operates at a rate that is P times less. To avoid loss of fidelity, the interpulse interval, given in units of sampling periods of the desired rate, is restricted to be co-prime with P. This condition is affected when clock drift is present and the transmitted pulse locations change. To handle this situation and to achieve good performance without using prior information, we derive an improved estimator based on the bounded data uncertainty (BDU) model. This estimator is shown to be related to the Bayesian linear minimum mean squared error (LMMSE) estimator. The performance of the proposed sub-sampling scheme was tested in conjunction with the new estimator. It is shown that high reduction in sampling rate can be achieved. The proposed estimator outperforms the least squares estimator in most cases; while in the high SNR regime, it also outperforms the LMMSE estimator. © 2014 IEEE.
Improving observational study estimates of treatment effects using joint modeling of selection effects and outcomes: the case of AAA repair.

Science.gov (United States)

O'Malley, A James; Cotterill, Philip; Schermerhorn, Marc L; Landon, Bruce E

2011-12-01

When 2 treatment approaches are available, there are likely to be unmeasured confounders that influence choice of procedure, which complicates estimation of the causal effect of treatment on outcomes using observational data. To estimate the effect of endovascular (endo) versus open surgical (open) repair, including possible modification by institutional volume, on survival after treatment for abdominal aortic aneurysm, accounting for observed and unobserved confounding variables. Observational study of data from the Medicare program using a joint model of treatment selection and survival given treatment to estimate the effects of type of surgery and institutional volume on survival. We studied 61,414 eligible repairs of intact abdominal aortic aneurysms during 2001 to 2004. The outcome, perioperative death, is defined as in-hospital death or death within 30 days of operation. The key predictors are use of endo, transformed endo and open volume, and endo-volume interactions. There is strong evidence of nonrandom selection of treatment with potential confounding variables including institutional volume and procedure date, variables not typically adjusted for in clinical trials. The best fitting model included heterogeneous transformations of endo volume for endo cases and open volume for open cases as predictors. Consistent with our hypothesis, accounting for unmeasured selection reduced the mortality benefit of endo. The effect of endo versus open surgery varies nonlinearly with endo and open volume. Accounting for institutional experience and unmeasured selection enables better decision-making by physicians making treatment referrals, investigators evaluating treatments, and policy makers.
Performances Of Estimators Of Linear Models With Autocorrelated ...

African Journals Online (AJOL)

The performances of five estimators of linear models with Autocorrelated error terms are compared when the independent variable is autoregressive. The results reveal that the properties of the estimators when the sample size is finite is quite similar to the properties of the estimators when the sample size is infinite although ...
Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures.

Science.gov (United States)

Viswanath, Shruthi; Chemmama, Ilan E; Cimermancic, Peter; Sali, Andrej

2017-12-05

Modeling of macromolecular structures involves structural sampling guided by a scoring function, resulting in an ensemble of good-scoring models. By necessity, the sampling is often stochastic, and must be exhaustive at a precision sufficient for accurate modeling and assessment of model uncertainty. Therefore, the very first step in analyzing the ensemble is an estimation of the highest precision at which the sampling is exhaustive. Here, we present an objective and automated method for this task. As a proxy for sampling exhaustiveness, we evaluate whether two independently and stochastically generated sets of models are sufficiently similar. The protocol includes testing 1) convergence of the model score, 2) whether model scores for the two samples were drawn from the same parent distribution, 3) whether each structural cluster includes models from each sample proportionally to its size, and 4) whether there is sufficient structural similarity between the two model samples in each cluster. The evaluation also provides the sampling precision, defined as the smallest clustering threshold that satisfies the third, most stringent test. We validate the protocol with the aid of enumerated good-scoring models for five illustrative cases of binary protein complexes. Passing the proposed four tests is necessary, but not sufficient for thorough sampling. The protocol is general in nature and can be applied to the stochastic sampling of any set of models, not just structural models. In addition, the tests can be used to stop stochastic sampling as soon as exhaustiveness at desired precision is reached, thereby improving sampling efficiency; they may also help in selecting a model representation that is sufficiently detailed to be informative, yet also sufficiently coarse for sampling to be exhaustive. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

Validation of limited sampling models (LSM) for estimating AUC in therapeutic drug monitoring - is a separate validation group required?

NARCIS (Netherlands)

Proost, J. H.

Objective: Limited sampling models (LSM) for estimating AUC in therapeutic drug monitoring are usually validated in a separate group of patients, according to published guidelines. The aim of this study is to evaluate the validation of LSM by comparing independent validation with cross-validation
Sampling designs and methods for estimating fish-impingement losses at cooling-water intakes

International Nuclear Information System (INIS)

Murarka, I.P.; Bodeau, D.J.

1977-01-01

Several systems for estimating fish impingement at power plant cooling-water intakes are compared to determine the most statistically efficient sampling designs and methods. Compared to a simple random sampling scheme the stratified systematic random sampling scheme, the systematic random sampling scheme, and the stratified random sampling scheme yield higher efficiencies and better estimators for the parameters in two models of fish impingement as a time-series process. Mathematical results and illustrative examples of the applications of the sampling schemes to simulated and real data are given. Some sampling designs applicable to fish-impingement studies are presented in appendixes
Radiation dose estimates due to air particulate emissions from selected phosphate industry operations

International Nuclear Information System (INIS)

Partridge, J.E.; Horton, T.R.; Sensintaffar, E.L.; Boysen, G.A.

1978-06-01

The EPA Office of Radiation Programs has conducted a series of studies to determine the radiological impact of the phosphate mining and milling industry. This report describes the efforts to estimate the radiation doses due to airborne emissions of particulates from selected phosphate milling operations in Florida. Two wet process phosphoric acid plants and one ore drying facility were selected for this study. The 1976 Annual Operations/Emissions Report, submitted by each facility to the Florida Department of Environmental Regulation, and a field survey trip by EPA personnel to each facility were used to develop data for dose calculations. The field survey trip included sampling for stack emissions and ambient air samples collected in the general vicinity of each plant. Population and individual radiation dose estimates are made based on these sources of data
Correlation between the model accuracy and model-based SOC estimation

International Nuclear Information System (INIS)

Wang, Qianqian; Wang, Jiao; Zhao, Pengju; Kang, Jianqiang; Yan, Few; Du, Changqing

2017-01-01

State-of-charge (SOC) estimation is a core technology for battery management systems. Considerable progress has been achieved in the study of SOC estimation algorithms, especially the algorithm on the basis of Kalman filter to meet the increasing demand of model-based battery management systems. The Kalman filter weakens the influence of white noise and initial error during SOC estimation but cannot eliminate the existing error of the battery model itself. As such, the accuracy of SOC estimation is directly related to the accuracy of the battery model. Thus far, the quantitative relationship between model accuracy and model-based SOC estimation remains unknown. This study summarizes three equivalent circuit lithium-ion battery models, namely, Thevenin, PNGV, and DP models. The model parameters are identified through hybrid pulse power characterization test. The three models are evaluated, and SOC estimation conducted by EKF-Ah method under three operating conditions are quantitatively studied. The regression and correlation of the standard deviation and normalized RMSE are studied and compared between the model error and the SOC estimation error. These parameters exhibit a strong linear relationship. Results indicate that the model accuracy affects the SOC estimation accuracy mainly in two ways: dispersion of the frequency distribution of the error and the overall level of the error. On the basis of the relationship between model error and SOC estimation error, our study provides a strategy for selecting a suitable cell model to meet the requirements of SOC precision using Kalman filter.
Heterogeneous Causal Effects and Sample Selection Bias

DEFF Research Database (Denmark)

Breen, Richard; Choi, Seongsoo; Holm, Anders

2015-01-01

The role of education in the process of socioeconomic attainment is a topic of long standing interest to sociologists and economists. Recently there has been growing interest not only in estimating the average causal effect of education on outcomes such as earnings, but also in estimating how...... causal effects might vary over individuals or groups. In this paper we point out one of the under-appreciated hazards of seeking to estimate heterogeneous causal effects: conventional selection bias (that is, selection on baseline differences) can easily be mistaken for heterogeneity of causal effects....... This might lead us to find heterogeneous effects when the true effect is homogenous, or to wrongly estimate not only the magnitude but also the sign of heterogeneous effects. We apply a test for the robustness of heterogeneous causal effects in the face of varying degrees and patterns of selection bias...
Optimal Bandwidth Selection for Kernel Density Functionals Estimation

Directory of Open Access Journals (Sweden)

Su Chen

2015-01-01

Full Text Available The choice of bandwidth is crucial to the kernel density estimation (KDE and kernel based regression. Various bandwidth selection methods for KDE and local least square regression have been developed in the past decade. It has been known that scale and location parameters are proportional to density functionals ∫γ(xf2(xdx with appropriate choice of γ(x and furthermore equality of scale and location tests can be transformed to comparisons of the density functionals among populations. ∫γ(xf2(xdx can be estimated nonparametrically via kernel density functionals estimation (KDFE. However, the optimal bandwidth selection for KDFE of ∫γ(xf2(xdx has not been examined. We propose a method to select the optimal bandwidth for the KDFE. The idea underlying this method is to search for the optimal bandwidth by minimizing the mean square error (MSE of the KDFE. Two main practical bandwidth selection techniques for the KDFE of ∫γ(xf2(xdx are provided: Normal scale bandwidth selection (namely, “Rule of Thumb” and direct plug-in bandwidth selection. Simulation studies display that our proposed bandwidth selection methods are superior to existing density estimation bandwidth selection methods in estimating density functionals.
Total arsenic in selected food samples from Argentina: Estimation of their contribution to inorganic arsenic dietary intake.

Science.gov (United States)

Sigrist, Mirna; Hilbe, Nandi; Brusa, Lucila; Campagnoli, Darío; Beldoménico, Horacio

2016-11-01

An optimized flow injection hydride generation atomic absorption spectroscopy (FI-HGAAS) method was used to determine total arsenic in selected food samples (beef, chicken, fish, milk, cheese, egg, rice, rice-based products, wheat flour, corn flour, oats, breakfast cereals, legumes and potatoes) and to estimate their contributions to inorganic arsenic dietary intake. The limit of detection (LOD) and limit of quantification (LOQ) values obtained were 6μgkg(-)(1) and 18μgkg(-)(1), respectively. The mean recovery range obtained for all food at a fortification level of 200μgkg(-)(1) was 85-110%. Accuracy was evaluated using dogfish liver certified reference material (DOLT-3 NRC) for trace metals. The highest total arsenic concentrations (in μgkg(-)(1)) were found in fish (152-439), rice (87-316) and rice-based products (52-201). The contribution to inorganic arsenic (i-As) intake was calculated from the mean i-As content of each food (calculated by applying conversion factors to total arsenic data) and the mean consumption per day. The primary contributors to inorganic arsenic intake were wheat flour, including its proportion in wheat flour-based products (breads, pasta and cookies), followed by rice; both foods account for close to 53% and 17% of the intake, respectively. The i-As dietary intake, estimated as 10.7μgday(-)(1), was significantly lower than that from drinking water in vast regions of Argentina. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bayesian Estimation and Selection of Nonlinear Vector Error Correction Models: The Case of the Sugar-Ethanol-Oil Nexus in Brazil

OpenAIRE

Kelvin Balcombe; George Rapsomanikis

2008-01-01

Nonlinear adjustment toward long-run price equilibrium relationships in the sugar-ethanol-oil nexus in Brazil is examined. We develop generalized bivariate error correction models that allow for cointegration between sugar, ethanol, and oil prices, where dynamic adjustments are potentially nonlinear functions of the disequilibrium errors. A range of models are estimated using Bayesian Monte Carlo Markov Chain algorithms and compared using Bayesian model selection methods. The results suggest ...
Judging statistical models of individual decision making under risk using in- and out-of-sample criteria.

Science.gov (United States)

Drichoutis, Andreas C; Lusk, Jayson L

2014-01-01

Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error) for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two) using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample.
Judging statistical models of individual decision making under risk using in- and out-of-sample criteria.

Directory of Open Access Journals (Sweden)

Andreas C Drichoutis

Full Text Available Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample.
Vast Volatility Matrix Estimation using High Frequency Data for Portfolio Selection*

Science.gov (United States)

Fan, Jianqing; Li, Yingying; Yu, Ke

2012-01-01

Portfolio allocation with gross-exposure constraint is an effective method to increase the efficiency and stability of portfolios selection among a vast pool of assets, as demonstrated in Fan et al. (2011). The required high-dimensional volatility matrix can be estimated by using high frequency financial data. This enables us to better adapt to the local volatilities and local correlations among vast number of assets and to increase significantly the sample size for estimating the volatility matrix. This paper studies the volatility matrix estimation using high-dimensional high-frequency data from the perspective of portfolio selection. Specifically, we propose the use of “pairwise-refresh time” and “all-refresh time” methods based on the concept of “refresh time” proposed by Barndorff-Nielsen et al. (2008) for estimation of vast covariance matrix and compare their merits in the portfolio selection. We establish the concentration inequalities of the estimates, which guarantee desirable properties of the estimated volatility matrix in vast asset allocation with gross exposure constraints. Extensive numerical studies are made via carefully designed simulations. Comparing with the methods based on low frequency daily data, our methods can capture the most recent trend of the time varying volatility and correlation, hence provide more accurate guidance for the portfolio allocation in the next time period. The advantage of using high-frequency data is significant in our simulation and empirical studies, which consist of 50 simulated assets and 30 constituent stocks of Dow Jones Industrial Average index. PMID:23264708
Vast Volatility Matrix Estimation using High Frequency Data for Portfolio Selection.

Science.gov (United States)

Fan, Jianqing; Li, Yingying; Yu, Ke

2012-01-01

Portfolio allocation with gross-exposure constraint is an effective method to increase the efficiency and stability of portfolios selection among a vast pool of assets, as demonstrated in Fan et al. (2011). The required high-dimensional volatility matrix can be estimated by using high frequency financial data. This enables us to better adapt to the local volatilities and local correlations among vast number of assets and to increase significantly the sample size for estimating the volatility matrix. This paper studies the volatility matrix estimation using high-dimensional high-frequency data from the perspective of portfolio selection. Specifically, we propose the use of "pairwise-refresh time" and "all-refresh time" methods based on the concept of "refresh time" proposed by Barndorff-Nielsen et al. (2008) for estimation of vast covariance matrix and compare their merits in the portfolio selection. We establish the concentration inequalities of the estimates, which guarantee desirable properties of the estimated volatility matrix in vast asset allocation with gross exposure constraints. Extensive numerical studies are made via carefully designed simulations. Comparing with the methods based on low frequency daily data, our methods can capture the most recent trend of the time varying volatility and correlation, hence provide more accurate guidance for the portfolio allocation in the next time period. The advantage of using high-frequency data is significant in our simulation and empirical studies, which consist of 50 simulated assets and 30 constituent stocks of Dow Jones Industrial Average index.
HOT-DUST-POOR QUASARS IN MID-INFRARED AND OPTICALLY SELECTED SAMPLES

International Nuclear Information System (INIS)

Hao Heng; Elvis, Martin; Civano, Francesca; Lawrence, Andy

2011-01-01

We show that the hot-dust-poor (HDP) quasars, originally found in the X-ray-selected XMM-COSMOS type 1 active galactic nucleus (AGN) sample, are just as common in two samples selected at optical/infrared wavelengths: the Richards et al. Spitzer/SDSS sample (8.7% ± 2.2%) and the Palomar-Green-quasar-dominated sample of Elvis et al. (9.5% ± 5.0%). The properties of the HDP quasars in these two samples are consistent with the XMM-COSMOS sample, except that, at the 99% (∼ 2.5σ) significance, a larger proportion of the HDP quasars in the Spitzer/SDSS sample have weak host galaxy contributions, probably due to the selection criteria used. Either the host dust is destroyed (dynamically or by radiation) or is offset from the central black hole due to recoiling. Alternatively, the universality of HDP quasars in samples with different selection methods and the continuous distribution of dust covering factor in type 1 AGNs suggest that the range of spectral energy distributions could be related to the range of tilts in warped fueling disks, as in the model of Lawrence and Elvis, with HDP quasars having relatively small warps.
A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy.

Science.gov (United States)

Corbett-Detig, Russell; Nielsen, Rasmus

2017-01-01

Admixture-the mixing of genomes from divergent populations-is increasingly appreciated as a central process in evolution. To characterize and quantify patterns of admixture across the genome, a number of methods have been developed for local ancestry inference. However, existing approaches have a number of shortcomings. First, all local ancestry inference methods require some prior assumption about the expected ancestry tract lengths. Second, existing methods generally require genotypes, which is not feasible to obtain for many next-generation sequencing projects. Third, many methods assume samples are diploid, however a wide variety of sequencing applications will fail to meet this assumption. To address these issues, we introduce a novel hidden Markov model for estimating local ancestry that models the read pileup data, rather than genotypes, is generalized to arbitrary ploidy, and can estimate the time since admixture during local ancestry inference. We demonstrate that our method can simultaneously estimate the time since admixture and local ancestry with good accuracy, and that it performs well on samples of high ploidy-i.e. 100 or more chromosomes. As this method is very general, we expect it will be useful for local ancestry inference in a wider variety of populations than what previously has been possible. We then applied our method to pooled sequencing data derived from populations of Drosophila melanogaster on an ancestry cline on the east coast of North America. We find that regions of local recombination rates are negatively correlated with the proportion of African ancestry, suggesting that selection against foreign ancestry is the least efficient in low recombination regions. Finally we show that clinal outlier loci are enriched for genes associated with gene regulatory functions, consistent with a role of regulatory evolution in ecological adaptation of admixed D. melanogaster populations. Our results illustrate the potential of local ancestry
Cost estimation model for advanced planetary programs, fourth edition

Science.gov (United States)

Spadoni, D. J.

1983-01-01

The development of the planetary program cost model is discussed. The Model was updated to incorporate cost data from the most recent US planetary flight projects and extensively revised to more accurately capture the information in the historical cost data base. This data base is comprised of the historical cost data for 13 unmanned lunar and planetary flight programs. The revision was made with a two fold objective: to increase the flexibility of the model in its ability to deal with the broad scope of scenarios under consideration for future missions, and to maintain and possibly improve upon the confidence in the model's capabilities with an expected accuracy of 20%. The Model development included a labor/cost proxy analysis, selection of the functional forms of the estimating relationships, and test statistics. An analysis of the Model is discussed and two sample applications of the cost model are presented.
Simulation of range imaging-based estimation of respiratory lung motion. Influence of noise, signal dimensionality and sampling patterns.

Science.gov (United States)

Wilms, M; Werner, R; Blendowski, M; Ortmüller, J; Handels, H

2014-01-01

A major problem associated with the irradiation of thoracic and abdominal tumors is respiratory motion. In clinical practice, motion compensation approaches are frequently steered by low-dimensional breathing signals (e.g., spirometry) and patient-specific correspondence models, which are used to estimate the sought internal motion given a signal measurement. Recently, the use of multidimensional signals derived from range images of the moving skin surface has been proposed to better account for complex motion patterns. In this work, a simulation study is carried out to investigate the motion estimation accuracy of such multidimensional signals and the influence of noise, the signal dimensionality, and different sampling patterns (points, lines, regions). A diffeomorphic correspondence modeling framework is employed to relate multidimensional breathing signals derived from simulated range images to internal motion patterns represented by diffeomorphic non-linear transformations. Furthermore, an automatic approach for the selection of optimal signal combinations/patterns within this framework is presented. This simulation study focuses on lung motion estimation and is based on 28 4D CT data sets. The results show that the use of multidimensional signals instead of one-dimensional signals significantly improves the motion estimation accuracy, which is, however, highly affected by noise. Only small differences exist between different multidimensional sampling patterns (lines and regions). Automatically determined optimal combinations of points and lines do not lead to accuracy improvements compared to results obtained by using all points or lines. Our results show the potential of multidimensional breathing signals derived from range images for the model-based estimation of respiratory motion in radiation therapy.
Efficient Bayesian parameter estimation with implicit sampling and surrogate modeling for a vadose zone hydrological problem

Science.gov (United States)

Liu, Y.; Pau, G. S. H.; Finsterle, S.

2015-12-01

Parameter inversion involves inferring the model parameter values based on sparse observations of some observables. To infer the posterior probability distributions of the parameters, Markov chain Monte Carlo (MCMC) methods are typically used. However, the large number of forward simulations needed and limited computational resources limit the complexity of the hydrological model we can use in these methods. In view of this, we studied the implicit sampling (IS) method, an efficient importance sampling technique that generates samples in the high-probability region of the posterior distribution and thus reduces the number of forward simulations that we need to run. For a pilot-point inversion of a heterogeneous permeability field based on a synthetic ponded infiltration experiment simulated with TOUGH2 (a subsurface modeling code), we showed that IS with linear map provides an accurate Bayesian description of the parameterized permeability field at the pilot points with just approximately 500 forward simulations. We further studied the use of surrogate models to improve the computational efficiency of parameter inversion. We implemented two reduced-order models (ROMs) for the TOUGH2 forward model. One is based on polynomial chaos expansion (PCE), of which the coefficients are obtained using the sparse Bayesian learning technique to mitigate the "curse of dimensionality" of the PCE terms. The other model is Gaussian process regression (GPR) for which different covariance, likelihood and inference models are considered. Preliminary results indicate that ROMs constructed based on the prior parameter space perform poorly. It is thus impractical to replace this hydrological model by a ROM directly in a MCMC method. However, the IS method can work with a ROM constructed for parameters in the close vicinity of the maximum a posteriori probability (MAP) estimate. We will discuss the accuracy and computational efficiency of using ROMs in the implicit sampling procedure
Radioecological sensitivity in the Faroe Islands estimated from modeling long-term variation of radioactivity

International Nuclear Information System (INIS)

Joensen, H.P.

2002-01-01

The Faroes environment has received radioactive debris from the nuclear weapons tests in the 1950'ies and 1960'ies and from the Chernobyl accident 26 April 1986. The paper presents results from modeling the relation between 137 Cs and 90 Sr in precipitation and the radioactivity from these nuclides in selected foodstuffs, using available data from the last four decades. The model relates the concentration of a radionuclide in a sample from a given year to the deposition rate of the radionuclide in the given year and in the year before, and to the accumulated deposition two years before. The effective ecological half-life of the radionuclides in the selected foodstuffs is estimated, and model-calculated sensitivities defined as time-integrated radionuclide concentration in an environmental sample from a unit ground deposition, as e.g. (Bq/kg)y per (kBq/m2), are presented. (au)
PERIOD ESTIMATION FOR SPARSELY SAMPLED QUASI-PERIODIC LIGHT CURVES APPLIED TO MIRAS

Energy Technology Data Exchange (ETDEWEB)

He, Shiyuan; Huang, Jianhua Z.; Long, James [Department of Statistics, Texas A and M University, College Station, TX (United States); Yuan, Wenlong; Macri, Lucas M., E-mail: lmacri@tamu.edu [George P. and Cynthia W. Mitchell Institute for Fundamental Physics and Astronomy, Department of Physics and Astronomy, Texas A and M University, College Station, TX (United States)

2016-12-01

We develop a nonlinear semi-parametric Gaussian process model to estimate periods of Miras with sparsely sampled light curves. The model uses a sinusoidal basis for the periodic variation and a Gaussian process for the stochastic changes. We use maximum likelihood to estimate the period and the parameters of the Gaussian process, while integrating out the effects of other nuisance parameters in the model with respect to a suitable prior distribution obtained from earlier studies. Since the likelihood is highly multimodal for period, we implement a hybrid method that applies the quasi-Newton algorithm for Gaussian process parameters and search the period/frequency parameter space over a dense grid. A large-scale, high-fidelity simulation is conducted to mimic the sampling quality of Mira light curves obtained by the M33 Synoptic Stellar Survey. The simulated data set is publicly available and can serve as a testbed for future evaluation of different period estimation methods. The semi-parametric model outperforms an existing algorithm on this simulated test data set as measured by period recovery rate and quality of the resulting period–luminosity relations.
The MIDAS Touch: Mixed Data Sampling Regression Models

OpenAIRE

Ghysels, Eric; Santa-Clara, Pedro; Valkanov, Rossen

2004-01-01

We introduce Mixed Data Sampling (henceforth MIDAS) regression models. The regressions involve time series data sampled at different frequencies. Technically speaking MIDAS models specify conditional expectations as a distributed lag of regressors recorded at some higher sampling frequencies. We examine the asymptotic properties of MIDAS regression estimation and compare it with traditional distributed lag models. MIDAS regressions have wide applicability in macroeconomics and ï¿½nance.

Estimating Loan-to-value Distributions

DEFF Research Database (Denmark)

Korteweg, Arthur; Sørensen, Morten

2016-01-01

We estimate a model of house prices, combined loan-to-value ratios (CLTVs) and trade and foreclosure behavior. House prices are only observed for traded properties and trades are endogenous, creating sample-selection problems for existing approaches to estimating CLTVs. We use a Bayesian filtering...
Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

Science.gov (United States)

Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

2018-05-01

Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.
Spatially explicit population estimates for black bears based on cluster sampling

Science.gov (United States)

Humm, J.; McCown, J. Walter; Scheick, B.K.; Clark, Joseph D.

2017-01-01

We estimated abundance and density of the 5 major black bear (Ursus americanus) subpopulations (i.e., Eglin, Apalachicola, Osceola, Ocala-St. Johns, Big Cypress) in Florida, USA with spatially explicit capture-mark-recapture (SCR) by extracting DNA from hair samples collected at barbed-wire hair sampling sites. We employed a clustered sampling configuration with sampling sites arranged in 3 × 3 clusters spaced 2 km apart within each cluster and cluster centers spaced 16 km apart (center to center). We surveyed all 5 subpopulations encompassing 38,960 km2 during 2014 and 2015. Several landscape variables, most associated with forest cover, helped refine density estimates for the 5 subpopulations we sampled. Detection probabilities were affected by site-specific behavioral responses coupled with individual capture heterogeneity associated with sex. Model-averaged bear population estimates ranged from 120 (95% CI = 59–276) bears or a mean 0.025 bears/km2 (95% CI = 0.011–0.44) for the Eglin subpopulation to 1,198 bears (95% CI = 949–1,537) or 0.127 bears/km2 (95% CI = 0.101–0.163) for the Ocala-St. Johns subpopulation. The total population estimate for our 5 study areas was 3,916 bears (95% CI = 2,914–5,451). The clustered sampling method coupled with information on land cover was efficient and allowed us to estimate abundance across extensive areas that would not have been possible otherwise. Clustered sampling combined with spatially explicit capture-recapture methods has the potential to provide rigorous population estimates for a wide array of species that are extensive and heterogeneous in their distribution.
Limited-sampling strategy models for estimating the pharmacokinetic parameters of 4-methylaminoantipyrine, an active metabolite of dipyrone

Directory of Open Access Journals (Sweden)

Suarez-Kurtz G.

2001-01-01

Full Text Available Bioanalytical data from a bioequivalence study were used to develop limited-sampling strategy (LSS models for estimating the area under the plasma concentration versus time curve (AUC and the peak plasma concentration (Cmax of 4-methylaminoantipyrine (MAA, an active metabolite of dipyrone. Twelve healthy adult male volunteers received single 600 mg oral doses of dipyrone in two formulations at a 7-day interval in a randomized, crossover protocol. Plasma concentrations of MAA (N = 336, measured by HPLC, were used to develop LSS models. Linear regression analysis and a "jack-knife" validation procedure revealed that the AUC0-¥ and the Cmax of MAA can be accurately predicted (R²>0.95, bias 0.85 of the AUC0-¥ or Cmax for the other formulation. LSS models based on three sampling points (1.5, 4 and 24 h, but using different coefficients for AUC0-¥ and Cmax, predicted the individual values of both parameters for the enrolled volunteers (R²>0.88, bias = -0.65 and -0.37%, precision = 4.3 and 7.4% as well as for plasma concentration data sets generated by simulation (R²>0.88, bias = -1.9 and 8.5%, precision = 5.2 and 8.7%. Bioequivalence assessment of the dipyrone formulations based on the 90% confidence interval of log-transformed AUC0-¥ and Cmax provided similar results when either the best-estimated or the LSS-derived metrics were used.
Sample Selection for Training Cascade Detectors.

Science.gov (United States)

Vállez, Noelia; Deniz, Oscar; Bueno, Gloria

2015-01-01

Automatic detection systems usually require large and representative training datasets in order to obtain good detection and false positive rates. Training datasets are such that the positive set has few samples and/or the negative set should represent anything except the object of interest. In this respect, the negative set typically contains orders of magnitude more images than the positive set. However, imbalanced training databases lead to biased classifiers. In this paper, we focus our attention on a negative sample selection method to properly balance the training data for cascade detectors. The method is based on the selection of the most informative false positive samples generated in one stage to feed the next stage. The results show that the proposed cascade detector with sample selection obtains on average better partial AUC and smaller standard deviation than the other compared cascade detectors.
Near-native protein loop sampling using nonparametric density estimation accommodating sparcity.

Science.gov (United States)

Joo, Hyun; Chavan, Archana G; Day, Ryan; Lennox, Kristin P; Sukhanov, Paul; Dahl, David B; Vannucci, Marina; Tsai, Jerry

2011-10-01

Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD 7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/.
Near-native protein loop sampling using nonparametric density estimation accommodating sparcity.

Directory of Open Access Journals (Sweden)

Hyun Joo

2011-10-01

Full Text Available Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM. Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD 7.0 Å, this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/.
Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity

Science.gov (United States)

Day, Ryan; Lennox, Kristin P.; Sukhanov, Paul; Dahl, David B.; Vannucci, Marina; Tsai, Jerry

2011-01-01

Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD 7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. PMID:22028638
Automatic smoothing parameter selection in GAMLSS with an application to centile estimation.

Science.gov (United States)

Rigby, Robert A; Stasinopoulos, Dimitrios M

2014-08-01

A method for automatic selection of the smoothing parameters in a generalised additive model for location, scale and shape (GAMLSS) model is introduced. The method uses a P-spline representation of the smoothing terms to express them as random effect terms with an internal (or local) maximum likelihood estimation on the predictor scale of each distribution parameter to estimate its smoothing parameters. This provides a fast method for estimating multiple smoothing parameters. The method is applied to centile estimation where all four parameters of a distribution for the response variable are modelled as smooth functions of a transformed explanatory variable x This allows smooth modelling of the location, scale, skewness and kurtosis parameters of the response variable distribution as functions of x. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
A shared frailty model for case-cohort samples: parent and offspring relations in an adoption study

DEFF Research Database (Denmark)

Petersen, Liselotte; Sørensen, Thorkild I A; Andersen, Per Kragh

2010-01-01

of their biological and adoptive parents were collected with the purpose of studying the association of survival between the adoptee and his/her biological or adoptive parents. Motivated by this study, we explored how to make inference in a shared frailty model for case-cohort data. Our approach was to use inverse......The Danish adoption register contains data on the 12 301 Danish nonfamilial adoptions during 1924-1947. From that register a case-cohort sample was selected consisting of all case adoptees, that is those adoptees dying before age 70 years, and a random sample of 1683 adoptees. The survival data...... probability weighting to account for the sampling in a conditional, shared frailty Poisson model and to use the robust variance estimator proposed by Moger et al. (Statist. Med. 2008; 27:1062-1074).To explore the performance of the estimation procedure, a simulation study was conducted. We studied situations...
Unemployment estimation: Spatial point referenced methods and models

KAUST Repository

Pereira, Soraia

2017-06-26

Portuguese Labor force survey, from 4th quarter of 2014 onwards, started geo-referencing the sampling units, namely the dwellings in which the surveys are carried. This opens new possibilities in analysing and estimating unemployment and its spatial distribution across any region. The labor force survey choose, according to an preestablished sampling criteria, a certain number of dwellings across the nation and survey the number of unemployed in these dwellings. Based on this survey, the National Statistical Institute of Portugal presently uses direct estimation methods to estimate the national unemployment figures. Recently, there has been increased interest in estimating these figures in smaller areas. Direct estimation methods, due to reduced sampling sizes in small areas, tend to produce fairly large sampling variations therefore model based methods, which tend to
Estimation of Stochastic Volatility Models by Nonparametric Filtering

DEFF Research Database (Denmark)

Kanaya, Shin; Kristensen, Dennis

2016-01-01

/estimated volatility process replacing the latent process. Our estimation strategy is applicable to both parametric and nonparametric stochastic volatility models, and can handle both jumps and market microstructure noise. The resulting estimators of the stochastic volatility model will carry additional biases...... and variances due to the first-step estimation, but under regularity conditions we show that these vanish asymptotically and our estimators inherit the asymptotic properties of the infeasible estimators based on observations of the volatility process. A simulation study examines the finite-sample properties...
Estimating mean change in population salt intake using spot urine samples.

Science.gov (United States)

Petersen, Kristina S; Wu, Jason H Y; Webster, Jacqui; Grimes, Carley; Woodward, Mark; Nowson, Caryl A; Neal, Bruce

2017-10-01

Spot urine samples are easier to collect than 24-h urine samples and have been used with estimating equations to derive the mean daily salt intake of a population. Whether equations using data from spot urine samples can also be used to estimate change in mean daily population salt intake over time is unknown. We compared estimates of change in mean daily population salt intake based upon 24-h urine collections with estimates derived using equations based on spot urine samples. Paired and unpaired 24-h urine samples and spot urine samples were collected from individuals in two Australian populations, in 2011 and 2014. Estimates of change in daily mean population salt intake between 2011 and 2014 were obtained directly from the 24-h urine samples and by applying established estimating equations (Kawasaki, Tanaka, Mage, Toft, INTERSALT) to the data from spot urine samples. Differences between 2011 and 2014 were calculated using mixed models. A total of 1000 participants provided a 24-h urine sample and a spot urine sample in 2011, and 1012 did so in 2014 (paired samples n = 870; unpaired samples n = 1142). The participants were community-dwelling individuals living in the State of Victoria or the town of Lithgow in the State of New South Wales, Australia, with a mean age of 55 years in 2011. The mean (95% confidence interval) difference in population salt intake between 2011 and 2014 determined from the 24-h urine samples was -0.48g/day (-0.74 to -0.21; P spot urine samples was -0.24 g/day (-0.42 to -0.06; P = 0.01) using the Tanaka equation, -0.42 g/day (-0.70 to -0.13; p = 0.004) using the Kawasaki equation, -0.51 g/day (-1.00 to -0.01; P = 0.046) using the Mage equation, -0.26 g/day (-0.42 to -0.10; P = 0.001) using the Toft equation, -0.20 g/day (-0.32 to -0.09; P = 0.001) using the INTERSALT equation and -0.27 g/day (-0.39 to -0.15; P 0.058). Separate analysis of the unpaired and paired data showed that detection of
Genomic breeding value estimation using nonparametric additive regression models

Directory of Open Access Journals (Sweden)

Solberg Trygve

2009-01-01

Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
Baysian estimation of P(X > x) from a small sample of Gaussian data

DEFF Research Database (Denmark)

Ditlevsen, Ove Dalager

2017-01-01

The classical statistical uncertainty problem of estimation of upper tail probabilities on the basis of a small sample of observations of a Gaussian random variable is considered. Predictive posterior estimation is discussed, adopting the standard statistical model with diffuse priors of the two...
Sample Selection for Training Cascade Detectors.

Directory of Open Access Journals (Sweden)

Noelia Vállez

Full Text Available Automatic detection systems usually require large and representative training datasets in order to obtain good detection and false positive rates. Training datasets are such that the positive set has few samples and/or the negative set should represent anything except the object of interest. In this respect, the negative set typically contains orders of magnitude more images than the positive set. However, imbalanced training databases lead to biased classifiers. In this paper, we focus our attention on a negative sample selection method to properly balance the training data for cascade detectors. The method is based on the selection of the most informative false positive samples generated in one stage to feed the next stage. The results show that the proposed cascade detector with sample selection obtains on average better partial AUC and smaller standard deviation than the other compared cascade detectors.
Sample Size Calculations for Population Size Estimation Studies Using Multiplier Methods With Respondent-Driven Sampling Surveys.

Science.gov (United States)

Fearon, Elizabeth; Chabata, Sungai T; Thompson, Jennifer A; Cowan, Frances M; Hargreaves, James R

2017-09-14

While guidance exists for obtaining population size estimates using multiplier methods with respondent-driven sampling surveys, we lack specific guidance for making sample size decisions. To guide the design of multiplier method population size estimation studies using respondent-driven sampling surveys to reduce the random error around the estimate obtained. The population size estimate is obtained by dividing the number of individuals receiving a service or the number of unique objects distributed (M) by the proportion of individuals in a representative survey who report receipt of the service or object (P). We have developed an approach to sample size calculation, interpreting methods to estimate the variance around estimates obtained using multiplier methods in conjunction with research into design effects and respondent-driven sampling. We describe an application to estimate the number of female sex workers in Harare, Zimbabwe. There is high variance in estimates. Random error around the size estimate reflects uncertainty from M and P, particularly when the estimate of P in the respondent-driven sampling survey is low. As expected, sample size requirements are higher when the design effect of the survey is assumed to be greater. We suggest a method for investigating the effects of sample size on the precision of a population size estimate obtained using multipler methods and respondent-driven sampling. Uncertainty in the size estimate is high, particularly when P is small, so balancing against other potential sources of bias, we advise researchers to consider longer service attendance reference periods and to distribute more unique objects, which is likely to result in a higher estimate of P in the respondent-driven sampling survey. ©Elizabeth Fearon, Sungai T Chabata, Jennifer A Thompson, Frances M Cowan, James R Hargreaves. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 14.09.2017.
Model Selection in Historical Research Using Approximate Bayesian Computation

Science.gov (United States)

Rubio-Campillo, Xavier

2016-01-01

Formal Models and History Computational models are increasingly being used to study historical dynamics. This new trend, which could be named Model-Based History, makes use of recently published datasets and innovative quantitative methods to improve our understanding of past societies based on their written sources. The extensive use of formal models allows historians to re-evaluate hypotheses formulated decades ago and still subject to debate due to the lack of an adequate quantitative framework. The initiative has the potential to transform the discipline if it solves the challenges posed by the study of historical dynamics. These difficulties are based on the complexities of modelling social interaction, and the methodological issues raised by the evaluation of formal models against data with low sample size, high variance and strong fragmentation. Case Study This work examines an alternate approach to this evaluation based on a Bayesian-inspired model selection method. The validity of the classical Lanchester’s laws of combat is examined against a dataset comprising over a thousand battles spanning 300 years. Four variations of the basic equations are discussed, including the three most common formulations (linear, squared, and logarithmic) and a new variant introducing fatigue. Approximate Bayesian Computation is then used to infer both parameter values and model selection via Bayes Factors. Impact Results indicate decisive evidence favouring the new fatigue model. The interpretation of both parameter estimations and model selection provides new insights into the factors guiding the evolution of warfare. At a methodological level, the case study shows how model selection methods can be used to guide historical research through the comparison between existing hypotheses and empirical evidence. PMID:26730953
Parameter Estimation and Model Selection for Mixtures of Truncated Exponentials

DEFF Research Database (Denmark)

Langseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael

2010-01-01

Bayesian networks with mixtures of truncated exponentials (MTEs) support efficient inference algorithms and provide a flexible way of modeling hybrid domains (domains containing both discrete and continuous variables). On the other hand, estimating an MTE from data has turned out to be a difficul...
Estimation of the ancestral effective population sizes of African great apes under different selection regimes.

Science.gov (United States)

Schrago, Carlos G

2014-08-01

Reliable estimates of ancestral effective population sizes are necessary to unveil the population-level phenomena that shaped the phylogeny and molecular evolution of the African great apes. Although several methods have previously been applied to infer ancestral effective population sizes, an analysis of the influence of the selective regime on the estimates of ancestral demography has not been thoroughly conducted. In this study, three independent data sets under different selective regimes were used were composed to tackle this issue. The results showed that selection had a significant impact on the estimates of ancestral effective population sizes of the African great apes. The inference of the ancestral demography of African great apes was affected by the selection regime. The effects, however, were not homogeneous along the ancestral populations of great apes. The effective population size of the ancestor of humans and chimpanzees was more impacted by the selection regime when compared to the same parameter in the ancestor of humans, chimpanzees and gorillas. Because the selection regime influenced the estimates of ancestral effective population size, it is reasonable to assume that a portion of the discrepancy found in previous studies that inferred the ancestral effective population size may be attributable to the differential action of selection on the genes sampled.

Indirect estimation of signal-dependent noise with nonadaptive heterogeneous samples.

Science.gov (United States)

Azzari, Lucio; Foi, Alessandro

2014-08-01

We consider the estimation of signal-dependent noise from a single image. Unlike conventional algorithms that build a scatterplot of local mean-variance pairs from either small or adaptively selected homogeneous data samples, our proposed approach relies on arbitrarily large patches of heterogeneous data extracted at random from the image. We demonstrate the feasibility of our approach through an extensive theoretical analysis based on mixture of Gaussian distributions. A prototype algorithm is also developed in order to validate the approach on simulated data as well as on real camera raw images.
Estimating Stochastic Volatility Models using Prediction-based Estimating Functions

DEFF Research Database (Denmark)

Lunde, Asger; Brix, Anne Floor

to the performance of the GMM estimator based on conditional moments of integrated volatility from Bollerslev and Zhou (2002). The case where the observed log-price process is contaminated by i.i.d. market microstructure (MMS) noise is also investigated. First, the impact of MMS noise on the parameter estimates from......In this paper prediction-based estimating functions (PBEFs), introduced in Sørensen (2000), are reviewed and PBEFs for the Heston (1993) stochastic volatility model are derived. The finite sample performance of the PBEF based estimator is investigated in a Monte Carlo study, and compared...... to correctly account for the noise are investigated. Our Monte Carlo study shows that the estimator based on PBEFs outperforms the GMM estimator, both in the setting with and without MMS noise. Finally, an empirical application investigates the possible challenges and general performance of applying the PBEF...
A Heckman Selection- t Model

KAUST Repository

Marchenko, Yulia V.; Genton, Marc G.

2012-01-01

for sample selection bias based on the SLt model and compare it with the performances of several tests used with the SLN model. Our findings indicate that the latter tests can be misleading in the presence of heavy-tailed data. © 2012 American Statistical
Bayesian Model Averaging of Artificial Intelligence Models for Hydraulic Conductivity Estimation

Science.gov (United States)

Nadiri, A.; Chitsazan, N.; Tsai, F. T.; Asghari Moghaddam, A.

2012-12-01

This research presents a Bayesian artificial intelligence model averaging (BAIMA) method that incorporates multiple artificial intelligence (AI) models to estimate hydraulic conductivity and evaluate estimation uncertainties. Uncertainty in the AI model outputs stems from error in model input as well as non-uniqueness in selecting different AI methods. Using one single AI model tends to bias the estimation and underestimate uncertainty. BAIMA employs Bayesian model averaging (BMA) technique to address the issue of using one single AI model for estimation. BAIMA estimates hydraulic conductivity by averaging the outputs of AI models according to their model weights. In this study, the model weights were determined using the Bayesian information criterion (BIC) that follows the parsimony principle. BAIMA calculates the within-model variances to account for uncertainty propagation from input data to AI model output. Between-model variances are evaluated to account for uncertainty due to model non-uniqueness. We employed Takagi-Sugeno fuzzy logic (TS-FL), artificial neural network (ANN) and neurofuzzy (NF) to estimate hydraulic conductivity for the Tasuj plain aquifer, Iran. BAIMA combined three AI models and produced better fitting than individual models. While NF was expected to be the best AI model owing to its utilization of both TS-FL and ANN models, the NF model is nearly discarded by the parsimony principle. The TS-FL model and the ANN model showed equal importance although their hydraulic conductivity estimates were quite different. This resulted in significant between-model variances that are normally ignored by using one AI model.
Obscured AGN at z ~ 1 from the zCOSMOS-Bright Survey. I. Selection and optical properties of a [Ne v]-selected sample

Science.gov (United States)

Mignoli, M.; Vignali, C.; Gilli, R.; Comastri, A.; Zamorani, G.; Bolzonella, M.; Bongiorno, A.; Lamareille, F.; Nair, P.; Pozzetti, L.; Lilly, S. J.; Carollo, C. M.; Contini, T.; Kneib, J.-P.; Le Fèvre, O.; Mainieri, V.; Renzini, A.; Scodeggio, M.; Bardelli, S.; Caputi, K.; Cucciati, O.; de la Torre, S.; de Ravel, L.; Franzetti, P.; Garilli, B.; Iovino, A.; Kampczyk, P.; Knobel, C.; Kovač, K.; Le Borgne, J.-F.; Le Brun, V.; Maier, C.; Pellò, R.; Peng, Y.; Perez Montero, E.; Presotto, V.; Silverman, J. D.; Tanaka, M.; Tasca, L.; Tresse, L.; Vergani, D.; Zucca, E.; Bordoloi, R.; Cappi, A.; Cimatti, A.; Koekemoer, A. M.; McCracken, H. J.; Moresco, M.; Welikala, N.

2013-08-01

Aims: The application of multi-wavelength selection techniques is essential for obtaining a complete and unbiased census of active galactic nuclei (AGN). We present here a method for selecting z ~ 1 obscured AGN from optical spectroscopic surveys. Methods: A sample of 94 narrow-line AGN with 0.65 advantage of the large amount of data available in the COSMOS field, the properties of the [Ne v]-selected type 2 AGN were investigated, focusing on their host galaxies, X-ray emission, and optical line-flux ratios. Finally, a previously developed diagnostic, based on the X-ray-to-[Ne v] luminosity ratio, was exploited to search for the more heavily obscured AGN. Results: We found that [Ne v]-selected narrow-line AGN have Seyfert 2-like optical spectra, although their emission line ratios are diluted by a star-forming component. The ACS morphologies and stellar component in the optical spectra indicate a preference for our type 2 AGN to be hosted in early-type spirals with stellar masses greater than 109.5 - 10 M⊙, on average higher than those of the galaxy parent sample. The fraction of galaxies hosting [Ne v]-selected obscured AGN increases with the stellar mass, reaching a maximum of about 3% at ≈2 × 1011 M⊙. A comparison with other selection techniques at z ~ 1, namely the line-ratio diagnostics and X-ray detections, shows that the detection of the [Ne v] λ3426 line is an effective method for selecting AGN in the optical band, in particular the most heavily obscured ones, but cannot provide a complete census of type 2 AGN by itself. Finally, the high fraction of [Ne v]-selected type 2 AGN not detected in medium-deep (≈100-200 ks) Chandra observations (67%) is suggestive of the inclusion of Compton-thick (i.e., with NH > 1024 cm-2) sources in our sample. The presence of a population of heavily obscured AGN is corroborated by the X-ray-to-[Ne v] ratio; we estimated, by means of an X-ray stacking technique and simulations, that the Compton-thick fraction in our
Robust estimation and moment selection in dynamic fixed-effects panel data models

NARCIS (Netherlands)

Cizek, Pavel; Aquaro, Michele

Considering linear dynamic panel data models with fixed effects, existing outlier–robust estimators based on the median ratio of two consecutive pairs of first-differenced data are extended to higher-order differencing. The estimation procedure is thus based on many pairwise differences and their
On efficiency of some ratio estimators in double sampling design ...

African Journals Online (AJOL)

In this paper, three sampling ratio estimators in double sampling design were proposed with the intention of finding an alternative double sampling design estimator to the conventional ratio estimator in double sampling design discussed by Cochran (1997), Okafor (2002) , Raj (1972) and Raj and Chandhok (1999).
Estimation of sample size and testing power (Part 4).

Science.gov (United States)

Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

2012-01-01

Sample size estimation is necessary for any experimental or survey research. An appropriate estimation of sample size based on known information and statistical knowledge is of great significance. This article introduces methods of sample size estimation of difference test for data with the design of one factor with two levels, including sample size estimation formulas and realization based on the formulas and the POWER procedure of SAS software for quantitative data and qualitative data with the design of one factor with two levels. In addition, this article presents examples for analysis, which will play a leading role for researchers to implement the repetition principle during the research design phase.
Generalized Likelihood Uncertainty Estimation (GLUE) Using Multi-Optimization Algorithm as Sampling Method

Science.gov (United States)

Wang, Z.

2015-12-01

For decades, distributed and lumped hydrological models have furthered our understanding of hydrological system. The development of hydrological simulation in large scale and high precision elaborated the spatial descriptions and hydrological behaviors. Meanwhile, the new trend is also followed by the increment of model complexity and number of parameters, which brings new challenges of uncertainty quantification. Generalized Likelihood Uncertainty Estimation (GLUE) has been widely used in uncertainty analysis for hydrological models referring to Monte Carlo method coupled with Bayesian estimation. However, the stochastic sampling method of prior parameters adopted by GLUE appears inefficient, especially in high dimensional parameter space. The heuristic optimization algorithms utilizing iterative evolution show better convergence speed and optimality-searching performance. In light of the features of heuristic optimization algorithms, this study adopted genetic algorithm, differential evolution, shuffled complex evolving algorithm to search the parameter space and obtain the parameter sets of large likelihoods. Based on the multi-algorithm sampling, hydrological model uncertainty analysis is conducted by the typical GLUE framework. To demonstrate the superiority of the new method, two hydrological models of different complexity are examined. The results shows the adaptive method tends to be efficient in sampling and effective in uncertainty analysis, providing an alternative path for uncertainty quantilization.
UNLABELED SELECTED SAMPLES IN FEATURE EXTRACTION FOR CLASSIFICATION OF HYPERSPECTRAL IMAGES WITH LIMITED TRAINING SAMPLES

Directory of Open Access Journals (Sweden)

A. Kianisarkaleh

2015-12-01

Full Text Available Feature extraction plays a key role in hyperspectral images classification. Using unlabeled samples, often unlimitedly available, unsupervised and semisupervised feature extraction methods show better performance when limited number of training samples exists. This paper illustrates the importance of selecting appropriate unlabeled samples that used in feature extraction methods. Also proposes a new method for unlabeled samples selection using spectral and spatial information. The proposed method has four parts including: PCA, prior classification, posterior classification and sample selection. As hyperspectral image passes these parts, selected unlabeled samples can be used in arbitrary feature extraction methods. The effectiveness of the proposed unlabeled selected samples in unsupervised and semisupervised feature extraction is demonstrated using two real hyperspectral datasets. Results show that through selecting appropriate unlabeled samples, the proposed method can improve the performance of feature extraction methods and increase classification accuracy.
Numerically stable algorithm for combining census and sample estimates with the multivariate composite estimator

Science.gov (United States)

R. L. Czaplewski

2009-01-01

The minimum variance multivariate composite estimator is a relatively simple sequential estimator for complex sampling designs (Czaplewski 2009). Such designs combine a probability sample of expensive field data with multiple censuses and/or samples of relatively inexpensive multi-sensor, multi-resolution remotely sensed data. Unfortunately, the multivariate composite...
Counting Cats: Spatially Explicit Population Estimates of Cheetah (Acinonyx jubatus Using Unstructured Sampling Data.

Directory of Open Access Journals (Sweden)

Femke Broekhuis

Full Text Available Many ecological theories and species conservation programmes rely on accurate estimates of population density. Accurate density estimation, especially for species facing rapid declines, requires the application of rigorous field and analytical methods. However, obtaining accurate density estimates of carnivores can be challenging as carnivores naturally exist at relatively low densities and are often elusive and wide-ranging. In this study, we employ an unstructured spatial sampling field design along with a Bayesian sex-specific spatially explicit capture-recapture (SECR analysis, to provide the first rigorous population density estimates of cheetahs (Acinonyx jubatus in the Maasai Mara, Kenya. We estimate adult cheetah density to be between 1.28 ± 0.315 and 1.34 ± 0.337 individuals/100km2 across four candidate models specified in our analysis. Our spatially explicit approach revealed 'hotspots' of cheetah density, highlighting that cheetah are distributed heterogeneously across the landscape. The SECR models incorporated a movement range parameter which indicated that male cheetah moved four times as much as females, possibly because female movement was restricted by their reproductive status and/or the spatial distribution of prey. We show that SECR can be used for spatially unstructured data to successfully characterise the spatial distribution of a low density species and also estimate population density when sample size is small. Our sampling and modelling framework will help determine spatial and temporal variation in cheetah densities, providing a foundation for their conservation and management. Based on our results we encourage other researchers to adopt a similar approach in estimating densities of individually recognisable species.
Counting Cats: Spatially Explicit Population Estimates of Cheetah (Acinonyx jubatus) Using Unstructured Sampling Data.

Science.gov (United States)

Broekhuis, Femke; Gopalaswamy, Arjun M

2016-01-01

Many ecological theories and species conservation programmes rely on accurate estimates of population density. Accurate density estimation, especially for species facing rapid declines, requires the application of rigorous field and analytical methods. However, obtaining accurate density estimates of carnivores can be challenging as carnivores naturally exist at relatively low densities and are often elusive and wide-ranging. In this study, we employ an unstructured spatial sampling field design along with a Bayesian sex-specific spatially explicit capture-recapture (SECR) analysis, to provide the first rigorous population density estimates of cheetahs (Acinonyx jubatus) in the Maasai Mara, Kenya. We estimate adult cheetah density to be between 1.28 ± 0.315 and 1.34 ± 0.337 individuals/100km2 across four candidate models specified in our analysis. Our spatially explicit approach revealed 'hotspots' of cheetah density, highlighting that cheetah are distributed heterogeneously across the landscape. The SECR models incorporated a movement range parameter which indicated that male cheetah moved four times as much as females, possibly because female movement was restricted by their reproductive status and/or the spatial distribution of prey. We show that SECR can be used for spatially unstructured data to successfully characterise the spatial distribution of a low density species and also estimate population density when sample size is small. Our sampling and modelling framework will help determine spatial and temporal variation in cheetah densities, providing a foundation for their conservation and management. Based on our results we encourage other researchers to adopt a similar approach in estimating densities of individually recognisable species.
Numerical Model based Reliability Estimation of Selective Laser Melting Process

DEFF Research Database (Denmark)

Mohanty, Sankhya; Hattel, Jesper Henri

2014-01-01

Selective laser melting is developing into a standard manufacturing technology with applications in various sectors. However, the process is still far from being at par with conventional processes such as welding and casting, the primary reason of which is the unreliability of the process. While...... of the selective laser melting process. A validated 3D finite-volume alternating-direction-implicit numerical technique is used to model the selective laser melting process, and is calibrated against results from single track formation experiments. Correlation coefficients are determined for process input...... parameters such as laser power, speed, beam profile, etc. Subsequently, uncertainties in the processing parameters are utilized to predict a range for the various outputs, using a Monte Carlo method based uncertainty analysis methodology, and the reliability of the process is established....
HERITABILITY AND BREEDING VALUE OF SHEEP FERTILITY ESTIMATED BY MEANS OF THE GIBBS SAMPLING METHOD USING THE LINEAR AND THRESHOLD MODELS

Directory of Open Access Journals (Sweden)

DARIUSZ Piwczynski

2013-03-01

Full Text Available The research was carried out on 4,030 Polish Merino ewes born in the years 1991- 2001, kept in 15 flocks from the Pomorze and Kujawy region. Fertility of ewes in subsequent reproduction seasons was analysed with the use of multiple logistic regression. The research showed that there is a statistical influence of the flock, year of birth, age of dam, flock year interaction of birth on the ewes fertility. In order to estimate the genetic parameters, the Gibbs sampling method was applied, using the univariate animal models, both linear as well as threshold. Estimates of fertility depending on the model equalled 0.067 to 0.104, whereas the estimates of repeatability equalled respectively: 0.076 and 0.139. The obtained genetic parameters were then used to estimate the breeding values of the animals in terms of controlled trait (Best Linear Unbiased Prediction method using linear and threshold models. The obtained animal breeding values rankings in respect of the same trait with the use of linear and threshold models were strongly correlated with each other (rs = 0.972. Negative genetic trends of fertility (0.01-0.08% per year were found.
Estimation of population mean under systematic sampling

Science.gov (United States)

Noor-ul-amin, Muhammad; Javaid, Amjad

2017-11-01

In this study we propose a generalized ratio estimator under non-response for systematic random sampling. We also generate a class of estimators through special cases of generalized estimator using different combinations of coefficients of correlation, kurtosis and variation. The mean square errors and mathematical conditions are also derived to prove the efficiency of proposed estimators. Numerical illustration is included using three populations to support the results.
Network Structure and Biased Variance Estimation in Respondent Driven Sampling.

Science.gov (United States)

Verdery, Ashton M; Mouw, Ted; Bauldry, Shawn; Mucha, Peter J

2015-01-01

This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.
Graphical models for inference under outcome-dependent sampling

DEFF Research Database (Denmark)

Didelez, V; Kreiner, S; Keiding, N

2010-01-01

a node for the sampling indicator, assumptions about sampling processes can be made explicit. We demonstrate how to read off such graphs whether consistent estimation of the association between exposure and outcome is possible. Moreover, we give sufficient graphical conditions for testing and estimating......We consider situations where data have been collected such that the sampling depends on the outcome of interest and possibly further covariates, as for instance in case-control studies. Graphical models represent assumptions about the conditional independencies among the variables. By including...
Comparison of blood flow models and acquisitions for quantitative myocardial perfusion estimation from dynamic CT

International Nuclear Information System (INIS)

Bindschadler, Michael; Alessio, Adam M; Modgil, Dimple; La Riviere, Patrick J; Branch, Kelley R

2014-01-01

Myocardial blood flow (MBF) can be estimated from dynamic contrast enhanced (DCE) cardiac CT acquisitions, leading to quantitative assessment of regional perfusion. The need for low radiation dose and the lack of consensus on MBF estimation methods motivates this study to refine the selection of acquisition protocols and models for CT-derived MBF. DCE cardiac CT acquisitions were simulated for a range of flow states (MBF = 0.5, 1, 2, 3 ml (min g) −1 , cardiac output = 3, 5, 8 L min −1 ). Patient kinetics were generated by a mathematical model of iodine exchange incorporating numerous physiological features including heterogenenous microvascular flow, permeability and capillary contrast gradients. CT acquisitions were simulated for multiple realizations of realistic x-ray flux levels. CT acquisitions that reduce radiation exposure were implemented by varying both temporal sampling (1, 2, and 3 s sampling intervals) and tube currents (140, 70, and 25 mAs). For all acquisitions, we compared three quantitative MBF estimation methods (two-compartment model, an axially-distributed model, and the adiabatic approximation to the tissue homogeneous model) and a qualitative slope-based method. In total, over 11 000 time attenuation curves were used to evaluate MBF estimation in multiple patient and imaging scenarios. After iodine-based beam hardening correction, the slope method consistently underestimated flow by on average 47.5% and the quantitative models provided estimates with less than 6.5% average bias and increasing variance with increasing dose reductions. The three quantitative models performed equally well, offering estimates with essentially identical root mean squared error (RMSE) for matched acquisitions. MBF estimates using the qualitative slope method were inferior in terms of bias and RMSE compared to the quantitative methods. MBF estimate error was equal at matched dose reductions for all quantitative methods and range of techniques evaluated. This
Maximum likelihood estimation of finite mixture model for economic data

Science.gov (United States)

Phoong, Seuk-Yen; Ismail, Mohd Tahir

2014-06-01

Finite mixture model is a mixture model with finite-dimension. This models are provides a natural representation of heterogeneity in a finite number of latent classes. In addition, finite mixture models also known as latent class models or unsupervised learning models. Recently, maximum likelihood estimation fitted finite mixture models has greatly drawn statistician's attention. The main reason is because maximum likelihood estimation is a powerful statistical method which provides consistent findings as the sample sizes increases to infinity. Thus, the application of maximum likelihood estimation is used to fit finite mixture model in the present paper in order to explore the relationship between nonlinear economic data. In this paper, a two-component normal mixture model is fitted by maximum likelihood estimation in order to investigate the relationship among stock market price and rubber price for sampled countries. Results described that there is a negative effect among rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia.

A General Model for Estimating Macroevolutionary Landscapes.

Science.gov (United States)

Boucher, Florian C; Démery, Vincent; Conti, Elena; Harmon, Luke J; Uyeda, Josef

2018-03-01

The evolution of quantitative characters over long timescales is often studied using stochastic diffusion models. The current toolbox available to students of macroevolution is however limited to two main models: Brownian motion and the Ornstein-Uhlenbeck process, plus some of their extensions. Here, we present a very general model for inferring the dynamics of quantitative characters evolving under both random diffusion and deterministic forces of any possible shape and strength, which can accommodate interesting evolutionary scenarios like directional trends, disruptive selection, or macroevolutionary landscapes with multiple peaks. This model is based on a general partial differential equation widely used in statistical mechanics: the Fokker-Planck equation, also known in population genetics as the Kolmogorov forward equation. We thus call the model FPK, for Fokker-Planck-Kolmogorov. We first explain how this model can be used to describe macroevolutionary landscapes over which quantitative traits evolve and, more importantly, we detail how it can be fitted to empirical data. Using simulations, we show that the model has good behavior both in terms of discrimination from alternative models and in terms of parameter inference. We provide R code to fit the model to empirical data using either maximum-likelihood or Bayesian estimation, and illustrate the use of this code with two empirical examples of body mass evolution in mammals. FPK should greatly expand the set of macroevolutionary scenarios that can be studied since it opens the way to estimating macroevolutionary landscapes of any conceivable shape. [Adaptation; bounds; diffusion; FPK model; macroevolution; maximum-likelihood estimation; MCMC methods; phylogenetic comparative data; selection.].
Acrylamide exposure among Turkish toddlers from selected cereal-based baby food samples.

Science.gov (United States)

Cengiz, Mehmet Fatih; Gündüz, Cennet Pelin Boyacı

2013-10-01

In this study, acrylamide exposure from selected cereal-based baby food samples was investigated among toddlers aged 1-3 years in Turkey. The study contained three steps. The first step was collecting food consumption data and toddlers' physical properties, such as gender, age and body weight, using a questionnaire given to parents by a trained interviewer between January and March 2012. The second step was determining the acrylamide levels in food samples that were reported on by the parents in the questionnaire, using a gas chromatography-mass spectrometry (GC-MS) method. The last step was combining the determined acrylamide levels in selected food samples with individual food consumption and body weight data using a deterministic approach to estimate the acrylamide exposure levels. The mean acrylamide levels of baby biscuits, breads, baby bread-rusks, crackers, biscuits, breakfast cereals and powdered cereal-based baby foods were 153, 225, 121, 604, 495, 290 and 36 μg/kg, respectively. The minimum, mean and maximum acrylamide exposures were estimated to be 0.06, 1.43 and 6.41 μg/kg BW per day, respectively. The foods that contributed to acrylamide exposure were aligned from high to low as bread, crackers, biscuits, baby biscuits, powdered cereal-based baby foods, baby bread-rusks and breakfast cereals. Copyright © 2013 Elsevier Ltd. All rights reserved.
Addressing catch mechanisms in gillnets improves modeling of selectivity and estimates of mortality rates: a case study using survey data on an endangered stock of Arctic char

DEFF Research Database (Denmark)

Jonsson, T.; Setzer, M.; Pope, John George

2013-01-01

Estimation of fish stock size distributions from survey data requires knowledge about gear selectivity. However, selectivity models rest on assumptions that seldom are analyzed. Departures from these can lead to misinterpretations and biased management recommendations. Here, we use survey data...... and asymmetric, with poor model fits. Removing potentially nonmeshed fish had the greatest positive effect on model fit, resulting in much narrower and less asymmetric selection curves, while attempting to take nonisometric growth into account, by using girth rather than length, improved model fit...
Estimating black bear density in New Mexico using noninvasive genetic sampling coupled with spatially explicit capture-recapture methods

Science.gov (United States)

Gould, Matthew J.; Cain, James W.; Roemer, Gary W.; Gould, William R.

2016-01-01

During the 2004–2005 to 2015–2016 hunting seasons, the New Mexico Department of Game and Fish (NMDGF) estimated black bear abundance (Ursus americanus) across the state by coupling density estimates with the distribution of primary habitat generated by Costello et al. (2001). These estimates have been used to set harvest limits. For example, a density of 17 bears/100 km2 for the Sangre de Cristo and Sacramento Mountains and 13.2 bears/100 km2 for the Sandia Mountains were used to set harvest levels. The advancement and widespread acceptance of non-invasive sampling and mark-recapture methods, prompted the NMDGF to collaborate with the New Mexico Cooperative Fish and Wildlife Research Unit and New Mexico State University to update their density estimates for black bear populations in select mountain ranges across the state.We established 5 study areas in 3 mountain ranges: the northern (NSC; sampled in 2012) and southern Sangre de Cristo Mountains (SSC; sampled in 2013), the Sandia Mountains (Sandias; sampled in 2014), and the northern (NSacs) and southern Sacramento Mountains (SSacs; both sampled in 2014). We collected hair samples from black bears using two concurrent non-invasive sampling methods, hair traps and bear rubs. We used a gender marker and a suite of microsatellite loci to determine the individual identification of hair samples that were suitable for genetic analysis. We used these data to generate mark-recapture encounter histories for each bear and estimated density in a spatially explicit capture-recapture framework (SECR). We constructed a suite of SECR candidate models using sex, elevation, land cover type, and time to model heterogeneity in detection probability and the spatial scale over which detection probability declines. We used Akaike’s Information Criterion corrected for small sample size (AICc) to rank and select the most supported model from which we estimated density.We set 554 hair traps, 117 bear rubs and collected 4,083 hair
Acceptance sampling using judgmental and randomly selected samples

Energy Technology Data Exchange (ETDEWEB)

Sego, Landon H.; Shulman, Stanley A.; Anderson, Kevin K.; Wilson, John E.; Pulsipher, Brent A.; Sieber, W. Karl

2010-09-01

We present a Bayesian model for acceptance sampling where the population consists of two groups, each with different levels of risk of containing unacceptable items. Expert opinion, or judgment, may be required to distinguish between the high and low-risk groups. Hence, high-risk items are likely to be identifed (and sampled) using expert judgment, while the remaining low-risk items are sampled randomly. We focus on the situation where all observed samples must be acceptable. Consequently, the objective of the statistical inference is to quantify the probability that a large percentage of the unsampled items in the population are also acceptable. We demonstrate that traditional (frequentist) acceptance sampling and simpler Bayesian formulations of the problem are essentially special cases of the proposed model. We explore the properties of the model in detail, and discuss the conditions necessary to ensure that required samples sizes are non-decreasing function of the population size. The method is applicable to a variety of acceptance sampling problems, and, in particular, to environmental sampling where the objective is to demonstrate the safety of reoccupying a remediated facility that has been contaminated with a lethal agent.
Statistical Model-Based Face Pose Estimation

Institute of Scientific and Technical Information of China (English)

GE Xinliang; YANG Jie; LI Feng; WANG Huahua

2007-01-01

A robust face pose estimation approach is proposed by using face shape statistical model approach and pose parameters are represented by trigonometric functions. The face shape statistical model is firstly built by analyzing the face shapes from different people under varying poses. The shape alignment is vital in the process of building the statistical model. Then, six trigonometric functions are employed to represent the face pose parameters. Lastly, the mapping function is constructed between face image and face pose by linearly relating different parameters. The proposed approach is able to estimate different face poses using a few face training samples. Experimental results are provided to demonstrate its efficiency and accuracy.
Habitat suitability criteria via parametric distributions: estimation, model selection and uncertainty

Science.gov (United States)

Som, Nicholas A.; Goodman, Damon H.; Perry, Russell W.; Hardy, Thomas B.

2016-01-01

Previous methods for constructing univariate habitat suitability criteria (HSC) curves have ranged from professional judgement to kernel-smoothed density functions or combinations thereof. We present a new method of generating HSC curves that applies probability density functions as the mathematical representation of the curves. Compared with previous approaches, benefits of our method include (1) estimation of probability density function parameters directly from raw data, (2) quantitative methods for selecting among several candidate probability density functions, and (3) concise methods for expressing estimation uncertainty in the HSC curves. We demonstrate our method with a thorough example using data collected on the depth of water used by juvenile Chinook salmon (Oncorhynchus tschawytscha) in the Klamath River of northern California and southern Oregon. All R code needed to implement our example is provided in the appendix. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
An online method for lithium-ion battery remaining useful life estimation using importance sampling and neural networks

International Nuclear Information System (INIS)

Wu, Ji; Zhang, Chenbin; Chen, Zonghai

2016-01-01

Highlights: • An online RUL estimation method for lithium-ion battery is proposed. • RUL is described by the difference among battery terminal voltage curves. • A feed forward neural network is employed for RUL estimation. • Importance sampling is utilized to select feed forward neural network inputs. - Abstract: An accurate battery remaining useful life (RUL) estimation can facilitate the design of a reliable battery system as well as the safety and reliability of actual operation. A reasonable definition and an effective prediction algorithm are indispensable for the achievement of an accurate RUL estimation result. In this paper, the analysis of battery terminal voltage curves under different cycle numbers during charge process is utilized for RUL definition. Moreover, the relationship between RUL and charge curve is simulated by feed forward neural network (FFNN) for its simplicity and effectiveness. Considering the nonlinearity of lithium-ion charge curve, importance sampling (IS) is employed for FFNN input selection. Based on these results, an online approach using FFNN and IS is presented to estimate lithium-ion battery RUL in this paper. Experiments and numerical comparisons are conducted to validate the proposed method. The results show that the FFNN with IS is an accurate estimation method for actual operation.
The Properties of Model Selection when Retaining Theory Variables

DEFF Research Database (Denmark)

Hendry, David F.; Johansen, Søren

Economic theories are often fitted directly to data to avoid possible model selection biases. We show that embedding a theory model that specifies the correct set of m relevant exogenous variables, x{t}, within the larger set of m+k candidate variables, (x{t},w{t}), then selection over the second...... set by their statistical significance can be undertaken without affecting the estimator distribution of the theory parameters. This strategy returns the theory-parameter estimates when the theory is correct, yet protects against the theory being under-specified because some w{t} are relevant....
Using the Violence Risk Scale-Sexual Offense version in sexual violence risk assessments: Updated risk categories and recidivism estimates from a multisite sample of treated sexual offenders.

Science.gov (United States)

Olver, Mark E; Mundt, James C; Thornton, David; Beggs Christofferson, Sarah M; Kingston, Drew A; Sowden, Justina N; Nicholaichuk, Terry P; Gordon, Audrey; Wong, Stephen C P

2018-04-30

The present study sought to develop updated risk categories and recidivism estimates for the Violence Risk Scale-Sexual Offense version (VRS-SO; Wong, Olver, Nicholaichuk, & Gordon, 2003-2017), a sexual offender risk assessment and treatment planning tool. The overarching purpose was to increase the clarity and accuracy of communicating risk assessment information that includes a systematic incorporation of new information (i.e., change) to modify risk estimates. Four treated samples of sexual offenders with VRS-SO pretreatment, posttreatment, and Static-99R ratings were combined with a minimum follow-up period of 10-years postrelease (N = 913). Logistic regression was used to model 5- and 10-year sexual and violent (including sexual) recidivism estimates across 6 different regression models employing specific risk and change score information from the VRS-SO and/or Static-99R. A rationale is presented for clinical applications of select models and the necessity of controlling for baseline risk when utilizing change information across repeated assessments. Information concerning relative risk (percentiles) and absolute risk (recidivism estimates) is integrated with common risk assessment language guidelines to generate new risk categories for the VRS-SO. Guidelines for model selection and forensic clinical application of the risk estimates are discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Estimation of DSGE Models under Diffuse Priors and Data-Driven Identification Constraints

DEFF Research Database (Denmark)

Lanne, Markku; Luoto, Jani

We propose a sequential Monte Carlo (SMC) method augmented with an importance sampling step for estimation of DSGE models. In addition to being theoretically well motivated, the new method facilitates the assessment of estimation accuracy. Furthermore, in order to alleviate the problem of multimo......We propose a sequential Monte Carlo (SMC) method augmented with an importance sampling step for estimation of DSGE models. In addition to being theoretically well motivated, the new method facilitates the assessment of estimation accuracy. Furthermore, in order to alleviate the problem...... the properties of the estimation method, and shows how the problem of multimodal posterior distributions caused by parameter redundancy is eliminated by identification constraints. Out-of-sample forecast comparisons as well as Bayes factors lend support to the constrained model....
Effects of sampling conditions on DNA-based estimates of American black bear abundance

Science.gov (United States)

Laufenberg, Jared S.; Van Manen, Frank T.; Clark, Joseph D.

2013-01-01

DNA-based capture-mark-recapture techniques are commonly used to estimate American black bear (Ursus americanus) population abundance (N). Although the technique is well established, many questions remain regarding study design. In particular, relationships among N, capture probability of heterogeneity mixtures A and B (pA and pB, respectively, or p, collectively), the proportion of each mixture (π), number of capture occasions (k), and probability of obtaining reliable estimates of N are not fully understood. We investigated these relationships using 1) an empirical dataset of DNA samples for which true N was unknown and 2) simulated datasets with known properties that represented a broader array of sampling conditions. For the empirical data analysis, we used the full closed population with heterogeneity data type in Program MARK to estimate N for a black bear population in Great Smoky Mountains National Park, Tennessee. We systematically reduced the number of those samples used in the analysis to evaluate the effect that changes in capture probabilities may have on parameter estimates. Model-averaged N for females and males were 161 (95% CI = 114–272) and 100 (95% CI = 74–167), respectively (pooled N = 261, 95% CI = 192–419), and the average weekly p was 0.09 for females and 0.12 for males. When we reduced the number of samples of the empirical data, support for heterogeneity models decreased. For the simulation analysis, we generated capture data with individual heterogeneity covering a range of sampling conditions commonly encountered in DNA-based capture-mark-recapture studies and examined the relationships between those conditions and accuracy (i.e., probability of obtaining an estimated N that is within 20% of true N), coverage (i.e., probability that 95% confidence interval includes true N), and precision (i.e., probability of obtaining a coefficient of variation ≤20%) of estimates using logistic regression. The capture probability
The impact of fecal sample processing on prevalence estimates for antibiotic-resistant Escherichia coli.

Science.gov (United States)

Omulo, Sylvia; Lofgren, Eric T; Mugoh, Maina; Alando, Moshe; Obiya, Joshua; Kipyegon, Korir; Kikwai, Gilbert; Gumbi, Wilson; Kariuki, Samuel; Call, Douglas R

2017-05-01

Investigators often rely on studies of Escherichia coli to characterize the burden of antibiotic resistance in a clinical or community setting. To determine if prevalence estimates for antibiotic resistance are sensitive to sample handling and interpretive criteria, we collected presumptive E. coli isolates (24 or 95 per stool sample) from a community in an urban informal settlement in Kenya. Isolates were tested for susceptibility to nine antibiotics using agar breakpoint assays and results were analyzed using generalized linear mixed models. We observed a 0.1). Prevalence estimates did not differ for five distinct E. coli colony morphologies on MacConkey agar plates (P>0.2). Successive re-plating of samples for up to five consecutive days had little to no impact on prevalence estimates. Finally, culturing E. coli under different conditions (with 5% CO 2 or micro-aerobic) did not affect estimates of prevalence. For the conditions tested in these experiments, minor modifications in sample processing protocols are unlikely to bias estimates of the prevalence of antibiotic-resistance for fecal E. coli. Copyright © 2017 Elsevier B.V. All rights reserved.
A scenario tree model for the Canadian Notifiable Avian Influenza Surveillance System and its application to estimation of probability of freedom and sample size determination.

Science.gov (United States)

Christensen, Jette; Stryhn, Henrik; Vallières, André; El Allaki, Farouk

2011-05-01

In 2008, Canada designed and implemented the Canadian Notifiable Avian Influenza Surveillance System (CanNAISS) with six surveillance activities in a phased-in approach. CanNAISS was a surveillance system because it had more than one surveillance activity or component in 2008: passive surveillance; pre-slaughter surveillance; and voluntary enhanced notifiable avian influenza surveillance. Our objectives were to give a short overview of two active surveillance components in CanNAISS; describe the CanNAISS scenario tree model and its application to estimation of probability of populations being free of NAI virus infection and sample size determination. Our data from the pre-slaughter surveillance component included diagnostic test results from 6296 serum samples representing 601 commercial chicken and turkey farms collected from 25 August 2008 to 29 January 2009. In addition, we included data from a sub-population of farms with high biosecurity standards: 36,164 samples from 55 farms sampled repeatedly over the 24 months study period from January 2007 to December 2008. All submissions were negative for Notifiable Avian Influenza (NAI) virus infection. We developed the CanNAISS scenario tree model, so that it will estimate the surveillance component sensitivity and the probability of a population being free of NAI at the 0.01 farm-level and 0.3 within-farm-level prevalences. We propose that a general model, such as the CanNAISS scenario tree model, may have a broader application than more detailed models that require disease specific input parameters, such as relative risk estimates. Crown Copyright © 2011. Published by Elsevier B.V. All rights reserved.
Estimation variance bounds of importance sampling simulations in digital communication systems

Science.gov (United States)

Lu, D.; Yao, K.

1991-01-01

In practical applications of importance sampling (IS) simulation, two basic problems are encountered, that of determining the estimation variance and that of evaluating the proper IS parameters needed in the simulations. The authors derive new upper and lower bounds on the estimation variance which are applicable to IS techniques. The upper bound is simple to evaluate and may be minimized by the proper selection of the IS parameter. Thus, lower and upper bounds on the improvement ratio of various IS techniques relative to the direct Monte Carlo simulation are also available. These bounds are shown to be useful and computationally simple to obtain. Based on the proposed technique, one can readily find practical suboptimum IS parameters. Numerical results indicate that these bounding techniques are useful for IS simulations of linear and nonlinear communication systems with intersymbol interference in which bit error rate and IS estimation variances cannot be obtained readily using prior techniques.
Out-of-pocket costs, primary care frequent attendance and sample selection: Estimates from a longitudinal cohort design.

Science.gov (United States)

Pymont, Carly; McNamee, Paul; Butterworth, Peter

2018-03-20

This paper examines the effect of out-of-pocket costs on subsequent frequent attendance in primary care using data from the Personality and Total Health (PATH) Through Life Project, a representative community cohort study from Canberra, Australia. The analysis sample comprised 1197 respondents with two or more GP consultations, and uses survey data linked to administrative health service use (Medicare) data which provides data on the number of consultations and out-of-pocket costs. Respondents identified in the highest decile of GP use in a year were defined as Frequent Attenders (FAs). Logistic regression models that did not account for potential selection effects showed that out-of-pocket costs incurred during respondents' prior two consultations were significantly associated with subsequent FA status. Respondents who incurred higher costs ($15-$35; or >$35) were less likely to become FAs than those who incurred no or low (attenders. Copyright © 2018. Published by Elsevier B.V.
Statistical Methods and Sampling Design for Estimating Step Trends in Surface-Water Quality

Science.gov (United States)

Hirsch, Robert M.

1988-01-01

This paper addresses two components of the problem of estimating the magnitude of step trends in surface water quality. The first is finding a robust estimator appropriate to the data characteristics expected in water-quality time series. The J. L. Hodges-E. L. Lehmann class of estimators is found to be robust in comparison to other nonparametric and moment-based estimators. A seasonal Hodges-Lehmann estimator is developed and shown to have desirable properties. Second, the effectiveness of various sampling strategies is examined using Monte Carlo simulation coupled with application of this estimator. The simulation is based on a large set of total phosphorus data from the Potomac River. To assure that the simulated records have realistic properties, the data are modeled in a multiplicative fashion incorporating flow, hysteresis, seasonal, and noise components. The results demonstrate the importance of balancing the length of the two sampling periods and balancing the number of data values between the two periods.
Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

Science.gov (United States)

Lusiana, Evellin Dewi

2017-12-01

The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.
Assessing the joint effect of population stratification and sample selection in studies of gene-gene (environment interactions

Directory of Open Access Journals (Sweden)

Cheng KF

2012-01-01

Full Text Available Abstract Background It is well known that the presence of population stratification (PS may cause the usual test in case-control studies to produce spurious gene-disease associations. However, the impact of the PS and sample selection (SS is less known. In this paper, we provide a systematic study of the joint effect of PS and SS under a more general risk model containing genetic and environmental factors. We provide simulation results to show the magnitude of the bias and its impact on type I error rate of the usual chi-square test under a wide range of PS level and selection bias. Results The biases to the estimation of main and interaction effect are quantified and then their bounds derived. The estimated bounds can be used to compute conservative p-values for the association test. If the conservative p-value is smaller than the significance level, we can safely claim that the association test is significant regardless of the presence of PS or not, or if there is any selection bias. We also identify conditions for the null bias. The bias depends on the allele frequencies, exposure rates, gene-environment odds ratios and disease risks across subpopulations and the sampling of the cases and controls. Conclusion Our results show that the bias cannot be ignored even the case and control data were matched in ethnicity. A real example is given to illustrate application of the conservative p-value. These results are useful to the genetic association studies of main and interaction effects.
Reliability of different sampling densities for estimating and mapping lichen diversity in biomonitoring studies

International Nuclear Information System (INIS)

Ferretti, M.; Brambilla, E.; Brunialti, G.; Fornasier, F.; Mazzali, C.; Giordani, P.; Nimis, P.L.

2004-01-01

Sampling requirements related to lichen biomonitoring include optimal sampling density for obtaining precise and unbiased estimates of population parameters and maps of known reliability. Two available datasets on a sub-national scale in Italy were used to determine a cost-effective sampling density to be adopted in medium-to-large-scale biomonitoring studies. As expected, the relative error in the mean Lichen Biodiversity (Italian acronym: BL) values and the error associated with the interpolation of BL values for (unmeasured) grid cells increased as the sampling density decreased. However, the increase in size of the error was not linear and even a considerable reduction (up to 50%) in the original sampling effort led to a far smaller increase in errors in the mean estimates (<6%) and in mapping (<18%) as compared with the original sampling densities. A reduction in the sampling effort can result in considerable savings of resources, which can then be used for a more detailed investigation of potentially problematic areas. It is, however, necessary to decide the acceptable level of precision at the design stage of the investigation, so as to select the proper sampling density. - An acceptable level of precision must be decided before determining a sampling design

Poisson sampling - The adjusted and unadjusted estimator revisited

Science.gov (United States)

Michael S. Williams; Hans T. Schreuder; Gerardo H. Terrazas

1998-01-01

The prevailing assumption, that for Poisson sampling the adjusted estimator "Y-hat a" is always substantially more efficient than the unadjusted estimator "Y-hat u" , is shown to be incorrect. Some well known theoretical results are applicable since "Y-hat a" is a ratio-of-means estimator and "Y-hat u" a simple unbiased estimator...
Learning from Past Classification Errors: Exploring Methods for Improving the Performance of a Deep Learning-based Building Extraction Model through Quantitative Analysis of Commission Errors for Optimal Sample Selection

Science.gov (United States)

Swan, B.; Laverdiere, M.; Yang, L.

2017-12-01

In the past five years, deep Convolutional Neural Networks (CNN) have been increasingly favored for computer vision applications due to their high accuracy and ability to generalize well in very complex problems; however, details of how they function and in turn how they may be optimized are still imperfectly understood. In particular, their complex and highly nonlinear network architecture, including many hidden layers and self-learned parameters, as well as their mathematical implications, presents open questions about how to effectively select training data. Without knowledge of the exact ways the model processes and transforms its inputs, intuition alone may fail as a guide to selecting highly relevant training samples. Working in the context of improving a CNN-based building extraction model used for the LandScan USA gridded population dataset, we have approached this problem by developing a semi-supervised, highly-scalable approach to select training samples from a dataset of identified commission errors. Due to the large scope this project, tens of thousands of potential samples could be derived from identified commission errors. To efficiently trim those samples down to a manageable and effective set for creating additional training sample, we statistically summarized the spectral characteristics of areas with rates of commission errors at the image tile level and grouped these tiles using affinity propagation. Highly representative members of each commission error cluster were then used to select sites for training sample creation. The model will be incrementally re-trained with the new training data to allow for an assessment of how the addition of different types of samples affects the model performance, such as precision and recall rates. By using quantitative analysis and data clustering techniques to select highly relevant training samples, we hope to improve model performance in a manner that is resource efficient, both in terms of training process
Remaining lifetime modeling using State-of-Health estimation

Science.gov (United States)

Beganovic, Nejra; Söffker, Dirk

2017-08-01

Technical systems and system's components undergo gradual degradation over time. Continuous degradation occurred in system is reflected in decreased system's reliability and unavoidably lead to a system failure. Therefore, continuous evaluation of State-of-Health (SoH) is inevitable to provide at least predefined lifetime of the system defined by manufacturer, or even better, to extend the lifetime given by manufacturer. However, precondition for lifetime extension is accurate estimation of SoH as well as the estimation and prediction of Remaining Useful Lifetime (RUL). For this purpose, lifetime models describing the relation between system/component degradation and consumed lifetime have to be established. In this contribution modeling and selection of suitable lifetime models from database based on current SoH conditions are discussed. Main contribution of this paper is the development of new modeling strategies capable to describe complex relations between measurable system variables, related system degradation, and RUL. Two approaches with accompanying advantages and disadvantages are introduced and compared. Both approaches are capable to model stochastic aging processes of a system by simultaneous adaption of RUL models to current SoH. The first approach requires a priori knowledge about aging processes in the system and accurate estimation of SoH. An estimation of SoH here is conditioned by tracking actual accumulated damage into the system, so that particular model parameters are defined according to a priori known assumptions about system's aging. Prediction accuracy in this case is highly dependent on accurate estimation of SoH but includes high number of degrees of freedom. The second approach in this contribution does not require a priori knowledge about system's aging as particular model parameters are defined in accordance to multi-objective optimization procedure. Prediction accuracy of this model does not highly depend on estimated SoH. This model
ITOUGH2 sample problems

International Nuclear Information System (INIS)

Finsterle, S.

1997-11-01

This report contains a collection of ITOUGH2 sample problems. It complements the ITOUGH2 User's Guide [Finsterle, 1997a], and the ITOUGH2 Command Reference [Finsterle, 1997b]. ITOUGH2 is a program for parameter estimation, sensitivity analysis, and uncertainty propagation analysis. It is based on the TOUGH2 simulator for non-isothermal multiphase flow in fractured and porous media [Preuss, 1987, 1991a]. The report ITOUGH2 User's Guide [Finsterle, 1997a] describes the inverse modeling framework and provides the theoretical background. The report ITOUGH2 Command Reference [Finsterle, 1997b] contains the syntax of all ITOUGH2 commands. This report describes a variety of sample problems solved by ITOUGH2. Table 1.1 contains a short description of the seven sample problems discussed in this report. The TOUGH2 equation-of-state (EOS) module that needs to be linked to ITOUGH2 is also indicated. Each sample problem focuses on a few selected issues shown in Table 1.2. ITOUGH2 input features and the usage of program options are described. Furthermore, interpretations of selected inverse modeling results are given. Problem 1 is a multipart tutorial, describing basic ITOUGH2 input files for the main ITOUGH2 application modes; no interpretation of results is given. Problem 2 focuses on non-uniqueness, residual analysis, and correlation structure. Problem 3 illustrates a variety of parameter and observation types, and describes parameter selection strategies. Problem 4 compares the performance of minimization algorithms and discusses model identification. Problem 5 explains how to set up a combined inversion of steady-state and transient data. Problem 6 provides a detailed residual and error analysis. Finally, Problem 7 illustrates how the estimation of model-related parameters may help compensate for errors in that model
Integration of sampling based battery state of health estimation method in electric vehicles

International Nuclear Information System (INIS)

Ozkurt, Celil; Camci, Fatih; Atamuradov, Vepa; Odorry, Christopher

2016-01-01

Highlights: • Presentation of a prototype system with full charge discharge cycling capability. • Presentation of SoH estimation results for systems degraded in the lab. • Discussion of integration alternatives of the presented method in EVs. • Simulation model based on presented SoH estimation for a real EV battery system. • Optimization of number of battery cells to be selected for SoH test. - Abstract: Battery cost is one of the crucial parameters affecting high deployment of Electric Vehicles (EVs) negatively. Accurate State of Health (SoH) estimation plays an important role in reducing the total ownership cost, availability, and safety of the battery avoiding early disposal of the batteries and decreasing unexpected failures. A circuit design for SoH estimation in a battery system that bases on selected battery cells and its integration to EVs are presented in this paper. A prototype microcontroller has been developed and used for accelerated aging tests for a battery system. The data collected in the lab tests have been utilized to simulate a real EV battery system. Results of accelerated aging tests and simulation have been presented in the paper. The paper also discusses identification of the best number of battery cells to be selected for SoH estimation test. In addition, different application options of the presented approach for EV batteries have been discussed in the paper.
Estimation of Nonlinear Dynamic Panel Data Models with Individual Effects

Directory of Open Access Journals (Sweden)

Yi Hu

2014-01-01

Full Text Available This paper suggests a generalized method of moments (GMM based estimation for dynamic panel data models with individual specific fixed effects and threshold effects simultaneously. We extend Hansen’s (Hansen, 1999 original setup to models including endogenous regressors, specifically, lagged dependent variables. To address the problem of endogeneity of these nonlinear dynamic panel data models, we prove that the orthogonality conditions proposed by Arellano and Bond (1991 are valid. The threshold and slope parameters are estimated by GMM, and asymptotic distribution of the slope parameters is derived. Finite sample performance of the estimation is investigated through Monte Carlo simulations. It shows that the threshold and slope parameter can be estimated accurately and also the finite sample distribution of slope parameters is well approximated by the asymptotic distribution.
A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

Science.gov (United States)

Karabatsos, George

2017-02-01

Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected
VSRR - Quarterly provisional estimates for selected birth indicators

Data.gov (United States)

U.S. Department of Health & Human Services — Provisional estimates of selected reproductive indicators. Estimates are presented for: general fertility rates, age-specific birth rates, total and low risk...
Diversified models for portfolio selection based on uncertain semivariance

Science.gov (United States)

Chen, Lin; Peng, Jin; Zhang, Bo; Rosyida, Isnaini

2017-02-01

Since the financial markets are complex, sometimes the future security returns are represented mainly based on experts' estimations due to lack of historical data. This paper proposes a semivariance method for diversified portfolio selection, in which the security returns are given subjective to experts' estimations and depicted as uncertain variables. In the paper, three properties of the semivariance of uncertain variables are verified. Based on the concept of semivariance of uncertain variables, two types of mean-semivariance diversified models for uncertain portfolio selection are proposed. Since the models are complex, a hybrid intelligent algorithm which is based on 99-method and genetic algorithm is designed to solve the models. In this hybrid intelligent algorithm, 99-method is applied to compute the expected value and semivariance of uncertain variables, and genetic algorithm is employed to seek the best allocation plan for portfolio selection. At last, several numerical examples are presented to illustrate the modelling idea and the effectiveness of the algorithm.
An Improved Estimation of Regional Fractional Woody/Herbaceous Cover Using Combined Satellite Data and High-Quality Training Samples

Directory of Open Access Journals (Sweden)

Xu Liu

2017-01-01

Full Text Available Mapping vegetation cover is critical for understanding and monitoring ecosystem functions in semi-arid biomes. As existing estimates tend to underestimate the woody cover in areas with dry deciduous shrubland and woodland, we present an approach to improve the regional estimation of woody and herbaceous fractional cover in the East Asia steppe. This developed approach uses Random Forest models by combining multiple remote sensing data—training samples derived from high-resolution image in a tailored spatial sampling and model inputs composed of specific metrics from MODIS sensor and ancillary variables including topographic, bioclimatic, and land surface information. We emphasize that effective spatial sampling, high-quality classification, and adequate geospatial information are important prerequisites of establishing appropriate model inputs and achieving high-quality training samples. This study suggests that the optimal models improve estimation accuracy (NMSE 0.47 for woody and 0.64 for herbaceous plants and show a consistent agreement with field observations. Compared with existing woody estimate product, the proposed woody cover estimation can delineate regions with subshrubs and shrubs, showing an improved capability of capturing spatialized detail of vegetation signals. This approach can be applicable over sizable semi-arid areas such as temperate steppes, savannas, and prairies.
Detecting spatial structures in throughfall data: The effect of extent, sample size, sampling design, and variogram estimation method

Science.gov (United States)

Voss, Sebastian; Zimmermann, Beate; Zimmermann, Alexander

2016-09-01

In the last decades, an increasing number of studies analyzed spatial patterns in throughfall by means of variograms. The estimation of the variogram from sample data requires an appropriate sampling scheme: most importantly, a large sample and a layout of sampling locations that often has to serve both variogram estimation and geostatistical prediction. While some recommendations on these aspects exist, they focus on Gaussian data and high ratios of the variogram range to the extent of the study area. However, many hydrological data, and throughfall data in particular, do not follow a Gaussian distribution. In this study, we examined the effect of extent, sample size, sampling design, and calculation method on variogram estimation of throughfall data. For our investigation, we first generated non-Gaussian random fields based on throughfall data with large outliers. Subsequently, we sampled the fields with three extents (plots with edge lengths of 25 m, 50 m, and 100 m), four common sampling designs (two grid-based layouts, transect and random sampling) and five sample sizes (50, 100, 150, 200, 400). We then estimated the variogram parameters by method-of-moments (non-robust and robust estimators) and residual maximum likelihood. Our key findings are threefold. First, the choice of the extent has a substantial influence on the estimation of the variogram. A comparatively small ratio of the extent to the correlation length is beneficial for variogram estimation. Second, a combination of a minimum sample size of 150, a design that ensures the sampling of small distances and variogram estimation by residual maximum likelihood offers a good compromise between accuracy and efficiency. Third, studies relying on method-of-moments based variogram estimation may have to employ at least 200 sampling points for reliable variogram estimates. These suggested sample sizes exceed the number recommended by studies dealing with Gaussian data by up to 100 %. Given that most previous
Procedure manual for the estimation of average indoor radon-daughter concentrations using the radon grab-sampling method

International Nuclear Information System (INIS)

George, J.L.

1986-04-01

The US Department of Energy (DOE) Office of Remedial Action and Waste Technology established the Technical Measurements Center to provide standardization, calibration, comparability, verification of data, quality assurance, and cost-effectiveness for the measurement requirements of DOE remedial action programs. One of the remedial-action measurement needs is the estimation of average indoor radon-daughter concentration. One method for accomplishing such estimations in support of DOE remedial action programs is the radon grab-sampling method. This manual describes procedures for radon grab sampling, with the application specifically directed to the estimation of average indoor radon-daughter concentration (RDC) in highly ventilated structures. This particular application of the measurement method is for cases where RDC estimates derived from long-term integrated measurements under occupied conditions are below the standard and where the structure being evaluated is considered to be highly ventilated. The radon grab-sampling method requires that sampling be conducted under standard maximized conditions. Briefly, the procedure for radon grab sampling involves the following steps: selection of sampling and counting equipment; sample acquisition and processing, including data reduction; calibration of equipment, including provisions to correct for pressure effects when sampling at various elevations; and incorporation of quality-control and assurance measures. This manual describes each of the above steps in detail and presents an example of a step-by-step radon grab-sampling procedure using a scintillation cell
Lifetime ultraviolet exposure estimates for selected population groups in south-east Queensland

International Nuclear Information System (INIS)

Parisi, A.V.; Meldrum, L.R.; Wong, J.C.F.; Fleming, R.A.; Aitken, J.

1999-01-01

The lifetime erythemal UV exposures received by selected population groups in south-east Queensland from birth up to an age of 55 years have been quantitatively estimated. A representative sample of teachers and other school workers received (64±22)x10 5 J m -2 to the neck compared with (4.1±1.4)x10 5 Jm -2 to the upper leg. A sample of indoor workers (bank officers, solicitors and psychologists) received approximately 2% less and a sample of outdoor workers (carpenters, tilers, electricians and labourers) received approximately 10% more to the neck than the school workers. These differences in erythemal UV exposures may influence the risk of non-melanoma skin cancer. (author)
Power Spectrum Estimation of Randomly Sampled Signals

DEFF Research Database (Denmark)

Velte, C. M.; Buchhave, P.; K. George, W.

algorithms; sample and-hold and the direct spectral estimator without residence time weighting. The computer generated signal is a Poisson process with a sample rate proportional to velocity magnitude that consist of well-defined frequency content, which makes bias easy to spot. The idea...
Transfer function design based on user selected samples for intuitive multivariate volume exploration

KAUST Repository

Zhou, Liang

2013-02-01

Multivariate volumetric datasets are important to both science and medicine. We propose a transfer function (TF) design approach based on user selected samples in the spatial domain to make multivariate volumetric data visualization more accessible for domain users. Specifically, the user starts the visualization by probing features of interest on slices and the data values are instantly queried by user selection. The queried sample values are then used to automatically and robustly generate high dimensional transfer functions (HDTFs) via kernel density estimation (KDE). Alternatively, 2D Gaussian TFs can be automatically generated in the dimensionality reduced space using these samples. With the extracted features rendered in the volume rendering view, the user can further refine these features using segmentation brushes. Interactivity is achieved in our system and different views are tightly linked. Use cases show that our system has been successfully applied for simulation and complicated seismic data sets. © 2013 IEEE.
Transfer function design based on user selected samples for intuitive multivariate volume exploration

KAUST Repository

Zhou, Liang; Hansen, Charles

2013-01-01

Multivariate volumetric datasets are important to both science and medicine. We propose a transfer function (TF) design approach based on user selected samples in the spatial domain to make multivariate volumetric data visualization more accessible for domain users. Specifically, the user starts the visualization by probing features of interest on slices and the data values are instantly queried by user selection. The queried sample values are then used to automatically and robustly generate high dimensional transfer functions (HDTFs) via kernel density estimation (KDE). Alternatively, 2D Gaussian TFs can be automatically generated in the dimensionality reduced space using these samples. With the extracted features rendered in the volume rendering view, the user can further refine these features using segmentation brushes. Interactivity is achieved in our system and different views are tightly linked. Use cases show that our system has been successfully applied for simulation and complicated seismic data sets. © 2013 IEEE.
Efficient estimation for ergodic diffusions sampled at high frequency

DEFF Research Database (Denmark)

Sørensen, Michael

A general theory of efficient estimation for ergodic diffusions sampled at high fre- quency is presented. High frequency sampling is now possible in many applications, in particular in finance. The theory is formulated in term of approximate martingale estimating functions and covers a large class...
The Impact of Selection, Gene Conversion, and Biased Sampling on the Assessment of Microbial Demography.

Science.gov (United States)

Lapierre, Marguerite; Blin, Camille; Lambert, Amaury; Achaz, Guillaume; Rocha, Eduardo P C

2016-07-01

Recent studies have linked demographic changes and epidemiological patterns in bacterial populations using coalescent-based approaches. We identified 26 studies using skyline plots and found that 21 inferred overall population expansion. This surprising result led us to analyze the impact of natural selection, recombination (gene conversion), and sampling biases on demographic inference using skyline plots and site frequency spectra (SFS). Forward simulations based on biologically relevant parameters from Escherichia coli populations showed that theoretical arguments on the detrimental impact of recombination and especially natural selection on the reconstructed genealogies cannot be ignored in practice. In fact, both processes systematically lead to spurious interpretations of population expansion in skyline plots (and in SFS for selection). Weak purifying selection, and especially positive selection, had important effects on skyline plots, showing patterns akin to those of population expansions. State-of-the-art techniques to remove recombination further amplified these biases. We simulated three common sampling biases in microbiological research: uniform, clustered, and mixed sampling. Alone, or together with recombination and selection, they further mislead demographic inferences producing almost any possible skyline shape or SFS. Interestingly, sampling sub-populations also affected skyline plots and SFS, because the coalescent rates of populations and their sub-populations had different distributions. This study suggests that extreme caution is needed to infer demographic changes solely based on reconstructed genealogies. We suggest that the development of novel sampling strategies and the joint analyzes of diverse population genetic methods are strictly necessary to estimate demographic changes in populations where selection, recombination, and biased sampling are present. © The Author 2016. Published by Oxford University Press on behalf of the Society for
Estimation of sample size and testing power (part 5).

Science.gov (United States)

Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

2012-02-01

Estimation of sample size and testing power is an important component of research design. This article introduced methods for sample size and testing power estimation of difference test for quantitative and qualitative data with the single-group design, the paired design or the crossover design. To be specific, this article introduced formulas for sample size and testing power estimation of difference test for quantitative and qualitative data with the above three designs, the realization based on the formulas and the POWER procedure of SAS software and elaborated it with examples, which will benefit researchers for implementing the repetition principle.
Sensitivity of landscape resistance estimates based on point selection functions to scale and behavioral state: Pumas as a case study

Science.gov (United States)

Katherine A. Zeller; Kevin McGarigal; Paul Beier; Samuel A. Cushman; T. Winston Vickers; Walter M. Boyce

2014-01-01

Estimating landscape resistance to animal movement is the foundation for connectivity modeling, and resource selection functions based on point data are commonly used to empirically estimate resistance. In this study, we used GPS data points acquired at 5-min intervals from radiocollared pumas in southern California to model context-dependent point selection...

The effects of dominance, regular inbreeding and sampling design on Q(ST), an estimator of population differentiation for quantitative traits.

Science.gov (United States)

Goudet, Jérôme; Büchi, Lucie

2006-02-01

To test whether quantitative traits are under directional or homogenizing selection, it is common practice to compare population differentiation estimates at molecular markers (F(ST)) and quantitative traits (Q(ST)). If the trait is neutral and its determinism is additive, then theory predicts that Q(ST) = F(ST), while Q(ST) > F(ST) is predicted under directional selection for different local optima, and Q(ST) sampling designs and find that it is always best to sample many populations (>20) with few families (five) rather than few populations with many families. Provided that estimates of Q(ST) are derived from individuals originating from many populations, we conclude that the pattern Q(ST) > F(ST), and hence the inference of directional selection for different local optima, is robust to the effect of nonadditive gene actions.
Using local multiplicity to improve effect estimation from a hypothesis-generating pharmacogenetics study.

Science.gov (United States)

Zou, W; Ouyang, H

2016-02-01

We propose a multiple estimation adjustment (MEA) method to correct effect overestimation due to selection bias from a hypothesis-generating study (HGS) in pharmacogenetics. MEA uses a hierarchical Bayesian approach to model individual effect estimates from maximal likelihood estimation (MLE) in a region jointly and shrinks them toward the regional effect. Unlike many methods that model a fixed selection scheme, MEA capitalizes on local multiplicity independent of selection. We compared mean square errors (MSEs) in simulated HGSs from naive MLE, MEA and a conditional likelihood adjustment (CLA) method that model threshold selection bias. We observed that MEA effectively reduced MSE from MLE on null effects with or without selection, and had a clear advantage over CLA on extreme MLE estimates from null effects under lenient threshold selection in small samples, which are common among 'top' associations from a pharmacogenetics HGS.
Linear models for airborne-laser-scanning-based operational forest inventory with small field sample size and highly correlated LiDAR data

Science.gov (United States)

Junttila, Virpi; Kauranne, Tuomo; Finley, Andrew O.; Bradford, John B.

2015-01-01

Modern operational forest inventory often uses remotely sensed data that cover the whole inventory area to produce spatially explicit estimates of forest properties through statistical models. The data obtained by airborne light detection and ranging (LiDAR) correlate well with many forest inventory variables, such as the tree height, the timber volume, and the biomass. To construct an accurate model over thousands of hectares, LiDAR data must be supplemented with several hundred field sample measurements of forest inventory variables. This can be costly and time consuming. Different LiDAR-data-based and spatial-data-based sampling designs can reduce the number of field sample plots needed. However, problems arising from the features of the LiDAR data, such as a large number of predictors compared with the sample size (overfitting) or a strong correlation among predictors (multicollinearity), may decrease the accuracy and precision of the estimates and predictions. To overcome these problems, a Bayesian linear model with the singular value decomposition of predictors, combined with regularization, is proposed. The model performance in predicting different forest inventory variables is verified in ten inventory areas from two continents, where the number of field sample plots is reduced using different sampling designs. The results show that, with an appropriate field plot selection strategy and the proposed linear model, the total relative error of the predicted forest inventory variables is only 5%–15% larger using 50 field sample plots than the error of a linear model estimated with several hundred field sample plots when we sum up the error due to both the model noise variance and the model’s lack of fit.
Improving the Network Scale-Up Estimator: Incorporating Means of Sums, Recursive Back Estimation, and Sampling Weights.

Directory of Open Access Journals (Sweden)

Patrick Habecker

Full Text Available Researchers interested in studying populations that are difficult to reach through traditional survey methods can now draw on a range of methods to access these populations. Yet many of these methods are more expensive and difficult to implement than studies using conventional sampling frames and trusted sampling methods. The network scale-up method (NSUM provides a middle ground for researchers who wish to estimate the size of a hidden population, but lack the resources to conduct a more specialized hidden population study. Through this method it is possible to generate population estimates for a wide variety of groups that are perhaps unwilling to self-identify as such (for example, users of illegal drugs or other stigmatized populations via traditional survey tools such as telephone or mail surveys--by asking a representative sample to estimate the number of people they know who are members of such a "hidden" subpopulation. The original estimator is formulated to minimize the weight a single scaling variable can exert upon the estimates. We argue that this introduces hidden and difficult to predict biases, and instead propose a series of methodological advances on the traditional scale-up estimation procedure, including a new estimator. Additionally, we formalize the incorporation of sample weights into the network scale-up estimation process, and propose a recursive process of back estimation "trimming" to identify and remove poorly performing predictors from the estimation process. To demonstrate these suggestions we use data from a network scale-up mail survey conducted in Nebraska during 2014. We find that using the new estimator and recursive trimming process provides more accurate estimates, especially when used in conjunction with sampling weights.
Inventory implications of using sampling variances in estimation of growth model coefficients

Science.gov (United States)

Albert R. Stage; William R. Wykoff

2000-01-01

Variables based on stand densities or stocking have sampling errors that depend on the relation of tree size to plot size and on the spatial structure of the population, ignoring the sampling errors of such variables, which include most measures of competition used in both distance-dependent and distance-independent growth models, can bias the predictions obtained from...
Sample selection based on kernel-subclustering for the signal reconstruction of multifunctional sensors

International Nuclear Information System (INIS)

Wang, Xin; Wei, Guo; Sun, Jinwei

2013-01-01

The signal reconstruction methods based on inverse modeling for the signal reconstruction of multifunctional sensors have been widely studied in recent years. To improve the accuracy, the reconstruction methods have become more and more complicated because of the increase in the model parameters and sample points. However, there is another factor that affects the reconstruction accuracy, the position of the sample points, which has not been studied. A reasonable selection of the sample points could improve the signal reconstruction quality in at least two ways: improved accuracy with the same number of sample points or the same accuracy obtained with a smaller number of sample points. Both ways are valuable for improving the accuracy and decreasing the workload, especially for large batches of multifunctional sensors. In this paper, we propose a sample selection method based on kernel-subclustering distill groupings of the sample data and produce the representation of the data set for inverse modeling. The method calculates the distance between two data points based on the kernel-induced distance instead of the conventional distance. The kernel function is a generalization of the distance metric by mapping the data that are non-separable in the original space into homogeneous groups in the high-dimensional space. The method obtained the best results compared with the other three methods in the simulation. (paper)
Wind scatterometry with improved ambiguity selection and rain modeling

Science.gov (United States)

Draper, David Willis

Although generally accurate, the quality of SeaWinds on QuikSCAT scatterometer ocean vector winds is compromised by certain natural phenomena and retrieval algorithm limitations. This dissertation addresses three main contributors to scatterometer estimate error: poor ambiguity selection, estimate uncertainty at low wind speeds, and rain corruption. A quality assurance (QA) analysis performed on SeaWinds data suggests that about 5% of SeaWinds data contain ambiguity selection errors and that scatterometer estimation error is correlated with low wind speeds and rain events. Ambiguity selection errors are partly due to the "nudging" step (initialization from outside data). A sophisticated new non-nudging ambiguity selection approach produces generally more consistent wind than the nudging method in moderate wind conditions. The non-nudging method selects 93% of the same ambiguities as the nudged data, validating both techniques, and indicating that ambiguity selection can be accomplished without nudging. Variability at low wind speeds is analyzed using tower-mounted scatterometer data. According to theory, below a threshold wind speed, the wind fails to generate the surface roughness necessary for wind measurement. A simple analysis suggests the existence of the threshold in much of the tower-mounted scatterometer data. However, the backscatter does not "go to zero" beneath the threshold in an uncontrolled environment as theory suggests, but rather has a mean drop and higher variability below the threshold. Rain is the largest weather-related contributor to scatterometer error, affecting approximately 4% to 10% of SeaWinds data. A simple model formed via comparison of co-located TRMM PR and SeaWinds measurements characterizes the average effect of rain on SeaWinds backscatter. The model is generally accurate to within 3 dB over the tropics. The rain/wind backscatter model is used to simultaneously retrieve wind and rain from SeaWinds measurements. The simultaneous
Estimation of creatinine in Urine sample by Jaffe's method

International Nuclear Information System (INIS)

Wankhede, Sonal; Arunkumar, Suja; Sawant, Pramilla D.; Rao, B.B.

2012-01-01

In-vitro bioassay monitoring is based on the determination of activity concentrations in biological samples excreted from the body and is most suitable for alpha and beta emitters. A truly representative bioassay sample is the one having all the voids collected during a 24-h period however, this being technically difficult, overnight urine samples collected by the workers are analyzed. These overnight urine samples are collected for 10-16 h, however in the absence of any specific information, 12 h duration is assumed and the observed results are then corrected accordingly obtain the daily excretion rate. To reduce the uncertainty due to unknown duration of sample collection, IAEA has recommended two methods viz., measurement of specific gravity and creatinine excretion rate in urine sample. Creatinine is a final metabolic product creatinine phosphate in the body and is excreted at a steady rate for people with normally functioning kidneys. It is, therefore, often used as a normalization factor for estimation of duration of sample collection. The present study reports the chemical procedure standardized and its application for the estimation of creatinine in urine samples collected from occupational workers. Chemical procedure for estimation of creatinine in bioassay samples was standardized and applied successfully for its estimation in bioassay samples collected from the workers. The creatinine excretion rate observed for these workers is lower than observed in literature. Further, work is in progress to generate a data bank of creatinine excretion rate for most of the workers and also to study the variability in creatinine coefficient for the same individual based on the analysis of samples collected for different duration
Estimation of functional failure probability of passive systems based on adaptive importance sampling method

International Nuclear Information System (INIS)

Wang Baosheng; Wang Dongqing; Zhang Jianmin; Jiang Jing

2012-01-01

In order to estimate the functional failure probability of passive systems, an innovative adaptive importance sampling methodology is presented. In the proposed methodology, information of variables is extracted with some pre-sampling of points in the failure region. An important sampling density is then constructed from the sample distribution in the failure region. Taking the AP1000 passive residual heat removal system as an example, the uncertainties related to the model of a passive system and the numerical values of its input parameters are considered in this paper. And then the probability of functional failure is estimated with the combination of the response surface method and adaptive importance sampling method. The numerical results demonstrate the high computed efficiency and excellent computed accuracy of the methodology compared with traditional probability analysis methods. (authors)
An unbiased estimator of the variance of simple random sampling using mixed random-systematic sampling

OpenAIRE

Padilla, Alberto

2009-01-01

Systematic sampling is a commonly used technique due to its simplicity and ease of implementation. The drawback of this simplicity is that it is not possible to estimate the design variance without bias. There are several ways to circumvent this problem. One method is to suppose that the variable of interest has a random order in the population, so the sample variance of simple random sampling without replacement is used. By means of a mixed random - systematic sample, an unbiased estimator o...
CONSISTENCY UNDER SAMPLING OF EXPONENTIAL RANDOM GRAPH MODELS.

Science.gov (United States)

Shalizi, Cosma Rohilla; Rinaldo, Alessandro

2013-04-01

The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling , or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.
Comparison of Four Estimators under sampling without Replacement

African Journals Online (AJOL)

The results were obtained using a program written in Microsoft Visual C++ programming language. It was observed that the two-stage sampling under unequal probabilities without replacement is always better than the other three estimators considered. Keywords: Unequal probability sampling, two-stage sampling, ...
Maximum profile likelihood estimation of differential equation parameters through model based smoothing state estimates.

Science.gov (United States)

Campbell, D A; Chkrebtii, O

2013-12-01

Statistical inference for biochemical models often faces a variety of characteristic challenges. In this paper we examine state and parameter estimation for the JAK-STAT intracellular signalling mechanism, which exemplifies the implementation intricacies common in many biochemical inference problems. We introduce an extension to the Generalized Smoothing approach for estimating delay differential equation models, addressing selection of complexity parameters, choice of the basis system, and appropriate optimization strategies. Motivated by the JAK-STAT system, we further extend the generalized smoothing approach to consider a nonlinear observation process with additional unknown parameters, and highlight how the approach handles unobserved states and unevenly spaced observations. The methodology developed is generally applicable to problems of estimation for differential equation models with delays, unobserved states, nonlinear observation processes, and partially observed histories. Crown Copyright © 2013. Published by Elsevier Inc. All rights reserved.
40 CFR 205.171-3 - Test motorcycle sample selection.

Science.gov (United States)

2010-07-01

... 40 Protection of Environment 24 2010-07-01 2010-07-01 false Test motorcycle sample selection. 205... ABATEMENT PROGRAMS TRANSPORTATION EQUIPMENT NOISE EMISSION CONTROLS Motorcycle Exhaust Systems § 205.171-3 Test motorcycle sample selection. A test motorcycle to be used for selective enforcement audit testing...
Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range.

Science.gov (United States)

Wan, Xiang; Wang, Wenqian; Liu, Jiming; Tong, Tiejun

2014-12-19

In systematic reviews and meta-analysis, researchers often pool the results of the sample mean and standard deviation from a set of similar clinical trials. A number of the trials, however, reported the study using the median, the minimum and maximum values, and/or the first and third quartiles. Hence, in order to combine results, one may have to estimate the sample mean and standard deviation for such trials. In this paper, we propose to improve the existing literature in several directions. First, we show that the sample standard deviation estimation in Hozo et al.'s method (BMC Med Res Methodol 5:13, 2005) has some serious limitations and is always less satisfactory in practice. Inspired by this, we propose a new estimation method by incorporating the sample size. Second, we systematically study the sample mean and standard deviation estimation problem under several other interesting settings where the interquartile range is also available for the trials. We demonstrate the performance of the proposed methods through simulation studies for the three frequently encountered scenarios, respectively. For the first two scenarios, our method greatly improves existing methods and provides a nearly unbiased estimate of the true sample standard deviation for normal data and a slightly biased estimate for skewed data. For the third scenario, our method still performs very well for both normal data and skewed data. Furthermore, we compare the estimators of the sample mean and standard deviation under all three scenarios and present some suggestions on which scenario is preferred in real-world applications. In this paper, we discuss different approximation methods in the estimation of the sample mean and standard deviation and propose some new estimation methods to improve the existing literature. We conclude our work with a summary table (an Excel spread sheet including all formulas) that serves as a comprehensive guidance for performing meta-analysis in different
The use of importance sampling in a trial assessment to obtain converged estimates of radiological risk

International Nuclear Information System (INIS)

Johnson, K.; Lucas, R.

1986-12-01

In developing a methodology for assessing potential sites for the disposal of radioactive wastes, the Department of the Environment has conducted a series of trial assessment exercises. In order to produce converged estimates of radiological risk using the SYVAC A/C simulation system an efficient sampling procedure is required. Previous work has demonstrated that importance sampling can substantially increase sampling efficiency. This study used importance sampling to produce converged estimates of risk for the first DoE trial assessment. Four major nuclide chains were analysed. In each case importance sampling produced converged risk estimates with between 10 and 170 times fewer runs of the SYVAC A/C model. This increase in sampling efficiency can reduce the total elapsed time required to obtain a converged estimate of risk from one nuclide chain by a factor of 20. The results of this study suggests that the use of importance sampling could reduce the elapsed time required to perform a risk assessment of a potential site by a factor of ten. (author)
Selection bias in population-based cancer case-control studies due to incomplete sampling frame coverage.

Science.gov (United States)

Walsh, Matthew C; Trentham-Dietz, Amy; Gangnon, Ronald E; Nieto, F Javier; Newcomb, Polly A; Palta, Mari

2012-06-01

Increasing numbers of individuals are choosing to opt out of population-based sampling frames due to privacy concerns. This is especially a problem in the selection of controls for case-control studies, as the cases often arise from relatively complete population-based registries, whereas control selection requires a sampling frame. If opt out is also related to risk factors, bias can arise. We linked breast cancer cases who reported having a valid driver's license from the 2004-2008 Wisconsin women's health study (N = 2,988) with a master list of licensed drivers from the Wisconsin Department of Transportation (WDOT). This master list excludes Wisconsin drivers that requested their information not be sold by the state. Multivariate-adjusted selection probability ratios (SPR) were calculated to estimate potential bias when using this driver's license sampling frame to select controls. A total of 962 cases (32%) had opted out of the WDOT sampling frame. Cases age <40 (SPR = 0.90), income either unreported (SPR = 0.89) or greater than $50,000 (SPR = 0.94), lower parity (SPR = 0.96 per one-child decrease), and hormone use (SPR = 0.93) were significantly less likely to be covered by the WDOT sampling frame (α = 0.05 level). Our results indicate the potential for selection bias due to differential opt out between various demographic and behavioral subgroups of controls. As selection bias may differ by exposure and study base, the assessment of potential bias needs to be ongoing. SPRs can be used to predict the direction of bias when cases and controls stem from different sampling frames in population-based case-control studies.
Estimating skin blood saturation by selecting a subset of hyperspectral imaging data

Science.gov (United States)

Ewerlöf, Maria; Salerud, E. Göran; Strömberg, Tomas; Larsson, Marcus

2015-03-01

Skin blood haemoglobin saturation (?b) can be estimated with hyperspectral imaging using the wavelength (λ) range of 450-700 nm where haemoglobin absorption displays distinct spectral characteristics. Depending on the image size and photon transport algorithm, computations may be demanding. Therefore, this work aims to evaluate subsets with a reduced number of wavelengths for ?b estimation. White Monte Carlo simulations are performed using a two-layered tissue model with discrete values for epidermal thickness (?epi) and the reduced scattering coefficient (μ's ), mimicking an imaging setup. A detected intensity look-up table is calculated for a range of model parameter values relevant to human skin, adding absorption effects in the post-processing. Skin model parameters, including absorbers, are; μ's (λ), ?epi, haemoglobin saturation (?b), tissue fraction blood (?b) and tissue fraction melanin (?mel). The skin model paired with the look-up table allow spectra to be calculated swiftly. Three inverse models with varying number of free parameters are evaluated: A(?b, ?b), B(?b, ?b, ?mel) and C(all parameters free). Fourteen wavelength candidates are selected by analysing the maximal spectral sensitivity to ?b and minimizing the sensitivity to ?b. All possible combinations of these candidates with three, four and 14 wavelengths, as well as the full spectral range, are evaluated for estimating ?b for 1000 randomly generated evaluation spectra. The results show that the simplified models A and B estimated ?b accurately using four wavelengths (mean error 2.2% for model B). If the number of wavelengths increased, the model complexity needed to be increased to avoid poor estimations.
Efficient estimation of semiparametric copula models for bivariate survival data

KAUST Repository

Cheng, Guang

2014-01-01

A semiparametric copula model for bivariate survival data is characterized by a parametric copula model of dependence and nonparametric models of two marginal survival functions. Efficient estimation for the semiparametric copula model has been recently studied for the complete data case. When the survival data are censored, semiparametric efficient estimation has only been considered for some specific copula models such as the Gaussian copulas. In this paper, we obtain the semiparametric efficiency bound and efficient estimation for general semiparametric copula models for possibly censored data. We construct an approximate maximum likelihood estimator by approximating the log baseline hazard functions with spline functions. We show that our estimates of the copula dependence parameter and the survival functions are asymptotically normal and efficient. Simple consistent covariance estimators are also provided. Numerical results are used to illustrate the finite sample performance of the proposed estimators. © 2013 Elsevier Inc.
Estimation of uranium in bioassay samples of occupational workers by laser fluorimetry

International Nuclear Information System (INIS)

Suja, A.; Prabhu, S.P.; Sawant, P.D.; Sarkar, P.K.; Tiwari, A.K.; Sharma, R.

2012-01-01

A newly established uranium processing facility has been commissioned at BARC, Trombay. Monitoring of occupational workers is essential to assess intake of uranium in this facility. A group of 21 workers was selected for bioassay monitoring to assess the existing urinary excretion levels of uranium before the commencement of actual work. Bioassay samples collected from these workers were analyzed by ion-exchange technique followed by laser fluorimetry. Standard addition method was followed for estimation of uranium concentration in the samples. The minimum detectable activity by this technique is about 0.2 ng. The range of uranium observed in these samples varies from 19 to 132 ng/L. Few of these samples were also analyzed by fission track analysis technique and the results were found to be comparable to those obtained by laser fluorimetry. The urinary excretion rate observed for the individual can be regarded as a 'personal baseline' and will be treated as the existing level of uranium in urine for these workers at the facility. (author)

Multivariate modeling of complications with data driven variable selection: Guarding against overfitting and effects of data set size

International Nuclear Information System (INIS)

Schaaf, Arjen van der; Xu Chengjian; Luijk, Peter van; Veld, Aart A. van’t; Langendijk, Johannes A.; Schilstra, Cornelis

2012-01-01

Purpose: Multivariate modeling of complications after radiotherapy is frequently used in conjunction with data driven variable selection. This study quantifies the risk of overfitting in a data driven modeling method using bootstrapping for data with typical clinical characteristics, and estimates the minimum amount of data needed to obtain models with relatively high predictive power. Materials and methods: To facilitate repeated modeling and cross-validation with independent datasets for the assessment of true predictive power, a method was developed to generate simulated data with statistical properties similar to real clinical data sets. Characteristics of three clinical data sets from radiotherapy treatment of head and neck cancer patients were used to simulate data with set sizes between 50 and 1000 patients. A logistic regression method using bootstrapping and forward variable selection was used for complication modeling, resulting for each simulated data set in a selected number of variables and an estimated predictive power. The true optimal number of variables and true predictive power were calculated using cross-validation with very large independent data sets. Results: For all simulated data set sizes the number of variables selected by the bootstrapping method was on average close to the true optimal number of variables, but showed considerable spread. Bootstrapping is more accurate in selecting the optimal number of variables than the AIC and BIC alternatives, but this did not translate into a significant difference of the true predictive power. The true predictive power asymptotically converged toward a maximum predictive power for large data sets, and the estimated predictive power converged toward the true predictive power. More than half of the potential predictive power is gained after approximately 200 samples. Our simulations demonstrated severe overfitting (a predicative power lower than that of predicting 50% probability) in a number of small
Efficient Monte Carlo Estimation of the Expected Value of Sample Information Using Moment Matching.

Science.gov (United States)

Heath, Anna; Manolopoulou, Ioanna; Baio, Gianluca

2018-02-01

The Expected Value of Sample Information (EVSI) is used to calculate the economic value of a new research strategy. Although this value would be important to both researchers and funders, there are very few practical applications of the EVSI. This is due to computational difficulties associated with calculating the EVSI in practical health economic models using nested simulations. We present an approximation method for the EVSI that is framed in a Bayesian setting and is based on estimating the distribution of the posterior mean of the incremental net benefit across all possible future samples, known as the distribution of the preposterior mean. Specifically, this distribution is estimated using moment matching coupled with simulations that are available for probabilistic sensitivity analysis, which is typically mandatory in health economic evaluations. This novel approximation method is applied to a health economic model that has previously been used to assess the performance of other EVSI estimators and accurately estimates the EVSI. The computational time for this method is competitive with other methods. We have developed a new calculation method for the EVSI which is computationally efficient and accurate. This novel method relies on some additional simulation so can be expensive in models with a large computational cost.
Estimation of stature from sternum - Exploring the quadratic models.

Science.gov (United States)

Saraf, Ashish; Kanchan, Tanuj; Krishan, Kewal; Ateriya, Navneet; Setia, Puneet

2018-04-14

Identification of the dead is significant in examination of unknown, decomposed and mutilated human remains. Establishing the biological profile is the central issue in such a scenario, and stature estimation remains one of the important criteria in this regard. The present study was undertaken to estimate stature from different parts of the sternum. A sample of 100 sterna was obtained from individuals during the medicolegal autopsies. Length of the deceased and various measurements of the sternum were measured. Student's t-test was performed to find the sex differences in stature and sternal measurements included in the study. Correlation between stature and sternal measurements were analysed using Karl Pearson's correlation, and linear and quadratic regression models were derived. All the measurements were found to be significantly larger in males than females. Stature correlated best with the combined length of sternum, among males (R = 0.894), females (R = 0.859), and for the total sample (R = 0.891). The study showed that the models derived for stature estimation from combined length of sternum are likely to give the most accurate estimates of stature in forensic case work when compared to manubrium and mesosternum. Accuracy of stature estimation further increased with quadratic models derived for the mesosternum among males and combined length of sternum among males and females when compared to linear regression models. Future studies in different geographical locations and a larger sample size are proposed to confirm the study observations. Copyright © 2018 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
ASYMMETRIC PRICE TRANSMISSION MODELING: THE IMPORTANCE OF MODEL COMPLEXITY AND THE PERFORMANCE OF THE SELECTION CRITERIA

Directory of Open Access Journals (Sweden)

Henry de-Graft Acquah

2013-01-01

Full Text Available Information Criteria provides an attractive basis for selecting the best model from a set of competing asymmetric price transmission models or theories. However, little is understood about the sensitivity of the model selection methods to model complexity. This study therefore fits competing asymmetric price transmission models that differ in complexity to simulated data and evaluates the ability of the model selection methods to recover the true model. The results of Monte Carlo experimentation suggest that in general BIC, CAIC and DIC were superior to AIC when the true data generating process was the standard error correction model, whereas AIC was more successful when the true model was the complex error correction model. It is also shown that the model selection methods performed better in large samples for a complex asymmetric data generating process than with a standard asymmetric data generating process. Except for complex models, AIC's performance did not make substantial gains in recovery rates as sample size increased. The research findings demonstrate the influence of model complexity in asymmetric price transmission model comparison and selection.
Censored rainfall modelling for estimation of fine-scale extremes

Science.gov (United States)

Cross, David; Onof, Christian; Winter, Hugo; Bernardara, Pietro

2018-01-01

Reliable estimation of rainfall extremes is essential for drainage system design, flood mitigation, and risk quantification. However, traditional techniques lack physical realism and extrapolation can be highly uncertain. In this study, we improve the physical basis for short-duration extreme rainfall estimation by simulating the heavy portion of the rainfall record mechanistically using the Bartlett-Lewis rectangular pulse (BLRP) model. Mechanistic rainfall models have had a tendency to underestimate rainfall extremes at fine temporal scales. Despite this, the simple process representation of rectangular pulse models is appealing in the context of extreme rainfall estimation because it emulates the known phenomenology of rainfall generation. A censored approach to Bartlett-Lewis model calibration is proposed and performed for single-site rainfall from two gauges in the UK and Germany. Extreme rainfall estimation is performed for each gauge at the 5, 15, and 60 min resolutions, and considerations for censor selection discussed.
A model for the sustainable selection of building envelope assemblies

Energy Technology Data Exchange (ETDEWEB)

Huedo, Patricia, E-mail: huedo@uji.es [Universitat Jaume I (Spain); Mulet, Elena, E-mail: emulet@uji.es [Universitat Jaume I (Spain); López-Mesa, Belinda, E-mail: belinda@unizar.es [Universidad de Zaragoza (Spain)

2016-02-15

The aim of this article is to define an evaluation model for the environmental impacts of building envelopes to support planners in the early phases of materials selection. The model is intended to estimate environmental impacts for different combinations of building envelope assemblies based on scientifically recognised sustainability indicators. These indicators will increase the amount of information that existing catalogues show to support planners in the selection of building assemblies. To define the model, first the environmental indicators were selected based on the specific aims of the intended sustainability assessment. Then, a simplified LCA methodology was developed to estimate the impacts applicable to three types of dwellings considering different envelope assemblies, building orientations and climate zones. This methodology takes into account the manufacturing, installation, maintenance and use phases of the building. Finally, the model was validated and a matrix in Excel was created as implementation of the model. - Highlights: • Method to assess the envelope impacts based on a simplified LCA • To be used at an earlier phase than the existing methods in a simple way. • It assigns a score by means of known sustainability indicators. • It estimates data about the embodied and operating environmental impacts. • It compares the investment costs with the costs of the consumed energy.
A model for the sustainable selection of building envelope assemblies

International Nuclear Information System (INIS)

Huedo, Patricia; Mulet, Elena; López-Mesa, Belinda

2016-01-01

The aim of this article is to define an evaluation model for the environmental impacts of building envelopes to support planners in the early phases of materials selection. The model is intended to estimate environmental impacts for different combinations of building envelope assemblies based on scientifically recognised sustainability indicators. These indicators will increase the amount of information that existing catalogues show to support planners in the selection of building assemblies. To define the model, first the environmental indicators were selected based on the specific aims of the intended sustainability assessment. Then, a simplified LCA methodology was developed to estimate the impacts applicable to three types of dwellings considering different envelope assemblies, building orientations and climate zones. This methodology takes into account the manufacturing, installation, maintenance and use phases of the building. Finally, the model was validated and a matrix in Excel was created as implementation of the model. - Highlights: • Method to assess the envelope impacts based on a simplified LCA • To be used at an earlier phase than the existing methods in a simple way. • It assigns a score by means of known sustainability indicators. • It estimates data about the embodied and operating environmental impacts. • It compares the investment costs with the costs of the consumed energy.
Magnetically separable polymer (Mag-MIP) for selective analysis of biotin in food samples.

Science.gov (United States)

Uzuriaga-Sánchez, Rosario Josefina; Khan, Sabir; Wong, Ademar; Picasso, Gino; Pividori, Maria Isabel; Sotomayor, Maria Del Pilar Taboada

2016-01-01

This work presents an efficient method for the preparation of magnetic nanoparticles modified with molecularly imprinted polymers (Mag-MIP) through core-shell method for the determination of biotin in milk food samples. The functional monomer acrylic acid was selected from molecular modeling, EGDMA was used as cross-linking monomer and AIBN as radical initiator. The Mag-MIP and Mag-NIP were characterized by FTIR, magnetic hysteresis, XRD, SEM and N2-sorption measurements. The capacity of Mag-MIP for biotin adsorption, its kinetics and selectivity were studied in detail. The adsorption data was well described by Freundlich isotherm model with adsorption equilibrium constant (KF) of 1.46 mL g(-1). The selectivity experiments revealed that prepared Mag-MIP had higher selectivity toward biotin compared to other molecules with different chemical structure. The material was successfully applied for the determination of biotin in diverse milk samples using HPLC for quantification of the analyte, obtaining the mean value of 87.4% recovery. Copyright © 2015 Elsevier Ltd. All rights reserved.
Sampling, Probability Models and Statistical Reasoning -RE ...

Indian Academy of Sciences (India)

random sampling allows data to be modelled with the help of probability ... g based on different trials to get an estimate of the experimental error. ... research interests lie in the .... if e is indeed the true value of the proportion of defectives in the.
NASA Software Cost Estimation Model: An Analogy Based Estimation Model

Science.gov (United States)

Hihn, Jairus; Juster, Leora; Menzies, Tim; Mathew, George; Johnson, James

2015-01-01

The cost estimation of software development activities is increasingly critical for large scale integrated projects such as those at DOD and NASA especially as the software systems become larger and more complex. As an example MSL (Mars Scientific Laboratory) developed at the Jet Propulsion Laboratory launched with over 2 million lines of code making it the largest robotic spacecraft ever flown (Based on the size of the software). Software development activities are also notorious for their cost growth, with NASA flight software averaging over 50% cost growth. All across the agency, estimators and analysts are increasingly being tasked to develop reliable cost estimates in support of program planning and execution. While there has been extensive work on improving parametric methods there is very little focus on the use of models based on analogy and clustering algorithms. In this paper we summarize our findings on effort/cost model estimation and model development based on ten years of software effort estimation research using data mining and machine learning methods to develop estimation models based on analogy and clustering. The NASA Software Cost Model performance is evaluated by comparing it to COCOMO II, linear regression, and K- nearest neighbor prediction model performance on the same data set.
Selecting Sensitive Parameter Subsets in Dynamical Models With Application to Biomechanical System Identification.

Science.gov (United States)

Ramadan, Ahmed; Boss, Connor; Choi, Jongeun; Peter Reeves, N; Cholewicki, Jacek; Popovich, John M; Radcliffe, Clark J

2018-07-01

Estimating many parameters of biomechanical systems with limited data may achieve good fit but may also increase 95% confidence intervals in parameter estimates. This results in poor identifiability in the estimation problem. Therefore, we propose a novel method to select sensitive biomechanical model parameters that should be estimated, while fixing the remaining parameters to values obtained from preliminary estimation. Our method relies on identifying the parameters to which the measurement output is most sensitive. The proposed method is based on the Fisher information matrix (FIM). It was compared against the nonlinear least absolute shrinkage and selection operator (LASSO) method to guide modelers on the pros and cons of our FIM method. We present an application identifying a biomechanical parametric model of a head position-tracking task for ten human subjects. Using measured data, our method (1) reduced model complexity by only requiring five out of twelve parameters to be estimated, (2) significantly reduced parameter 95% confidence intervals by up to 89% of the original confidence interval, (3) maintained goodness of fit measured by variance accounted for (VAF) at 82%, (4) reduced computation time, where our FIM method was 164 times faster than the LASSO method, and (5) selected similar sensitive parameters to the LASSO method, where three out of five selected sensitive parameters were shared by FIM and LASSO methods.
Ultra-small time-delay estimation via a weak measurement technique with post-selection

International Nuclear Information System (INIS)

Fang, Chen; Huang, Jing-Zheng; Yu, Yang; Li, Qinzheng; Zeng, Guihua

2016-01-01

Weak measurement is a novel technique for parameter estimation with higher precision. In this paper we develop a general theory for the parameter estimation based on a weak measurement technique with arbitrary post-selection. The weak-value amplification model and the joint weak measurement model are two special cases in our theory. Applying the developed theory, time-delay estimation is investigated in both theory and experiments. The experimental results show that when the time delay is ultra-small, the joint weak measurement scheme outperforms the weak-value amplification scheme, and is robust against not only misalignment errors but also the wavelength dependence of the optical components. These results are consistent with theoretical predictions that have not been previously verified by any experiment. (paper)
The NuSTAR Extragalactic Surveys: X-Ray Spectroscopic Analysis of the Bright Hard-band Selected Sample

Science.gov (United States)

Zappacosta, L.; Comastri, A.; Civano, F.; Puccetti, S.; Fiore, F.; Aird, J.; Del Moro, A.; Lansbury, G. B.; Lanzuisi, G.; Goulding, A.; Mullaney, J. R.; Stern, D.; Ajello, M.; Alexander, D. M.; Ballantyne, D. R.; Bauer, F. E.; Brandt, W. N.; Chen, C.-T. J.; Farrah, D.; Harrison, F. A.; Gandhi, P.; Lanz, L.; Masini, A.; Marchesi, S.; Ricci, C.; Treister, E.

2018-02-01

We discuss the spectral analysis of a sample of 63 active galactic nuclei (AGN) detected above a limiting flux of S(8{--}24 {keV})=7× {10}-14 {erg} {{{s}}}-1 {{cm}}-2 in the multi-tiered NuSTAR extragalactic survey program. The sources span a redshift range z=0{--}2.1 (median =0.58). The spectral analysis is performed over the broad 0.5–24 keV energy range, combining NuSTAR with Chandra and/or XMM-Newton data and employing empirical and physically motivated models. This constitutes the largest sample of AGN selected at > 10 {keV} to be homogeneously spectrally analyzed at these flux levels. We study the distribution of spectral parameters such as photon index, column density ({N}{{H}}), reflection parameter ({\\boldsymbol{R}}), and 10–40 keV luminosity ({L}{{X}}). Heavily obscured ({log}[{N}{{H}}/{{cm}}-2]≥slant 23) and Compton-thick (CT; {log}[{N}{{H}}/{{cm}}-2]≥slant 24) AGN constitute ∼25% (15–17 sources) and ∼2–3% (1–2 sources) of the sample, respectively. The observed {N}{{H}} distribution agrees fairly well with predictions of cosmic X-ray background population-synthesis models (CXBPSM). We estimate the intrinsic fraction of AGN as a function of {N}{{H}}, accounting for the bias against obscured AGN in a flux-selected sample. The fraction of CT AGN relative to {log}[{N}{{H}}/{{cm}}-2]=20{--}24 AGN is poorly constrained, formally in the range 2–56% (90% upper limit of 66%). We derived a fraction (f abs) of obscured AGN ({log}[{N}{{H}}/{{cm}}-2]=22{--}24) as a function of {L}{{X}} in agreement with CXBPSM and previous zvalues.
A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

Science.gov (United States)

Zhang, L; Liu, X J

2016-06-03

With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.
Effects of systematic sampling on satellite estimates of deforestation rates

International Nuclear Information System (INIS)

Steininger, M K; Godoy, F; Harper, G

2009-01-01

Options for satellite monitoring of deforestation rates over large areas include the use of sampling. Sampling may reduce the cost of monitoring but is also a source of error in estimates of areas and rates. A common sampling approach is systematic sampling, in which sample units of a constant size are distributed in some regular manner, such as a grid. The proposed approach for the 2010 Forest Resources Assessment (FRA) of the UN Food and Agriculture Organization (FAO) is a systematic sample of 10 km wide squares at every 1 deg. intersection of latitude and longitude. We assessed the outcome of this and other systematic samples for estimating deforestation at national, sub-national and continental levels. The study is based on digital data on deforestation patterns for the five Amazonian countries outside Brazil plus the Brazilian Amazon. We tested these schemes by varying sample-unit size and frequency. We calculated two estimates of sampling error. First we calculated the standard errors, based on the size, variance and covariance of the samples, and from this calculated the 95% confidence intervals (CI). Second, we calculated the actual errors, based on the difference between the sample-based estimates and the estimates from the full-coverage maps. At the continental level, the 1 deg., 10 km scheme had a CI of 21% and an actual error of 8%. At the national level, this scheme had CIs of 126% for Ecuador and up to 67% for other countries. At this level, increasing sampling density to every 0.25 deg. produced a CI of 32% for Ecuador and CIs of up to 25% for other countries, with only Brazil having a CI of less than 10%. Actual errors were within the limits of the CIs in all but two of the 56 cases. Actual errors were half or less of the CIs in all but eight of these cases. These results indicate that the FRA 2010 should have CIs of smaller than or close to 10% at the continental level. However, systematic sampling at the national level yields large CIs unless the
Fatigue level estimation of monetary bills based on frequency band acoustic signals with feature selection by supervised SOM

Science.gov (United States)

Teranishi, Masaru; Omatu, Sigeru; Kosaka, Toshihisa

Fatigued monetary bills adversely affect the daily operation of automated teller machines (ATMs). In order to make the classification of fatigued bills more efficient, the development of an automatic fatigued monetary bill classification method is desirable. We propose a new method by which to estimate the fatigue level of monetary bills from the feature-selected frequency band acoustic energy pattern of banking machines. By using a supervised self-organizing map (SOM), we effectively estimate the fatigue level using only the feature-selected frequency band acoustic energy pattern. Furthermore, the feature-selected frequency band acoustic energy pattern improves the estimation accuracy of the fatigue level of monetary bills by adding frequency domain information to the acoustic energy pattern. The experimental results with real monetary bill samples reveal the effectiveness of the proposed method.
Online State Space Model Parameter Estimation in Synchronous Machines

Directory of Open Access Journals (Sweden)

Z. Gallehdari

2014-06-01

The suggested approach is evaluated for a sample synchronous machine model. Estimated parameters are tested for different inputs at different operating conditions. The effect of noise is also considered in this study. Simulation results show that the proposed approach provides good accuracy for parameter estimation.
Channel Selection and Feature Projection for Cognitive Load Estimation Using Ambulatory EEG

Directory of Open Access Journals (Sweden)

Tian Lan

2007-01-01

Full Text Available We present an ambulatory cognitive state classification system to assess the subject's mental load based on EEG measurements. The ambulatory cognitive state estimator is utilized in the context of a real-time augmented cognition (AugCog system that aims to enhance the cognitive performance of a human user through computer-mediated assistance based on assessments of cognitive states using physiological signals including, but not limited to, EEG. This paper focuses particularly on the offline channel selection and feature projection phases of the design and aims to present mutual-information-based techniques that use a simple sample estimator for this quantity. Analyses conducted on data collected from 3 subjects performing 2 tasks (n-back/Larson at 2 difficulty levels (low/high demonstrate that the proposed mutual-information-based dimensionality reduction scheme can achieve up to 94% cognitive load estimation accuracy.
Optimizing Sampling Efficiency for Biomass Estimation Across NEON Domains

Science.gov (United States)

Abercrombie, H. H.; Meier, C. L.; Spencer, J. J.

2013-12-01

Over the course of 30 years, the National Ecological Observatory Network (NEON) will measure plant biomass and productivity across the U.S. to enable an understanding of terrestrial carbon cycle responses to ecosystem change drivers. Over the next several years, prior to operational sampling at a site, NEON will complete construction and characterization phases during which a limited amount of sampling will be done at each site to inform sampling designs, and guide standardization of data collection across all sites. Sampling biomass in 60+ sites distributed among 20 different eco-climatic domains poses major logistical and budgetary challenges. Traditional biomass sampling methods such as clip harvesting and direct measurements of Leaf Area Index (LAI) involve collecting and processing plant samples, and are time and labor intensive. Possible alternatives include using indirect sampling methods for estimating LAI such as digital hemispherical photography (DHP) or using a LI-COR 2200 Plant Canopy Analyzer. These LAI estimations can then be used as a proxy for biomass. The biomass estimates calculated can then inform the clip harvest sampling design during NEON operations, optimizing both sample size and number so that standardized uncertainty limits can be achieved with a minimum amount of sampling effort. In 2011, LAI and clip harvest data were collected from co-located sampling points at the Central Plains Experimental Range located in northern Colorado, a short grass steppe ecosystem that is the NEON Domain 10 core site. LAI was measured with a LI-COR 2200 Plant Canopy Analyzer. The layout of the sampling design included four, 300 meter transects, with clip harvests plots spaced every 50m, and LAI sub-transects spaced every 10m. LAI was measured at four points along 6m sub-transects running perpendicular to the 300m transect. Clip harvest plots were co-located 4m from corresponding LAI transects, and had dimensions of 0.1m by 2m. We conducted regression analyses
The Impact of Learning Curve Model Selection and Criteria for Cost Estimation Accuracy in the DoD

Science.gov (United States)

2016-04-30

different in the future due to machines” • Heightened scrutiny of cost estimates • Budget Control Act of 2011 seeks to reduce federal deficit ...qÜáêíÉÉåíÜ=^ååì~ä= ^Åèìáëáíáçå=oÉëÉ~êÅÜ= póãéçëáìã= qÜìêëÇ~ó=pÉëëáçåë= sçäìãÉ=ff= = The Impact of Learning Curve Model Selection and Criteria for Cost...Assistant Division Director, Institute for Defense Analyses Bruce Harmon, Research Staff Member, Institute for Defense Analyses The Impact of Learning

Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA.

Science.gov (United States)

Kelly, Brendan J; Gross, Robert; Bittinger, Kyle; Sherrill-Mix, Scott; Lewis, James D; Collman, Ronald G; Bushman, Frederic D; Li, Hongzhe

2015-08-01

The variation in community composition between microbiome samples, termed beta diversity, can be measured by pairwise distance based on either presence-absence or quantitative species abundance data. PERMANOVA, a permutation-based extension of multivariate analysis of variance to a matrix of pairwise distances, partitions within-group and between-group distances to permit assessment of the effect of an exposure or intervention (grouping factor) upon the sampled microbiome. Within-group distance and exposure/intervention effect size must be accurately modeled to estimate statistical power for a microbiome study that will be analyzed with pairwise distances and PERMANOVA. We present a framework for PERMANOVA power estimation tailored to marker-gene microbiome studies that will be analyzed by pairwise distances, which includes: (i) a novel method for distance matrix simulation that permits modeling of within-group pairwise distances according to pre-specified population parameters; (ii) a method to incorporate effects of different sizes within the simulated distance matrix; (iii) a simulation-based method for estimating PERMANOVA power from simulated distance matrices; and (iv) an R statistical software package that implements the above. Matrices of pairwise distances can be efficiently simulated to satisfy the triangle inequality and incorporate group-level effects, which are quantified by the adjusted coefficient of determination, omega-squared (ω2). From simulated distance matrices, available PERMANOVA power or necessary sample size can be estimated for a planned microbiome study. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Risk Attitudes, Sample Selection and Attrition in a Longitudinal Field Experiment

DEFF Research Database (Denmark)

Harrison, Glenn W.; Lau, Morten Igel

with respect to risk attitudes. Our design builds in explicit randomization on the incentives for participation. We show that there are significant sample selection effects on inferences about the extent of risk aversion, but that the effects of subsequent sample attrition are minimal. Ignoring sample...... selection leads to inferences that subjects in the population are more risk averse than they actually are. Correcting for sample selection and attrition affects utility curvature, but does not affect inferences about probability weighting. Properly accounting for sample selection and attrition effects leads...... to findings of temporal stability in overall risk aversion. However, that stability is around different levels of risk aversion than one might naively infer without the controls for sample selection and attrition we are able to implement. This evidence of “randomization bias” from sample selection...
Application of random effects to the study of resource selection by animals.

Science.gov (United States)

Gillies, Cameron S; Hebblewhite, Mark; Nielsen, Scott E; Krawchuk, Meg A; Aldridge, Cameron L; Frair, Jacqueline L; Saher, D Joanne; Stevens, Cameron E; Jerde, Christopher L

2006-07-01

1. Resource selection estimated by logistic regression is used increasingly in studies to identify critical resources for animal populations and to predict species occurrence. 2. Most frequently, individual animals are monitored and pooled to estimate population-level effects without regard to group or individual-level variation. Pooling assumes that both observations and their errors are independent, and resource selection is constant given individual variation in resource availability. 3. Although researchers have identified ways to minimize autocorrelation, variation between individuals caused by differences in selection or available resources, including functional responses in resource selection, have not been well addressed. 4. Here we review random-effects models and their application to resource selection modelling to overcome these common limitations. We present a simple case study of an analysis of resource selection by grizzly bears in the foothills of the Canadian Rocky Mountains with and without random effects. 5. Both categorical and continuous variables in the grizzly bear model differed in interpretation, both in statistical significance and coefficient sign, depending on how a random effect was included. We used a simulation approach to clarify the application of random effects under three common situations for telemetry studies: (a) discrepancies in sample sizes among individuals; (b) differences among individuals in selection where availability is constant; and (c) differences in availability with and without a functional response in resource selection. 6. We found that random intercepts accounted for unbalanced sample designs, and models with random intercepts and coefficients improved model fit given the variation in selection among individuals and functional responses in selection. Our empirical example and simulations demonstrate how including random effects in resource selection models can aid interpretation and address difficult assumptions
Variance of discharge estimates sampled using acoustic Doppler current profilers from moving boats

Science.gov (United States)

Garcia, Carlos M.; Tarrab, Leticia; Oberg, Kevin; Szupiany, Ricardo; Cantero, Mariano I.

2012-01-01

This paper presents a model for quantifying the random errors (i.e., variance) of acoustic Doppler current profiler (ADCP) discharge measurements from moving boats for different sampling times. The model focuses on the random processes in the sampled flow field and has been developed using statistical methods currently available for uncertainty analysis of velocity time series. Analysis of field data collected using ADCP from moving boats from three natural rivers of varying sizes and flow conditions shows that, even though the estimate of the integral time scale of the actual turbulent flow field is larger than the sampling interval, the integral time scale of the sampled flow field is on the order of the sampling interval. Thus, an equation for computing the variance error in discharge measurements associated with different sampling times, assuming uncorrelated flow fields is appropriate. The approach is used to help define optimal sampling strategies by choosing the exposure time required for ADCPs to accurately measure flow discharge.
Performances of estimators of linear auto-correlated error model ...

African Journals Online (AJOL)

The performances of five estimators of linear models with autocorrelated disturbance terms are compared when the independent variable is exponential. The results reveal that for both small and large samples, the Ordinary Least Squares (OLS) compares favourably with the Generalized least Squares (GLS) estimators in ...
Model selection with multiple regression on distance matrices leads to incorrect inferences.

Directory of Open Access Journals (Sweden)

Ryan P Franckowiak

Full Text Available In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC, its small-sample correction (AICc, and the Bayesian information criterion (BIC to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.
Estimating structural equation models with non-normal variables by using transformations

NARCIS (Netherlands)

Montfort, van K.; Mooijaart, A.; Meijerink, F.

2009-01-01

We discuss structural equation models for non-normal variables. In this situation the maximum likelihood and the generalized least-squares estimates of the model parameters can give incorrect estimates of the standard errors and the associated goodness-of-fit chi-squared statistics. If the sample
Functional Mixed Effects Model for Small Area Estimation.

Science.gov (United States)

Maiti, Tapabrata; Sinha, Samiran; Zhong, Ping-Shou

2016-09-01

Functional data analysis has become an important area of research due to its ability of handling high dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models, and in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area level data, and fit a varying coefficient linear mixed effect model where the varying coefficients are semi-parametrically modeled via B-splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors, and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.
Contributions to sampling statistics

CERN Document Server

Conti, Pier; Ranalli, Maria

2014-01-01

This book contains a selection of the papers presented at the ITACOSM 2013 Conference, held in Milan in June 2013. ITACOSM is the bi-annual meeting of the Survey Sampling Group S2G of the Italian Statistical Society, intended as an international forum of scientific discussion on the developments of theory and application of survey sampling methodologies and applications in human and natural sciences. The book gathers research papers carefully selected from both invited and contributed sessions of the conference. The whole book appears to be a relevant contribution to various key aspects of sampling methodology and techniques; it deals with some hot topics in sampling theory, such as calibration, quantile-regression and multiple frame surveys, and with innovative methodologies in important topics of both sampling theory and applications. Contributions cut across current sampling methodologies such as interval estimation for complex samples, randomized responses, bootstrap, weighting, modeling, imputati...
Rank-based model selection for multiple ions quantum tomography

International Nuclear Information System (INIS)

Guţă, Mădălin; Kypraios, Theodore; Dryden, Ian

2012-01-01

The statistical analysis of measurement data has become a key component of many quantum engineering experiments. As standard full state tomography becomes unfeasible for large dimensional quantum systems, one needs to exploit prior information and the ‘sparsity’ properties of the experimental state in order to reduce the dimensionality of the estimation problem. In this paper we propose model selection as a general principle for finding the simplest, or most parsimonious explanation of the data, by fitting different models and choosing the estimator with the best trade-off between likelihood fit and model complexity. We apply two well established model selection methods—the Akaike information criterion (AIC) and the Bayesian information criterion (BIC)—two models consisting of states of fixed rank and datasets such as are currently produced in multiple ions experiments. We test the performance of AIC and BIC on randomly chosen low rank states of four ions, and study the dependence of the selected rank with the number of measurement repetitions for one ion states. We then apply the methods to real data from a four ions experiment aimed at creating a Smolin state of rank 4. By applying the two methods together with the Pearson χ 2 test we conclude that the data can be suitably described with a model whose rank is between 7 and 9. Additionally we find that the mean square error of the maximum likelihood estimator for pure states is close to that of the optimal over all possible measurements. (paper)
Dynamic selection of ship responses for estimation of on-site directional wave spectra

DEFF Research Database (Denmark)

Andersen, Ingrid Marie Vincent; Storhaug, Gaute

2012-01-01

-estimate of the wave spectrum is suggested. The selection method needs to be robust for what reason a parameterised uni-directional, two-parameter wave spectrum is treated. The parameters included are the zero up-crossing period, the significant wave height and the main wave direction relative to the ship’s heading...... with the best overall agreement are selected for the actual estimation of the directional wave spectrum. The transfer functions for the ship responses can be determined using different computational methods such as striptheory, 3D panel codes, closed form expressions or model tests. The uncertainty associated......Knowledge of the wave environment in which a ship is operating is crucial for most on-board decision support systems. Previous research has shown that the directional wave spectrum can be estimated by the use of measured global ship responses and a set of transfer functions determined...
Estimation after classification using lot quality assurance sampling: corrections for curtailed sampling with application to evaluating polio vaccination campaigns.

Science.gov (United States)

Olives, Casey; Valadez, Joseph J; Pagano, Marcello

2014-03-01

To assess the bias incurred when curtailment of Lot Quality Assurance Sampling (LQAS) is ignored, to present unbiased estimators, to consider the impact of cluster sampling by simulation and to apply our method to published polio immunization data from Nigeria. We present estimators of coverage when using two kinds of curtailed LQAS strategies: semicurtailed and curtailed. We study the proposed estimators with independent and clustered data using three field-tested LQAS designs for assessing polio vaccination coverage, with samples of size 60 and decision rules of 9, 21 and 33, and compare them to biased maximum likelihood estimators. Lastly, we present estimates of polio vaccination coverage from previously published data in 20 local government authorities (LGAs) from five Nigerian states. Simulations illustrate substantial bias if one ignores the curtailed sampling design. Proposed estimators show no bias. Clustering does not affect the bias of these estimators. Across simulations, standard errors show signs of inflation as clustering increases. Neither sampling strategy nor LQAS design influences estimates of polio vaccination coverage in 20 Nigerian LGAs. When coverage is low, semicurtailed LQAS strategies considerably reduces the sample size required to make a decision. Curtailed LQAS designs further reduce the sample size when coverage is high. Results presented dispel the misconception that curtailed LQAS data are unsuitable for estimation. These findings augment the utility of LQAS as a tool for monitoring vaccination efforts by demonstrating that unbiased estimation using curtailed designs is not only possible but these designs also reduce the sample size. © 2014 John Wiley & Sons Ltd.
Bootstrap-after-bootstrap model averaging for reducing model uncertainty in model selection for air pollution mortality studies.

Science.gov (United States)

Roberts, Steven; Martin, Michael A

2010-01-01

Concerns have been raised about findings of associations between particulate matter (PM) air pollution and mortality that have been based on a single "best" model arising from a model selection procedure, because such a strategy may ignore model uncertainty inherently involved in searching through a set of candidate models to find the best model. Model averaging has been proposed as a method of allowing for model uncertainty in this context. To propose an extension (double BOOT) to a previously described bootstrap model-averaging procedure (BOOT) for use in time series studies of the association between PM and mortality. We compared double BOOT and BOOT with Bayesian model averaging (BMA) and a standard method of model selection [standard Akaike's information criterion (AIC)]. Actual time series data from the United States are used to conduct a simulation study to compare and contrast the performance of double BOOT, BOOT, BMA, and standard AIC. Double BOOT produced estimates of the effect of PM on mortality that have had smaller root mean squared error than did those produced by BOOT, BMA, and standard AIC. This performance boost resulted from estimates produced by double BOOT having smaller variance than those produced by BOOT and BMA. Double BOOT is a viable alternative to BOOT and BMA for producing estimates of the mortality effect of PM.
GENERALISED MODEL BASED CONFIDENCE INTERVALS IN TWO STAGE CLUSTER SAMPLING

Directory of Open Access Journals (Sweden)

Christopher Ouma Onyango

2010-09-01

Full Text Available Chambers and Dorfman (2002 constructed bootstrap confidence intervals in model based estimation for finite population totals assuming that auxiliary values are available throughout a target population and that the auxiliary values are independent. They also assumed that the cluster sizes are known throughout the target population. We now extend to two stage sampling in which the cluster sizes are known only for the sampled clusters, and we therefore predict the unobserved part of the population total. Jan and Elinor (2008 have done similar work, but unlike them, we use a general model, in which the auxiliary values are not necessarily independent. We demonstrate that the asymptotic properties of our proposed estimator and its coverage rates are better than those constructed under the model assisted local polynomial regression model.
A review of single-sample-based models and other approaches for radiocarbon dating of dissolved inorganic carbon in groundwater

Science.gov (United States)

Han, L. F; Plummer, Niel

2016-01-01

Numerous methods have been proposed to estimate the pre-nuclear-detonation 14C content of dissolved inorganic carbon (DIC) recharged to groundwater that has been corrected/adjusted for geochemical processes in the absence of radioactive decay (14C0) - a quantity that is essential for estimation of radiocarbon age of DIC in groundwater. The models/approaches most commonly used are grouped as follows: (1) single-sample-based models, (2) a statistical approach based on the observed (curved) relationship between 14C and δ13C data for the aquifer, and (3) the geochemical mass-balance approach that constructs adjustment models accounting for all the geochemical reactions known to occur along a groundwater flow path. This review discusses first the geochemical processes behind each of the single-sample-based models, followed by discussions of the statistical approach and the geochemical mass-balance approach. Finally, the applications, advantages and limitations of the three groups of models/approaches are discussed.The single-sample-based models constitute the prevailing use of 14C data in hydrogeology and hydrological studies. This is in part because the models are applied to an individual water sample to estimate the 14C age, therefore the measurement data are easily available. These models have been shown to provide realistic radiocarbon ages in many studies. However, they usually are limited to simple carbonate aquifers and selection of model may have significant effects on 14C0 often resulting in a wide range of estimates of 14C ages.Of the single-sample-based models, four are recommended for the estimation of 14C0 of DIC in groundwater: Pearson's model, (Ingerson and Pearson, 1964; Pearson and White, 1967), Han & Plummer's model (Han and Plummer, 2013), the IAEA model (Gonfiantini, 1972; Salem et al., 1980), and Oeschger's model (Geyh, 2000). These four models include all processes considered in single-sample-based models, and can be used in different ranges of
An optimized Line Sampling method for the estimation of the failure probability of nuclear passive systems

International Nuclear Information System (INIS)

Zio, E.; Pedroni, N.

2010-01-01

The quantitative reliability assessment of a thermal-hydraulic (T-H) passive safety system of a nuclear power plant can be obtained by (i) Monte Carlo (MC) sampling the uncertainties of the system model and parameters, (ii) computing, for each sample, the system response by a mechanistic T-H code and (iii) comparing the system response with pre-established safety thresholds, which define the success or failure of the safety function. The computational effort involved can be prohibitive because of the large number of (typically long) T-H code simulations that must be performed (one for each sample) for the statistical estimation of the probability of success or failure. In this work, Line Sampling (LS) is adopted for efficient MC sampling. In the LS method, an 'important direction' pointing towards the failure domain of interest is determined and a number of conditional one-dimensional problems are solved along such direction; this allows for a significant reduction of the variance of the failure probability estimator, with respect, for example, to standard random sampling. Two issues are still open with respect to LS: first, the method relies on the determination of the 'important direction', which requires additional runs of the T-H code; second, although the method has been shown to improve the computational efficiency by reducing the variance of the failure probability estimator, no evidence has been given yet that accurate and precise failure probability estimates can be obtained with a number of samples reduced to below a few hundreds, which may be required in case of long-running models. The work presented in this paper addresses the first issue by (i) quantitatively comparing the efficiency of the methods proposed in the literature to determine the LS important direction; (ii) employing artificial neural network (ANN) regression models as fast-running surrogates of the original, long-running T-H code to reduce the computational cost associated to the
Statistical model selection with “Big Data”

Directory of Open Access Journals (Sweden)

Jurgen A. Doornik

2015-12-01

Full Text Available Big Data offer potential benefits for statistical modelling, but confront problems including an excess of false positives, mistaking correlations for causes, ignoring sampling biases and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem, using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem, using a viable approach that resolves the computational problem of immense numbers of possible models.
Estimating the Competitive Storage Model with Trending Commodity Prices

OpenAIRE

Gouel , Christophe; LEGRAND , Nicolas

2017-01-01

We present a method to estimate jointly the parameters of a standard commodity storage model and the parameters characterizing the trend in commodity prices. This procedure allows the influence of a possible trend to be removed without restricting the model specification, and allows model and trend selection based on statistical criteria. The trend is modeled deterministically using linear or cubic spline functions of time. The results show that storage models with trend are always preferred ...
An alternative procedure for estimating the population mean in simple random sampling

Directory of Open Access Journals (Sweden)

Housila P. Singh

2012-03-01

Full Text Available This paper deals with the problem of estimating the finite population mean using auxiliary information in simple random sampling. Firstly we have suggested a correction to the mean squared error of the estimator proposed by Gupta and Shabbir [On improvement in estimating the population mean in simple random sampling. Jour. Appl. Statist. 35(5 (2008, pp. 559-566]. Later we have proposed a ratio type estimator and its properties are studied in simple random sampling. Numerically we have shown that the proposed class of estimators is more efficient than different known estimators including Gupta and Shabbir (2008 estimator.
Disturbance estimation of nuclear power plant by using reduced-order model

International Nuclear Information System (INIS)

Tashima, Shin-ichi; Wakabayashi, Jiro

1983-01-01

An estimation method is proposed of multiplex disturbances which occur in a nuclear power plant. The method is composed of two parts: (i) the identification of a simplified model of multi-input and multi-output to describe the related system response, and (ii) the design of a Kalman filter to estimate the multiplex disturbance. Concerning the simplified model, several observed signals are firstly selected as output variables which can well represent the system response caused by the disturbances. A reduced-order model is utilized for designing the disturbance estimator. This is based on the following two considerations. The first is that the disturbance is assumed to be of a quasistatic nature. The other is based on the intuition that there exist a few dominant modes between the disturbances and the selected observed signals and that most of the non-dominant modes which remain may not affect the accuracy of the disturbance estimator. The reduced-order model is furtherly transformed to a single-output model using a linear combination of the output signals, where the standard procedure of the structural identification is evaded. The parameters of the model thus transformed are calculated by the generalized least square method. As for the multiplex disturbance estimator, the Kalman filtering method is applied by compromising the following three items : (a) quick response to disturbance, (b) reduction of estimation error in the presence of observation noises, and (c) the elimination of cross-interference between the disturbances to the plant and the estimates from the Kalman filter. The effectiveness of the proposed method is verified through some computer experiments using a BWR plant simulator. (author)

Robust Estimation and Moment Selection in Dynamic Fixed-effects Panel Data Models

NARCIS (Netherlands)

Cizek, P.; Aquaro, M.

2015-01-01

This paper extends an existing outlier-robust estimator of linear dynamic panel data models with fixed effects, which is based on the median ratio of two consecutive pairs of first-differenced data. To improve its precision and robust properties, a general procedure based on many pairwise
Forecasting Urban Air Quality via a Back-Propagation Neural Network and a Selection Sample Rule

Directory of Open Access Journals (Sweden)

Yonghong Liu

2015-07-01

Full Text Available In this paper, based on a sample selection rule and a Back Propagation (BP neural network, a new model of forecasting daily SO2, NO2, and PM10 concentration in seven sites of Guangzhou was developed using data from January 2006 to April 2012. A meteorological similarity principle was applied in the development of the sample selection rule. The key meteorological factors influencing SO2, NO2, and PM10 daily concentrations as well as weight matrices and threshold matrices were determined. A basic model was then developed based on the improved BP neural network. Improving the basic model, identification of the factor variation consistency was added in the rule, and seven sets of sensitivity experiments in one of the seven sites were conducted to obtain the selected model. A comparison of the basic model from May 2011 to April 2012 in one site showed that the selected model for PM10 displayed better forecasting performance, with Mean Absolute Percentage Error (MAPE values decreasing by 4% and R2 values increasing from 0.53 to 0.68. Evaluations conducted at the six other sites revealed a similar performance. On the whole, the analysis showed that the models presented here could provide local authorities with reliable and precise predictions and alarms about air quality if used at an operational scale.
Regression models to estimate real-time concentrations of selected constituents in two tributaries to Lake Houston near Houston, Texas, 2005-07

Science.gov (United States)

Oden, Timothy D.; Asquith, William H.; Milburn, Matthew S.

2009-01-01

In December 2005, the U.S. Geological Survey in cooperation with the City of Houston, Texas, began collecting discrete water-quality samples for nutrients, total organic carbon, bacteria (total coliform and Escherichia coli), atrazine, and suspended sediment at two U.S. Geological Survey streamflow-gaging stations upstream from Lake Houston near Houston (08068500 Spring Creek near Spring, Texas, and 08070200 East Fork San Jacinto River near New Caney, Texas). The data from the discrete water-quality samples collected during 2005-07, in conjunction with monitored real-time data already being collected - physical properties (specific conductance, pH, water temperature, turbidity, and dissolved oxygen), streamflow, and rainfall - were used to develop regression models for predicting water-quality constituent concentrations for inflows to Lake Houston. Rainfall data were obtained from a rain gage monitored by Harris County Homeland Security and Emergency Management and colocated with the Spring Creek station. The leaps and bounds algorithm was used to find the best subsets of possible regression models (minimum residual sum of squares for a given number of variables). The potential explanatory or predictive variables included discharge (streamflow), specific conductance, pH, water temperature, turbidity, dissolved oxygen, rainfall, and time (to account for seasonal variations inherent in some water-quality data). The response variables at each site were nitrite plus nitrate nitrogen, total phosphorus, organic carbon, Escherichia coli, atrazine, and suspended sediment. The explanatory variables provide easily measured quantities as a means to estimate concentrations of the various constituents under investigation, with accompanying estimates of measurement uncertainty. Each regression equation can be used to estimate concentrations of a given constituent in real time. In conjunction with estimated concentrations, constituent loads were estimated by multiplying the
Parameter Estimation in Stochastic Grey-Box Models

DEFF Research Database (Denmark)

Kristensen, Niels Rode; Madsen, Henrik; Jørgensen, Sten Bay

2004-01-01

An efficient and flexible parameter estimation scheme for grey-box models in the sense of discretely, partially observed Ito stochastic differential equations with measurement noise is presented along with a corresponding software implementation. The estimation scheme is based on the extended...... Kalman filter and features maximum likelihood as well as maximum a posteriori estimation on multiple independent data sets, including irregularly sampled data sets and data sets with occasional outliers and missing observations. The software implementation is compared to an existing software tool...... and proves to have better performance both in terms of quality of estimates for nonlinear systems with significant diffusion and in terms of reproducibility. In particular, the new tool provides more accurate and more consistent estimates of the parameters of the diffusion term....
Using step and path selection functions for estimating resistance to movement: Pumas as a case study

Science.gov (United States)

Katherine A. Zeller; Kevin McGarigal; Samuel A. Cushman; Paul Beier; T. Winston Vickers; Walter M. Boyce

2015-01-01

GPS telemetry collars and their ability to acquire accurate and consistently frequent locations have increased the use of step selection functions (SSFs) and path selection functions (PathSFs) for studying animal movement and estimating resistance. However, previously published SSFs and PathSFs often do not accommodate multiple scales or multiscale modeling....
Small sample GEE estimation of regression parameters for longitudinal data.

Science.gov (United States)

Paul, Sudhir; Zhang, Xuemao

2014-09-28

Longitudinal (clustered) response data arise in many bio-statistical applications which, in general, cannot be assumed to be independent. Generalized estimating equation (GEE) is a widely used method to estimate marginal regression parameters for correlated responses. The advantage of the GEE is that the estimates of the regression parameters are asymptotically unbiased even if the correlation structure is misspecified, although their small sample properties are not known. In this paper, two bias adjusted GEE estimators of the regression parameters in longitudinal data are obtained when the number of subjects is small. One is based on a bias correction, and the other is based on a bias reduction. Simulations show that the performances of both the bias-corrected methods are similar in terms of bias, efficiency, coverage probability, average coverage length, impact of misspecification of correlation structure, and impact of cluster size on bias correction. Both these methods show superior properties over the GEE estimates for small samples. Further, analysis of data involving a small number of subjects also shows improvement in bias, MSE, standard error, and length of the confidence interval of the estimates by the two bias adjusted methods over the GEE estimates. For small to moderate sample sizes (N ≤50), either of the bias-corrected methods GEEBc and GEEBr can be used. However, the method GEEBc should be preferred over GEEBr, as the former is computationally easier. For large sample sizes, the GEE method can be used. Copyright © 2014 John Wiley & Sons, Ltd.
Estimating sample size for landscape-scale mark-recapture studies of North American migratory tree bats

Science.gov (United States)

Ellison, Laura E.; Lukacs, Paul M.

2014-01-01

Concern for migratory tree-roosting bats in North America has grown because of possible population declines from wind energy development. This concern has driven interest in estimating population-level changes. Mark-recapture methodology is one possible analytical framework for assessing bat population changes, but sample size requirements to produce reliable estimates have not been estimated. To illustrate the sample sizes necessary for a mark-recapture-based monitoring program we conducted power analyses using a statistical model that allows reencounters of live and dead marked individuals. We ran 1,000 simulations for each of five broad sample size categories in a Burnham joint model, and then compared the proportion of simulations in which 95% confidence intervals overlapped between and among years for a 4-year study. Additionally, we conducted sensitivity analyses of sample size to various capture probabilities and recovery probabilities. More than 50,000 individuals per year would need to be captured and released to accurately determine 10% and 15% declines in annual survival. To detect more dramatic declines of 33% or 50% survival over four years, then sample sizes of 25,000 or 10,000 per year, respectively, would be sufficient. Sensitivity analyses reveal that increasing recovery of dead marked individuals may be more valuable than increasing capture probability of marked individuals. Because of the extraordinary effort that would be required, we advise caution should such a mark-recapture effort be initiated because of the difficulty in attaining reliable estimates. We make recommendations for what techniques show the most promise for mark-recapture studies of bats because some techniques violate the assumptions of mark-recapture methodology when used to mark bats.
Bayesian Simultaneous Estimation for Means in k Sample Problems

OpenAIRE

Imai, Ryo; Kubokawa, Tatsuya; Ghosh, Malay

2017-01-01

This paper is concerned with the simultaneous estimation of k population means when one suspects that the k means are nearly equal. As an alternative to the preliminary test estimator based on the test statistics for testing hypothesis of equal means, we derive Bayesian and minimax estimators which shrink individual sample means toward a pooled mean estimator given under the hypothesis. Interestingly, it is shown that both the preliminary test estimator and the Bayesian minimax shrinkage esti...
Turbidity-controlled sampling for suspended sediment load estimation

Science.gov (United States)

Jack Lewis

2003-01-01

Abstract - Automated data collection is essential to effectively measure suspended sediment loads in storm events, particularly in small basins. Continuous turbidity measurements can be used, along with discharge, in an automated system that makes real-time sampling decisions to facilitate sediment load estimation. The Turbidity Threshold Sampling method distributes...
Feasibility and reliability of digital imaging for estimating food selection and consumption from students' packed lunches.

Science.gov (United States)

Taylor, Jennifer C; Sutter, Carolyn; Ontai, Lenna L; Nishina, Adrienne; Zidenberg-Cherr, Sheri

2018-01-01

Although increasing attention is placed on the quality of foods in children's packed lunches, few studies have examined the capacity of observational methods to reliably determine both what is selected and consumed from these lunches. The objective of this project was to assess the feasibility and inter-rater reliability of digital imaging for determining selection and consumption from students' packed lunches, by adapting approaches previously applied to school lunches. Study 1 assessed feasibility and reliability of data collection among a sample of packed lunches (n = 155), while Study 2 further examined reliability in a larger sample of packed (n = 386) as well as school (n = 583) lunches. Based on the results from Study 1, it was feasible to collect and code most items in packed lunch images; missing data were most commonly attributed to packaging that limited visibility of contents. Across both studies, there was satisfactory reliability for determining food types selected, quantities selected, and quantities consumed in the eight food categories examined (weighted kappa coefficients 0.68-0.97 for packed lunches, 0.74-0.97 for school lunches), with lowest reliability for estimating condiments and meats/meat alternatives in packed lunches. In extending methods predominately applied to school lunches, these findings demonstrate the capacity of digital imaging for the objective estimation of selection and consumption from both school and packed lunches. Copyright © 2017 Elsevier Ltd. All rights reserved.
Estimating the price elasticity of beer: meta-analysis of data with heterogeneity, dependence, and publication bias.

Science.gov (United States)

Nelson, Jon P

2014-01-01

Precise estimates of price elasticities are important for alcohol tax policy. Using meta-analysis, this paper corrects average beer elasticities for heterogeneity, dependence, and publication selection bias. A sample of 191 estimates is obtained from 114 primary studies. Simple and weighted means are reported. Dependence is addressed by restricting number of estimates per study, author-restricted samples, and author-specific variables. Publication bias is addressed using funnel graph, trim-and-fill, and Egger's intercept model. Heterogeneity and selection bias are examined jointly in meta-regressions containing moderator variables for econometric methodology, primary data, and precision of estimates. Results for fixed- and random-effects regressions are reported. Country-specific effects and sample time periods are unimportant, but several methodology variables help explain the dispersion of estimates. In models that correct for selection bias and heterogeneity, the average beer price elasticity is about -0.20, which is less elastic by 50% compared to values commonly used in alcohol tax policy simulations. Copyright © 2013 Elsevier B.V. All rights reserved.
Applied stochastic modelling

CERN Document Server

Morgan, Byron JT; Tanner, Martin Abba; Carlin, Bradley P

2008-01-01

Introduction and Examples Introduction Examples of data sets Basic Model Fitting Introduction Maximum-likelihood estimation for a geometric model Maximum-likelihood for the beta-geometric model Modelling polyspermy Which model? What is a model for? Mechanistic models Function Optimisation Introduction MATLAB: graphs and finite differences Deterministic search methods Stochastic search methods Accuracy and a hybrid approach Basic Likelihood ToolsIntroduction Estimating standard errors and correlations Looking at surfaces: profile log-likelihoods Confidence regions from profiles Hypothesis testing in model selectionScore and Wald tests Classical goodness of fit Model selection biasGeneral Principles Introduction Parameterisation Parameter redundancy Boundary estimates Regression and influence The EM algorithm Alternative methods of model fitting Non-regular problemsSimulation Techniques Introduction Simulating random variables Integral estimation Verification Monte Carlo inference Estimating sampling distributi...
Relative contributions of sampling effort, measuring, and weighing to precision of larval sea lamprey biomass estimates

Science.gov (United States)

Slade, Jeffrey W.; Adams, Jean V.; Cuddy, Douglas W.; Neave, Fraser B.; Sullivan, W. Paul; Young, Robert J.; Fodale, Michael F.; Jones, Michael L.

2003-01-01

We developed two weight-length models from 231 populations of larval sea lampreys (Petromyzon marinus) collected from tributaries of the Great Lakes: Lake Ontario (21), Lake Erie (6), Lake Huron (67), Lake Michigan (76), and Lake Superior (61). Both models were mixed models, which used population as a random effect and additional environmental factors as fixed effects. We resampled weights and lengths 1,000 times from data collected in each of 14 other populations not used to develop the models, obtaining a weight and length distribution from reach resampling. To test model performance, we applied the two weight-length models to the resampled length distributions and calculated the predicted mean weights. We also calculated the observed mean weight for each resampling and for each of the original 14 data sets. When the average of predicted means was compared to means from the original data in each stream, inclusion of environmental factors did not consistently improve the performance of the weight-length model. We estimated the variance associated with measures of abundance and mean weight for each of the 14 selected populations and determined that a conservative estimate of the proportional contribution to variance associated with estimating abundance accounted for 32% to 95% of the variance (mean = 66%). Variability in the biomass estimate appears more affected by variability in estimating abundance than in converting length to weight. Hence, efforts to improve the precision of biomass estimates would be aided most by reducing the variability associated with estimating abundance.
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

Science.gov (United States)

Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

2017-06-01

A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
The Use of Indicators in Modified Historical Model to Estimate the Intrinsic Value of a Stock

Directory of Open Access Journals (Sweden)

Gottwald Radim

2012-06-01

Full Text Available The article mentions several methods of a fundamental analysis used to value stocks. It primarily focuses on the historical model. This model enables undervalued, correctly valued and overvalued stocks to be identified. The model is further modified in the article, using selected accounting indicators. The modified model versions are applied to selected stocks in the SPAD segment, Prague Stock Exchange, within the 2005-2010 period. Empirical analysis is applied to a comparison of accuracy of the accounting indicator value estimates and accuracy of stock intrinsic value estimates, both separately for each stock and accounting indicator. The comparisons of accuracy of the accounting indicator value estimates and the accuracy of the stock intrinsic value estimates are also done based on the length of applied time period. With respect to the obvious fierce competition between stock issuers within the financial markets, the model enables potential investors, who are to select from an extensive offer of stocks, to make better informed investment decisions.
A method for the estimation of the significance of cross-correlations in unevenly sampled red-noise time series

Science.gov (United States)

Max-Moerbeck, W.; Richards, J. L.; Hovatta, T.; Pavlidou, V.; Pearson, T. J.; Readhead, A. C. S.

2014-11-01

We present a practical implementation of a Monte Carlo method to estimate the significance of cross-correlations in unevenly sampled time series of data, whose statistical properties are modelled with a simple power-law power spectral density. This implementation builds on published methods; we introduce a number of improvements in the normalization of the cross-correlation function estimate and a bootstrap method for estimating the significance of the cross-correlations. A closely related matter is the estimation of a model for the light curves, which is critical for the significance estimates. We present a graphical and quantitative demonstration that uses simulations to show how common it is to get high cross-correlations for unrelated light curves with steep power spectral densities. This demonstration highlights the dangers of interpreting them as signs of a physical connection. We show that by using interpolation and the Hanning sampling window function we are able to reduce the effects of red-noise leakage and to recover steep simple power-law power spectral densities. We also introduce the use of a Neyman construction for the estimation of the errors in the power-law index of the power spectral density. This method provides a consistent way to estimate the significance of cross-correlations in unevenly sampled time series of data.
Estimating waste disposal quantities from raw waste samples

International Nuclear Information System (INIS)

Negin, C.A.; Urland, C.S.; Hitz, C.G.; GPU Nuclear Corp., Middletown, PA)

1985-01-01

Estimating the disposal quantity of waste resulting from stabilization of radioactive sludge is complex because of the many factors relating to sample analysis results, radioactive decay, allowable disposal concentrations, and options for disposal containers. To facilitate this estimation, a microcomputer spread sheet template was created. The spread sheet has saved considerable engineering hours. 1 fig., 3 tabs
Skewed factor models using selection mechanisms

KAUST Repository

Kim, Hyoung-Moon

2015-12-21

Traditional factor models explicitly or implicitly assume that the factors follow a multivariate normal distribution; that is, only moments up to order two are involved. However, it may happen in real data problems that the first two moments cannot explain the factors. Based on this motivation, here we devise three new skewed factor models, the skew-normal, the skew-tt, and the generalized skew-normal factor models depending on a selection mechanism on the factors. The ECME algorithms are adopted to estimate related parameters for statistical inference. Monte Carlo simulations validate our new models and we demonstrate the need for skewed factor models using the classic open/closed book exam scores dataset.
Skewed factor models using selection mechanisms

KAUST Repository

Kim, Hyoung-Moon; Maadooliat, Mehdi; Arellano-Valle, Reinaldo B.; Genton, Marc G.

2015-01-01

Traditional factor models explicitly or implicitly assume that the factors follow a multivariate normal distribution; that is, only moments up to order two are involved. However, it may happen in real data problems that the first two moments cannot explain the factors. Based on this motivation, here we devise three new skewed factor models, the skew-normal, the skew-tt, and the generalized skew-normal factor models depending on a selection mechanism on the factors. The ECME algorithms are adopted to estimate related parameters for statistical inference. Monte Carlo simulations validate our new models and we demonstrate the need for skewed factor models using the classic open/closed book exam scores dataset.
Improvements of the Vis-NIRS Model in the Prediction of Soil Organic Matter Content Using Spectral Pretreatments, Sample Selection, and Wavelength Optimization

Science.gov (United States)

Lin, Z. D.; Wang, Y. B.; Wang, R. J.; Wang, L. S.; Lu, C. P.; Zhang, Z. Y.; Song, L. T.; Liu, Y.

2017-07-01

A total of 130 topsoil samples collected from Guoyang County, Anhui Province, China, were used to establish a Vis-NIR model for the prediction of organic matter content (OMC) in lime concretion black soils. Different spectral pretreatments were applied for minimizing the irrelevant and useless information of the spectra and increasing the spectra correlation with the measured values. Subsequently, the Kennard-Stone (KS) method and sample set partitioning based on joint x-y distances (SPXY) were used to select the training set. Successive projection algorithm (SPA) and genetic algorithm (GA) were then applied for wavelength optimization. Finally, the principal component regression (PCR) model was constructed, in which the optimal number of principal components was determined using the leave-one-out cross validation technique. The results show that the combination of the Savitzky-Golay (SG) filter for smoothing and multiplicative scatter correction (MSC) can eliminate the effect of noise and baseline drift; the SPXY method is preferable to KS in the sample selection; both the SPA and the GA can significantly reduce the number of wavelength variables and favorably increase the accuracy, especially GA, which greatly improved the prediction accuracy of soil OMC with Rcc, RMSEP, and RPD up to 0.9316, 0.2142, and 2.3195, respectively.

40 CFR 205.160-2 - Test sample selection and preparation.

Science.gov (United States)

2010-07-01

... 40 Protection of Environment 24 2010-07-01 2010-07-01 false Test sample selection and preparation... sample selection and preparation. (a) Vehicles comprising the sample which are required to be tested... maintained in any manner unless such preparation, tests, modifications, adjustments or maintenance are part...
Selective parathyroid venous sampling in primary hyperparathyroidism: A systematic review and meta-analysis.

Science.gov (United States)

Ibraheem, Kareem; Toraih, Eman A; Haddad, Antoine B; Farag, Mahmoud; Randolph, Gregory W; Kandil, Emad

2018-05-14

Minimally invasive parathyroidectomy requires accurate preoperative localization techniques. There is considerable controversy about the effectiveness of selective parathyroid venous sampling (sPVS) in primary hyperparathyroidism (PHPT) patients. The aim of this meta-analysis is to examine the diagnostic accuracy of sPVS as a preoperative localization modality in PHPT. Studies evaluating the diagnostic accuracy of sPVS for PHPT were electronically searched in the PubMed, EMBASE, Web of Science, and Cochrane Controlled Trials Register databases. Two independent authors reviewed the studies, and revised quality assessment of diagnostic accuracy study tool was used for the quality assessment. Study heterogeneity and pooled estimates were calculated. Two hundred and two unique studies were identified. Of those, 12 studies were included in the meta-analysis. Pooled sensitivity, specificity, and positive likelihood ratio (PLR) of sPVS were 74%, 41%, and 1.55, respectively. The area-under-the-receiver operating characteristic curve was 0.684, indicating an average discriminatory ability of sPVS. On comparison between sPVS and noninvasive imaging modalities, sensitivity, PLR, and positive posttest probability were significantly higher in sPVS compared to noninvasive imaging modalities. Interestingly, super-selective venous sampling had the highest sensitivity, accuracy, and positive posttest probability compared to other parathyroid venous sampling techniques. This is the first meta-analysis to examine the accuracy of sPVS in PHPT. sPVS had higher pooled sensitivity when compared to noninvasive modalities in revision parathyroid surgery. However, the invasiveness of this technique does not favor its routine use for preoperative localization. Super-selective venous sampling was the most accurate among all other parathyroid venous sampling techniques. Laryngoscope, 2018. © 2018 The American Laryngological, Rhinological and Otological Society, Inc.
Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors

Directory of Open Access Journals (Sweden)

Xibin Zhang

2016-04-01

Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.
Bias-Corrected Estimation of Noncentrality Parameters of Covariance Structure Models

Science.gov (United States)

Raykov, Tenko

2005-01-01

A bias-corrected estimator of noncentrality parameters of covariance structure models is discussed. The approach represents an application of the bootstrap methodology for purposes of bias correction, and utilizes the relation between average of resample conventional noncentrality parameter estimates and their sample counterpart. The…
A heteroskedastic error covariance matrix estimator using a first-order conditional autoregressive Markov simulation for deriving asympotical efficient estimates from ecological sampled Anopheles arabiensis aquatic habitat covariates

Directory of Open Access Journals (Sweden)

Githure John I

2009-09-01

Full Text Available Abstract Background Autoregressive regression coefficients for Anopheles arabiensis aquatic habitat models are usually assessed using global error techniques and are reported as error covariance matrices. A global statistic, however, will summarize error estimates from multiple habitat locations. This makes it difficult to identify where there are clusters of An. arabiensis aquatic habitats of acceptable prediction. It is therefore useful to conduct some form of spatial error analysis to detect clusters of An. arabiensis aquatic habitats based on uncertainty residuals from individual sampled habitats. In this research, a method of error estimation for spatial simulation models was demonstrated using autocorrelation indices and eigenfunction spatial filters to distinguish among the effects of parameter uncertainty on a stochastic simulation of ecological sampled Anopheles aquatic habitat covariates. A test for diagnostic checking error residuals in an An. arabiensis aquatic habitat model may enable intervention efforts targeting productive habitats clusters, based on larval/pupal productivity, by using the asymptotic distribution of parameter estimates from a residual autocovariance matrix. The models considered in this research extends a normal regression analysis previously considered in the literature. Methods Field and remote-sampled data were collected during July 2006 to December 2007 in Karima rice-village complex in Mwea, Kenya. SAS 9.1.4® was used to explore univariate statistics, correlations, distributions, and to generate global autocorrelation statistics from the ecological sampled datasets. A local autocorrelation index was also generated using spatial covariance parameters (i.e., Moran's Indices in a SAS/GIS® database. The Moran's statistic was decomposed into orthogonal and uncorrelated synthetic map pattern components using a Poisson model with a gamma-distributed mean (i.e. negative binomial regression. The eigenfunction
The importance of spatial models for estimating the strength of density dependence

DEFF Research Database (Denmark)

Thorson, James T.; Skaug, Hans J.; Kristensen, Kasper

2014-01-01

the California Coast. In this case, the nonspatial model estimates implausible oscillatory dynamics on an annual time scale, while the spatial model estimates strong autocorrelation and is supported by model selection tools. We conclude by discussing the importance of improved data archiving techniques, so...... that spatial models can be used to re-examine classic questions regarding the presence and strength of density dependence in wild populations Read More: http://www.esajournals.org/doi/abs/10.1890/14-0739.1...
Sampling strategies for estimating brook trout effective population size

Science.gov (United States)

Andrew R. Whiteley; Jason A. Coombs; Mark Hudy; Zachary Robinson; Keith H. Nislow; Benjamin H. Letcher

2012-01-01

The influence of sampling strategy on estimates of effective population size (Ne) from single-sample genetic methods has not been rigorously examined, though these methods are increasingly used. For headwater salmonids, spatially close kin association among age-0 individuals suggests that sampling strategy (number of individuals and location from...
ARA and ARI imperfect repair models: Estimation, goodness-of-fit and reliability prediction

International Nuclear Information System (INIS)

Toledo, Maria Luíza Guerra de; Freitas, Marta A.; Colosimo, Enrico A.; Gilardoni, Gustavo L.

2015-01-01

An appropriate maintenance policy is essential to reduce expenses and risks related to equipment failures. A fundamental aspect to be considered when specifying such policies is to be able to predict the reliability of the systems under study, based on a well fitted model. In this paper, the classes of models Arithmetic Reduction of Age and Arithmetic Reduction of Intensity are explored. Likelihood functions for such models are derived, and a graphical method is proposed for model selection. A real data set involving failures in trucks used by a Brazilian mining is analyzed considering models with different memories. Parameters, namely, shape and scale for Power Law Process, and the efficiency of repair were estimated for the best fitted model. Estimation of model parameters allowed us to derive reliability estimators to predict the behavior of the failure process. These results are a valuable information for the mining company and can be used to support decision making regarding preventive maintenance policy. - Highlights: • Likelihood functions for imperfect repair models are derived. • A goodness-of-fit technique is proposed as a tool for model selection. • Failures in trucks owned by a Brazilian mining are modeled. • Estimation allowed deriving reliability predictors to forecast the future failure process of the trucks
Estimation and asymptotic theory for transition probabilities in Markov Renewal Multi–state models

NARCIS (Netherlands)

Spitoni, C.; Verduijn, M.; Putter, H.

2012-01-01

In this paper we discuss estimation of transition probabilities for semi–Markov multi–state models. Non–parametric and semi–parametric estimators of the transition probabilities for a large class of models (forward going models) are proposed. Large sample theory is derived using the functional
Estimation of Handgrip Force from SEMG Based on Wavelet Scale Selection.

Science.gov (United States)

Wang, Kai; Zhang, Xianmin; Ota, Jun; Huang, Yanjiang

2018-02-24

This paper proposes a nonlinear correlation-based wavelet scale selection technology to select the effective wavelet scales for the estimation of handgrip force from surface electromyograms (SEMG). The SEMG signal corresponding to gripping force was collected from extensor and flexor forearm muscles during the force-varying analysis task. We performed a computational sensitivity analysis on the initial nonlinear SEMG-handgrip force model. To explore the nonlinear correlation between ten wavelet scales and handgrip force, a large-scale iteration based on the Monte Carlo simulation was conducted. To choose a suitable combination of scales, we proposed a rule to combine wavelet scales based on the sensitivity of each scale and selected the appropriate combination of wavelet scales based on sequence combination analysis (SCA). The results of SCA indicated that the scale combination VI is suitable for estimating force from the extensors and the combination V is suitable for the flexors. The proposed method was compared to two former methods through prolonged static and force-varying contraction tasks. The experiment results showed that the root mean square errors derived by the proposed method for both static and force-varying contraction tasks were less than 20%. The accuracy and robustness of the handgrip force derived by the proposed method is better than that obtained by the former methods.
The effect of mis-specification on mean and selection between the Weibull and lognormal models

Science.gov (United States)

Jia, Xiang; Nadarajah, Saralees; Guo, Bo

2018-02-01

The lognormal and Weibull models are commonly used to analyse data. Although selection procedures have been extensively studied, it is possible that the lognormal model could be selected when the true model is Weibull or vice versa. As the mean is important in applications, we focus on the effect of mis-specification on mean. The effect on lognormal mean is first considered if the lognormal sample is wrongly fitted by a Weibull model. The maximum likelihood estimate (MLE) and quasi-MLE (QMLE) of lognormal mean are obtained based on lognormal and Weibull models. Then, the impact is evaluated by computing ratio of biases and ratio of mean squared errors (MSEs) between MLE and QMLE. For completeness, the theoretical results are demonstrated by simulation studies. Next, the effect of the reverse mis-specification on Weibull mean is discussed. It is found that the ratio of biases and the ratio of MSEs are independent of the location and scale parameters of the lognormal and Weibull models. The influence could be ignored if some special conditions hold. Finally, a model selection method is proposed by comparing ratios concerning biases and MSEs. We also present a published data to illustrate the study in this paper.
Honest Importance Sampling with Multiple Markov Chains.

Science.gov (United States)

Tan, Aixin; Doss, Hani; Hobert, James P

2015-01-01

Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π 1 , is used to estimate an expectation with respect to another, π . The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π 1 is replaced by a Harris ergodic Markov chain with invariant density π 1 , then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π 1 , …, π k , are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable
Creel survey sampling designs for estimating effort in short-duration Chinook salmon fisheries

Science.gov (United States)

McCormick, Joshua L.; Quist, Michael C.; Schill, Daniel J.

2013-01-01

Chinook Salmon Oncorhynchus tshawytscha sport fisheries in the Columbia River basin are commonly monitored using roving creel survey designs and require precise, unbiased catch estimates. The objective of this study was to examine the relative bias and precision of total catch estimates using various sampling designs to estimate angling effort under the assumption that mean catch rate was known. We obtained information on angling populations based on direct visual observations of portions of Chinook Salmon fisheries in three Idaho river systems over a 23-d period. Based on the angling population, Monte Carlo simulations were used to evaluate the properties of effort and catch estimates for each sampling design. All sampling designs evaluated were relatively unbiased. Systematic random sampling (SYS) resulted in the most precise estimates. The SYS and simple random sampling designs had mean square error (MSE) estimates that were generally half of those observed with cluster sampling designs. The SYS design was more efficient (i.e., higher accuracy per unit cost) than a two-cluster design. Increasing the number of clusters available for sampling within a day decreased the MSE of estimates of daily angling effort, but the MSE of total catch estimates was variable depending on the fishery. The results of our simulations provide guidelines on the relative influence of sample sizes and sampling designs on parameters of interest in short-duration Chinook Salmon fisheries.
Re-evaluating neonatal-age models for ungulates: does model choice affect survival estimates?

Directory of Open Access Journals (Sweden)

Troy W Grovenburg

Full Text Available New-hoof growth is regarded as the most reliable metric for predicting age of newborn ungulates, but variation in estimated age among hoof-growth equations that have been developed may affect estimates of survival in staggered-entry models. We used known-age newborns to evaluate variation in age estimates among existing hoof-growth equations and to determine the consequences of that variation on survival estimates. During 2001-2009, we captured and radiocollared 174 newborn (≤24-hrs old ungulates: 76 white-tailed deer (Odocoileus virginianus in Minnesota and South Dakota, 61 mule deer (O. hemionus in California, and 37 pronghorn (Antilocapra americana in South Dakota. Estimated age of known-age newborns differed among hoof-growth models and varied by >15 days for white-tailed deer, >20 days for mule deer, and >10 days for pronghorn. Accuracy (i.e., the proportion of neonates assigned to the correct age in aging newborns using published equations ranged from 0.0% to 39.4% in white-tailed deer, 0.0% to 3.3% in mule deer, and was 0.0% for pronghorns. Results of survival modeling indicated that variability in estimates of age-at-capture affected short-term estimates of survival (i.e., 30 days for white-tailed deer and mule deer, and survival estimates over a longer time frame (i.e., 120 days for mule deer. Conversely, survival estimates for pronghorn were not affected by estimates of age. Our analyses indicate that modeling survival in daily intervals is too fine a temporal scale when age-at-capture is unknown given the potential inaccuracies among equations used to estimate age of neonates. Instead, weekly survival intervals are more appropriate because most models accurately predicted ages within 1 week of the known age. Variation among results of neonatal-age models on short- and long-term estimates of survival for known-age young emphasizes the importance of selecting an appropriate hoof-growth equation and appropriately defining intervals (i
Identifyability measures to select the parameters to be estimated in a solid-state fermentation distributed parameter model.

Science.gov (United States)

da Silveira, Christian L; Mazutti, Marcio A; Salau, Nina P G

2016-07-08

Process modeling can lead to of advantages such as helping in process control, reducing process costs and product quality improvement. This work proposes a solid-state fermentation distributed parameter model composed by seven differential equations with seventeen parameters to represent the process. Also, parameters estimation with a parameters identifyability analysis (PIA) is performed to build an accurate model with optimum parameters. Statistical tests were made to verify the model accuracy with the estimated parameters considering different assumptions. The results have shown that the model assuming substrate inhibition better represents the process. It was also shown that eight from the seventeen original model parameters were nonidentifiable and better results were obtained with the removal of these parameters from the estimation procedure. Therefore, PIA can be useful to estimation procedure, since it may reduce the number of parameters that can be evaluated. Further, PIA improved the model results, showing to be an important procedure to be taken. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:905-917, 2016. © 2016 American Institute of Chemical Engineers.
Alternative Approaches to Technical Efficiency Estimation in the Stochastic Frontier Model

OpenAIRE

Acquah, H. de-Graft; Onumah, E. E.

2014-01-01

Estimating the stochastic frontier model and calculating technical efficiency of decision making units are of great importance in applied production economic works. This paper estimates technical efficiency from the stochastic frontier model using Jondrow, and Battese and Coelli approaches. In order to compare alternative methods, simulated data with sample sizes of 60 and 200 are generated from stochastic frontier model commonly applied to agricultural firms. Simulated data is employed to co...
Benefits of Dominance over Additive Models for the Estimation of Average Effects in the Presence of Dominance

Directory of Open Access Journals (Sweden)

Pascal Duenk

2017-10-01

Full Text Available In quantitative genetics, the average effect at a single locus can be estimated by an additive (A model, or an additive plus dominance (AD model. In the presence of dominance, the AD-model is expected to be more accurate, because the A-model falsely assumes that residuals are independent and identically distributed. Our objective was to investigate the accuracy of an estimated average effect (α^ in the presence of dominance, using either a single locus A-model or AD-model. Estimation was based on a finite sample from a large population in Hardy-Weinberg equilibrium (HWE, and the root mean squared error of α^ was calculated for several broad-sense heritabilities, sample sizes, and sizes of the dominance effect. Results show that with the A-model, both sampling deviations of genotype frequencies from HWE frequencies and sampling deviations of allele frequencies contributed to the error. With the AD-model, only sampling deviations of allele frequencies contributed to the error, provided that all three genotype classes were sampled. In the presence of dominance, the root mean squared error of α^ with the AD-model was always smaller than with the A-model, even when the heritability was less than one. Remarkably, in the absence of dominance, there was no disadvantage of fitting dominance. In conclusion, the AD-model yields more accurate estimates of average effects from a finite sample, because it is more robust against sampling deviations from HWE frequencies than the A-model. Genetic models that include dominance, therefore, yield higher accuracies of estimated average effects than purely additive models when dominance is present.
Estimating population salt intake in India using spot urine samples.

Science.gov (United States)

Petersen, Kristina S; Johnson, Claire; Mohan, Sailesh; Rogers, Kris; Shivashankar, Roopa; Thout, Sudhir Raj; Gupta, Priti; He, Feng J; MacGregor, Graham A; Webster, Jacqui; Santos, Joseph Alvin; Krishnan, Anand; Maulik, Pallab K; Reddy, K Srinath; Gupta, Ruby; Prabhakaran, Dorairaj; Neal, Bruce

2017-11-01

To compare estimates of mean population salt intake in North and South India derived from spot urine samples versus 24-h urine collections. In a cross-sectional survey, participants were sampled from slum, urban and rural communities in North and in South India. Participants provided 24-h urine collections, and random morning spot urine samples. Salt intake was estimated from the spot urine samples using a series of established estimating equations. Salt intake data from the 24-h urine collections and spot urine equations were weighted to provide estimates of salt intake for Delhi and Haryana, and Andhra Pradesh. A total of 957 individuals provided a complete 24-h urine collection and a spot urine sample. Weighted mean salt intake based on the 24-h urine collection, was 8.59 (95% confidence interval 7.73-9.45) and 9.46 g/day (8.95-9.96) in Delhi and Haryana, and Andhra Pradesh, respectively. Corresponding estimates based on the Tanaka equation [9.04 (8.63-9.45) and 9.79 g/day (9.62-9.96) for Delhi and Haryana, and Andhra Pradesh, respectively], the Mage equation [8.80 (7.67-9.94) and 10.19 g/day (95% CI 9.59-10.79)], the INTERSALT equation [7.99 (7.61-8.37) and 8.64 g/day (8.04-9.23)] and the INTERSALT equation with potassium [8.13 (7.74-8.52) and 8.81 g/day (8.16-9.46)] were all within 1 g/day of the estimate based upon 24-h collections. For the Toft equation, estimates were 1-2 g/day higher [9.94 (9.24-10.64) and 10.69 g/day (9.44-11.93)] and for the Kawasaki equation they were 3-4 g/day higher [12.14 (11.30-12.97) and 13.64 g/day (13.15-14.12)]. In urban and rural areas in North and South India, most spot urine-based equations provided reasonable estimates of mean population salt intake. Equations that did not provide good estimates may have failed because specimen collection was not aligned with the original method.
Estimation of genetic variability and selection response for clutch length in dwarf brown-egg layers carrying or not the naked neck gene

Directory of Open Access Journals (Sweden)

Tixier-Boichard Michèle

2003-03-01

Full Text Available Abstract In order to investigate the possibility of using the dwarf gene for egg production, two dwarf brown-egg laying lines were selected for 16 generations on average clutch length; one line (L1 was normally feathered and the other (L2 was homozygous for the naked neck gene NA. A control line from the same base population, dwarf and segregating for the NA gene, was maintained during the selection experiment under random mating. The average clutch length was normalized using a Box-Cox transformation. Genetic variability and selection response were estimated either with the mixed model methodology, or with the classical methods for calculating genetic gain, as the deviation from the control line, and the realized heritability, as the ratio of the selection response on cumulative selection differentials. Heritability of average clutch length was estimated to be 0.42 ± 0.02, with a multiple trait animal model, whereas the estimates of the realized heritability were lower, being 0.28 and 0.22 in lines L1 and L2, respectively. REML estimates of heritability were found to decline with generations of selection, suggesting a departure from the infinitesimal model, either because a limited number of genes was involved, or their frequencies were changed. The yearly genetic gains in average clutch length, after normalization, were estimated to be 0.37 ± 0.02 and 0.33 ± 0.04 with the classical methods, 0.46 ± 0.02 and 0.43 ± 0.01 with animal model methodology, for lines L1 and L2 respectively, which represented about 30% of the genetic standard deviation on the transformed scale. Selection response appeared to be faster in line L2, homozygous for the NA gene, but the final cumulated selection response for clutch length was not different between the L1 and L2 lines at generation 16.
Comparison of sampling designs for estimating deforestation from landsat TM and MODIS imagery: a case study in Mato Grosso, Brazil.

Science.gov (United States)

Zhu, Shanyou; Zhang, Hailong; Liu, Ronggao; Cao, Yun; Zhang, Guixin

2014-01-01

Sampling designs are commonly used to estimate deforestation over large areas, but comparisons between different sampling strategies are required. Using PRODES deforestation data as a reference, deforestation in the state of Mato Grosso in Brazil from 2005 to 2006 is evaluated using Landsat imagery and a nearly synchronous MODIS dataset. The MODIS-derived deforestation is used to assist in sampling and extrapolation. Three sampling designs are compared according to the estimated deforestation of the entire study area based on simple extrapolation and linear regression models. The results show that stratified sampling for strata construction and sample allocation using the MODIS-derived deforestation hotspots provided more precise estimations than simple random and systematic sampling. Moreover, the relationship between the MODIS-derived and TM-derived deforestation provides a precise estimate of the total deforestation area as well as the distribution of deforestation in each block.

Examining speed versus selection in connectivity models using elk migration as an example

Science.gov (United States)

Brennan, Angela; Hanks, Ephraim M.; Merkle, Jerod A.; Cole, Eric K.; Dewey, Sarah R.; Courtemanch, Alyson B.; Cross, Paul C.

2018-01-01

ContextLandscape resistance is vital to connectivity modeling and frequently derived from resource selection functions (RSFs). RSFs estimate relative probability of use and tend to focus on understanding habitat preferences during slow, routine animal movements (e.g., foraging). Dispersal and migration, however, can produce rarer, faster movements, in which case models of movement speed rather than resource selection may be more realistic for identifying habitats that facilitate connectivity.ObjectiveTo compare two connectivity modeling approaches applied to resistance estimated from models of movement rate and resource selection.MethodsUsing movement data from migrating elk, we evaluated continuous time Markov chain (CTMC) and movement-based RSF models (i.e., step selection functions [SSFs]). We applied circuit theory and shortest random path (SRP) algorithms to CTMC, SSF and null (i.e., flat) resistance surfaces to predict corridors between elk seasonal ranges. We evaluated prediction accuracy by comparing model predictions to empirical elk movements.ResultsAll connectivity models predicted elk movements well, but models applied to CTMC resistance were more accurate than models applied to SSF and null resistance. Circuit theory models were more accurate on average than SRP models.ConclusionsCTMC can be more realistic than SSFs for estimating resistance for fast movements, though SSFs may demonstrate some predictive ability when animals also move slowly through corridors (e.g., stopover use during migration). High null model accuracy suggests seasonal range data may also be critical for predicting direct migration routes. For animals that migrate or disperse across large landscapes, we recommend incorporating CTMC into the connectivity modeling toolkit.
Engineer’s estimate reliability and statistical characteristics of bids

Directory of Open Access Journals (Sweden)

Fariborz M. Tehrani

2016-12-01

Full Text Available The objective of this report is to provide a methodology for examining bids and evaluating the performance of engineer’s estimates in capturing the true cost of projects. This study reviews the cost development for transportation projects in addition to two sources of uncertainties in a cost estimate, including modeling errors and inherent variability. Sample projects are highway maintenance projects with a similar scope of the work, size, and schedule. Statistical analysis of engineering estimates and bids examines the adaptability of statistical models for sample projects. Further, the variation of engineering cost estimates from inception to implementation has been presented and discussed for selected projects. Moreover, the applicability of extreme values theory is assessed for available data. The results indicate that the performance of engineer’s estimate is best evaluated based on trimmed average of bids, excluding discordant bids.
Estimating leaf photosynthetic pigments information by stepwise multiple linear regression analysis and a leaf optical model

Science.gov (United States)

Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei

2014-10-01

Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.
Networked Estimation for Event-Based Sampling Systems with Packet Dropouts

Directory of Open Access Journals (Sweden)

Young Soo Suh

2009-04-01

Full Text Available This paper is concerned with a networked estimation problem in which sensor data are transmitted over the network. In the event-based sampling scheme known as level-crossing or send-on-delta (SOD, sensor data are transmitted to the estimator node if the difference between the current sensor value and the last transmitted one is greater than a given threshold. Event-based sampling has been shown to be more efficient than the time-triggered one in some situations, especially in network bandwidth improvement. However, it cannot detect packet dropout situations because data transmission and reception do not use a periodical time-stamp mechanism as found in time-triggered sampling systems. Motivated by this issue, we propose a modified event-based sampling scheme called modified SOD in which sensor data are sent when either the change of sensor output exceeds a given threshold or the time elapses more than a given interval. Through simulation results, we show that the proposed modified SOD sampling significantly improves estimation performance when packet dropouts happen.
How does observation uncertainty influence which stream water samples are most informative for model calibration?

Science.gov (United States)

Wang, Ling; van Meerveld, Ilja; Seibert, Jan

2016-04-01

Streamflow isotope samples taken during rainfall-runoff events are very useful for multi-criteria model calibration because they can help decrease parameter uncertainty and improve internal model consistency. However, the number of samples that can be collected and analysed is often restricted by practical and financial constraints. It is, therefore, important to choose an appropriate sampling strategy and to obtain samples that have the highest information content for model calibration. We used the Birkenes hydrochemical model and synthetic rainfall, streamflow and isotope data to explore which samples are most informative for model calibration. Starting with error-free observations, we investigated how many samples are needed to obtain a certain model fit. Based on different parameter sets, representing different catchments, and different rainfall events, we also determined which sampling times provide the most informative data for model calibration. Our results show that simulation performance for models calibrated with the isotopic data from two intelligently selected samples was comparable to simulations based on isotopic data for all 100 time steps. The models calibrated with the intelligently selected samples also performed better than the model calibrations with two benchmark sampling strategies (random selection and selection based on hydrologic information). Surprisingly, samples on the rising limb and at the peak were less informative than expected and, generally, samples taken at the end of the event were most informative. The timing of the most informative samples depends on the proportion of different flow components (baseflow, slow response flow, fast response flow and overflow). For events dominated by baseflow and slow response flow, samples taken at the end of the event after the fast response flow has ended were most informative; when the fast response flow was dominant, samples taken near the peak were most informative. However when overflow
Modern survey sampling

CERN Document Server

Chaudhuri, Arijit

2014-01-01

Exposure to SamplingAbstract Introduction Concepts of Population, Sample, and SamplingInitial RamificationsAbstract Introduction Sampling Design, Sampling SchemeRandom Numbers and Their Uses in Simple RandomSampling (SRS)Drawing Simple Random Samples with and withoutReplacementEstimation of Mean, Total, Ratio of Totals/Means:Variance and Variance EstimationDetermination of Sample SizesA.2 Appendix to Chapter 2 A.More on Equal Probability Sampling A.Horvitz-Thompson EstimatorA.SufficiencyA.LikelihoodA.Non-Existence Theorem More Intricacies Abstract Introduction Unequal Probability Sampling StrategiesPPS Sampling Exploring Improved WaysAbstract Introduction Stratified Sampling Cluster SamplingMulti-Stage SamplingMulti-Phase Sampling: Ratio and RegressionEstimationviiviii ContentsControlled SamplingModeling Introduction Super-Population ModelingPrediction Approach Model-Assisted Approach Bayesian Methods Spatial SmoothingSampling on Successive Occasions: Panel Rotation Non-Response and Not-at-Homes Weighting Adj...
The quasar luminosity function from a variability-selected sample

Science.gov (United States)

Hawkins, M. R. S.; Veron, P.

1993-01-01

A sample of quasars is selected from a 10-yr sequence of 30 UK Schmidt plates. Luminosity functions are derived in several redshift intervals, which in each case show a featureless power-law rise towards low luminosities. There is no sign of the 'break' found in the recent UVX sample of Boyle et al. It is suggested that reasons for the disagreement are connected with biases in the selection of the UVX sample. The question of the nature of quasar evolution appears to be still unresolved.
Comparison of distance sampling estimates to a known population ...

African Journals Online (AJOL)

Line-transect sampling was used to obtain abundance estimates of an Ant-eating Chat Myrmecocichla formicivora population to compare these with the true size of the population. The population size was determined by a long-term banding study, and abundance estimates were obtained by surveying line transects.
A method to combine non-probability sample data with probability sample data in estimating spatial means of environmental variables

NARCIS (Netherlands)

Brus, D.J.; Gruijter, de J.J.

2003-01-01

In estimating spatial means of environmental variables of a region from data collected by convenience or purposive sampling, validity of the results can be ensured by collecting additional data through probability sampling. The precision of the pi estimator that uses the probability sample can be
A sampling-based Bayesian model for gas saturation estimationusing seismic AVA and marine CSEM data

Energy Technology Data Exchange (ETDEWEB)

Chen, Jinsong; Hoversten, Michael; Vasco, Don; Rubin, Yoram; Hou,Zhangshuan

2006-04-04

We develop a sampling-based Bayesian model to jointly invertseismic amplitude versus angles (AVA) and marine controlled-sourceelectromagnetic (CSEM) data for layered reservoir models. The porosityand fluid saturation in each layer of the reservoir, the seismic P- andS-wave velocity and density in the layers below and above the reservoir,and the electrical conductivity of the overburden are considered asrandom variables. Pre-stack seismic AVA data in a selected time windowand real and quadrature components of the recorded electrical field areconsidered as data. We use Markov chain Monte Carlo (MCMC) samplingmethods to obtain a large number of samples from the joint posteriordistribution function. Using those samples, we obtain not only estimatesof each unknown variable, but also its uncertainty information. Thedeveloped method is applied to both synthetic and field data to explorethe combined use of seismic AVA and EM data for gas saturationestimation. Results show that the developed method is effective for jointinversion, and the incorporation of CSEM data reduces uncertainty influid saturation estimation, when compared to results from inversion ofAVA data only.
A NEW METHOD FOR NON DESTRUCTIVE ESTIMATION OF Jc IN YBaCuO CERAMIC SAMPLES

Directory of Open Access Journals (Sweden)

Giancarlo Cordeiro Costa

2014-12-01

Full Text Available This work presents a new method for estimation of Jc as a bulk characteristic of YBCO blocks. The experimental magnetic interaction force between a SmCo permanent magnet and a YBCO block was compared to finite element method (FEM simulations results, allowing us to search a best fitting value to the critical current of the superconducting sample. As FEM simulations were based on Bean model , the critical current density was taken as an unknown parameter. This is a non destructive estimation method. since there is no need of breaking even a little piece of the sample for analysis.
Model Complexity and Out-of-Sample Performance: Evidence from S&P 500 Index Returns

NARCIS (Netherlands)

Kaeck, Andreas; Rodrigues, Paulo; Seeger, Norman J.

We apply a range of out-of-sample specification tests to more than forty competing stochastic volatility models to address how model complexity affects out-of-sample performance. Using daily S&P 500 index returns, model confidence set estimations provide strong evidence that the most important model
E-Model MOS Estimate Precision Improvement and Modelling of Jitter Effects

Directory of Open Access Journals (Sweden)

Adrian Kovac

2012-01-01

Full Text Available This paper deals with the ITU-T E-model, which is used for non-intrusive MOS VoIP call quality estimation on IP networks. The pros of E-model are computational simplicity and usability on real-time traffic. The cons, as shown in our previous work, are the inability of E-model to reflect effects of network jitter present on real traffic flows and jitter-buffer behavior on end user devices. These effects are visible mostly on traffic over WAN, internet and radio networks and cause the E-model MOS call quality estimate to be noticeably too optimistic. In this paper, we propose a modification to E-model using previously proposed Pplef (effective packet loss using jitter and jitter-buffer model based on Pareto/D/1/K system. We subsequently perform optimization of newly added parameters reflecting jitter effects into E-model by using PESQ intrusive measurement method as a reference for selected audio codecs. Function fitting and parameter optimization is performed under varying delay, packet loss, jitter and different jitter-buffer sizes for both, correlated and uncorrelated long-tailed network traffic.
Modelling maximum likelihood estimation of availability

International Nuclear Information System (INIS)

Waller, R.A.; Tietjen, G.L.; Rock, G.W.

1975-01-01

Suppose the performance of a nuclear powered electrical generating power plant is continuously monitored to record the sequence of failure and repairs during sustained operation. The purpose of this study is to assess one method of estimating the performance of the power plant when the measure of performance is availability. That is, we determine the probability that the plant is operational at time t. To study the availability of a power plant, we first assume statistical models for the variables, X and Y, which denote the time-to-failure and the time-to-repair variables, respectively. Once those statistical models are specified, the availability, A(t), can be expressed as a function of some or all of their parameters. Usually those parameters are unknown in practice and so A(t) is unknown. This paper discusses the maximum likelihood estimator of A(t) when the time-to-failure model for X is an exponential density with parameter, lambda, and the time-to-repair model for Y is an exponential density with parameter, theta. Under the assumption of exponential models for X and Y, it follows that the instantaneous availability at time t is A(t)=lambda/(lambda+theta)+theta/(lambda+theta)exp[-[(1/lambda)+(1/theta)]t] with t>0. Also, the steady-state availability is A(infinity)=lambda/(lambda+theta). We use the observations from n failure-repair cycles of the power plant, say X 1 , X 2 , ..., Xsub(n), Y 1 , Y 2 , ..., Ysub(n) to present the maximum likelihood estimators of A(t) and A(infinity). The exact sampling distributions for those estimators and some statistical properties are discussed before a simulation model is used to determine 95% simulation intervals for A(t). The methodology is applied to two examples which approximate the operating history of two nuclear power plants. (author)
Mathematical estimation of the level of microbial contamination on spacecraft surfaces by volumetric air sampling

Science.gov (United States)

Oxborrow, G. S.; Roark, A. L.; Fields, N. D.; Puleo, J. R.

1974-01-01

Microbiological sampling methods presently used for enumeration of microorganisms on spacecraft surfaces require contact with easily damaged components. Estimation of viable particles on surfaces using air sampling methods in conjunction with a mathematical model would be desirable. Parameters necessary for the mathematical model are the effect of angled surfaces on viable particle collection and the number of viable cells per viable particle. Deposition of viable particles on angled surfaces closely followed a cosine function, and the number of viable cells per viable particle was consistent with a Poisson distribution. Other parameters considered by the mathematical model included deposition rate and fractional removal per unit time. A close nonlinear correlation between volumetric air sampling and airborne fallout on surfaces was established with all fallout data points falling within the 95% confidence limits as determined by the mathematical model.
Comparison of Sampling Designs for Estimating Deforestation from Landsat TM and MODIS Imagery: A Case Study in Mato Grosso, Brazil

Directory of Open Access Journals (Sweden)

Shanyou Zhu

2014-01-01

Full Text Available Sampling designs are commonly used to estimate deforestation over large areas, but comparisons between different sampling strategies are required. Using PRODES deforestation data as a reference, deforestation in the state of Mato Grosso in Brazil from 2005 to 2006 is evaluated using Landsat imagery and a nearly synchronous MODIS dataset. The MODIS-derived deforestation is used to assist in sampling and extrapolation. Three sampling designs are compared according to the estimated deforestation of the entire study area based on simple extrapolation and linear regression models. The results show that stratified sampling for strata construction and sample allocation using the MODIS-derived deforestation hotspots provided more precise estimations than simple random and systematic sampling. Moreover, the relationship between the MODIS-derived and TM-derived deforestation provides a precise estimate of the total deforestation area as well as the distribution of deforestation in each block.
On incomplete sampling under birth-death models and connections to the sampling-based coalescent.

Science.gov (United States)

Stadler, Tanja

2009-11-07

The constant rate birth-death process is used as a stochastic model for many biological systems, for example phylogenies or disease transmission. As the biological data are usually not fully available, it is crucial to understand the effect of incomplete sampling. In this paper, we analyze the constant rate birth-death process with incomplete sampling. We derive the density of the bifurcation events for trees on n leaves which evolved under this birth-death-sampling process. This density is used for calculating prior distributions in Bayesian inference programs and for efficiently simulating trees. We show that the birth-death-sampling process can be interpreted as a birth-death process with reduced rates and complete sampling. This shows that joint inference of birth rate, death rate and sampling probability is not possible. The birth-death-sampling process is compared to the sampling-based population genetics model, the coalescent. It is shown that despite many similarities between these two models, the distribution of bifurcation times remains different even in the case of very large population sizes. We illustrate these findings on an Hepatitis C virus dataset from Egypt. We show that the transmission times estimates are significantly different-the widely used Gamma statistic even changes its sign from negative to positive when switching from the coalescent to the birth-death process.
Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling.

Science.gov (United States)

Zhou, Fuqun; Zhang, Aining

2016-10-25

Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.
Explicit estimating equations for semiparametric generalized linear latent variable models

KAUST Repository

Ma, Yanyuan

2010-07-05

We study generalized linear latent variable models without requiring a distributional assumption of the latent variables. Using a geometric approach, we derive consistent semiparametric estimators. We demonstrate that these models have a property which is similar to that of a sufficient complete statistic, which enables us to simplify the estimating procedure and explicitly to formulate the semiparametric estimating equations. We further show that the explicit estimators have the usual root n consistency and asymptotic normality. We explain the computational implementation of our method and illustrate the numerical performance of the estimators in finite sample situations via extensive simulation studies. The advantage of our estimators over the existing likelihood approach is also shown via numerical comparison. We employ the method to analyse a real data example from economics. © 2010 Royal Statistical Society.
Moment Conditions Selection Based on Adaptive Penalized Empirical Likelihood

Directory of Open Access Journals (Sweden)

Yunquan Song

2014-01-01

Full Text Available Empirical likelihood is a very popular method and has been widely used in the fields of artificial intelligence (AI and data mining as tablets and mobile application and social media dominate the technology landscape. This paper proposes an empirical likelihood shrinkage method to efficiently estimate unknown parameters and select correct moment conditions simultaneously, when the model is defined by moment restrictions in which some are possibly misspecified. We show that our method enjoys oracle-like properties; that is, it consistently selects the correct moment conditions and at the same time its estimator is as efficient as the empirical likelihood estimator obtained by all correct moment conditions. Moreover, unlike the GMM, our proposed method allows us to carry out confidence regions for the parameters included in the model without estimating the covariances of the estimators. For empirical implementation, we provide some data-driven procedures for selecting the tuning parameter of the penalty function. The simulation results show that the method works remarkably well in terms of correct moment selection and the finite sample properties of the estimators. Also, a real-life example is carried out to illustrate the new methodology.

Optimal sampling designs for estimation of Plasmodium falciparum clearance rates in patients treated with artemisinin derivatives

Science.gov (United States)

2013-01-01

Background The emergence of Plasmodium falciparum resistance to artemisinins in Southeast Asia threatens the control of malaria worldwide. The pharmacodynamic hallmark of artemisinin derivatives is rapid parasite clearance (a short parasite half-life), therefore, the in vivo phenotype of slow clearance defines the reduced susceptibility to the drug. Measurement of parasite counts every six hours during the first three days after treatment have been recommended to measure the parasite clearance half-life, but it remains unclear whether simpler sampling intervals and frequencies might also be sufficient to reliably estimate this parameter. Methods A total of 2,746 parasite density-time profiles were selected from 13 clinical trials in Thailand, Cambodia, Mali, Vietnam, and Kenya. In these studies, parasite densities were measured every six hours until negative after treatment with an artemisinin derivative (alone or in combination with a partner drug). The WWARN Parasite Clearance Estimator (PCE) tool was used to estimate “reference” half-lives from these six-hourly measurements. The effect of four alternative sampling schedules on half-life estimation was investigated, and compared to the reference half-life (time zero, 6, 12, 24 (A1); zero, 6, 18, 24 (A2); zero, 12, 18, 24 (A3) or zero, 12, 24 (A4) hours and then every 12 hours). Statistical bootstrap methods were used to estimate the sampling distribution of half-lives for parasite populations with different geometric mean half-lives. A simulation study was performed to investigate a suite of 16 potential alternative schedules and half-life estimates generated by each of the schedules were compared to the “true” half-life. The candidate schedules in the simulation study included (among others) six-hourly sampling, schedule A1, schedule A4, and a convenience sampling schedule at six, seven, 24, 25, 48 and 49 hours. Results The median (range) parasite half-life for all clinical studies combined was 3.1 (0
The use of Thompson sampling to increase estimation precision

NARCIS (Netherlands)

Kaptein, M.C.

2015-01-01

In this article, we consider a sequential sampling scheme for efficient estimation of the difference between the means of two independent treatments when the population variances are unequal across groups. The sampling scheme proposed is based on a solution to bandit problems called Thompson
Sensitivity of postplanning target and OAR coverage estimates to dosimetric margin distribution sampling parameters.

Science.gov (United States)

Xu, Huijun; Gordon, J James; Siebers, Jeffrey V

2011-02-01

A dosimetric margin (DM) is the margin in a specified direction between a structure and a specified isodose surface, corresponding to a prescription or tolerance dose. The dosimetric margin distribution (DMD) is the distribution of DMs over all directions. Given a geometric uncertainty model, representing inter- or intrafraction setup uncertainties or internal organ motion, the DMD can be used to calculate coverage Q, which is the probability that a realized target or organ-at-risk (OAR) dose metric D, exceeds the corresponding prescription or tolerance dose. Postplanning coverage evaluation quantifies the percentage of uncertainties for which target and OAR structures meet their intended dose constraints. The goal of the present work is to evaluate coverage probabilities for 28 prostate treatment plans to determine DMD sampling parameters that ensure adequate accuracy for postplanning coverage estimates. Normally distributed interfraction setup uncertainties were applied to 28 plans for localized prostate cancer, with prescribed dose of 79.2 Gy and 10 mm clinical target volume to planning target volume (CTV-to-PTV) margins. Using angular or isotropic sampling techniques, dosimetric margins were determined for the CTV, bladder and rectum, assuming shift invariance of the dose distribution. For angular sampling, DMDs were sampled at fixed angular intervals w (e.g., w = 1 degree, 2 degrees, 5 degrees, 10 degrees, 20 degrees). Isotropic samples were uniformly distributed on the unit sphere resulting in variable angular increments, but were calculated for the same number of sampling directions as angular DMDs, and accordingly characterized by the effective angular increment omega eff. In each direction, the DM was calculated by moving the structure in radial steps of size delta (=0.1, 0.2, 0.5, 1 mm) until the specified isodose was crossed. Coverage estimation accuracy deltaQ was quantified as a function of the sampling parameters omega or omega eff and delta. The
MODELS TO ESTIMATE BRAZILIAN INDIRECT TENSILE STRENGTH OF LIMESTONE IN SATURATED STATE

Directory of Open Access Journals (Sweden)

Zlatko Briševac

2016-06-01

Full Text Available There are a number of methods of estimating physical and mechanical characteristics. Principally, the most widely used is the regression, but recently the more sophisticated methods such as neural networks has frequently been applied, as well. This paper presents the models of a simple and a multiple regression and the neural networks – types Radial Basis Function and Multiple Layer Perceptron, which can be used for the estimate of the Brazilian indirect tensile strength in saturated conditions. The paper includes the issues of collecting the data for the analysis and modelling and the overview of the performed analysis of the efficacy assessment of the estimate of each model. After the assessment, the model which provides the best estimate was selected, including the model which could have the most wide-spread application in the engineering practice.
Estimating and Forecasting Generalized Fractional Long Memory Stochastic Volatility Models

Directory of Open Access Journals (Sweden)

Shelton Peiris

2017-12-01

Full Text Available This paper considers a flexible class of time series models generated by Gegenbauer polynomials incorporating the long memory in stochastic volatility (SV components in order to develop the General Long Memory SV (GLMSV model. We examine the corresponding statistical properties of this model, discuss the spectral likelihood estimation and investigate the finite sample properties via Monte Carlo experiments. We provide empirical evidence by applying the GLMSV model to three exchange rate return series and conjecture that the results of out-of-sample forecasts adequately confirm the use of GLMSV model in certain financial applications.
A Bayesian analysis of sensible heat flux estimation: Quantifying uncertainty in meteorological forcing to improve model prediction

KAUST Repository

Ershadi, Ali

2013-05-01

The influence of uncertainty in land surface temperature, air temperature, and wind speed on the estimation of sensible heat flux is analyzed using a Bayesian inference technique applied to the Surface Energy Balance System (SEBS) model. The Bayesian approach allows for an explicit quantification of the uncertainties in input variables: a source of error generally ignored in surface heat flux estimation. An application using field measurements from the Soil Moisture Experiment 2002 is presented. The spatial variability of selected input meteorological variables in a multitower site is used to formulate the prior estimates for the sampling uncertainties, and the likelihood function is formulated assuming Gaussian errors in the SEBS model. Land surface temperature, air temperature, and wind speed were estimated by sampling their posterior distribution using a Markov chain Monte Carlo algorithm. Results verify that Bayesian-inferred air temperature and wind speed were generally consistent with those observed at the towers, suggesting that local observations of these variables were spatially representative. Uncertainties in the land surface temperature appear to have the strongest effect on the estimated sensible heat flux, with Bayesian-inferred values differing by up to ±5°C from the observed data. These differences suggest that the footprint of the in situ measured land surface temperature is not representative of the larger-scale variability. As such, these measurements should be used with caution in the calculation of surface heat fluxes and highlight the importance of capturing the spatial variability in the land surface temperature: particularly, for remote sensing retrieval algorithms that use this variable for flux estimation.
An estimator of the survival function based on the semi-Markov model under dependent censorship.

Science.gov (United States)

Lee, Seung-Yeoun; Tsai, Wei-Yann

2005-06-01

Lee and Wolfe (Biometrics vol. 54 pp. 1176-1178, 1998) proposed the two-stage sampling design for testing the assumption of independent censoring, which involves further follow-up of a subset of lost-to-follow-up censored subjects. They also proposed an adjusted estimator for the survivor function for a proportional hazards model under the dependent censoring model. In this paper, a new estimator for the survivor function is proposed for the semi-Markov model under the dependent censorship on the basis of the two-stage sampling data. The consistency and the asymptotic distribution of the proposed estimator are derived. The estimation procedure is illustrated with an example of lung cancer clinical trial and simulation results are reported of the mean squared errors of estimators under a proportional hazards and two different nonproportional hazards models.
Generalized Spatial Two Stage Least Squares Estimation of Spatial Autoregressive Models with Autoregressive Disturbances in the Presence of Endogenous Regressors and Many Instruments

Directory of Open Access Journals (Sweden)

Fei Jin

2013-05-01

Full Text Available This paper studies the generalized spatial two stage least squares (GS2SLS estimation of spatial autoregressive models with autoregressive disturbances when there are endogenous regressors with many valid instruments. Using many instruments may improve the efficiency of estimators asymptotically, but the bias might be large in finite samples, making the inference inaccurate. We consider the case that the number of instruments K increases with, but at a rate slower than, the sample size, and derive the approximate mean square errors (MSE that account for the trade-offs between the bias and variance, for both the GS2SLS estimator and a bias-corrected GS2SLS estimator. A criterion function for the optimal K selection can be based on the approximate MSEs. Monte Carlo experiments are provided to show the performance of our procedure of choosing K.
Random Walks on Directed Networks: Inference and Respondent-Driven Sampling

Directory of Open Access Journals (Sweden)

Malmros Jens

2016-06-01

Full Text Available Respondent-driven sampling (RDS is often used to estimate population properties (e.g., sexual risk behavior in hard-to-reach populations. In RDS, already sampled individuals recruit population members to the sample from their social contacts in an efficient snowball-like sampling procedure. By assuming a Markov model for the recruitment of individuals, asymptotically unbiased estimates of population characteristics can be obtained. Current RDS estimation methodology assumes that the social network is undirected, that is, all edges are reciprocal. However, empirical social networks in general also include a substantial number of nonreciprocal edges. In this article, we develop an estimation method for RDS in populations connected by social networks that include reciprocal and nonreciprocal edges. We derive estimators of the selection probabilities of individuals as a function of the number of outgoing edges of sampled individuals. The proposed estimators are evaluated on artificial and empirical networks and are shown to generally perform better than existing estimators. This is the case in particular when the fraction of directed edges in the network is large.
OPTIMAL WAVELENGTH SELECTION ON HYPERSPECTRAL DATA WITH FUSED LASSO FOR BIOMASS ESTIMATION OF TROPICAL RAIN FOREST

Directory of Open Access Journals (Sweden)

T. Takayama

2016-06-01

Full Text Available Above-ground biomass prediction of tropical rain forest using remote sensing data is of paramount importance to continuous large-area forest monitoring. Hyperspectral data can provide rich spectral information for the biomass prediction; however, the prediction accuracy is affected by a small-sample-size problem, which widely exists as overfitting in using high dimensional data where the number of training samples is smaller than the dimensionality of the samples due to limitation of require time, cost, and human resources for field surveys. A common approach to addressing this problem is reducing the dimensionality of dataset. Also, acquired hyperspectral data usually have low signal-to-noise ratio due to a narrow bandwidth and local or global shifts of peaks due to instrumental instability or small differences in considering practical measurement conditions. In this work, we propose a methodology based on fused lasso regression that select optimal bands for the biomass prediction model with encouraging sparsity and grouping, which solves the small-sample-size problem by the dimensionality reduction from the sparsity and the noise and peak shift problem by the grouping. The prediction model provided higher accuracy with root-mean-square error (RMSE of 66.16 t/ha in the cross-validation than other methods; multiple linear analysis, partial least squares regression, and lasso regression. Furthermore, fusion of spectral and spatial information derived from texture index increased the prediction accuracy with RMSE of 62.62 t/ha. This analysis proves efficiency of fused lasso and image texture in biomass estimation of tropical forests.
A free-knot spline modeling framework for piecewise linear logistic regression in complex samples with body mass index and mortality as an example

Directory of Open Access Journals (Sweden)

Scott W. Keith

2014-09-01

Full Text Available This paper details the design, evaluation, and implementation of a framework for detecting and modeling nonlinearity between a binary outcome and a continuous predictor variable adjusted for covariates in complex samples. The framework provides familiar-looking parameterizations of output in terms of linear slope coefficients and odds ratios. Estimation methods focus on maximum likelihood optimization of piecewise linear free-knot splines formulated as B-splines. Correctly specifying the optimal number and positions of the knots improves the model, but is marked by computational intensity and numerical instability. Our inference methods utilize both parametric and nonparametric bootstrapping. Unlike other nonlinear modeling packages, this framework is designed to incorporate multistage survey sample designs common to nationally representative datasets. We illustrate the approach and evaluate its performance in specifying the correct number of knots under various conditions with an example using body mass index (BMI; kg/m2 and the complex multi-stage sampling design from the Third National Health and Nutrition Examination Survey to simulate binary mortality outcomes data having realistic nonlinear sample-weighted risk associations with BMI. BMI and mortality data provide a particularly apt example and area of application since BMI is commonly recorded in large health surveys with complex designs, often categorized for modeling, and nonlinearly related to mortality. When complex sample design considerations were ignored, our method was generally similar to or more accurate than two common model selection procedures, Schwarz’s Bayesian Information Criterion (BIC and Akaike’s Information Criterion (AIC, in terms of correctly selecting the correct number of knots. Our approach provided accurate knot selections when complex sampling weights were incorporated, while AIC and BIC were not effective under these conditions.
Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient.

Science.gov (United States)

Algina, James; Olejnik, Stephen

2000-01-01

Discusses determining sample size for estimation of the squared multiple correlation coefficient and presents regression equations that permit determination of the sample size for estimating this parameter for up to 20 predictor variables. (SLD)
Simulation of selected genealogies.

Science.gov (United States)

Slade, P F

2000-02-01

Algorithms for generating genealogies with selection conditional on the sample configuration of n genes in one-locus, two-allele haploid and diploid models are presented. Enhanced integro-recursions using the ancestral selection graph, introduced by S. M. Krone and C. Neuhauser (1997, Theor. Popul. Biol. 51, 210-237), which is the non-neutral analogue of the coalescent, enables accessible simulation of the embedded genealogy. A Monte Carlo simulation scheme based on that of R. C. Griffiths and S. Tavaré (1996, Math. Comput. Modelling 23, 141-158), is adopted to consider the estimation of ancestral times under selection. Simulations show that selection alters the expected depth of the conditional ancestral trees, depending on a mutation-selection balance. As a consequence, branch lengths are shown to be an ineffective criterion for detecting the presence of selection. Several examples are given which quantify the effects of selection on the conditional expected time to the most recent common ancestor. Copyright 2000 Academic Press.
Cost-effective sampling of 137Cs-derived net soil redistribution: part 1 – estimating the spatial mean across scales of variation

International Nuclear Information System (INIS)

Li, Y.; Chappell, A.; Nyamdavaa, B.; Yu, H.; Davaasuren, D.; Zoljargal, K.

2015-01-01

The 137 Cs technique for estimating net time-integrated soil redistribution is valuable for understanding the factors controlling soil redistribution by all processes. The literature on this technique is dominated by studies of individual fields and describes its typically time-consuming nature. We contend that the community making these studies has inappropriately assumed that many 137 Cs measurements are required and hence estimates of net soil redistribution can only be made at the field scale. Here, we support future studies of 137 Cs-derived net soil redistribution to apply their often limited resources across scales of variation (field, catchment, region etc.) without compromising the quality of the estimates at any scale. We describe a hybrid, design-based and model-based, stratified random sampling design with composites to estimate the sampling variance and a cost model for fieldwork and laboratory measurements. Geostatistical mapping of net (1954–2012) soil redistribution as a case study on the Chinese Loess Plateau is compared with estimates for several other sampling designs popular in the literature. We demonstrate the cost-effectiveness of the hybrid design for spatial estimation of net soil redistribution. To demonstrate the limitations of current sampling approaches to cut across scales of variation, we extrapolate our estimate of net soil redistribution across the region, show that for the same resources, estimates from many fields could have been provided and would elucidate the cause of differences within and between regional estimates. We recommend that future studies evaluate carefully the sampling design to consider the opportunity to investigate 137 Cs-derived net soil redistribution across scales of variation. - Highlights: • The 137 Cs technique estimates net time-integrated soil redistribution by all processes. • It is time-consuming and dominated by studies of individual fields. • We use limited resources to estimate soil
Density meter algorithm and system for estimating sampling/mixing uncertainty

International Nuclear Information System (INIS)

Shine, E.P.

1986-01-01

The Laboratories Department at the Savannah River Plant (SRP) has installed a six-place density meter with an automatic sampling device. This paper describes the statistical software developed to analyze the density of uranyl nitrate solutions using this automated system. The purpose of this software is twofold: to estimate the sampling/mixing and measurement uncertainties in the process and to provide a measurement control program for the density meter. Non-uniformities in density are analyzed both analytically and graphically. The mean density and its limit of error are estimated. Quality control standards are analyzed concurrently with process samples and used to control the density meter measurement error. The analyses are corrected for concentration due to evaporation of samples waiting to be analyzed. The results of this program have been successful in identifying sampling/mixing problems and controlling the quality of analyses
Density meter algorithm and system for estimating sampling/mixing uncertainty

International Nuclear Information System (INIS)

Shine, E.P.

1986-01-01

The Laboratories Department at the Savannah River Plant (SRP) has installed a six-place density meter with an automatic sampling device. This paper describes the statisical software developed to analyze the density of uranyl nitrate solutions using this automated system. The purpose of this software is twofold: to estimate the sampling/mixing and measurement uncertainties in the process and to provide a measurement control program for the density meter. Non-uniformities in density are analyzed both analytically and graphically. The mean density and its limit of error are estimated. Quality control standards are analyzed concurrently with process samples and used to control the density meter measurement error. The analyses are corrected for concentration due to evaporation of samples waiting to be analyzed. The results of this program have been successful in identifying sampling/mixing problems and controlling the quality of analyses
Selection bias in dynamically measured supermassive black hole samples: scaling relations and correlations between residuals in semi-analytic galaxy formation models

Science.gov (United States)

Barausse, Enrico; Shankar, Francesco; Bernardi, Mariangela; Dubois, Yohan; Sheth, Ravi K.

2017-07-01

Recent work has confirmed that the scaling relations between the masses of supermassive black holes and host-galaxy properties such as stellar masses and velocity dispersions may be biased high. Much of this may be caused by the requirement that the black hole sphere of influence must be resolved for the black hole mass to be reliably estimated. We revisit this issue with a comprehensive galaxy evolution semi-analytic model. Once tuned to reproduce the (mean) correlation of black hole mass with velocity dispersion, the model cannot account for the correlation with stellar mass. This is independent of the model's parameters, thus suggesting an internal inconsistency in the data. The predicted distributions, especially at the low-mass end, are also much broader than observed. However, if selection effects are included, the model's predictions tend to align with the observations. We also demonstrate that the correlations between the residuals of the scaling relations are more effective than the relations themselves at constraining models for the feedback of active galactic nuclei (AGNs). In fact, we find that our model, while in apparent broad agreement with the scaling relations when accounting for selection biases, yields very weak correlations between their residuals at fixed stellar mass, in stark contrast with observations. This problem persists when changing the AGN feedback strength, and is also present in the hydrodynamic cosmological simulation Horizon-AGN, which includes state-of-the-art treatments of AGN feedback. This suggests that current AGN feedback models are too weak or simply not capturing the effect of the black hole on the stellar velocity dispersion.
Global parameter estimation for thermodynamic models of transcriptional regulation.

Science.gov (United States)

Suleimenov, Yerzhan; Ay, Ahmet; Samee, Md Abul Hassan; Dresch, Jacqueline M; Sinha, Saurabh; Arnosti, David N

2013-07-15

Deciphering the mechanisms involved in gene regulation holds the key to understanding the control of central biological processes, including human disease, population variation, and the evolution of morphological innovations. New experimental techniques including whole genome sequencing and transcriptome analysis have enabled comprehensive modeling approaches to study gene regulation. In many cases, it is useful to be able to assign biological significance to the inferred model parameters, but such interpretation should take into account features that affect these parameters, including model construction and sensitivity, the type of fitness calculation, and the effectiveness of parameter estimation. This last point is often neglected, as estimation methods are often selected for historical reasons or for computational ease. Here, we compare the performance of two parameter estimation techniques broadly representative of local and global approaches, namely, a quasi-Newton/Nelder-Mead simplex (QN/NMS) method and a covariance matrix adaptation-evolutionary strategy (CMA-ES) method. The estimation methods were applied to a set of thermodynamic models of gene transcription applied to regulatory elements active in the Drosophila embryo. Measuring overall fit, the global CMA-ES method performed significantly better than the local QN/NMS method on high quality data sets, but this difference was negligible on lower quality data sets with increased noise or on data sets simplified by stringent thresholding. Our results suggest that the choice of parameter estimation technique for evaluation of gene expression models depends both on quality of data, the nature of the models [again, remains to be established] and the aims of the modeling effort. Copyright © 2013 Elsevier Inc. All rights reserved.
A Bayesian Approach to Model Selection in Hierarchical Mixtures-of-Experts Architectures.

Science.gov (United States)

Tanner, Martin A.; Peng, Fengchun; Jacobs, Robert A.

1997-03-01

There does not exist a statistical model that shows good performance on all tasks. Consequently, the model selection problem is unavoidable; investigators must decide which model is best at summarizing the data for each task of interest. This article presents an approach to the model selection problem in hierarchical mixtures-of-experts architectures. These architectures combine aspects of generalized linear models with those of finite mixture models in order to perform tasks via a recursive "divide-and-conquer" strategy. Markov chain Monte Carlo methodology is used to estimate the distribution of the architectures' parameters. One part of our approach to model selection attempts to estimate the worth of each component of an architecture so that relatively unused components can be pruned from the architecture's structure. A second part of this approach uses a Bayesian hypothesis testing procedure in order to differentiate inputs that carry useful information from nuisance inputs. Simulation results suggest that the approach presented here adheres to the dictum of Occam's razor; simple architectures that are adequate for summarizing the data are favored over more complex structures. Copyright 1997 Elsevier Science Ltd. All Rights Reserved.
Nonparametric Regression Estimation for Multivariate Null Recurrent Processes

Directory of Open Access Journals (Sweden)

Biqing Cai

2015-04-01

Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.

Estimates of selection parameters in protein mutants of spring barley

International Nuclear Information System (INIS)

Gaul, H.; Walther, H.; Seibold, K.H.; Brunner, H.; Mikaelsen, K.

1976-01-01

Detailed studies have been made with induced protein mutants regarding a possible genetic advance in selection including the estimation of the genetic variation and heritability coefficients. Estimates were obtained for protein content and protein yield. The variation of mutant lines in different environments was found to be many times as large as the variation of the line means. The detection of improved protein mutants seems therefore possible only in trials with more than one environment. The heritability of protein content and protein yield was estimated in different sets of environments and was found to be low. However, higher values were found with an increasing number of environments. At least four environments seem to be necessary to obtain reliable heritability estimates. The geneticall component of the variation between lines was significant for protein content in all environmental combinations. For protein yield some environmental combinations only showed significant differences. The expected genetic advance with one selection step was small for both protein traits. Genetically significant differences between protein micromutants give, however, a first indication that selection among protein mutants with small differences seems also possible. (author)
A Unimodal Model for Double Observer Distance Sampling Surveys.

Directory of Open Access Journals (Sweden)

Earl F Becker

Full Text Available Distance sampling is a widely used method to estimate animal population size. Most distance sampling models utilize a monotonically decreasing detection function such as a half-normal. Recent advances in distance sampling modeling allow for the incorporation of covariates into the distance model, and the elimination of the assumption of perfect detection at some fixed distance (usually the transect line with the use of double-observer models. The assumption of full observer independence in the double-observer model is problematic, but can be addressed by using the point independence assumption which assumes there is one distance, the apex of the detection function, where the 2 observers are assumed independent. Aerially collected distance sampling data can have a unimodal shape and have been successfully modeled with a gamma detection function. Covariates in gamma detection models cause the apex of detection to shift depending upon covariate levels, making this model incompatible with the point independence assumption when using double-observer data. This paper reports a unimodal detection model based on a two-piece normal distribution that allows covariates, has only one apex, and is consistent with the point independence assumption when double-observer data are utilized. An aerial line-transect survey of black bears in Alaska illustrate how this method can be applied.
Targeting estimation of CCC-GARCH models with infinite fourth moments

DEFF Research Database (Denmark)

Pedersen, Rasmus Søndergaard

. In this paper we consider the large-sample properties of the variance targeting estimator for the multivariate extended constant conditional correlation GARCH model when the distribution of the data generating process has infinite fourth moments. Using non-standard limit theory we derive new results...... for the estimator stating that its limiting distribution is multivariate stable. The rate of consistency of the estimator is slower than √Τ (as obtained by the quasi-maximum likelihood estimator) and depends on the tails of the data generating process....
DeepQA: Improving the estimation of single protein model quality with deep belief networks

OpenAIRE

Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie; Cheng, Jianlin

2016-01-01

Background Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. Results We introduce a novel single-model quality assessment method DeepQA based on deep belie...
Small Sample Properties of Bayesian Multivariate Autoregressive Time Series Models

Science.gov (United States)

Price, Larry R.

2012-01-01

The aim of this study was to compare the small sample (N = 1, 3, 5, 10, 15) performance of a Bayesian multivariate vector autoregressive (BVAR-SEM) time series model relative to frequentist power and parameter estimation bias. A multivariate autoregressive model was developed based on correlated autoregressive time series vectors of varying…
Estimating the variation, autocorrelation, and environmental sensitivity of phenotypic selection

NARCIS (Netherlands)

Chevin, Luis-Miguel; Visser, Marcel E.; Tufto, Jarle

2015-01-01

Despite considerable interest in temporal and spatial variation of phenotypic selection, very few methods allow quantifying this variation while correctly accounting for the error variance of each individual estimate. Furthermore, the available methods do not estimate the autocorrelation of
Estimating the variation, autocorrelation, and environmental sensitivity of phenotypic selection

NARCIS (Netherlands)

Chevin, Luis-Miguel; Visser, Marcel E.; Tufto, Jarle

Despite considerable interest in temporal and spatial variation of phenotypic selection, very few methods allow quantifying this variation while correctly accounting for the error variance of each individual estimate. Furthermore, the available methods do not estimate the autocorrelation of
Adaptive Annealed Importance Sampling for Multimodal Posterior Exploration and Model Selection with Application to Extrasolar Planet Detection

Science.gov (United States)

Liu, Bin

2014-07-01

We describe an algorithm that can adaptively provide mixture summaries of multimodal posterior distributions. The parameter space of the involved posteriors ranges in size from a few dimensions to dozens of dimensions. This work was motivated by an astrophysical problem called extrasolar planet (exoplanet) detection, wherein the computation of stochastic integrals that are required for Bayesian model comparison is challenging. The difficulty comes from the highly nonlinear models that lead to multimodal posterior distributions. We resort to importance sampling (IS) to estimate the integrals, and thus translate the problem to be how to find a parametric approximation of the posterior. To capture the multimodal structure in the posterior, we initialize a mixture proposal distribution and then tailor its parameters elaborately to make it resemble the posterior to the greatest extent possible. We use the effective sample size (ESS) calculated based on the IS draws to measure the degree of approximation. The bigger the ESS is, the better the proposal resembles the posterior. A difficulty within this tailoring operation lies in the adjustment of the number of mixing components in the mixture proposal. Brute force methods just preset it as a large constant, which leads to an increase in the required computational resources. We provide an iterative delete/merge/add process, which works in tandem with an expectation-maximization step to tailor such a number online. The efficiency of our proposed method is tested via both simulation studies and real exoplanet data analysis.
THE INFLUENCE OF CONVERSION MODEL CHOICE FOR EROSION RATE ESTIMATION AND THE SENSITIVITY OF THE RESULTS TO CHANGES IN THE MODEL PARAMETER

Directory of Open Access Journals (Sweden)

Nita Suhartini

2010-06-01

Full Text Available A study of soil erosion rates had been done on a slightly and long slope of cultivated area in Ciawi - Bogor, using 137Cs technique. The objective of the present study was to evaluate the applicability of the 137Cs technique in obtaining spatially distributed information of soil redistribution at small catchment. This paper reports the result of the choice of conversion model for erosion rate estimates and the sensitive of the changes in the model parameter. For this purpose, small site was selected, namely landuse I (LU-I. The top of a slope was chosen as a reference site. The erosion/deposit rate of individual sampling points was estimated using the conversion models, namely Proportional Model (PM, Mass Balance Model 1 (MBM1 and Mass Balance Model 2 (MBM2. A comparison of the conversion models showed that the lowest value is obtained by the PM. The MBM1 gave values closer to MBM2, but MBM2 gave a reliable values. In this study, a sensitivity analysis suggest that the conversion models are sensitive to changes in parameters that depend on the site conditions, but insensitive to changes in parameters that interact to the onset of 137Cs fallout input. Keywords: soil erosion, environmental radioisotope, cesium
Optimization of multi-environment trials for genomic selection based on crop models.

Science.gov (United States)

Rincent, R; Kuhn, E; Monod, H; Oury, F-X; Rousset, M; Allard, V; Le Gouis, J

2017-08-01

We propose a statistical criterion to optimize multi-environment trials to predict genotype × environment interactions more efficiently, by combining crop growth models and genomic selection models. Genotype × environment interactions (GEI) are common in plant multi-environment trials (METs). In this context, models developed for genomic selection (GS) that refers to the use of genome-wide information for predicting breeding values of selection candidates need to be adapted. One promising way to increase prediction accuracy in various environments is to combine ecophysiological and genetic modelling thanks to crop growth models (CGM) incorporating genetic parameters. The efficiency of this approach relies on the quality of the parameter estimates, which depends on the environments composing this MET used for calibration. The objective of this study was to determine a method to optimize the set of environments composing the MET for estimating genetic parameters in this context. A criterion called OptiMET was defined to this aim, and was evaluated on simulated and real data, with the example of wheat phenology. The MET defined with OptiMET allowed estimating the genetic parameters with lower error, leading to higher QTL detection power and higher prediction accuracies. MET defined with OptiMET was on average more efficient than random MET composed of twice as many environments, in terms of quality of the parameter estimates. OptiMET is thus a valuable tool to determine optimal experimental conditions to best exploit MET and the phenotyping tools that are currently developed.
Finite Sample Comparison of Parametric, Semiparametric, and Wavelet Estimators of Fractional Integration

DEFF Research Database (Denmark)

Nielsen, Morten Ø.; Frederiksen, Per Houmann

2005-01-01

In this paper we compare through Monte Carlo simulations the finite sample properties of estimators of the fractional differencing parameter, d. This involves frequency domain, time domain, and wavelet based approaches, and we consider both parametric and semiparametric estimation methods. The es...... the time domain parametric methods, and (4) without sufficient trimming of scales the wavelet-based estimators are heavily biased.......In this paper we compare through Monte Carlo simulations the finite sample properties of estimators of the fractional differencing parameter, d. This involves frequency domain, time domain, and wavelet based approaches, and we consider both parametric and semiparametric estimation methods....... The estimators are briefly introduced and compared, and the criteria adopted for measuring finite sample performance are bias and root mean squared error. Most importantly, the simulations reveal that (1) the frequency domain maximum likelihood procedure is superior to the time domain parametric methods, (2) all...
Reactivity-worth estimates of the OSMOSE samples in the MINERVE reactor R1-UO2 configuration.

Energy Technology Data Exchange (ETDEWEB)

Klann, R. T.; Perret, G.; Nuclear Engineering Division

2007-10-03

An initial series of calculations of the reactivity-worth of the OSMOSE samples in the MINERVE reactor with the R1-UO2 core configuration were completed. The reactor model was generated using the REBUS code developed at Argonne National Laboratory. The calculations are based on the specifications for fabrication, so they are considered preliminary until sampling and analysis have been completed on the fabricated samples. The estimates indicate a range of reactivity effect from -22 pcm to +25 pcm compared to the natural U sample.
Model uncertainty and multimodel inference in reliability estimation within a longitudinal framework.

Science.gov (United States)

Alonso, Ariel; Laenen, Annouschka

2013-05-01

Laenen, Alonso, and Molenberghs (2007) and Laenen, Alonso, Molenberghs, and Vangeneugden (2009) proposed a method to assess the reliability of rating scales in a longitudinal context. The methodology is based on hierarchical linear models, and reliability coefficients are derived from the corresponding covariance matrices. However, finding a good parsimonious model to describe complex longitudinal data is a challenging task. Frequently, several models fit the data equally well, raising the problem of model selection uncertainty. When model uncertainty is high one may resort to model averaging, where inferences are based not on one but on an entire set of models. We explored the use of different model building strategies, including model averaging, in reliability estimation. We found that the approach introduced by Laenen et al. (2007, 2009) combined with some of these strategies may yield meaningful results in the presence of high model selection uncertainty and when all models are misspecified, in so far as some of them manage to capture the most salient features of the data. Nonetheless, when all models omit prominent regularities in the data, misleading results may be obtained. The main ideas are further illustrated on a case study in which the reliability of the Hamilton Anxiety Rating Scale is estimated. Importantly, the ambit of model selection uncertainty and model averaging transcends the specific setting studied in the paper and may be of interest in other areas of psychometrics. © 2012 The British Psychological Society.
Non-destructive linear model for leaf area estimation in Vernonia ferruginea Less

Directory of Open Access Journals (Sweden)

MC. Souza

Full Text Available Leaf area estimation is an important biometrical trait for evaluating leaf development and plant growth in field and pot experiments. We developed a non-destructive model to estimate the leaf area (LA of Vernonia ferruginea using the length (L and width (W leaf dimensions. Different combinations of linear equations were obtained from L, L2, W, W2, LW and L2W2. The linear regressions using the product of LW dimensions were more efficient to estimate the LA of V. ferruginea than models based on a single dimension (L, W, L2 or W2. Therefore, the linear regression “LA=0.463+0.676WL” provided the most accurate estimate of V. ferruginea leaf area. Validation of the selected model showed that the correlation between real measured leaf area and estimated leaf area was very high.
Correcting the bias of empirical frequency parameter estimators in codon models.

Directory of Open Access Journals (Sweden)

Sergei Kosakovsky Pond

2010-07-01

Full Text Available Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a "corrected" empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators.
The Selection of ARIMA Models with or without Regressors

DEFF Research Database (Denmark)

Johansen, Søren; Riani, Marco; Atkinson, Anthony C.

We develop a $C_{p}$ statistic for the selection of regression models with stationary and nonstationary ARIMA error term. We derive the asymptotic theory of the maximum likelihood estimators and show they are consistent and asymptotically Gaussian. We also prove that the distribution of the sum...
Model parameters estimation and sensitivity by genetic algorithms

International Nuclear Information System (INIS)

Marseguerra, Marzio; Zio, Enrico; Podofillini, Luca

2003-01-01

In this paper we illustrate the possibility of extracting qualitative information on the importance of the parameters of a model in the course of a Genetic Algorithms (GAs) optimization procedure for the estimation of such parameters. The Genetic Algorithms' search of the optimal solution is performed according to procedures that resemble those of natural selection and genetics: an initial population of alternative solutions evolves within the search space through the four fundamental operations of parent selection, crossover, replacement, and mutation. During the search, the algorithm examines a large amount of solution points which possibly carries relevant information on the underlying model characteristics. A possible utilization of this information amounts to create and update an archive with the set of best solutions found at each generation and then to analyze the evolution of the statistics of the archive along the successive generations. From this analysis one can retrieve information regarding the speed of convergence and stabilization of the different control (decision) variables of the optimization problem. In this work we analyze the evolution strategy followed by a GA in its search for the optimal solution with the aim of extracting information on the importance of the control (decision) variables of the optimization with respect to the sensitivity of the objective function. The study refers to a GA search for optimal estimates of the effective parameters in a lumped nuclear reactor model of literature. The supporting observation is that, as most optimization procedures do, the GA search evolves towards convergence in such a way to stabilize first the most important parameters of the model and later those which influence little the model outputs. In this sense, besides estimating efficiently the parameters values, the optimization approach also allows us to provide a qualitative ranking of their importance in contributing to the model output. The
HICOSMO - cosmology with a complete sample of galaxy clusters - I. Data analysis, sample selection and luminosity-mass scaling relation

Science.gov (United States)

Schellenberger, G.; Reiprich, T. H.

2017-08-01

The X-ray regime, where the most massive visible component of galaxy clusters, the intracluster medium, is visible, offers directly measured quantities, like the luminosity, and derived quantities, like the total mass, to characterize these objects. The aim of this project is to analyse a complete sample of galaxy clusters in detail and constrain cosmological parameters, like the matter density, Ωm, or the amplitude of initial density fluctuations, σ8. The purely X-ray flux-limited sample (HIFLUGCS) consists of the 64 X-ray brightest galaxy clusters, which are excellent targets to study the systematic effects, that can bias results. We analysed in total 196 Chandra observations of the 64 HIFLUGCS clusters, with a total exposure time of 7.7 Ms. Here, we present our data analysis procedure (including an automated substructure detection and an energy band optimization for surface brightness profile analysis) that gives individually determined, robust total mass estimates. These masses are tested against dynamical and Planck Sunyaev-Zeldovich (SZ) derived masses of the same clusters, where good overall agreement is found with the dynamical masses. The Planck SZ masses seem to show a mass-dependent bias to our hydrostatic masses; possible biases in this mass-mass comparison are discussed including the Planck selection function. Furthermore, we show the results for the (0.1-2.4) keV luminosity versus mass scaling relation. The overall slope of the sample (1.34) is in agreement with expectations and values from literature. Splitting the sample into galaxy groups and clusters reveals, even after a selection bias correction, that galaxy groups exhibit a significantly steeper slope (1.88) compared to clusters (1.06).
Consistent Estimation of Partition Markov Models

Directory of Open Access Journals (Sweden)

Jesús E. García

2017-04-01

Full Text Available The Partition Markov Model characterizes the process by a partition L of the state space, where the elements in each part of L share the same transition probability to an arbitrary element in the alphabet. This model aims to answer the following questions: what is the minimal number of parameters needed to specify a Markov chain and how to estimate these parameters. In order to answer these questions, we build a consistent strategy for model selection which consist of: giving a size n realization of the process, finding a model within the Partition Markov class, with a minimal number of parts to represent the process law. From the strategy, we derive a measure that establishes a metric in the state space. In addition, we show that if the law of the process is Markovian, then, eventually, when n goes to infinity, L will be retrieved. We show an application to model internet navigation patterns.
Geostatistical estimation of forest biomass in interior Alaska combining Landsat-derived tree cover, sampled airborne lidar and field observations

Science.gov (United States)

Babcock, Chad; Finley, Andrew O.; Andersen, Hans-Erik; Pattison, Robert; Cook, Bruce D.; Morton, Douglas C.; Alonzo, Michael; Nelson, Ross; Gregoire, Timothy; Ene, Liviu; Gobakken, Terje; Næsset, Erik

2018-06-01

The goal of this research was to develop and examine the performance of a geostatistical coregionalization modeling approach for combining field inventory measurements, strip samples of airborne lidar and Landsat-based remote sensing data products to predict aboveground biomass (AGB) in interior Alaska's Tanana Valley. The proposed modeling strategy facilitates pixel-level mapping of AGB density predictions across the entire spatial domain. Additionally, the coregionalization framework allows for statistically sound estimation of total AGB for arbitrary areal units within the study area---a key advance to support diverse management objectives in interior Alaska. This research focuses on appropriate characterization of prediction uncertainty in the form of posterior predictive coverage intervals and standard deviations. Using the framework detailed here, it is possible to quantify estimation uncertainty for any spatial extent, ranging from pixel-level predictions of AGB density to estimates of AGB stocks for the full domain. The lidar-informed coregionalization models consistently outperformed their counterpart lidar-free models in terms of point-level predictive performance and total AGB precision. Additionally, the inclusion of Landsat-derived forest cover as a covariate further improved estimation precision in regions with lower lidar sampling intensity. Our findings also demonstrate that model-based approaches that do not explicitly account for residual spatial dependence can grossly underestimate uncertainty, resulting in falsely precise estimates of AGB. On the other hand, in a geostatistical setting, residual spatial structure can be modeled within a Bayesian hierarchical framework to obtain statistically defensible assessments of uncertainty for AGB estimates.

Nested sampling algorithm for subsurface flow model selection, uncertainty quantification, and nonlinear calibration

KAUST Repository

Elsheikh, A. H.; Wheeler, M. F.; Hoteit, Ibrahim

2013-01-01

Calibration of subsurface flow models is an essential step for managing ground water aquifers, designing of contaminant remediation plans, and maximizing recovery from hydrocarbon reservoirs. We investigate an efficient sampling algorithm known
Statistical inference for the additive hazards model under outcome-dependent sampling.

Science.gov (United States)

Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo

2015-09-01

Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.
Assessment of sampling strategies for estimation of site mean concentrations of stormwater pollutants.

Science.gov (United States)

McCarthy, David T; Zhang, Kefeng; Westerlund, Camilla; Viklander, Maria; Bertrand-Krajewski, Jean-Luc; Fletcher, Tim D; Deletic, Ana

2018-02-01

The estimation of stormwater pollutant concentrations is a primary requirement of integrated urban water management. In order to determine effective sampling strategies for estimating pollutant concentrations, data from extensive field measurements at seven different catchments was used. At all sites, 1-min resolution continuous flow measurements, as well as flow-weighted samples, were taken and analysed for total suspend solids (TSS), total nitrogen (TN) and Escherichia coli (E. coli). For each of these parameters, the data was used to calculate the Event Mean Concentrations (EMCs) for each event. The measured Site Mean Concentrations (SMCs) were taken as the volume-weighted average of these EMCs for each parameter, at each site. 17 different sampling strategies, including random and fixed strategies were tested to estimate SMCs, which were compared with the measured SMCs. The ratios of estimated/measured SMCs were further analysed to determine the most effective sampling strategies. Results indicate that the random sampling strategies were the most promising method in reproducing SMCs for TSS and TN, while some fixed sampling strategies were better for estimating the SMC of E. coli. The differences in taking one, two or three random samples were small (up to 20% for TSS, and 10% for TN and E. coli), indicating that there is little benefit in investing in collection of more than one sample per event if attempting to estimate the SMC through monitoring of multiple events. It was estimated that an average of 27 events across the studied catchments are needed for characterising SMCs of TSS with a 90% confidence interval (CI) width of 1.0, followed by E.coli (average 12 events) and TN (average 11 events). The coefficient of variation of pollutant concentrations was linearly and significantly correlated to the 90% confidence interval ratio of the estimated/measured SMCs (R 2 = 0.49; P sampling frequency needed to accurately estimate SMCs of pollutants. Crown
Optimum sample size to estimate mean parasite abundance in fish parasite surveys

Directory of Open Access Journals (Sweden)

Shvydka S.

2018-03-01

Full Text Available To reach ethically and scientifically valid mean abundance values in parasitological and epidemiological studies this paper considers analytic and simulation approaches for sample size determination. The sample size estimation was carried out by applying mathematical formula with predetermined precision level and parameter of the negative binomial distribution estimated from the empirical data. A simulation approach to optimum sample size determination aimed at the estimation of true value of the mean abundance and its confidence interval (CI was based on the Bag of Little Bootstraps (BLB. The abundance of two species of monogenean parasites Ligophorus cephali and L. mediterraneus from Mugil cephalus across the Azov-Black Seas localities were subjected to the analysis. The dispersion pattern of both helminth species could be characterized as a highly aggregated distribution with the variance being substantially larger than the mean abundance. The holistic approach applied here offers a wide range of appropriate methods in searching for the optimum sample size and the understanding about the expected precision level of the mean. Given the superior performance of the BLB relative to formulae with its few assumptions, the bootstrap procedure is the preferred method. Two important assessments were performed in the present study: i based on CIs width a reasonable precision level for the mean abundance in parasitological surveys of Ligophorus spp. could be chosen between 0.8 and 0.5 with 1.6 and 1x mean of the CIs width, and ii the sample size equal 80 or more host individuals allows accurate and precise estimation of mean abundance. Meanwhile for the host sample size in range between 25 and 40 individuals, the median estimates showed minimal bias but the sampling distribution skewed to the low values; a sample size of 10 host individuals yielded to unreliable estimates.
The redshift distribution of cosmological samples: a forward modeling approach

Energy Technology Data Exchange (ETDEWEB)

Herbel, Jörg; Kacprzak, Tomasz; Amara, Adam; Refregier, Alexandre; Bruderer, Claudio; Nicola, Andrina, E-mail: joerg.herbel@phys.ethz.ch, E-mail: tomasz.kacprzak@phys.ethz.ch, E-mail: adam.amara@phys.ethz.ch, E-mail: alexandre.refregier@phys.ethz.ch, E-mail: claudio.bruderer@phys.ethz.ch, E-mail: andrina.nicola@phys.ethz.ch [Institute for Astronomy, Department of Physics, ETH Zürich, Wolfgang-Pauli-Strasse 27, 8093 Zürich (Switzerland)

2017-08-01

Determining the redshift distribution n ( z ) of galaxy samples is essential for several cosmological probes including weak lensing. For imaging surveys, this is usually done using photometric redshifts estimated on an object-by-object basis. We present a new approach for directly measuring the global n ( z ) of cosmological galaxy samples, including uncertainties, using forward modeling. Our method relies on image simulations produced using \\textsc(UFig) (Ultra Fast Image Generator) and on ABC (Approximate Bayesian Computation) within the MCCL (Monte-Carlo Control Loops) framework. The galaxy population is modeled using parametric forms for the luminosity functions, spectral energy distributions, sizes and radial profiles of both blue and red galaxies. We apply exactly the same analysis to the real data and to the simulated images, which also include instrumental and observational effects. By adjusting the parameters of the simulations, we derive a set of acceptable models that are statistically consistent with the data. We then apply the same cuts to the simulations that were used to construct the target galaxy sample in the real data. The redshifts of the galaxies in the resulting simulated samples yield a set of n ( z ) distributions for the acceptable models. We demonstrate the method by determining n ( z ) for a cosmic shear like galaxy sample from the 4-band Subaru Suprime-Cam data in the COSMOS field. We also complement this imaging data with a spectroscopic calibration sample from the VVDS survey. We compare our resulting posterior n ( z ) distributions to the one derived from photometric redshifts estimated using 36 photometric bands in COSMOS and find good agreement. This offers good prospects for applying our approach to current and future large imaging surveys.
The redshift distribution of cosmological samples: a forward modeling approach

Science.gov (United States)

Herbel, Jörg; Kacprzak, Tomasz; Amara, Adam; Refregier, Alexandre; Bruderer, Claudio; Nicola, Andrina

2017-08-01

Determining the redshift distribution n(z) of galaxy samples is essential for several cosmological probes including weak lensing. For imaging surveys, this is usually done using photometric redshifts estimated on an object-by-object basis. We present a new approach for directly measuring the global n(z) of cosmological galaxy samples, including uncertainties, using forward modeling. Our method relies on image simulations produced using \\textsc{UFig} (Ultra Fast Image Generator) and on ABC (Approximate Bayesian Computation) within the MCCL (Monte-Carlo Control Loops) framework. The galaxy population is modeled using parametric forms for the luminosity functions, spectral energy distributions, sizes and radial profiles of both blue and red galaxies. We apply exactly the same analysis to the real data and to the simulated images, which also include instrumental and observational effects. By adjusting the parameters of the simulations, we derive a set of acceptable models that are statistically consistent with the data. We then apply the same cuts to the simulations that were used to construct the target galaxy sample in the real data. The redshifts of the galaxies in the resulting simulated samples yield a set of n(z) distributions for the acceptable models. We demonstrate the method by determining n(z) for a cosmic shear like galaxy sample from the 4-band Subaru Suprime-Cam data in the COSMOS field. We also complement this imaging data with a spectroscopic calibration sample from the VVDS survey. We compare our resulting posterior n(z) distributions to the one derived from photometric redshifts estimated using 36 photometric bands in COSMOS and find good agreement. This offers good prospects for applying our approach to current and future large imaging surveys.
The redshift distribution of cosmological samples: a forward modeling approach

International Nuclear Information System (INIS)

Herbel, Jörg; Kacprzak, Tomasz; Amara, Adam; Refregier, Alexandre; Bruderer, Claudio; Nicola, Andrina

2017-01-01

Determining the redshift distribution n ( z ) of galaxy samples is essential for several cosmological probes including weak lensing. For imaging surveys, this is usually done using photometric redshifts estimated on an object-by-object basis. We present a new approach for directly measuring the global n ( z ) of cosmological galaxy samples, including uncertainties, using forward modeling. Our method relies on image simulations produced using \\textsc(UFig) (Ultra Fast Image Generator) and on ABC (Approximate Bayesian Computation) within the MCCL (Monte-Carlo Control Loops) framework. The galaxy population is modeled using parametric forms for the luminosity functions, spectral energy distributions, sizes and radial profiles of both blue and red galaxies. We apply exactly the same analysis to the real data and to the simulated images, which also include instrumental and observational effects. By adjusting the parameters of the simulations, we derive a set of acceptable models that are statistically consistent with the data. We then apply the same cuts to the simulations that were used to construct the target galaxy sample in the real data. The redshifts of the galaxies in the resulting simulated samples yield a set of n ( z ) distributions for the acceptable models. We demonstrate the method by determining n ( z ) for a cosmic shear like galaxy sample from the 4-band Subaru Suprime-Cam data in the COSMOS field. We also complement this imaging data with a spectroscopic calibration sample from the VVDS survey. We compare our resulting posterior n ( z ) distributions to the one derived from photometric redshifts estimated using 36 photometric bands in COSMOS and find good agreement. This offers good prospects for applying our approach to current and future large imaging surveys.
Traditional and robust vector selection methods for use with similarity based models

International Nuclear Information System (INIS)

Hines, J. W.; Garvey, D. R.

2006-01-01

Vector selection, or instance selection as it is often called in the data mining literature, performs a critical task in the development of nonparametric, similarity based models. Nonparametric, similarity based modeling (SBM) is a form of 'lazy learning' which constructs a local model 'on the fly' by comparing a query vector to historical, training vectors. For large training sets the creation of local models may become cumbersome, since each training vector must be compared to the query vector. To alleviate this computational burden, varying forms of training vector sampling may be employed with the goal of selecting a subset of the training data such that the samples are representative of the underlying process. This paper describes one such SBM, namely auto-associative kernel regression (AAKR), and presents five traditional vector selection methods and one robust vector selection method that may be used to select prototype vectors from a larger data set in model training. The five traditional vector selection methods considered are min-max, vector ordering, combination min-max and vector ordering, fuzzy c-means clustering, and Adeli-Hung clustering. Each method is described in detail and compared using artificially generated data and data collected from the steam system of an operating nuclear power plant. (authors)
Gray bootstrap method for estimating frequency-varying random vibration signals with small samples

Directory of Open Access Journals (Sweden)

Wang Yanqing

2014-04-01

Full Text Available During environment testing, the estimation of random vibration signals (RVS is an important technique for the airborne platform safety and reliability. However, the available methods including extreme value envelope method (EVEM, statistical tolerances method (STM and improved statistical tolerance method (ISTM require large samples and typical probability distribution. Moreover, the frequency-varying characteristic of RVS is usually not taken into account. Gray bootstrap method (GBM is proposed to solve the problem of estimating frequency-varying RVS with small samples. Firstly, the estimated indexes are obtained including the estimated interval, the estimated uncertainty, the estimated value, the estimated error and estimated reliability. In addition, GBM is applied to estimating the single flight testing of certain aircraft. At last, in order to evaluate the estimated performance, GBM is compared with bootstrap method (BM and gray method (GM in testing analysis. The result shows that GBM has superiority for estimating dynamic signals with small samples and estimated reliability is proved to be 100% at the given confidence level.
Effects of LiDAR point density, sampling size and height threshold on estimation accuracy of crop biophysical parameters.

Science.gov (United States)

Luo, Shezhou; Chen, Jing M; Wang, Cheng; Xi, Xiaohuan; Zeng, Hongcheng; Peng, Dailiang; Li, Dong

2016-05-30

Vegetation leaf area index (LAI), height, and aboveground biomass are key biophysical parameters. Corn is an important and globally distributed crop, and reliable estimations of these parameters are essential for corn yield forecasting, health monitoring and ecosystem modeling. Light Detection and Ranging (LiDAR) is considered an effective technology for estimating vegetation biophysical parameters. However, the estimation accuracies of these parameters are affected by multiple factors. In this study, we first estimated corn LAI, height and biomass (R2 = 0.80, 0.874 and 0.838, respectively) using the original LiDAR data (7.32 points/m2), and the results showed that LiDAR data could accurately estimate these biophysical parameters. Second, comprehensive research was conducted on the effects of LiDAR point density, sampling size and height threshold on the estimation accuracy of LAI, height and biomass. Our findings indicated that LiDAR point density had an important effect on the estimation accuracy for vegetation biophysical parameters, however, high point density did not always produce highly accurate estimates, and reduced point density could deliver reasonable estimation results. Furthermore, the results showed that sampling size and height threshold were additional key factors that affect the estimation accuracy of biophysical parameters. Therefore, the optimal sampling size and the height threshold should be determined to improve the estimation accuracy of biophysical parameters. Our results also implied that a higher LiDAR point density, larger sampling size and height threshold were required to obtain accurate corn LAI estimation when compared with height and biomass estimations. In general, our results provide valuable guidance for LiDAR data acquisition and estimation of vegetation biophysical parameters using LiDAR data.
Generalized likelihood uncertainty estimation (GLUE) using adaptive Markov chain Monte Carlo sampling

DEFF Research Database (Denmark)

Blasone, Roberta-Serena; Vrugt, Jasper A.; Madsen, Henrik

2008-01-01

propose an alternative strategy to determine the value of the cutoff threshold based on the appropriate coverage of the resulting uncertainty bounds. We demonstrate the superiority of this revised GLUE method with three different conceptual watershed models of increasing complexity, using both synthetic......In the last few decades hydrologists have made tremendous progress in using dynamic simulation models for the analysis and understanding of hydrologic systems. However, predictions with these models are often deterministic and as such they focus on the most probable forecast, without an explicit...... of applications. However, the MC based sampling strategy of the prior parameter space typically utilized in GLUE is not particularly efficient in finding behavioral simulations. This becomes especially problematic for high-dimensional parameter estimation problems, and in the case of complex simulation models...
Influence of Sampling Effort on the Estimated Richness of Road-Killed Vertebrate Wildlife

Science.gov (United States)

Bager, Alex; da Rosa, Clarissa A.

2011-05-01

Road-killed mammals, birds, and reptiles were collected weekly from highways in southern Brazil in 2002 and 2005. The objective was to assess variation in estimates of road-kill impacts on species richness produced by different sampling efforts, and to provide information to aid in the experimental design of future sampling. Richness observed in weekly samples was compared with sampling for different periods. In each period, the list of road-killed species was evaluated based on estimates the community structure derived from weekly samplings, and by the presence of the ten species most subject to road mortality, and also of threatened species. Weekly samples were sufficient only for reptiles and mammals, considered separately. Richness estimated from the biweekly samples was equal to that found in the weekly samples, and gave satisfactory results for sampling the most abundant and threatened species. The ten most affected species showed constant road-mortality rates, independent of sampling interval, and also maintained their dominance structure. Birds required greater sampling effort. When the composition of road-killed species varies seasonally, it is necessary to take biweekly samples for a minimum of one year. Weekly or more-frequent sampling for periods longer than two years is necessary to provide a reliable estimate of total species richness.
Soft X-Ray Observations of a Complete Sample of X-Ray--selected BL Lacertae Objects

Science.gov (United States)

Perlman, Eric S.; Stocke, John T.; Wang, Q. Daniel; Morris, Simon L.

1996-01-01

We present the results of ROSAT PSPC observations of the X-ray selected BL Lacertae objects (XBLs) in the complete Einstein Extended Medium Sensitivity Survey (EM MS) sample. None of the objects is resolved in their respective PSPC images, but all are easily detected. All BL Lac objects in this sample are well-fitted by single power laws. Their X-ray spectra exhibit a variety of spectral slopes, with best-fit energy power-law spectral indices between α = 0.5-2.3. The PSPC spectra of this sample are slightly steeper than those typical of flat ratio-spectrum quasars. Because almost all of the individual PSPC spectral indices are equal to or slightly steeper than the overall optical to X-ray spectral indices for these same objects, we infer that BL Lac soft X-ray continua are dominated by steep-spectrum synchrotron radiation from a broad X-ray jet, rather than flat-spectrum inverse Compton radiation linked to the narrower radio/millimeter jet. The softness of the X-ray spectra of these XBLs revives the possibility proposed by Guilbert, Fabian, & McCray (1983) that BL Lac objects are lineless because the circumnuclear gas cannot be heated sufficiently to permit two stable gas phases, the cooler of which would comprise the broad emission-line clouds. Because unified schemes predict that hard self-Compton radiation is beamed only into a small solid angle in BL Lac objects, the steep-spectrum synchrotron tail controls the temperature of the circumnuclear gas at r ≤ 1018 cm and prevents broad-line cloud formation. We use these new ROSAT data to recalculate the X-ray luminosity function and cosmological evolution of the complete EMSS sample by determining accurate K-corrections for the sample and estimating the effects of variability and the possibility of incompleteness in the sample. Our analysis confirms that XBLs are evolving "negatively," opposite in sense to quasars, with Ve/Va = 0.331±0.060. The statistically significant difference between the values for X
Models of microbiome evolution incorporating host and microbial selection.

Science.gov (United States)

Zeng, Qinglong; Wu, Steven; Sukumaran, Jeet; Rodrigo, Allen

2017-09-25

Numerous empirical studies suggest that hosts and microbes exert reciprocal selective effects on their ecological partners. Nonetheless, we still lack an explicit framework to model the dynamics of both hosts and microbes under selection. In a previous study, we developed an agent-based forward-time computational framework to simulate the neutral evolution of host-associated microbial communities in a constant-sized, unstructured population of hosts. These neutral models allowed offspring to sample microbes randomly from parents and/or from the environment. Additionally, the environmental pool of available microbes was constituted by fixed and persistent microbial OTUs and by contributions from host individuals in the preceding generation. In this paper, we extend our neutral models to allow selection to operate on both hosts and microbes. We do this by constructing a phenome for each microbial OTU consisting of a sample of traits that influence host and microbial fitnesses independently. Microbial traits can influence the fitness of hosts ("host selection") and the fitness of microbes ("trait-mediated microbial selection"). Additionally, the fitness effects of traits on microbes can be modified by their hosts ("host-mediated microbial selection"). We simulate the effects of these three types of selection, individually or in combination, on microbiome diversities and the fitnesses of hosts and microbes over several thousand generations of hosts. We show that microbiome diversity is strongly influenced by selection acting on microbes. Selection acting on hosts only influences microbiome diversity when there is near-complete direct or indirect parental contribution to the microbiomes of offspring. Unsurprisingly, microbial fitness increases under microbial selection. Interestingly, when host selection operates, host fitness only increases under two conditions: (1) when there is a strong parental contribution to microbial communities or (2) in the absence of a strong
Estimating the Term Structure With a Semiparametric Bayesian Hierarchical Model: An Application to Corporate Bonds1

Science.gov (United States)

Cruz-Marcelo, Alejandro; Ensor, Katherine B.; Rosner, Gary L.

2011-01-01

The term structure of interest rates is used to price defaultable bonds and credit derivatives, as well as to infer the quality of bonds for risk management purposes. We introduce a model that jointly estimates term structures by means of a Bayesian hierarchical model with a prior probability model based on Dirichlet process mixtures. The modeling methodology borrows strength across term structures for purposes of estimation. The main advantage of our framework is its ability to produce reliable estimators at the company level even when there are only a few bonds per company. After describing the proposed model, we discuss an empirical application in which the term structure of 197 individual companies is estimated. The sample of 197 consists of 143 companies with only one or two bonds. In-sample and out-of-sample tests are used to quantify the improvement in accuracy that results from approximating the term structure of corporate bonds with estimators by company rather than by credit rating, the latter being a popular choice in the financial literature. A complete description of a Markov chain Monte Carlo (MCMC) scheme for the proposed model is available as Supplementary Material. PMID:21765566
A novel heterogeneous training sample selection method on space-time adaptive processing

Science.gov (United States)

Wang, Qiang; Zhang, Yongshun; Guo, Yiduo

2018-04-01

The performance of ground target detection about space-time adaptive processing (STAP) decreases when non-homogeneity of clutter power is caused because of training samples contaminated by target-like signals. In order to solve this problem, a novel nonhomogeneous training sample selection method based on sample similarity is proposed, which converts the training sample selection into a convex optimization problem. Firstly, the existing deficiencies on the sample selection using generalized inner product (GIP) are analyzed. Secondly, the similarities of different training samples are obtained by calculating mean-hausdorff distance so as to reject the contaminated training samples. Thirdly, cell under test (CUT) and the residual training samples are projected into the orthogonal subspace of the target in the CUT, and mean-hausdorff distances between the projected CUT and training samples are calculated. Fourthly, the distances are sorted in order of value and the training samples which have the bigger value are selective preference to realize the reduced-dimension. Finally, simulation results with Mountain-Top data verify the effectiveness of the proposed method.
Estimated Mortality of Selected Migratory Bird Species from Mowing and Other Mechanical Operations in Canadian Agriculture

Directory of Open Access Journals (Sweden)

Joerg Tews

2013-12-01

Full Text Available Mechanical operations such as mowing, tilling, seeding, and harvesting are well-known sources of direct avian mortality in agricultural fields. However, there are currently no mortality rate estimates available for any species group or larger jurisdiction. Even reviews of sources of mortality in birds have failed to address mechanical disturbance in farm fields. To overcome this information gap we provide estimates of total mortality rates by mechanical operations for five selected species across Canada. In our step-by-step modeling approach we (i quantified the amount of various types of agricultural land in each Bird Conservation Region (BCR in Canada, (ii estimated population densities by region and agricultural habitat type for each selected species, (iii estimated the average timing of mechanical agricultural activities, egg laying, and fledging, (iv and used these values and additional demographical parameters to derive estimates of total mortality by species within each BCR. Based on our calculations the total annual estimated incidental take of young ranged from ~138,000 for Horned Lark (Eremophila alpestris to as much as ~941,000 for Savannah Sparrow (Passerculus sandwichensis. Net losses to the fall flight of birds, i.e., those birds that would have fledged successfully in the absence of mechanical disturbance, were, for example ~321,000 for Bobolink (Dolichonyx oryzivorus and ~483,000 for Savannah Sparrow. Although our estimates are subject to an unknown degree of uncertainty, this assessment is a very important first step because it provides a broad estimate of incidental take for a set of species that may be particularly vulnerable to mechanical operations and a starting point for future refinements of model parameters if and when they become available.
Estimating rare events in biochemical systems using conditional sampling

Science.gov (United States)

Sundar, V. S.

2017-01-01

The paper focuses on development of variance reduction strategies to estimate rare events in biochemical systems. Obtaining this probability using brute force Monte Carlo simulations in conjunction with the stochastic simulation algorithm (Gillespie's method) is computationally prohibitive. To circumvent this, important sampling tools such as the weighted stochastic simulation algorithm and the doubly weighted stochastic simulation algorithm have been proposed. However, these strategies require an additional step of determining the important region to sample from, which is not straightforward for most of the problems. In this paper, we apply the subset simulation method, developed as a variance reduction tool in the context of structural engineering, to the problem of rare event estimation in biochemical systems. The main idea is that the rare event probability is expressed as a product of more frequent conditional probabilities. These conditional probabilities are estimated with high accuracy using Monte Carlo simulations, specifically the Markov chain Monte Carlo method with the modified Metropolis-Hastings algorithm. Generating sample realizations of the state vector using the stochastic simulation algorithm is viewed as mapping the discrete-state continuous-time random process to the standard normal random variable vector. This viewpoint opens up the possibility of applying more sophisticated and efficient sampling schemes developed elsewhere to problems in stochastic chemical kinetics. The results obtained using the subset simulation method are compared with existing variance reduction strategies for a few benchmark problems, and a satisfactory improvement in computational time is demonstrated.
Measurement error correction in the least absolute shrinkage and selection operator model when validation data are available.

Science.gov (United States)

Vasquez, Monica M; Hu, Chengcheng; Roe, Denise J; Halonen, Marilyn; Guerra, Stefano

2017-01-01

Measurement of serum biomarkers by multiplex assays may be more variable as compared to single biomarker assays. Measurement error in these data may bias parameter estimates in regression analysis, which could mask true associations of serum biomarkers with an outcome. The Least Absolute Shrinkage and Selection Operator (LASSO) can be used for variable selection in these high-dimensional data. Furthermore, when the distribution of measurement error is assumed to be known or estimated with replication data, a simple measurement error correction method can be applied to the LASSO method. However, in practice the distribution of the measurement error is unknown and is expensive to estimate through replication both in monetary cost and need for greater amount of sample which is often limited in quantity. We adapt an existing bias correction approach by estimating the measurement error using validation data in which a subset of serum biomarkers are re-measured on a random subset of the study sample. We evaluate this method using simulated data and data from the Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD). We show that the bias in parameter estimation is reduced and variable selection is improved.
Estimation for small domains in double sampling for stratification ...

African Journals Online (AJOL)

In this article, we investigate the effect of randomness of the size of a small domain on the precision of an estimator of mean for the domain under double sampling for stratification. The result shows that for a small domain that cuts across various strata with unknown weights, the sampling variance depends on the within ...

Sampling strategies for efficient estimation of tree foliage biomass

Science.gov (United States)

Hailemariam Temesgen; Vicente Monleon; Aaron Weiskittel; Duncan Wilson

2011-01-01

Conifer crowns can be highly variable both within and between trees, particularly with respect to foliage biomass and leaf area. A variety of sampling schemes have been used to estimate biomass and leaf area at the individual tree and stand scales. Rarely has the effectiveness of these sampling schemes been compared across stands or even across species. In addition,...
A method for estimating radioactive cesium concentrations in cattle blood using urine samples.

Science.gov (United States)

Sato, Itaru; Yamagishi, Ryoma; Sasaki, Jun; Satoh, Hiroshi; Miura, Kiyoshi; Kikuchi, Kaoru; Otani, Kumiko; Okada, Keiji

2017-12-01

In the region contaminated by the Fukushima nuclear accident, radioactive contamination of live cattle should be checked before slaughter. In this study, we establish a precise method for estimating radioactive cesium concentrations in cattle blood using urine samples. Blood and urine samples were collected from a total of 71 cattle on two farms in the 'difficult-to-return zone'. Urine 137 Cs, specific gravity, electrical conductivity, pH, sodium, potassium, calcium, and creatinine were measured and various estimation methods for blood 137 Cs were tested. The average error rate of the estimation was 54.2% without correction. Correcting for urine creatinine, specific gravity, electrical conductivity, or potassium improved the precision of the estimation. Correcting for specific gravity using the following formula gave the most precise estimate (average error rate = 16.9%): [blood 137 Cs] = [urinary 137 Cs]/([specific gravity] - 1)/329. Urine samples are faster to measure than blood samples because urine can be obtained in larger quantities and has a higher 137 Cs concentration than blood. These advantages of urine and the estimation precision demonstrated in our study, indicate that estimation of blood 137 Cs using urine samples is a practical means of monitoring radioactive contamination in live cattle. © 2017 Japanese Society of Animal Science.
A model of directional selection applied to the evolution of drug resistance in HIV-1.

Science.gov (United States)

Seoighe, Cathal; Ketwaroo, Farahnaz; Pillay, Visva; Scheffler, Konrad; Wood, Natasha; Duffet, Rodger; Zvelebil, Marketa; Martinson, Neil; McIntyre, James; Morris, Lynn; Hide, Winston

2007-04-01

Understanding how pathogens acquire resistance to drugs is important for the design of treatment strategies, particularly for rapidly evolving viruses such as HIV-1. Drug treatment can exert strong selective pressures and sites within targeted genes that confer resistance frequently evolve far more rapidly than the neutral rate. Rapid evolution at sites that confer resistance to drugs can be used to help elucidate the mechanisms of evolution of drug resistance and to discover or corroborate novel resistance mutations. We have implemented standard maximum likelihood methods that are used to detect diversifying selection and adapted them for use with serially sampled reverse transcriptase (RT) coding sequences isolated from a group of 300 HIV-1 subtype C-infected women before and after single-dose nevirapine (sdNVP) to prevent mother-to-child transmission. We have also extended the standard models of codon evolution for application to the detection of directional selection. Through simulation, we show that the directional selection model can provide a substantial improvement in sensitivity over models of diversifying selection. Five of the sites within the RT gene that are known to harbor mutations that confer resistance to nevirapine (NVP) strongly supported the directional selection model. There was no evidence that other mutations that are known to confer NVP resistance were selected in this cohort. The directional selection model, applied to serially sampled sequences, also had more power than the diversifying selection model to detect selection resulting from factors other than drug resistance. Because inference of selection from serial samples is unlikely to be adversely affected by recombination, the methods we describe may have general applicability to the analysis of positive selection affecting recombining coding sequences when serially sampled data are available.
The importance of estimating selection bias on prevalence estimates shortly after a disaster.

NARCIS (Netherlands)

Grievink, Linda; Velden, Peter G van der; Yzermans, C Joris; Roorda, Jan; Stellato, Rebecca K

2006-01-01

PURPOSE: The aim was to study selective participation and its effect on prevalence estimates in a health survey of affected residents 3 weeks after a man-made disaster in The Netherlands (May 13, 2000). METHODS: All affected adult residents were invited to participate. Survey (questionnaire) data
The importance of estimating selection bias on prevalence estimates, shortly after a disaster.

NARCIS (Netherlands)

Grievink, L.; Velden, P.G. van der; Yzermans, C.J.; Roorda, J.; Stellato, R.K.

2006-01-01

PURPOSE: The aim was to study selective participation and its effect on prevalence estimates in a health survey of affected residents 3 weeks after a man-made disaster in The Netherlands (May 13, 2000). METHODS: All affected adult residents were invited to participate. Survey (questionnaire) data
Estimating Model Probabilities using Thermodynamic Markov Chain Monte Carlo Methods

Science.gov (United States)

Ye, M.; Liu, P.; Beerli, P.; Lu, D.; Hill, M. C.

2014-12-01

Markov chain Monte Carlo (MCMC) methods are widely used to evaluate model probability for quantifying model uncertainty. In a general procedure, MCMC simulations are first conducted for each individual model, and MCMC parameter samples are then used to approximate marginal likelihood of the model by calculating the geometric mean of the joint likelihood of the model and its parameters. It has been found the method of evaluating geometric mean suffers from the numerical problem of low convergence rate. A simple test case shows that even millions of MCMC samples are insufficient to yield accurate estimation of the marginal likelihood. To resolve this problem, a thermodynamic method is used to have multiple MCMC runs with different values of a heating coefficient between zero and one. When the heating coefficient is zero, the MCMC run is equivalent to a random walk MC in the prior parameter space; when the heating coefficient is one, the MCMC run is the conventional one. For a simple case with analytical form of the marginal likelihood, the thermodynamic method yields more accurate estimate than the method of using geometric mean. This is also demonstrated for a case of groundwater modeling with consideration of four alternative models postulated based on different conceptualization of a confining layer. This groundwater example shows that model probabilities estimated using the thermodynamic method are more reasonable than those obtained using the geometric method. The thermodynamic method is general, and can be used for a wide range of environmental problem for model uncertainty quantification.
Evaluation of pump pulsation in respirable size-selective sampling: part II. Changes in sampling efficiency.

Science.gov (United States)

Lee, Eun Gyung; Lee, Taekhee; Kim, Seung Won; Lee, Larry; Flemmer, Michael M; Harper, Martin

2014-01-01

This second, and concluding, part of this study evaluated changes in sampling efficiency of respirable size-selective samplers due to air pulsations generated by the selected personal sampling pumps characterized in Part I (Lee E, Lee L, Möhlmann C et al. Evaluation of pump pulsation in respirable size-selective sampling: Part I. Pulsation measurements. Ann Occup Hyg 2013). Nine particle sizes of monodisperse ammonium fluorescein (from 1 to 9 μm mass median aerodynamic diameter) were generated individually by a vibrating orifice aerosol generator from dilute solutions of fluorescein in aqueous ammonia and then injected into an environmental chamber. To collect these particles, 10-mm nylon cyclones, also known as Dorr-Oliver (DO) cyclones, were used with five medium volumetric flow rate pumps. Those were the Apex IS, HFS513, GilAir5, Elite5, and Basic5 pumps, which were found in Part I to generate pulsations of 5% (the lowest), 25%, 30%, 56%, and 70% (the highest), respectively. GK2.69 cyclones were used with the Legacy [pump pulsation (PP) = 15%] and Elite12 (PP = 41%) pumps for collection at high flows. The DO cyclone was also used to evaluate changes in sampling efficiency due to pulse shape. The HFS513 pump, which generates a more complex pulse shape, was compared to a single sine wave fluctuation generated by a piston. The luminescent intensity of the fluorescein extracted from each sample was measured with a luminescence spectrometer. Sampling efficiencies were obtained by dividing the intensity of the fluorescein extracted from the filter placed in a cyclone with the intensity obtained from the filter used with a sharp-edged reference sampler. Then, sampling efficiency curves were generated using a sigmoid function with three parameters and each sampling efficiency curve was compared to that of the reference cyclone by constructing bias maps. In general, no change in sampling efficiency (bias under ±10%) was observed until pulsations exceeded 25% for the
Modeling and Solving the Liner Shipping Service Selection Problem

DEFF Research Database (Denmark)

Karsten, Christian Vad; Balakrishnan, Anant

We address a tactical planning problem, the Liner Shipping Service Selection Problem (LSSSP), facing container shipping companies. Given estimated demand between various ports, the LSSSP entails selecting the best subset of non-simple cyclic sailing routes from a given pool of candidate routes...... to accurately model transshipment costs and incorporate routing policies such as maximum transit time, maritime cabotage rules, and operational alliances. Our hop-indexed arc flow model is smaller and easier to solve than path flow models. We outline a preprocessing procedure that exploits both the routing...... requirements and the hop limits to reduce problem size, and describe techniques to accelerate the solution procedure. We present computational results for realistic problem instances from the benchmark suite LINER-LIB....
6. Label-free selective plane illumination microscopy of tissue samples

Directory of Open Access Journals (Sweden)

Muteb Alharbi

2017-10-01

Conclusion: Overall this method meets the demands of the current needs for 3D imaging tissue samples in a label-free manner. Label-free Selective Plane Microscopy directly provides excellent information about the structure of the tissue samples. This work has highlighted the superiority of Label-free Selective Plane Microscopy to current approaches to label-free 3D imaging of tissue.
Three-dimensional reconstruction of highly complex microscopic samples using scanning electron microscopy and optical flow estimation.

Directory of Open Access Journals (Sweden)

Ahmadreza Baghaie

Full Text Available Scanning Electron Microscope (SEM as one of the major research and industrial equipment for imaging of micro-scale samples and surfaces has gained extensive attention from its emerge. However, the acquired micrographs still remain two-dimensional (2D. In the current work a novel and highly accurate approach is proposed to recover the hidden third-dimension by use of multi-view image acquisition of the microscopic samples combined with pre/post-processing steps including sparse feature-based stereo rectification, nonlocal-based optical flow estimation for dense matching and finally depth estimation. Employing the proposed approach, three-dimensional (3D reconstructions of highly complex microscopic samples were achieved to facilitate the interpretation of topology and geometry of surface/shape attributes of the samples. As a byproduct of the proposed approach, high-definition 3D printed models of the samples can be generated as a tangible means of physical understanding. Extensive comparisons with the state-of-the-art reveal the strength and superiority of the proposed method in uncovering the details of the highly complex microscopic samples.
Sampling in ecology and evolution - bridging the gap between theory and practice

Science.gov (United States)

Albert, C.H.; Yoccoz, N.G.; Edwards, T.C.; Graham, C.H.; Zimmermann, N.E.; Thuiller, W.

2010-01-01

Sampling is a key issue for answering most ecological and evolutionary questions. The importance of developing a rigorous sampling design tailored to specific questions has already been discussed in the ecological and sampling literature and has provided useful tools and recommendations to sample and analyse ecological data. However, sampling issues are often difficult to overcome in ecological studies due to apparent inconsistencies between theory and practice, often leading to the implementation of simplified sampling designs that suffer from unknown biases. Moreover, we believe that classical sampling principles which are based on estimation of means and variances are insufficient to fully address many ecological questions that rely on estimating relationships between a response and a set of predictor variables over time and space. Our objective is thus to highlight the importance of selecting an appropriate sampling space and an appropriate sampling design. We also emphasize the importance of using prior knowledge of the study system to estimate models or complex parameters and thus better understand ecological patterns and processes generating these patterns. Using a semi-virtual simulation study as an illustration we reveal how the selection of the space (e.g. geographic, climatic), in which the sampling is designed, influences the patterns that can be ultimately detected. We also demonstrate the inefficiency of common sampling designs to reveal response curves between ecological variables and climatic gradients. Further, we show that response-surface methodology, which has rarely been used in ecology, is much more efficient than more traditional methods. Finally, we discuss the use of prior knowledge, simulation studies and model-based designs in defining appropriate sampling designs. We conclude by a call for development of methods to unbiasedly estimate nonlinear ecologically relevant parameters, in order to make inferences while fulfilling requirements of
Analytical solutions to sampling effects in drop size distribution measurements during stationary rainfall: Estimation of bulk rainfall variables

NARCIS (Netherlands)

Uijlenhoet, R.; Porrà, J.M.; Sempere Torres, D.; Creutin, J.D.

2006-01-01

A stochastic model of the microstructure of rainfall is used to derive explicit expressions for the magnitude of the sampling fluctuations in rainfall properties estimated from raindrop size measurements in stationary rainfall. The model is a marked point process, in which the points represent the
Environmental DNA (eDNA sampling improves occurrence and detection estimates of invasive burmese pythons.

Directory of Open Access Journals (Sweden)

Margaret E Hunter

Full Text Available Environmental DNA (eDNA methods are used to detect DNA that is shed into the aquatic environment by cryptic or low density species. Applied in eDNA studies, occupancy models can be used to estimate occurrence and detection probabilities and thereby account for imperfect detection. However, occupancy terminology has been applied inconsistently in eDNA studies, and many have calculated occurrence probabilities while not considering the effects of imperfect detection. Low detection of invasive giant constrictors using visual surveys and traps has hampered the estimation of occupancy and detection estimates needed for population management in southern Florida, USA. Giant constrictor snakes pose a threat to native species and the ecological restoration of the Florida Everglades. To assist with detection, we developed species-specific eDNA assays using quantitative PCR (qPCR for the Burmese python (Python molurus bivittatus, Northern African python (P. sebae, boa constrictor (Boa constrictor, and the green (Eunectes murinus and yellow anaconda (E. notaeus. Burmese pythons, Northern African pythons, and boa constrictors are established and reproducing, while the green and yellow anaconda have the potential to become established. We validated the python and boa constrictor assays using laboratory trials and tested all species in 21 field locations distributed in eight southern Florida regions. Burmese python eDNA was detected in 37 of 63 field sampling events; however, the other species were not detected. Although eDNA was heterogeneously distributed in the environment, occupancy models were able to provide the first estimates of detection probabilities, which were greater than 91%. Burmese python eDNA was detected along the leading northern edge of the known population boundary. The development of informative detection tools and eDNA occupancy models can improve conservation efforts in southern Florida and support more extensive studies of invasive
Environmental DNA (eDNA) sampling improves occurrence and detection estimates of invasive burmese pythons.

Science.gov (United States)

Hunter, Margaret E; Oyler-McCance, Sara J; Dorazio, Robert M; Fike, Jennifer A; Smith, Brian J; Hunter, Charles T; Reed, Robert N; Hart, Kristen M

2015-01-01

Environmental DNA (eDNA) methods are used to detect DNA that is shed into the aquatic environment by cryptic or low density species. Applied in eDNA studies, occupancy models can be used to estimate occurrence and detection probabilities and thereby account for imperfect detection. However, occupancy terminology has been applied inconsistently in eDNA studies, and many have calculated occurrence probabilities while not considering the effects of imperfect detection. Low detection of invasive giant constrictors using visual surveys and traps has hampered the estimation of occupancy and detection estimates needed for population management in southern Florida, USA. Giant constrictor snakes pose a threat to native species and the ecological restoration of the Florida Everglades. To assist with detection, we developed species-specific eDNA assays using quantitative PCR (qPCR) for the Burmese python (Python molurus bivittatus), Northern African python (P. sebae), boa constrictor (Boa constrictor), and the green (Eunectes murinus) and yellow anaconda (E. notaeus). Burmese pythons, Northern African pythons, and boa constrictors are established and reproducing, while the green and yellow anaconda have the potential to become established. We validated the python and boa constrictor assays using laboratory trials and tested all species in 21 field locations distributed in eight southern Florida regions. Burmese python eDNA was detected in 37 of 63 field sampling events; however, the other species were not detected. Although eDNA was heterogeneously distributed in the environment, occupancy models were able to provide the first estimates of detection probabilities, which were greater than 91%. Burmese python eDNA was detected along the leading northern edge of the known population boundary. The development of informative detection tools and eDNA occupancy models can improve conservation efforts in southern Florida and support more extensive studies of invasive constrictors
Development of a sampling strategy and sample size calculation to estimate the distribution of mammographic breast density in Korean women.

Science.gov (United States)

Jun, Jae Kwan; Kim, Mi Jin; Choi, Kui Son; Suh, Mina; Jung, Kyu-Won

2012-01-01

Mammographic breast density is a known risk factor for breast cancer. To conduct a survey to estimate the distribution of mammographic breast density in Korean women, appropriate sampling strategies for representative and efficient sampling design were evaluated through simulation. Using the target population from the National Cancer Screening Programme (NCSP) for breast cancer in 2009, we verified the distribution estimate by repeating the simulation 1,000 times using stratified random sampling to investigate the distribution of breast density of 1,340,362 women. According to the simulation results, using a sampling design stratifying the nation into three groups (metropolitan, urban, and rural), with a total sample size of 4,000, we estimated the distribution of breast density in Korean women at a level of 0.01% tolerance. Based on the results of our study, a nationwide survey for estimating the distribution of mammographic breast density among Korean women can be conducted efficiently.
Volume estimation of Pterogyne nitens in pure plantation in the southwest of Bahia

Directory of Open Access Journals (Sweden)

Magno Pacheco Fraga

2014-09-01

Full Text Available The knowledge of wood volume is essential to determine the logging productive potential of a forest plantation. However, as this variable isn’t easily measurable it’s necessary the obtainment by estimation. This study aims to select volumetric models and analyze the efficiency of three different methods to estimate the wood volume (form factor, form quotient and adjusted volumetric equation to the species Pterogyne nitens Tul. in pure plantation in Vitoria da Conquista, State of Bahia, Brazil. The sampled trees were logged and rigorously cubed, andeight volumetric models were adjusted. The best models were selected based on the pondered value of the statistical parameters scores and residues distribution. Stoate model presented the best performance to estimate the bole volume and the total wood volume of Pterogyne nitens with bark. Nevertheless, for the bole volume, Koperzky and Gehrhardt model presented similar estimates as Stoate’s, being also indicated to this species. Among the methods used to estimate volume, it is recommended the use of adjusted volumetric models.
A predictor-corrector algorithm to estimate the fractional flow in oil-water models

International Nuclear Information System (INIS)

Savioli, Gabriela B; Berdaguer, Elena M Fernandez

2008-01-01

We introduce a predictor-corrector algorithm to estimate parameters in a nonlinear hyperbolic problem. It can be used to estimate the oil-fractional flow function from the Buckley-Leverett equation. The forward model is non-linear: the sought- for parameter is a function of the solution of the equation. Traditionally, the estimation of functions requires the selection of a fitting parametric model. The algorithm that we develop does not require a predetermined parameter model. Therefore, the estimation problem is carried out over a set of parameters which are functions. The algorithm is based on the linearization of the parameter-to-output mapping. This technique is new in the field of nonlinear estimation. It has the advantage of laying aside parametric models. The algorithm is iterative and is of predictor-corrector type. We present theoretical results on the inverse problem. We use synthetic data to test the new algorithm.
Replication Variance Estimation under Two-phase Sampling in the Presence of Non-response

Directory of Open Access Journals (Sweden)

Muqaddas Javed

2014-09-01

Full Text Available Kim and Yu (2011 discussed replication variance estimator for two-phase stratified sampling. In this paper estimators for mean have been proposed in two-phase stratified sampling for different situation of existence of non-response at first phase and second phase. The expressions of variances of these estimators have been derived. Furthermore, replication-based jackknife variance estimators of these variances have also been derived. Simulation study has been conducted to investigate the performance of the suggested estimators.
truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models

Directory of Open Access Journals (Sweden)

Maria Karlsson

2014-05-01

Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.
Evaluating the performance of simple estimators for probit models with two dummy endogenous regressors

DEFF Research Database (Denmark)

Holm, Anders; Nielsen, Jacob Arendt

2013-01-01

This study considers the small sample performance of approximate but simple two-stage estimators for probit models with two endogenous binary covariates. Monte Carlo simulations showthat all the considered estimators, including the simulated maximum-likelihood (SML) estimation, of the trivariate ...

Randomization of grab-sampling strategies for estimating the annual exposure of U miners to Rn daughters.

Science.gov (United States)

Borak, T B

1986-04-01

Periodic grab sampling in combination with time-of-occupancy surveys has been the accepted procedure for estimating the annual exposure of underground U miners to Rn daughters. Temporal variations in the concentration of potential alpha energy in the mine generate uncertainties in this process. A system to randomize the selection of locations for measurement is described which can reduce uncertainties and eliminate systematic biases in the data. In general, a sample frequency of 50 measurements per year is sufficient to satisfy the criteria that the annual exposure be determined in working level months to within +/- 50% of the true value with a 95% level of confidence. Suggestions for implementing this randomization scheme are presented.
Cost-Sensitive Estimation of ARMA Models for Financial Asset Return Data

Directory of Open Access Journals (Sweden)

Minyoung Kim

2015-01-01

Full Text Available The autoregressive moving average (ARMA model is a simple but powerful model in financial engineering to represent time-series with long-range statistical dependency. However, the traditional maximum likelihood (ML estimator aims to minimize a loss function that is inherently symmetric due to Gaussianity. The consequence is that when the data of interest are asset returns, and the main goal is to maximize profit by accurate forecasting, the ML objective may be less appropriate potentially leading to a suboptimal solution. Rather, it is more reasonable to adopt an asymmetric loss where the model's prediction, as long as it is in the same direction as the true return, is penalized less than the prediction in the opposite direction. We propose a quite sensible asymmetric cost-sensitive loss function and incorporate it into the ARMA model estimation. On the online portfolio selection problem with real stock return data, we demonstrate that the investment strategy based on predictions by the proposed estimator can be significantly more profitable than the traditional ML estimator.
Blind CP-OFDM and ZP-OFDM Parameter Estimation in Frequency Selective Channels

Directory of Open Access Journals (Sweden)

Vincent Le Nir

2009-01-01

Full Text Available A cognitive radio system needs accurate knowledge of the radio spectrum it operates in. Blind modulation recognition techniques have been proposed to discriminate between single-carrier and multicarrier modulations and to estimate their parameters. Some powerful techniques use autocorrelation- and cyclic autocorrelation-based features of the transmitted signal applying to OFDM signals using a Cyclic Prefix time guard interval (CP-OFDM. In this paper, we propose a blind parameter estimation technique based on a power autocorrelation feature applying to OFDM signals using a Zero Padding time guard interval (ZP-OFDM which in particular excludes the use of the autocorrelation- and cyclic autocorrelation-based techniques. The proposed technique leads to an efficient estimation of the symbol duration and zero padding duration in frequency selective channels, and is insensitive to receiver phase and frequency offsets. Simulation results are given for WiMAX and WiMedia signals using realistic Stanford University Interim (SUI and Ultra-Wideband (UWB IEEE 802.15.4a channel models, respectively.
Estimating Diversifying Selection and Functional Constraint in the Presence of Recombination

OpenAIRE

Wilson, Daniel J.; McVean, Gilean

2006-01-01

Models of molecular evolution that incorporate the ratio of nonsynonymous to synonymous polymorphism (dN/dS ratio) as a parameter can be used to identify sites that are under diversifying selection or functional constraint in a sample of gene sequences. However, when there has been recombination in the evolutionary history of the sequences, reconstructing a single phylogenetic tree is not appropriate, and inference based on a single tree can give misleading results. In the presence of high le...
Inverse sampled Bernoulli (ISB) procedure for estimating a population proportion, with nuclear material applications

International Nuclear Information System (INIS)

Wright, T.

1982-01-01

A new sampling procedure is introduced for estimating a population proportion. The procedure combines the ideas of inverse binomial sampling and Bernoulli sampling. An unbiased estimator is given with its variance. The procedure can be viewed as a generalization of inverse binomial sampling
HIV Model Parameter Estimates from Interruption Trial Data including Drug Efficacy and Reservoir Dynamics

Science.gov (United States)

Luo, Rutao; Piovoso, Michael J.; Martinez-Picado, Javier; Zurakowski, Ryan

2012-01-01

Mathematical models based on ordinary differential equations (ODE) have had significant impact on understanding HIV disease dynamics and optimizing patient treatment. A model that characterizes the essential disease dynamics can be used for prediction only if the model parameters are identifiable from clinical data. Most previous parameter identification studies for HIV have used sparsely sampled data from the decay phase following the introduction of therapy. In this paper, model parameters are identified from frequently sampled viral-load data taken from ten patients enrolled in the previously published AutoVac HAART interruption study, providing between 69 and 114 viral load measurements from 3–5 phases of viral decay and rebound for each patient. This dataset is considerably larger than those used in previously published parameter estimation studies. Furthermore, the measurements come from two separate experimental conditions, which allows for the direct estimation of drug efficacy and reservoir contribution rates, two parameters that cannot be identified from decay-phase data alone. A Markov-Chain Monte-Carlo method is used to estimate the model parameter values, with initial estimates obtained using nonlinear least-squares methods. The posterior distributions of the parameter estimates are reported and compared for all patients. PMID:22815727
Location-based Mobile Relay Selection and Impact of Inaccurate Path Loss Model Parameters

DEFF Research Database (Denmark)

Nielsen, Jimmy Jessen; Madsen, Tatiana Kozlova; Schwefel, Hans-Peter

2010-01-01

In this paper we propose a relay selection scheme which uses collected location information together with a path loss model for relay selection, and analyze the performance impact of mobility and different error causes on this scheme. Performance is evaluated in terms of bit error rate...... by simulations. The SNR measurement based relay selection scheme proposed previously is unsuitable for use with fast moving users in e.g. vehicular scenarios due to a large signaling overhead. The proposed location based scheme is shown to work well with fast moving users due to a lower signaling overhead...... in these situations. As the location-based scheme relies on a path loss model to estimate link qualities and select relays, the sensitivity with respect to inaccurate estimates of the unknown path loss model parameters is investigated. The parameter ranges that result in useful performance were found...
PockDrug: A Model for Predicting Pocket Druggability That Overcomes Pocket Estimation Uncertainties.

Science.gov (United States)

Borrel, Alexandre; Regad, Leslie; Xhaard, Henri; Petitjean, Michel; Camproux, Anne-Claude

2015-04-27

Predicting protein druggability is a key interest in the target identification phase of drug discovery. Here, we assess the pocket estimation methods' influence on druggability predictions by comparing statistical models constructed from pockets estimated using different pocket estimation methods: a proximity of either 4 or 5.5 Å to a cocrystallized ligand or DoGSite and fpocket estimation methods. We developed PockDrug, a robust pocket druggability model that copes with uncertainties in pocket boundaries. It is based on a linear discriminant analysis from a pool of 52 descriptors combined with a selection of the most stable and efficient models using different pocket estimation methods. PockDrug retains the best combinations of three pocket properties which impact druggability: geometry, hydrophobicity, and aromaticity. It results in an average accuracy of 87.9% ± 4.7% using a test set and exhibits higher accuracy (∼5-10%) than previous studies that used an identical apo set. In conclusion, this study confirms the influence of pocket estimation on pocket druggability prediction and proposes PockDrug as a new model that overcomes pocket estimation variability.
A two-step approach to estimating selectivity and fishing power of research gill nets used in Greenland waters

DEFF Research Database (Denmark)

Hovgård, Holger

1996-01-01

by normal distributions and could be related to mesh size in accordance with the principle of geometrical similarity. In the second step the selection parameters were estimated by a nonlinear least squares fit. The model also estimated the relative efficiency of the two capture processes and the fishing......Catches of Atlantic cod (Gadus morhua) from Greenland gill-net surveys were analyzed by a two-step approach. In the initial step the form of the selection curve was identified as binormal, which was caused by fish being gilled or caught by the maxillae. Both capture processes could be described...
Bayesian Nonparametric Model for Estimating Multistate Travel Time Distribution

Directory of Open Access Journals (Sweden)

Emmanuel Kidando

2017-01-01

Full Text Available Multistate models, that is, models with more than two distributions, are preferred over single-state probability models in modeling the distribution of travel time. Literature review indicated that the finite multistate modeling of travel time using lognormal distribution is superior to other probability functions. In this study, we extend the finite multistate lognormal model of estimating the travel time distribution to unbounded lognormal distribution. In particular, a nonparametric Dirichlet Process Mixture Model (DPMM with stick-breaking process representation was used. The strength of the DPMM is that it can choose the number of components dynamically as part of the algorithm during parameter estimation. To reduce computational complexity, the modeling process was limited to a maximum of six components. Then, the Markov Chain Monte Carlo (MCMC sampling technique was employed to estimate the parameters’ posterior distribution. Speed data from nine links of a freeway corridor, aggregated on a 5-minute basis, were used to calculate the corridor travel time. The results demonstrated that this model offers significant flexibility in modeling to account for complex mixture distributions of the travel time without specifying the number of components. The DPMM modeling further revealed that freeway travel time is characterized by multistate or single-state models depending on the inclusion of onset and offset of congestion periods.
Heterogeneous autoregressive model with structural break using nearest neighbor truncation volatility estimators for DAX.

Science.gov (United States)

Chin, Wen Cheong; Lee, Min Cherng; Yap, Grace Lee Ching

2016-01-01

High frequency financial data modelling has become one of the important research areas in the field of financial econometrics. However, the possible structural break in volatile financial time series often trigger inconsistency issue in volatility estimation. In this study, we propose a structural break heavy-tailed heterogeneous autoregressive (HAR) volatility econometric model with the enhancement of jump-robust estimators. The breakpoints in the volatility are captured by dummy variables after the detection by Bai-Perron sequential multi breakpoints procedure. In order to further deal with possible abrupt jump in the volatility, the jump-robust volatility estimators are composed by using the nearest neighbor truncation approach, namely the minimum and median realized volatility. Under the structural break improvements in both the models and volatility estimators, the empirical findings show that the modified HAR model provides the best performing in-sample and out-of-sample forecast evaluations as compared with the standard HAR models. Accurate volatility forecasts have direct influential to the application of risk management and investment portfolio analysis.
Patch-based visual tracking with online representative sample selection

Science.gov (United States)

Ou, Weihua; Yuan, Di; Li, Donghao; Liu, Bin; Xia, Daoxun; Zeng, Wu

2017-05-01

Occlusion is one of the most challenging problems in visual object tracking. Recently, a lot of discriminative methods have been proposed to deal with this problem. For the discriminative methods, it is difficult to select the representative samples for the target template updating. In general, the holistic bounding boxes that contain tracked results are selected as the positive samples. However, when the objects are occluded, this simple strategy easily introduces the noises into the training data set and the target template and then leads the tracker to drift away from the target seriously. To address this problem, we propose a robust patch-based visual tracker with online representative sample selection. Different from previous works, we divide the object and the candidates into several patches uniformly and propose a score function to calculate the score of each patch independently. Then, the average score is adopted to determine the optimal candidate. Finally, we utilize the non-negative least square method to find the representative samples, which are used to update the target template. The experimental results on the object tracking benchmark 2013 and on the 13 challenging sequences show that the proposed method is robust to the occlusion and achieves promising results.
Effect of dissolved organic matter on pre-equilibrium passive sampling: A predictive QSAR modeling study.

Science.gov (United States)

Lin, Wei; Jiang, Ruifen; Shen, Yong; Xiong, Yaxin; Hu, Sizi; Xu, Jianqiao; Ouyang, Gangfeng

2018-04-13

Pre-equilibrium passive sampling is a simple and promising technique for studying sampling kinetics, which is crucial to determine the distribution, transfer and fate of hydrophobic organic compounds (HOCs) in environmental water and organisms. Environmental water samples contain complex matrices that complicate the traditional calibration process for obtaining the accurate rate constants. This study proposed a QSAR model to predict the sampling rate constants of HOCs (polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyls (PCBs) and pesticides) in aqueous systems containing complex matrices. A homemade flow-through system was established to simulate an actual aqueous environment containing dissolved organic matter (DOM) i.e. humic acid (HA) and (2-Hydroxypropyl)-β-cyclodextrin (β-HPCD)), and to obtain the experimental rate constants. Then, a quantitative structure-activity relationship (QSAR) model using Genetic Algorithm-Multiple Linear Regression (GA-MLR) was found to correlate the experimental rate constants to the system state including physicochemical parameters of the HOCs and DOM which were calculated and selected as descriptors by Density Functional Theory (DFT) and Chem 3D. The experimental results showed that the rate constants significantly increased as the concentration of DOM increased, and the enhancement factors of 70-fold and 34-fold were observed for the HOCs in HA and β-HPCD, respectively. The established QSAR model was validated as credible (R Adj. 2 =0.862) and predictable (Q 2 =0.835) in estimating the rate constants of HOCs for complex aqueous sampling, and a probable mechanism was developed by comparison to the reported theoretical study. The present study established a QSAR model of passive sampling rate constants and calibrated the effect of DOM on the sampling kinetics. Copyright © 2018 Elsevier B.V. All rights reserved.
A test of alternative estimators for volume at time 1 from remeasured point samples

Science.gov (United States)

Francis A. Roesch; Edwin J. Green; Charles T. Scott

1993-01-01

Two estimators for volume at time 1 for use with permanent horizontal point samples are evaluated. One estimator, used traditionally, uses only the trees sampled at time 1, while the second estimator, originally presented by Roesch and coauthors (F.A. Roesch, Jr., E.J. Green, and C.T. Scott. 1989. For. Sci. 35(2):281-293). takes advantage of additional sample...
Perceptual quality estimation of H.264/AVC videos using reduced-reference and no-reference models

Science.gov (United States)

Shahid, Muhammad; Pandremmenou, Katerina; Kondi, Lisimachos P.; Rossholm, Andreas; Lövström, Benny

2016-09-01

Reduced-reference (RR) and no-reference (NR) models for video quality estimation, using features that account for the impact of coding artifacts, spatio-temporal complexity, and packet losses, are proposed. The purpose of this study is to analyze a number of potentially quality-relevant features in order to select the most suitable set of features for building the desired models. The proposed sets of features have not been used in the literature and some of the features are used for the first time in this study. The features are employed by the least absolute shrinkage and selection operator (LASSO), which selects only the most influential of them toward perceptual quality. For comparison, we apply feature selection in the complete feature sets and ridge regression on the reduced sets. The models are validated using a database of H.264/AVC encoded videos that were subjectively assessed for quality in an ITU-T compliant laboratory. We infer that just two features selected by RR LASSO and two bitstream-based features selected by NR LASSO are able to estimate perceptual quality with high accuracy, higher than that of ridge, which uses more features. The comparisons with competing works and two full-reference metrics also verify the superiority of our models.
Towards a pro-health food-selection model for gatekeepers in ...

African Journals Online (AJOL)

The purpose of this study was to develop a pro-health food selection model for gatekeepers of Bulawayo high-density suburbs in Zimbabwe. Gatekeepers in five suburbs constituted the study population from which a sample of 250 subjects was randomly selected. Of the total respondents (N= 182), 167 had their own ...
Turbidity-controlled suspended sediment sampling for runoff-event load estimation

Science.gov (United States)

Jack Lewis

1996-01-01

Abstract - For estimating suspended sediment concentration (SSC) in rivers, turbidity is generally a much better predictor than water discharge. Although it is now possible to collect continuous turbidity data even at remote sites, sediment sampling and load estimation are still conventionally based on discharge. With frequent calibration the relation of turbidity to...
Performance of sampling methods to estimate log characteristics for wildlife.

Science.gov (United States)

Lisa J. Bate; Torolf R. Torgersen; Michael J. Wisdom; Edward O. Garton

2004-01-01

Accurate estimation of the characteristics of log resources, or coarse woody debris (CWD), is critical to effective management of wildlife and other forest resources. Despite the importance of logs as wildlife habitat, methods for sampling logs have traditionally focused on silvicultural and fire applications. These applications have emphasized estimates of log volume...
Variations among animals when estimating the undegradable fraction of fiber in forage samples

Directory of Open Access Journals (Sweden)

Cláudia Batista Sampaio

2014-10-01

Full Text Available The objective of this study was to assess the variability among animals regarding the critical time to estimate the undegradable fraction of fiber (ct using an in situ incubation procedure. Five rumenfistulated Nellore steers were used to estimate the degradation profile of fiber. Animals were fed a standard diet with an 80:20 forage:concentrate ratio. Sugarcane, signal grass hay, corn silage and fresh elephant grass samples were assessed. Samples were put in F57 Ankom® bags and were incubated in the rumens of the animals for 0, 6, 12, 18, 24, 48, 72, 96, 120, 144, 168, 192, 216, 240 and 312 hours. The degradation profiles were interpreted using a mixed non-linear model in which a random effect was associated with the degradation rate. For sugarcane, signal grass hay and corn silage, there were no significant variations among animals regarding the fractional degradation rate of neutral and acid detergent fiber; consequently, the ct required to estimate the undegradable fiber fraction did not vary among animals for those forages. However, a significant variability among animals was found for the fresh elephant grass. The results seem to suggest that the variability among animals regarding the degradation rate of fibrous components can be significant.
A hierarchical model for estimating density in camera-trap studies

Science.gov (United States)

Royle, J. Andrew; Nichols, James D.; Karanth, K.Ullas; Gopalaswamy, Arjun M.

2009-01-01

Estimating animal density using capture–recapture data from arrays of detection devices such as camera traps has been problematic due to the movement of individuals and heterogeneity in capture probability among them induced by differential exposure to trapping.We develop a spatial capture–recapture model for estimating density from camera-trapping data which contains explicit models for the spatial point process governing the distribution of individuals and their exposure to and detection by traps.We adopt a Bayesian approach to analysis of the hierarchical model using the technique of data augmentation.The model is applied to photographic capture–recapture data on tigers Panthera tigris in Nagarahole reserve, India. Using this model, we estimate the density of tigers to be 14·3 animals per 100 km2 during 2004.Synthesis and applications. Our modelling framework largely overcomes several weaknesses in conventional approaches to the estimation of animal density from trap arrays. It effectively deals with key problems such as individual heterogeneity in capture probabilities, movement of traps, presence of potential ‘holes’ in the array and ad hoc estimation of sample area. The formulation, thus, greatly enhances flexibility in the conduct of field surveys as well as in the analysis of data, from studies that may involve physical, photographic or DNA-based ‘captures’ of individual animals.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.