WorldWideScience

Sample records for regression estimated relative

  1. Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data.

    Science.gov (United States)

    Yelland, Lisa N; Salter, Amy B; Ryan, Philip

    2011-10-15

    Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.

  2. Regression estimators for generic health-related quality of life and quality-adjusted life years.

    Science.gov (United States)

    Basu, Anirban; Manca, Andrea

    2012-01-01

    To develop regression models for outcomes with truncated supports, such as health-related quality of life (HRQoL) data, and account for features typical of such data such as a skewed distribution, spikes at 1 or 0, and heteroskedasticity. Regression estimators based on features of the Beta distribution. First, both a single equation and a 2-part model are presented, along with estimation algorithms based on maximum-likelihood, quasi-likelihood, and Bayesian Markov-chain Monte Carlo methods. A novel Bayesian quasi-likelihood estimator is proposed. Second, a simulation exercise is presented to assess the performance of the proposed estimators against ordinary least squares (OLS) regression for a variety of HRQoL distributions that are encountered in practice. Finally, the performance of the proposed estimators is assessed by using them to quantify the treatment effect on QALYs in the EVALUATE hysterectomy trial. Overall model fit is studied using several goodness-of-fit tests such as Pearson's correlation test, link and reset tests, and a modified Hosmer-Lemeshow test. The simulation results indicate that the proposed methods are more robust in estimating covariate effects than OLS, especially when the effects are large or the HRQoL distribution has a large spike at 1. Quasi-likelihood techniques are more robust than maximum likelihood estimators. When applied to the EVALUATE trial, all but the maximum likelihood estimators produce unbiased estimates of the treatment effect. One and 2-part Beta regression models provide flexible approaches to regress the outcomes with truncated supports, such as HRQoL, on covariates, after accounting for many idiosyncratic features of the outcomes distribution. This work will provide applied researchers with a practical set of tools to model outcomes in cost-effectiveness analysis.

  3. A logistic regression estimating function for spatial Gibbs point processes

    DEFF Research Database (Denmark)

    Baddeley, Adrian; Coeurjolly, Jean-François; Rubak, Ege

    We propose a computationally efficient logistic regression estimating function for spatial Gibbs point processes. The sample points for the logistic regression consist of the observed point pattern together with a random pattern of dummy points. The estimating function is closely related to the p......We propose a computationally efficient logistic regression estimating function for spatial Gibbs point processes. The sample points for the logistic regression consist of the observed point pattern together with a random pattern of dummy points. The estimating function is closely related...

  4. Independent contrasts and PGLS regression estimators are equivalent.

    Science.gov (United States)

    Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary

    2012-05-01

    We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.

  5. Estimation of genetic parameters related to eggshell strength using random regression models.

    Science.gov (United States)

    Guo, J; Ma, M; Qu, L; Shen, M; Dou, T; Wang, K

    2015-01-01

    This study examined the changes in eggshell strength and the genetic parameters related to this trait throughout a hen's laying life using random regression. The data were collected from a crossbred population between 2011 and 2014, where the eggshell strength was determined repeatedly for 2260 hens. Using random regression models (RRMs), several Legendre polynomials were employed to estimate the fixed, direct genetic and permanent environment effects. The residual effects were treated as independently distributed with heterogeneous variance for each test week. The direct genetic variance was included with second-order Legendre polynomials and the permanent environment with third-order Legendre polynomials. The heritability of eggshell strength ranged from 0.26 to 0.43, the repeatability ranged between 0.47 and 0.69, and the estimated genetic correlations between test weeks was high at > 0.67. The first eigenvalue of the genetic covariance matrix accounted for about 97% of the sum of all the eigenvalues. The flexibility and statistical power of RRM suggest that this model could be an effective method to improve eggshell quality and to reduce losses due to cracked eggs in a breeding plan.

  6. Relation of whole blood carboxyhemoglobin concentration to ambient carbon monoxide exposure estimated using regression.

    Science.gov (United States)

    Rudra, Carole B; Williams, Michelle A; Sheppard, Lianne; Koenig, Jane Q; Schiff, Melissa A; Frederick, Ihunnaya O; Dills, Russell

    2010-04-15

    Exposure to carbon monoxide (CO) and other ambient air pollutants is associated with adverse pregnancy outcomes. While there are several methods of estimating CO exposure, few have been evaluated against exposure biomarkers. The authors examined the relation between estimated CO exposure and blood carboxyhemoglobin concentration in 708 pregnant western Washington State women (1996-2004). Carboxyhemoglobin was measured in whole blood drawn around 13 weeks' gestation. CO exposure during the month of blood draw was estimated using a regression model containing predictor terms for year, month, street and population densities, and distance to the nearest major road. Year and month were the strongest predictors. Carboxyhemoglobin level was correlated with estimated CO exposure (rho = 0.22, 95% confidence interval (CI): 0.15, 0.29). After adjustment for covariates, each 10% increase in estimated exposure was associated with a 1.12% increase in median carboxyhemoglobin level (95% CI: 0.54, 1.69). This association remained after exclusion of 286 women who reported smoking or being exposed to secondhand smoke (rho = 0.24). In this subgroup, the median carboxyhemoglobin concentration increased 1.29% (95% CI: 0.67, 1.91) for each 10% increase in CO exposure. Monthly estimated CO exposure was moderately correlated with an exposure biomarker. These results support the validity of this regression model for estimating ambient CO exposures in this population and geographic setting.

  7. Ridge regression estimator: combining unbiased and ordinary ridge regression methods of estimation

    Directory of Open Access Journals (Sweden)

    Sharad Damodar Gore

    2009-10-01

    Full Text Available Statistical literature has several methods for coping with multicollinearity. This paper introduces a new shrinkage estimator, called modified unbiased ridge (MUR. This estimator is obtained from unbiased ridge regression (URR in the same way that ordinary ridge regression (ORR is obtained from ordinary least squares (OLS. Properties of MUR are derived. Results on its matrix mean squared error (MMSE are obtained. MUR is compared with ORR and URR in terms of MMSE. These results are illustrated with an example based on data generated by Hoerl and Kennard (1975.

  8. Estimating the exceedance probability of rain rate by logistic regression

    Science.gov (United States)

    Chiu, Long S.; Kedem, Benjamin

    1990-01-01

    Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.

  9. Estimating the prevalence of 26 health-related indicators at neighbourhood level in the Netherlands using structured additive regression.

    Science.gov (United States)

    van de Kassteele, Jan; Zwakhals, Laurens; Breugelmans, Oscar; Ameling, Caroline; van den Brink, Carolien

    2017-07-01

    Local policy makers increasingly need information on health-related indicators at smaller geographic levels like districts or neighbourhoods. Although more large data sources have become available, direct estimates of the prevalence of a health-related indicator cannot be produced for neighbourhoods for which only small samples or no samples are available. Small area estimation provides a solution, but unit-level models for binary-valued outcomes that can handle both non-linear effects of the predictors and spatially correlated random effects in a unified framework are rarely encountered. We used data on 26 binary-valued health-related indicators collected on 387,195 persons in the Netherlands. We associated the health-related indicators at the individual level with a set of 12 predictors obtained from national registry data. We formulated a structured additive regression model for small area estimation. The model captured potential non-linear relations between the predictors and the outcome through additive terms in a functional form using penalized splines and included a term that accounted for spatially correlated heterogeneity between neighbourhoods. The registry data were used to predict individual outcomes which in turn are aggregated into higher geographical levels, i.e. neighbourhoods. We validated our method by comparing the estimated prevalences with observed prevalences at the individual level and by comparing the estimated prevalences with direct estimates obtained by weighting methods at municipality level. We estimated the prevalence of the 26 health-related indicators for 415 municipalities, 2599 districts and 11,432 neighbourhoods in the Netherlands. We illustrate our method on overweight data and show that there are distinct geographic patterns in the overweight prevalence. Calibration plots show that the estimated prevalences agree very well with observed prevalences at the individual level. The estimated prevalences agree reasonably well with the

  10. A flexible fuzzy regression algorithm for forecasting oil consumption estimation

    International Nuclear Information System (INIS)

    Azadeh, A.; Khakestani, M.; Saberi, M.

    2009-01-01

    Oil consumption plays a vital role in socio-economic development of most countries. This study presents a flexible fuzzy regression algorithm for forecasting oil consumption based on standard economic indicators. The standard indicators are annual population, cost of crude oil import, gross domestic production (GDP) and annual oil production in the last period. The proposed algorithm uses analysis of variance (ANOVA) to select either fuzzy regression or conventional regression for future demand estimation. The significance of the proposed algorithm is three fold. First, it is flexible and identifies the best model based on the results of ANOVA and minimum absolute percentage error (MAPE), whereas previous studies consider the best fitted fuzzy regression model based on MAPE or other relative error results. Second, the proposed model may identify conventional regression as the best model for future oil consumption forecasting because of its dynamic structure, whereas previous studies assume that fuzzy regression always provide the best solutions and estimation. Third, it utilizes the most standard independent variables for the regression models. To show the applicability and superiority of the proposed flexible fuzzy regression algorithm the data for oil consumption in Canada, United States, Japan and Australia from 1990 to 2005 are used. The results show that the flexible algorithm provides accurate solution for oil consumption estimation problem. The algorithm may be used by policy makers to accurately foresee the behavior of oil consumption in various regions.

  11. Parameter Estimation for Improving Association Indicators in Binary Logistic Regression

    Directory of Open Access Journals (Sweden)

    Mahdi Bashiri

    2012-02-01

    Full Text Available The aim of this paper is estimation of Binary logistic regression parameters for maximizing the log-likelihood function with improved association indicators. In this paper the parameter estimation steps have been explained and then measures of association have been introduced and their calculations have been analyzed. Moreover a new related indicators based on membership degree level have been expressed. Indeed association measures demonstrate the number of success responses occurred in front of failure in certain number of Bernoulli independent experiments. In parameter estimation, existing indicators values is not sensitive to the parameter values, whereas the proposed indicators are sensitive to the estimated parameters during the iterative procedure. Therefore, proposing a new association indicator of binary logistic regression with more sensitivity to the estimated parameters in maximizing the log- likelihood in iterative procedure is innovation of this study.

  12. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    Science.gov (United States)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  13. Principal component regression for crop yield estimation

    CERN Document Server

    Suryanarayana, T M V

    2016-01-01

    This book highlights the estimation of crop yield in Central Gujarat, especially with regard to the development of Multiple Regression Models and Principal Component Regression (PCR) models using climatological parameters as independent variables and crop yield as a dependent variable. It subsequently compares the multiple linear regression (MLR) and PCR results, and discusses the significance of PCR for crop yield estimation. In this context, the book also covers Principal Component Analysis (PCA), a statistical procedure used to reduce a number of correlated variables into a smaller number of uncorrelated variables called principal components (PC). This book will be helpful to the students and researchers, starting their works on climate and agriculture, mainly focussing on estimation models. The flow of chapters takes the readers in a smooth path, in understanding climate and weather and impact of climate change, and gradually proceeds towards downscaling techniques and then finally towards development of ...

  14. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2011-01-01

    In this paper, two non-parametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a more viable alternative to existing kernel-based approaches. The second estimator

  15. Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator

    DEFF Research Database (Denmark)

    Johansen, Søren; Nielsen, Bent

    2013-01-01

    In regression we can delete outliers based upon a preliminary estimator and reestimate the parameters by least squares based upon the retained observations. We study the properties of an iteratively defined sequence of estimators based on this idea. We relate the sequence to the Huber-skip estima......In regression we can delete outliers based upon a preliminary estimator and reestimate the parameters by least squares based upon the retained observations. We study the properties of an iteratively defined sequence of estimators based on this idea. We relate the sequence to the Huber...... that the normalized estimation errors are tight and are close to a linear function of the kernel, thus providing a stochastic expansion of the estimators, which is the same as for the Huber-skip. This implies that the iterated estimator is a close approximation of the Huber-skip...

  16. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    Science.gov (United States)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  17. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2009-01-01

    In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By

  18. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2010-01-01

    In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By

  19. Regression Equations for Birth Weight Estimation using ...

    African Journals Online (AJOL)

    In this study, Birth Weight has been estimated from anthropometric measurements of hand and foot. Linear regression equations were formed from each of the measured variables. These simple equations can be used to estimate Birth Weight of new born babies, in order to identify those with low birth weight and referred to ...

  20. Sparse reduced-rank regression with covariance estimation

    KAUST Repository

    Chen, Lisha

    2014-12-08

    Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.

  1. Sparse reduced-rank regression with covariance estimation

    KAUST Repository

    Chen, Lisha; Huang, Jianhua Z.

    2014-01-01

    Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.

  2. Regression and Sparse Regression Methods for Viscosity Estimation of Acid Milk From it’s Sls Features

    DEFF Research Database (Denmark)

    Sharifzadeh, Sara; Skytte, Jacob Lercke; Nielsen, Otto Højager Attermann

    2012-01-01

    Statistical solutions find wide spread use in food and medicine quality control. We investigate the effect of different regression and sparse regression methods for a viscosity estimation problem using the spectro-temporal features from new Sub-Surface Laser Scattering (SLS) vision system. From...... with sparse LAR, lasso and Elastic Net (EN) sparse regression methods. Due to the inconsistent measurement condition, Locally Weighted Scatter plot Smoothing (Loess) has been employed to alleviate the undesired variation in the estimated viscosity. The experimental results of applying different methods show...

  3. [Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].

    Science.gov (United States)

    Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao

    2016-03-01

    Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.

  4. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...

  5. Robust median estimator in logisitc regression

    Czech Academy of Sciences Publication Activity Database

    Hobza, T.; Pardo, L.; Vajda, Igor

    2008-01-01

    Roč. 138, č. 12 (2008), s. 3822-3840 ISSN 0378-3758 R&D Projects: GA MŠk 1M0572 Grant - others:Instituto Nacional de Estadistica (ES) MPO FI - IM3/136; GA MŠk(CZ) MTM 2006-06872 Institutional research plan: CEZ:AV0Z10750506 Keywords : Logistic regression * Median * Robustness * Consistency and asymptotic normality * Morgenthaler * Bianco and Yohai * Croux and Hasellbroeck Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.679, year: 2008 http://library.utia.cas.cz/separaty/2008/SI/vajda-robust%20median%20estimator%20in%20logistic%20regression.pdf

  6. Estimating nonlinear selection gradients using quadratic regression coefficients: double or nothing?

    Science.gov (United States)

    Stinchcombe, John R; Agrawal, Aneil F; Hohenlohe, Paul A; Arnold, Stevan J; Blows, Mark W

    2008-09-01

    The use of regression analysis has been instrumental in allowing evolutionary biologists to estimate the strength and mode of natural selection. Although directional and correlational selection gradients are equal to their corresponding regression coefficients, quadratic regression coefficients must be doubled to estimate stabilizing/disruptive selection gradients. Based on a sample of 33 papers published in Evolution between 2002 and 2007, at least 78% of papers have not doubled quadratic regression coefficients, leading to an appreciable underestimate of the strength of stabilizing and disruptive selection. Proper treatment of quadratic regression coefficients is necessary for estimation of fitness surfaces and contour plots, canonical analysis of the gamma matrix, and modeling the evolution of populations on an adaptive landscape.

  7. Estimation Methods for Non-Homogeneous Regression - Minimum CRPS vs Maximum Likelihood

    Science.gov (United States)

    Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim

    2017-04-01

    Non-homogeneous regression models are widely used to statistically post-process numerical weather prediction models. Such regression models correct for errors in mean and variance and are capable to forecast a full probability distribution. In order to estimate the corresponding regression coefficients, CRPS minimization is performed in many meteorological post-processing studies since the last decade. In contrast to maximum likelihood estimation, CRPS minimization is claimed to yield more calibrated forecasts. Theoretically, both scoring rules used as an optimization score should be able to locate a similar and unknown optimum. Discrepancies might result from a wrong distributional assumption of the observed quantity. To address this theoretical concept, this study compares maximum likelihood and minimum CRPS estimation for different distributional assumptions. First, a synthetic case study shows that, for an appropriate distributional assumption, both estimation methods yield to similar regression coefficients. The log-likelihood estimator is slightly more efficient. A real world case study for surface temperature forecasts at different sites in Europe confirms these results but shows that surface temperature does not always follow the classical assumption of a Gaussian distribution. KEYWORDS: ensemble post-processing, maximum likelihood estimation, CRPS minimization, probabilistic temperature forecasting, distributional regression models

  8. [Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].

    Science.gov (United States)

    Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L

    2017-03-10

    To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.

  9. Kinetic microplate bioassays for relative potency of antibiotics improved by partial Least Square (PLS) regression.

    Science.gov (United States)

    Francisco, Fabiane Lacerda; Saviano, Alessandro Morais; Almeida, Túlia de Souza Botelho; Lourenço, Felipe Rebello

    2016-05-01

    Microbiological assays are widely used to estimate the relative potencies of antibiotics in order to guarantee the efficacy, safety, and quality of drug products. Despite of the advantages of turbidimetric bioassays when compared to other methods, it has limitations concerning the linearity and range of the dose-response curve determination. Here, we proposed to use partial least squares (PLS) regression to solve these limitations and to improve the prediction of relative potencies of antibiotics. Kinetic-reading microplate turbidimetric bioassays for apramacyin and vancomycin were performed using Escherichia coli (ATCC 8739) and Bacillus subtilis (ATCC 6633), respectively. Microbial growths were measured as absorbance up to 180 and 300min for apramycin and vancomycin turbidimetric bioassays, respectively. Conventional dose-response curves (absorbances or area under the microbial growth curve vs. log of antibiotic concentration) showed significant regression, however there were significant deviation of linearity. Thus, they could not be used for relative potency estimations. PLS regression allowed us to construct a predictive model for estimating the relative potencies of apramycin and vancomycin without over-fitting and it improved the linear range of turbidimetric bioassay. In addition, PLS regression provided predictions of relative potencies equivalent to those obtained from agar diffusion official methods. Therefore, we conclude that PLS regression may be used to estimate the relative potencies of antibiotics with significant advantages when compared to conventional dose-response curve determination. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. The efficiency of modified jackknife and ridge type regression estimators: a comparison

    Directory of Open Access Journals (Sweden)

    Sharad Damodar Gore

    2008-09-01

    Full Text Available A common problem in multiple regression models is multicollinearity, which produces undesirable effects on the least squares estimator. To circumvent this problem, two well known estimation procedures are often suggested in the literature. They are Generalized Ridge Regression (GRR estimation suggested by Hoerl and Kennard iteb8 and the Jackknifed Ridge Regression (JRR estimation suggested by Singh et al. iteb13. The GRR estimation leads to a reduction in the sampling variance, whereas, JRR leads to a reduction in the bias. In this paper, we propose a new estimator namely, Modified Jackknife Ridge Regression Estimator (MJR. It is based on the criterion that combines the ideas underlying both the GRR and JRR estimators. We have investigated standard properties of this new estimator. From a simulation study, we find that the new estimator often outperforms the LASSO, and it is superior to both GRR and JRR estimators, using the mean squared error criterion. The conditions under which the MJR estimator is better than the other two competing estimators have been investigated.

  11. Dynamic travel time estimation using regression trees.

    Science.gov (United States)

    2008-10-01

    This report presents a methodology for travel time estimation by using regression trees. The dissemination of travel time information has become crucial for effective traffic management, especially under congested road conditions. In the absence of c...

  12. The effect of high leverage points on the logistic ridge regression estimator having multicollinearity

    Science.gov (United States)

    Ariffin, Syaiba Balqish; Midi, Habshah

    2014-06-01

    This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.

  13. A SAS-macro for estimation of the cumulative incidence using Poisson regression

    DEFF Research Database (Denmark)

    Waltoft, Berit Lindum

    2009-01-01

    the hazard rates, and the hazard rates are often estimated by the Cox regression. This procedure may not be suitable for large studies due to limited computer resources. Instead one uses Poisson regression, which approximates the Cox regression. Rosthøj et al. presented a SAS-macro for the estimation...... of the cumulative incidences based on the Cox regression. I present the functional form of the probabilities and variances when using piecewise constant hazard rates and a SAS-macro for the estimation using Poisson regression. The use of the macro is demonstrated through examples and compared to the macro presented...

  14. Tightness of M-estimators for multiple linear regression in time series

    DEFF Research Database (Denmark)

    Johansen, Søren; Nielsen, Bent

    We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...

  15. On the relation between S-Estimators and M-Estimators of multivariate location and covariance

    NARCIS (Netherlands)

    Lopuhaa, H.P.

    1987-01-01

    We discuss the relation between S-estimators and M-estimators of multivariate location and covariance. As in the case of the estimation of a multiple regression parameter, S-estimators are shown to satisfy first-order conditions of M-estimators. We show that the influence function IF (x;S F) of

  16. On the estimation and testing of predictive panel regressions

    NARCIS (Netherlands)

    Karabiyik, H.; Westerlund, Joakim; Narayan, Paresh

    2016-01-01

    Hjalmarsson (2010) considers an OLS-based estimator of predictive panel regressions that is argued to be mixed normal under very general conditions. In a recent paper, Westerlund et al. (2016) show that while consistent, the estimator is generally not mixed normal, which invalidates standard normal

  17. Mass estimation of loose parts in nuclear power plant based on multiple regression

    International Nuclear Information System (INIS)

    He, Yuanfeng; Cao, Yanlong; Yang, Jiangxin; Gan, Chunbiao

    2012-01-01

    According to the application of the Hilbert–Huang transform to the non-stationary signal and the relation between the mass of loose parts in nuclear power plant and corresponding frequency content, a new method for loose part mass estimation based on the marginal Hilbert–Huang spectrum (MHS) and multiple regression is proposed in this paper. The frequency spectrum of a loose part in a nuclear power plant can be expressed by the MHS. The multiple regression model that is constructed by the MHS feature of the impact signals for mass estimation is used to predict the unknown masses of a loose part. A simulated experiment verified that the method is feasible and the errors of the results are acceptable. (paper)

  18. Challenges Associated with Estimating Utility in Wet Age-Related Macular Degeneration: A Novel Regression Analysis to Capture the Bilateral Nature of the Disease.

    Science.gov (United States)

    Hodgson, Robert; Reason, Timothy; Trueman, David; Wickstead, Rose; Kusel, Jeanette; Jasilek, Adam; Claxton, Lindsay; Taylor, Matthew; Pulikottil-Jacob, Ruth

    2017-10-01

    The estimation of utility values for the economic evaluation of therapies for wet age-related macular degeneration (AMD) is a particular challenge. Previous economic models in wet AMD have been criticized for failing to capture the bilateral nature of wet AMD by modelling visual acuity (VA) and utility values associated with the better-seeing eye only. Here we present a de novo regression analysis using generalized estimating equations (GEE) applied to a previous dataset of time trade-off (TTO)-derived utility values from a sample of the UK population that wore contact lenses to simulate visual deterioration in wet AMD. This analysis allows utility values to be estimated as a function of VA in both the better-seeing eye (BSE) and worse-seeing eye (WSE). VAs in both the BSE and WSE were found to be statistically significant (p regression analysis provides a possible source of utility values to allow future economic models to capture the quality of life impact of changes in VA in both eyes. Novartis Pharmaceuticals UK Limited.

  19. Multiplication factor versus regression analysis in stature estimation from hand and foot dimensions.

    Science.gov (United States)

    Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha

    2012-05-01

    Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  20. Estimation of Ordinary Differential Equation Parameters Using Constrained Local Polynomial Regression.

    Science.gov (United States)

    Ding, A Adam; Wu, Hulin

    2014-10-01

    We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method.

  1. Small sample GEE estimation of regression parameters for longitudinal data.

    Science.gov (United States)

    Paul, Sudhir; Zhang, Xuemao

    2014-09-28

    Longitudinal (clustered) response data arise in many bio-statistical applications which, in general, cannot be assumed to be independent. Generalized estimating equation (GEE) is a widely used method to estimate marginal regression parameters for correlated responses. The advantage of the GEE is that the estimates of the regression parameters are asymptotically unbiased even if the correlation structure is misspecified, although their small sample properties are not known. In this paper, two bias adjusted GEE estimators of the regression parameters in longitudinal data are obtained when the number of subjects is small. One is based on a bias correction, and the other is based on a bias reduction. Simulations show that the performances of both the bias-corrected methods are similar in terms of bias, efficiency, coverage probability, average coverage length, impact of misspecification of correlation structure, and impact of cluster size on bias correction. Both these methods show superior properties over the GEE estimates for small samples. Further, analysis of data involving a small number of subjects also shows improvement in bias, MSE, standard error, and length of the confidence interval of the estimates by the two bias adjusted methods over the GEE estimates. For small to moderate sample sizes (N ≤50), either of the bias-corrected methods GEEBc and GEEBr can be used. However, the method GEEBc should be preferred over GEEBr, as the former is computationally easier. For large sample sizes, the GEE method can be used. Copyright © 2014 John Wiley & Sons, Ltd.

  2. Early cost estimating for road construction projects using multiple regression techniques

    Directory of Open Access Journals (Sweden)

    Ibrahim Mahamid

    2011-12-01

    Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.

  3. Higher-order Multivariable Polynomial Regression to Estimate Human Affective States

    Science.gov (United States)

    Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin

    2016-03-01

    From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.

  4. Reducing Monte Carlo error in the Bayesian estimation of risk ratios using log-binomial regression models.

    Science.gov (United States)

    Salmerón, Diego; Cano, Juan A; Chirlaque, María D

    2015-08-30

    In cohort studies, binary outcomes are very often analyzed by logistic regression. However, it is well known that when the goal is to estimate a risk ratio, the logistic regression is inappropriate if the outcome is common. In these cases, a log-binomial regression model is preferable. On the other hand, the estimation of the regression coefficients of the log-binomial model is difficult owing to the constraints that must be imposed on these coefficients. Bayesian methods allow a straightforward approach for log-binomial regression models and produce smaller mean squared errors in the estimation of risk ratios than the frequentist methods, and the posterior inferences can be obtained using the software WinBUGS. However, Markov chain Monte Carlo methods implemented in WinBUGS can lead to large Monte Carlo errors in the approximations to the posterior inferences because they produce correlated simulations, and the accuracy of the approximations are inversely related to this correlation. To reduce correlation and to improve accuracy, we propose a reparameterization based on a Poisson model and a sampling algorithm coded in R. Copyright © 2015 John Wiley & Sons, Ltd.

  5. Estimating monotonic rates from biological data using local linear regression.

    Science.gov (United States)

    Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R

    2017-03-01

    Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.

  6. Robust best linear estimation for regression analysis using surrogate and instrumental variables.

    Science.gov (United States)

    Wang, C Y

    2012-04-01

    We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case-control study.

  7. truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models

    Directory of Open Access Journals (Sweden)

    Maria Karlsson

    2014-05-01

    Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.

  8. A Comparison of Regression Techniques for Estimation of Above-Ground Winter Wheat Biomass Using Near-Surface Spectroscopy

    Directory of Open Access Journals (Sweden)

    Jibo Yue

    2018-01-01

    Full Text Available Above-ground biomass (AGB provides a vital link between solar energy consumption and yield, so its correct estimation is crucial to accurately monitor crop growth and predict yield. In this work, we estimate AGB by using 54 vegetation indexes (e.g., Normalized Difference Vegetation Index, Soil-Adjusted Vegetation Index and eight statistical regression techniques: artificial neural network (ANN, multivariable linear regression (MLR, decision-tree regression (DT, boosted binary regression tree (BBRT, partial least squares regression (PLSR, random forest regression (RF, support vector machine regression (SVM, and principal component regression (PCR, which are used to analyze hyperspectral data acquired by using a field spectrophotometer. The vegetation indexes (VIs determined from the spectra were first used to train regression techniques for modeling and validation to select the best VI input, and then summed with white Gaussian noise to study how remote sensing errors affect the regression techniques. Next, the VIs were divided into groups of different sizes by using various sampling methods for modeling and validation to test the stability of the techniques. Finally, the AGB was estimated by using a leave-one-out cross validation with these powerful techniques. The results of the study demonstrate that, of the eight techniques investigated, PLSR and MLR perform best in terms of stability and are most suitable when high-accuracy and stable estimates are required from relatively few samples. In addition, RF is extremely robust against noise and is best suited to deal with repeated observations involving remote-sensing data (i.e., data affected by atmosphere, clouds, observation times, and/or sensor noise. Finally, the leave-one-out cross-validation method indicates that PLSR provides the highest accuracy (R2 = 0.89, RMSE = 1.20 t/ha, MAE = 0.90 t/ha, NRMSE = 0.07, CV (RMSE = 0.18; thus, PLSR is best suited for works requiring high

  9. Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions.

    Science.gov (United States)

    Rativa, Diego; Fernandes, Bruno J T; Roque, Alexandre

    2018-01-01

    Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care.

  10. Comparison of regression models for estimation of isometric wrist joint torques using surface electromyography

    Directory of Open Access Journals (Sweden)

    Menon Carlo

    2011-09-01

    Full Text Available Abstract Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2 values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS was shown to have high isometric torque estimation accuracy combined with very short training times.

  11. Support vector regression methodology for estimating global solar radiation in Algeria

    Science.gov (United States)

    Guermoui, Mawloud; Rabehi, Abdelaziz; Gairaa, Kacem; Benkaciali, Said

    2018-01-01

    Accurate estimation of Daily Global Solar Radiation (DGSR) has been a major goal for solar energy applications. In this paper we show the possibility of developing a simple model based on the Support Vector Regression (SVM-R), which could be used to estimate DGSR on the horizontal surface in Algeria based only on sunshine ratio as input. The SVM model has been developed and tested using a data set recorded over three years (2005-2007). The data was collected at the Applied Research Unit for Renewable Energies (URAER) in Ghardaïa city. The data collected between 2005-2006 are used to train the model while the 2007 data are used to test the performance of the selected model. The measured and the estimated values of DGSR were compared during the testing phase statistically using the Root Mean Square Error (RMSE), Relative Square Error (rRMSE), and correlation coefficient (r2), which amount to 1.59(MJ/m2), 8.46 and 97,4%, respectively. The obtained results show that the SVM-R is highly qualified for DGSR estimation using only sunshine ratio.

  12. The Collinearity Free and Bias Reduced Regression Estimation Project: The Theory of Normalization Ridge Regression. Report No. 2.

    Science.gov (United States)

    Bulcock, J. W.; And Others

    Multicollinearity refers to the presence of highly intercorrelated independent variables in structural equation models, that is, models estimated by using techniques such as least squares regression and maximum likelihood. There is a problem of multicollinearity in both the natural and social sciences where theory formulation and estimation is in…

  13. Simultaneous Estimation of Regression Functions for Marine Corps Technical Training Specialties.

    Science.gov (United States)

    Dunbar, Stephen B.; And Others

    This paper considers the application of Bayesian techniques for simultaneous estimation to the specification of regression weights for selection tests used in various technical training courses in the Marine Corps. Results of a method for m-group regression developed by Molenaar and Lewis (1979) suggest that common weights for training courses…

  14. Estimation of monthly solar exposure on horizontal surface by Angstrom-type regression equation

    International Nuclear Information System (INIS)

    Ravanshid, S.H.

    1981-01-01

    To obtain solar flux intensity, solar radiation measuring instruments are the best. In the absence of instrumental data there are other meteorological measurements which are related to solar energy and also it is possible to use empirical relationships to estimate solar flux intensit. One of these empirical relationships to estimate monthly averages of total solar radiation on a horizontal surface is the modified angstrom-type regression equation which has been employed in this report in order to estimate the solar flux intensity on a horizontal surface for Tehran. By comparing the results of this equation with four years measured valued by Tehran's meteorological weather station the values of meteorological constants (a,b) in the equation were obtained for Tehran. (author)

  15. Robust estimation for homoscedastic regression in the secondary analysis of case-control data

    KAUST Repository

    Wei, Jiawei; Carroll, Raymond J.; Mü ller, Ursula U.; Keilegom, Ingrid Van; Chatterjee, Nilanjan

    2012-01-01

    Primary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.

  16. Robust estimation for homoscedastic regression in the secondary analysis of case-control data

    KAUST Repository

    Wei, Jiawei

    2012-12-04

    Primary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.

  17. Adjusting for overdispersion in piecewise exponential regression models to estimate excess mortality rate in population-based research.

    Science.gov (United States)

    Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard

    2016-10-01

    In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.

  18. Inverse estimation of multiple muscle activations based on linear logistic regression.

    Science.gov (United States)

    Sekiya, Masashi; Tsuji, Toshiaki

    2017-07-01

    This study deals with a technology to estimate the muscle activity from the movement data using a statistical model. A linear regression (LR) model and artificial neural networks (ANN) have been known as statistical models for such use. Although ANN has a high estimation capability, it is often in the clinical application that the lack of data amount leads to performance deterioration. On the other hand, the LR model has a limitation in generalization performance. We therefore propose a muscle activity estimation method to improve the generalization performance through the use of linear logistic regression model. The proposed method was compared with the LR model and ANN in the verification experiment with 7 participants. As a result, the proposed method showed better generalization performance than the conventional methods in various tasks.

  19. Optimized support vector regression for drilling rate of penetration estimation

    Science.gov (United States)

    Bodaghi, Asadollah; Ansari, Hamid Reza; Gholami, Mahsa

    2015-12-01

    In the petroleum industry, drilling optimization involves the selection of operating conditions for achieving the desired depth with the minimum expenditure while requirements of personal safety, environment protection, adequate information of penetrated formations and productivity are fulfilled. Since drilling optimization is highly dependent on the rate of penetration (ROP), estimation of this parameter is of great importance during well planning. In this research, a novel approach called `optimized support vector regression' is employed for making a formulation between input variables and ROP. Algorithms used for optimizing the support vector regression are the genetic algorithm (GA) and the cuckoo search algorithm (CS). Optimization implementation improved the support vector regression performance by virtue of selecting proper values for its parameters. In order to evaluate the ability of optimization algorithms in enhancing SVR performance, their results were compared to the hybrid of pattern search and grid search (HPG) which is conventionally employed for optimizing SVR. The results demonstrated that the CS algorithm achieved further improvement on prediction accuracy of SVR compared to the GA and HPG as well. Moreover, the predictive model derived from back propagation neural network (BPNN), which is the traditional approach for estimating ROP, is selected for comparisons with CSSVR. The comparative results revealed the superiority of CSSVR. This study inferred that CSSVR is a viable option for precise estimation of ROP.

  20. Two biased estimation techniques in linear regression: Application to aircraft

    Science.gov (United States)

    Klein, Vladislav

    1988-01-01

    Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.

  1. Generalized allometric regression to estimate biomass of Populus in short-rotation coppice

    Energy Technology Data Exchange (ETDEWEB)

    Ben Brahim, Mohammed; Gavaland, Andre; Cabanettes, Alain [INRA Centre de Toulouse, Castanet-Tolosane Cedex (France). Unite Agroforesterie et Foret Paysanne

    2000-07-01

    Data from four different stands were combined to establish a single generalized allometric equation to estimate above-ground biomass of individual Populus trees grown on short-rotation coppice. The generalized model was performed using diameter at breast height, the mean diameter and the mean height of each site as dependent variables and then compared with the stand-specific regressions using F-test. Results showed that this single regression estimates tree biomass well at each stand and does not introduce bias with increasing diameter.

  2. Comparison of some biased estimation methods (including ordinary subset regression) in the linear model

    Science.gov (United States)

    Sidik, S. M.

    1975-01-01

    Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.

  3. Brillouin Scattering Spectrum Analysis Based on Auto-Regressive Spectral Estimation

    Science.gov (United States)

    Huang, Mengyun; Li, Wei; Liu, Zhangyun; Cheng, Linghao; Guan, Bai-Ou

    2018-06-01

    Auto-regressive (AR) spectral estimation technology is proposed to analyze the Brillouin scattering spectrum in Brillouin optical time-domain refelectometry. It shows that AR based method can reliably estimate the Brillouin frequency shift with an accuracy much better than fast Fourier transform (FFT) based methods provided the data length is not too short. It enables about 3 times improvement over FFT at a moderate spatial resolution.

  4. Brillouin Scattering Spectrum Analysis Based on Auto-Regressive Spectral Estimation

    Science.gov (United States)

    Huang, Mengyun; Li, Wei; Liu, Zhangyun; Cheng, Linghao; Guan, Bai-Ou

    2018-03-01

    Auto-regressive (AR) spectral estimation technology is proposed to analyze the Brillouin scattering spectrum in Brillouin optical time-domain refelectometry. It shows that AR based method can reliably estimate the Brillouin frequency shift with an accuracy much better than fast Fourier transform (FFT) based methods provided the data length is not too short. It enables about 3 times improvement over FFT at a moderate spatial resolution.

  5. The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

    Science.gov (United States)

    Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

    2014-09-01

    In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

  6. A Gaussian IV estimator of cointegrating relations

    DEFF Research Database (Denmark)

    Bårdsen, Gunnar; Haldrup, Niels

    2006-01-01

    In static single equation cointegration regression modelsthe OLS estimator will have a non-standard distribution unless regressors arestrictly exogenous. In the literature a number of estimators have been suggestedto deal with this problem, especially by the use of semi-nonparametricestimators. T......In static single equation cointegration regression modelsthe OLS estimator will have a non-standard distribution unless regressors arestrictly exogenous. In the literature a number of estimators have been suggestedto deal with this problem, especially by the use of semi...... in cointegrating regressions. These instruments are almost idealand simulations show that the IV estimator using such instruments alleviatethe endogeneity problem extremely well in both finite and large samples....

  7. Estimation of Stature from Foot Dimensions and Stature among South Indian Medical Students Using Regression Models

    Directory of Open Access Journals (Sweden)

    Rajesh D. R

    2015-01-01

    Full Text Available Background: At times fragments of soft tissues are found disposed off in the open, in ditches at the crime scene and the same are brought to forensic experts for the purpose of identification and such type of cases pose a real challenge. Objectives: This study was aimed at developing a methodology which could help in personal identification by studying the relation between foot dimensions and stature among south subjects using regression models. Material and Methods: Stature and foot length of 100 subjects (age range 18-22 years were measured. Linear regression equations for stature estimation were calculated. Result: The correlation coefficients between stature and foot lengths were found to be positive and statistically significant. Height = 98.159 + 3.746 × FLRT (r = 0.821 and Height = 91.242 + 3.284 × FLRT (r = 0.837 are the regression formulas from foot lengths for males and females respectively. Conclusion: The regression equation derived in the study can be used reliably for estimation of stature in a diverse population group thus would be of immense value in the field of personal identification especially from mutilated bodies or fragmentary remains.

  8. Estimating Loess Plateau Average Annual Precipitation with Multiple Linear Regression Kriging and Geographically Weighted Regression Kriging

    Directory of Open Access Journals (Sweden)

    Qiutong Jin

    2016-06-01

    Full Text Available Estimating the spatial distribution of precipitation is an important and challenging task in hydrology, climatology, ecology, and environmental science. In order to generate a highly accurate distribution map of average annual precipitation for the Loess Plateau in China, multiple linear regression Kriging (MLRK and geographically weighted regression Kriging (GWRK methods were employed using precipitation data from the period 1980–2010 from 435 meteorological stations. The predictors in regression Kriging were selected by stepwise regression analysis from many auxiliary environmental factors, such as elevation (DEM, normalized difference vegetation index (NDVI, solar radiation, slope, and aspect. All predictor distribution maps had a 500 m spatial resolution. Validation precipitation data from 130 hydrometeorological stations were used to assess the prediction accuracies of the MLRK and GWRK approaches. Results showed that both prediction maps with a 500 m spatial resolution interpolated by MLRK and GWRK had a high accuracy and captured detailed spatial distribution data; however, MLRK produced a lower prediction error and a higher variance explanation than GWRK, although the differences were small, in contrast to conclusions from similar studies.

  9. Estimation of evapotranspiration across the conterminous United States using a regression with climate and land-cover data

    Science.gov (United States)

    Sanford, Ward E.; Selnick, David L.

    2013-01-01

    Evapotranspiration (ET) is an important quantity for water resource managers to know because it often represents the largest sink for precipitation (P) arriving at the land surface. In order to estimate actual ET across the conterminous United States (U.S.) in this study, a water-balance method was combined with a climate and land-cover regression equation. Precipitation and streamflow records were compiled for 838 watersheds for 1971-2000 across the U.S. to obtain long-term estimates of actual ET. A regression equation was developed that related the ratio ET/P to climate and land-cover variables within those watersheds. Precipitation and temperatures were used from the PRISM climate dataset, and land-cover data were used from the USGS National Land Cover Dataset. Results indicate that ET can be predicted relatively well at a watershed or county scale with readily available climate variables alone, and that land-cover data can also improve those predictions. Using the climate and land-cover data at an 800-m scale and then averaging to the county scale, maps were produced showing estimates of ET and ET/P for the entire conterminous U.S. Using the regression equation, such maps could also be made for more detailed state coverages, or for other areas of the world where climate and land-cover data are plentiful.

  10. Online and Batch Supervised Background Estimation via L1 Regression

    KAUST Repository

    Dutta, Aritra

    2017-11-23

    We propose a surprisingly simple model for supervised video background estimation. Our model is based on $\\\\ell_1$ regression. As existing methods for $\\\\ell_1$ regression do not scale to high-resolution videos, we propose several simple and scalable methods for solving the problem, including iteratively reweighted least squares, a homotopy method, and stochastic gradient descent. We show through extensive experiments that our model and methods match or outperform the state-of-the-art online and batch methods in virtually all quantitative and qualitative measures.

  11. Online and Batch Supervised Background Estimation via L1 Regression

    KAUST Repository

    Dutta, Aritra; Richtarik, Peter

    2017-01-01

    We propose a surprisingly simple model for supervised video background estimation. Our model is based on $\\ell_1$ regression. As existing methods for $\\ell_1$ regression do not scale to high-resolution videos, we propose several simple and scalable methods for solving the problem, including iteratively reweighted least squares, a homotopy method, and stochastic gradient descent. We show through extensive experiments that our model and methods match or outperform the state-of-the-art online and batch methods in virtually all quantitative and qualitative measures.

  12. On the robust nonparametric regression estimation for a functional regressor

    OpenAIRE

    Azzedine , Nadjia; Laksaci , Ali; Ould-Saïd , Elias

    2009-01-01

    On the robust nonparametric regression estimation for a functional regressor correspondance: Corresponding author. (Ould-Said, Elias) (Azzedine, Nadjia) (Laksaci, Ali) (Ould-Said, Elias) Departement de Mathematiques--> , Univ. Djillali Liabes--> , BP 89--> , 22000 Sidi Bel Abbes--> - ALGERIA (Azzedine, Nadjia) Departement de Mathema...

  13. Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

    Directory of Open Access Journals (Sweden)

    Yoonseok Shin

    2015-01-01

    Full Text Available Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.

  14. Accounting for estimated IQ in neuropsychological test performance with regression-based techniques.

    Science.gov (United States)

    Testa, S Marc; Winicki, Jessica M; Pearlson, Godfrey D; Gordon, Barry; Schretlen, David J

    2009-11-01

    Regression-based normative techniques account for variability in test performance associated with multiple predictor variables and generate expected scores based on algebraic equations. Using this approach, we show that estimated IQ, based on oral word reading, accounts for 1-9% of the variability beyond that explained by individual differences in age, sex, race, and years of education for most cognitive measures. These results confirm that adding estimated "premorbid" IQ to demographic predictors in multiple regression models can incrementally improve the accuracy with which regression-based norms (RBNs) benchmark expected neuropsychological test performance in healthy adults. It remains to be seen whether the incremental variance in test performance explained by estimated "premorbid" IQ translates to improved diagnostic accuracy in patient samples. We describe these methods, and illustrate the step-by-step application of RBNs with two cases. We also discuss the rationale, assumptions, and caveats of this approach. More broadly, we note that adjusting test scores for age and other characteristics might actually decrease the accuracy with which test performance predicts absolute criteria, such as the ability to drive or live independently.

  15. Estimation of Panel Data Regression Models with Two-Sided Censoring or Truncation

    DEFF Research Database (Denmark)

    Alan, Sule; Honore, Bo E.; Hu, Luojia

    2014-01-01

    This paper constructs estimators for panel data regression models with individual speci…fic heterogeneity and two–sided censoring and truncation. Following Powell (1986) the estimation strategy is based on moment conditions constructed from re–censored or re–truncated residuals. While these moment...

  16. Nonparametric Regression Estimation for Multivariate Null Recurrent Processes

    Directory of Open Access Journals (Sweden)

    Biqing Cai

    2015-04-01

    Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.

  17. Performance of a New Restricted Biased Estimator in Logistic Regression

    Directory of Open Access Journals (Sweden)

    Yasin ASAR

    2017-12-01

    Full Text Available It is known that the variance of the maximum likelihood estimator (MLE inflates when the explanatory variables are correlated. This situation is called the multicollinearity problem. As a result, the estimations of the model may not be trustful. Therefore, this paper introduces a new restricted estimator (RLTE that may be applied to get rid of the multicollinearity when the parameters lie in some linear subspace  in logistic regression. The mean squared errors (MSE and the matrix mean squared errors (MMSE of the estimators considered in this paper are given. A Monte Carlo experiment is designed to evaluate the performances of the proposed estimator, the restricted MLE (RMLE, MLE and Liu-type estimator (LTE. The criterion of performance is chosen to be MSE. Moreover, a real data example is presented. According to the results, proposed estimator has better performance than MLE, RMLE and LTE.

  18. On the Choice of Difference Sequence in a Unified Framework for Variance Estimation in Nonparametric Regression

    KAUST Repository

    Dai, Wenlin; Tong, Tiejun; Zhu, Lixing

    2017-01-01

    Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.

  19. On the Choice of Difference Sequence in a Unified Framework for Variance Estimation in Nonparametric Regression

    KAUST Repository

    Dai, Wenlin

    2017-09-01

    Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.

  20. A different approach to estimate nonlinear regression model using numerical methods

    Science.gov (United States)

    Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.

    2017-11-01

    This research paper concerns with the computational methods namely the Gauss-Newton method, Gradient algorithm methods (Newton-Raphson method, Steepest Descent or Steepest Ascent algorithm method, the Method of Scoring, the Method of Quadratic Hill-Climbing) based on numerical analysis to estimate parameters of nonlinear regression model in a very different way. Principles of matrix calculus have been used to discuss the Gradient-Algorithm methods. Yonathan Bard [1] discussed a comparison of gradient methods for the solution of nonlinear parameter estimation problems. However this article discusses an analytical approach to the gradient algorithm methods in a different way. This paper describes a new iterative technique namely Gauss-Newton method which differs from the iterative technique proposed by Gorden K. Smyth [2]. Hans Georg Bock et.al [10] proposed numerical methods for parameter estimation in DAE’s (Differential algebraic equation). Isabel Reis Dos Santos et al [11], Introduced weighted least squares procedure for estimating the unknown parameters of a nonlinear regression metamodel. For large-scale non smooth convex minimization the Hager and Zhang (HZ) conjugate gradient Method and the modified HZ (MHZ) method were presented by Gonglin Yuan et al [12].

  1. Regression tools for CO2 inversions: application of a shrinkage estimator to process attribution

    International Nuclear Information System (INIS)

    Shaby, Benjamin A.; Field, Christopher B.

    2006-01-01

    In this study we perform an atmospheric inversion based on a shrinkage estimator. This method is used to estimate surface fluxes of CO 2 , first partitioned according to constituent geographic regions, and then according to constituent processes that are responsible for the total flux. Our approach differs from previous approaches in two important ways. The first is that the technique of linear Bayesian inversion is recast as a regression problem. Seen as such, standard regression tools are employed to analyse and reduce errors in the resultant estimates. A shrinkage estimator, which combines standard ridge regression with the linear 'Bayesian inversion' model, is introduced. This method introduces additional bias into the model with the aim of reducing variance such that errors are decreased overall. Compared with standard linear Bayesian inversion, the ridge technique seems to reduce both flux estimation errors and prediction errors. The second divergence from previous studies is that instead of dividing the world into geographically distinct regions and estimating the CO 2 flux in each region, the flux space is divided conceptually into processes that contribute to the total global flux. Formulating the problem in this manner adds to the interpretability of the resultant estimates and attempts to shed light on the problem of attributing sources and sinks to their underlying mechanisms

  2. Estimating traffic volume on Wyoming low volume roads using linear and logistic regression methods

    Directory of Open Access Journals (Sweden)

    Dick Apronti

    2016-12-01

    Full Text Available Traffic volume is an important parameter in most transportation planning applications. Low volume roads make up about 69% of road miles in the United States. Estimating traffic on the low volume roads is a cost-effective alternative to taking traffic counts. This is because traditional traffic counts are expensive and impractical for low priority roads. The purpose of this paper is to present the development of two alternative means of cost-effectively estimating traffic volumes for low volume roads in Wyoming and to make recommendations for their implementation. The study methodology involves reviewing existing studies, identifying data sources, and carrying out the model development. The utility of the models developed were then verified by comparing actual traffic volumes to those predicted by the model. The study resulted in two regression models that are inexpensive and easy to implement. The first regression model was a linear regression model that utilized pavement type, access to highways, predominant land use types, and population to estimate traffic volume. In verifying the model, an R2 value of 0.64 and a root mean square error of 73.4% were obtained. The second model was a logistic regression model that identified the level of traffic on roads using five thresholds or levels. The logistic regression model was verified by estimating traffic volume thresholds and determining the percentage of roads that were accurately classified as belonging to the given thresholds. For the five thresholds, the percentage of roads classified correctly ranged from 79% to 88%. In conclusion, the verification of the models indicated both model types to be useful for accurate and cost-effective estimation of traffic volumes for low volume Wyoming roads. The models developed were recommended for use in traffic volume estimations for low volume roads in pavement management and environmental impact assessment studies.

  3. A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover

    Science.gov (United States)

    Huang, C.; Townshend, J.R.G.

    2003-01-01

    A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.

  4. A Quantile Regression Approach to Estimating the Distribution of Anesthetic Procedure Time during Induction.

    Directory of Open Access Journals (Sweden)

    Hsin-Lun Wu

    Full Text Available Although procedure time analyses are important for operating room management, it is not easy to extract useful information from clinical procedure time data. A novel approach was proposed to analyze procedure time during anesthetic induction. A two-step regression analysis was performed to explore influential factors of anesthetic induction time (AIT. Linear regression with stepwise model selection was used to select significant correlates of AIT and then quantile regression was employed to illustrate the dynamic relationships between AIT and selected variables at distinct quantiles. A total of 1,060 patients were analyzed. The first and second-year residents (R1-R2 required longer AIT than the third and fourth-year residents and attending anesthesiologists (p = 0.006. Factors prolonging AIT included American Society of Anesthesiologist physical status ≧ III, arterial, central venous and epidural catheterization, and use of bronchoscopy. Presence of surgeon before induction would decrease AIT (p < 0.001. Types of surgery also had significant influence on AIT. Quantile regression satisfactorily estimated extra time needed to complete induction for each influential factor at distinct quantiles. Our analysis on AIT demonstrated the benefit of quantile regression analysis to provide more comprehensive view of the relationships between procedure time and related factors. This novel two-step regression approach has potential applications to procedure time analysis in operating room management.

  5. Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

    Science.gov (United States)

    Lusiana, Evellin Dewi

    2017-12-01

    The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.

  6. Locoregional control of non-small cell lung cancer in relation to automated early assessment of tumor regression on cone beam computed tomography

    DEFF Research Database (Denmark)

    Brink, Carsten; Bernchou, Uffe; Bertelsen, Anders

    2014-01-01

    was estimated on the basis of the first one third and two thirds of the scans. The concordance between estimated and actual relative volume at the end of radiation therapy was quantified by Pearson's correlation coefficient. On the basis of the estimated relative volume, the patients were stratified into 2...... for other clinical characteristics. RESULTS: Automatic measurement of the tumor regression from standard CBCT images was feasible. Pearson's correlation coefficient between manual and automatic measurement was 0.86 in a sample of 9 patients. Most patients experienced tumor volume regression, and this could...

  7. Asymptotic normality of kernel estimator of $\\psi$-regression function for functional ergodic data

    OpenAIRE

    Laksaci ALI; Benziadi Fatima; Gheriballak Abdelkader

    2016-01-01

    In this paper we consider the problem of the estimation of the $\\psi$-regression function when the covariates take values in an infinite dimensional space. Our main aim is to establish, under a stationary ergodic process assumption, the asymptotic normality of this estimate.

  8. Estimation of Electrically-Evoked Knee Torque from Mechanomyography Using Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Morufu Olusola Ibitoye

    2016-07-01

    Full Text Available The difficulty of real-time muscle force or joint torque estimation during neuromuscular electrical stimulation (NMES in physical therapy and exercise science has motivated recent research interest in torque estimation from other muscle characteristics. This study investigated the accuracy of a computational intelligence technique for estimating NMES-evoked knee extension torque based on the Mechanomyographic signals (MMG of contracting muscles that were recorded from eight healthy males. Simulation of the knee torque was modelled via Support Vector Regression (SVR due to its good generalization ability in related fields. Inputs to the proposed model were MMG amplitude characteristics, the level of electrical stimulation or contraction intensity, and knee angle. Gaussian kernel function, as well as its optimal parameters were identified with the best performance measure and were applied as the SVR kernel function to build an effective knee torque estimation model. To train and test the model, the data were partitioned into training (70% and testing (30% subsets, respectively. The SVR estimation accuracy, based on the coefficient of determination (R2 between the actual and the estimated torque values was up to 94% and 89% during the training and testing cases, with root mean square errors (RMSE of 9.48 and 12.95, respectively. The knee torque estimations obtained using SVR modelling agreed well with the experimental data from an isokinetic dynamometer. These findings support the realization of a closed-loop NMES system for functional tasks using MMG as the feedback signal source and an SVR algorithm for joint torque estimation.

  9. Estimating integrated variance in the presence of microstructure noise using linear regression

    Science.gov (United States)

    Holý, Vladimír

    2017-07-01

    Using financial high-frequency data for estimation of integrated variance of asset prices is beneficial but with increasing number of observations so-called microstructure noise occurs. This noise can significantly bias the realized variance estimator. We propose a method for estimation of the integrated variance robust to microstructure noise as well as for testing the presence of the noise. Our method utilizes linear regression in which realized variances estimated from different data subsamples act as dependent variable while the number of observations act as explanatory variable. We compare proposed estimator with other methods on simulated data for several microstructure noise structures.

  10. Genomic breeding value estimation using nonparametric additive regression models

    Directory of Open Access Journals (Sweden)

    Solberg Trygve

    2009-01-01

    Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.

  11. Ordinal Regression Based Subpixel Shift Estimation for Video Super-Resolution

    Directory of Open Access Journals (Sweden)

    Petrovic Nemanja

    2007-01-01

    Full Text Available We present a supervised learning-based approach for subpixel motion estimation which is then used to perform video super-resolution. The novelty of this work is the formulation of the problem of subpixel motion estimation in a ranking framework. The ranking formulation is a variant of classification and regression formulation, in which the ordering present in class labels namely, the shift between patches is explicitly taken into account. Finally, we demonstrate the applicability of our approach on superresolving synthetically generated images with global subpixel shifts and enhancing real video frames by accounting for both local integer and subpixel shifts.

  12. Magnitude conversion to unified moment magnitude using orthogonal regression relation

    Science.gov (United States)

    Das, Ranjit; Wason, H. R.; Sharma, M. L.

    2012-05-01

    Homogenization of earthquake catalog being a pre-requisite for seismic hazard assessment requires region based magnitude conversion relationships. Linear Standard Regression (SR) relations fail when both the magnitudes have measurement errors. To accomplish homogenization, techniques like Orthogonal Standard Regression (OSR) are thus used. In this paper a technique is proposed for using such OSR for preparation of homogenized earthquake catalog in moment magnitude Mw. For derivation of orthogonal regression relation between mb and Mw, a data set consisting of 171 events with observed body wave magnitudes (mb,obs) and moment magnitude (Mw,obs) values has been taken from ISC and GCMT databases for Northeast India and adjoining region for the period 1978-2006. Firstly, an OSR relation given below has been developed using mb,obs and Mw,obs values corresponding to 150 events from this data set. M=1.3(±0.004)m-1.4(±0.130), where mb,proxy are body wave magnitude values of the points on the OSR line given by the orthogonality criterion, for observed (mb,obs, Mw,obs) points. A linear relation is then developed between these 150 mb,obs values and corresponding mb,proxy values given by the OSR line using orthogonality criterion. The relation obtained is m=0.878(±0.03)m+0.653(±0.15). The accuracy of the above procedure has been checked with the rest of the data i.e., 21 events values. The improvement in the correlation coefficient value between mb,obs and Mw estimated using the proposed procedure compared to the correlation coefficient value between mb,obs and Mw,obs shows the advantage of OSR relationship for homogenization. The OSR procedure developed in this study can be used to homogenize any catalog containing various magnitudes (e.g., ML, mb, MS) with measurement errors, by their conversion to unified moment magnitude Mw. The proposed procedure also remains valid in case the magnitudes have measurement errors of different orders, i.e. the error variance ratio is

  13. The importance of the chosen technique to estimate diffuse solar radiation by means of regression

    Energy Technology Data Exchange (ETDEWEB)

    Arslan, Talha; Altyn Yavuz, Arzu [Department of Statistics. Science and Literature Faculty. Eskisehir Osmangazi University (Turkey)], email: mtarslan@ogu.edu.tr, email: aaltin@ogu.edu.tr; Acikkalp, Emin [Department of Mechanical and Manufacturing Engineering. Engineering Faculty. Bilecik University (Turkey)], email: acikkalp@gmail.com

    2011-07-01

    The Ordinary Least Squares (OLS) method is one of the most frequently used for estimation of diffuse solar radiation. The data set must provide certain assumptions for the OLS method to work. The most important is that the regression equation offered by OLS error terms must fit within the normal distribution. Utilizing an alternative robust estimator to get parameter estimations is highly effective in solving problems where there is a lack of normal distribution due to the presence of outliers or some other factor. The purpose of this study is to investigate the value of the chosen technique for the estimation of diffuse radiation. This study described alternative robust methods frequently used in applications and compared them with the OLS method. Making a comparison of the data set analysis of the OLS and that of the M Regression (Huber, Andrews and Tukey) techniques, it was study found that robust regression techniques are preferable to OLS because of the smoother explanation values.

  14. Estimating HIES Data through Ratio and Regression Methods for Different Sampling Designs

    Directory of Open Access Journals (Sweden)

    Faqir Muhammad

    2007-01-01

    Full Text Available In this study, comparison has been made for different sampling designs, using the HIES data of North West Frontier Province (NWFP for 2001-02 and 1998-99 collected from the Federal Bureau of Statistics, Statistical Division, Government of Pakistan, Islamabad. The performance of the estimators has also been considered using bootstrap and Jacknife. A two-stage stratified random sample design is adopted by HIES. In the first stage, enumeration blocks and villages are treated as the first stage Primary Sampling Units (PSU. The sample PSU’s are selected with probability proportional to size. Secondary Sampling Units (SSU i.e., households are selected by systematic sampling with a random start. They have used a single study variable. We have compared the HIES technique with some other designs, which are: Stratified Simple Random Sampling. Stratified Systematic Sampling. Stratified Ranked Set Sampling. Stratified Two Phase Sampling. Ratio and Regression methods were applied with two study variables, which are: Income (y and Household sizes (x. Jacknife and Bootstrap are used for variance replication. Simple Random Sampling with sample size (462 to 561 gave moderate variances both by Jacknife and Bootstrap. By applying Systematic Sampling, we received moderate variance with sample size (467. In Jacknife with Systematic Sampling, we obtained variance of regression estimator greater than that of ratio estimator for a sample size (467 to 631. At a sample size (952 variance of ratio estimator gets greater than that of regression estimator. The most efficient design comes out to be Ranked set sampling compared with other designs. The Ranked set sampling with jackknife and bootstrap, gives minimum variance even with the smallest sample size (467. Two Phase sampling gave poor performance. Multi-stage sampling applied by HIES gave large variances especially if used with a single study variable.

  15. A robust background regression based score estimation algorithm for hyperspectral anomaly detection

    Science.gov (United States)

    Zhao, Rui; Du, Bo; Zhang, Liangpei; Zhang, Lefei

    2016-12-01

    Anomaly detection has become a hot topic in the hyperspectral image analysis and processing fields in recent years. The most important issue for hyperspectral anomaly detection is the background estimation and suppression. Unreasonable or non-robust background estimation usually leads to unsatisfactory anomaly detection results. Furthermore, the inherent nonlinearity of hyperspectral images may cover up the intrinsic data structure in the anomaly detection. In order to implement robust background estimation, as well as to explore the intrinsic data structure of the hyperspectral image, we propose a robust background regression based score estimation algorithm (RBRSE) for hyperspectral anomaly detection. The Robust Background Regression (RBR) is actually a label assignment procedure which segments the hyperspectral data into a robust background dataset and a potential anomaly dataset with an intersection boundary. In the RBR, a kernel expansion technique, which explores the nonlinear structure of the hyperspectral data in a reproducing kernel Hilbert space, is utilized to formulate the data as a density feature representation. A minimum squared loss relationship is constructed between the data density feature and the corresponding assigned labels of the hyperspectral data, to formulate the foundation of the regression. Furthermore, a manifold regularization term which explores the manifold smoothness of the hyperspectral data, and a maximization term of the robust background average density, which suppresses the bias caused by the potential anomalies, are jointly appended in the RBR procedure. After this, a paired-dataset based k-nn score estimation method is undertaken on the robust background and potential anomaly datasets, to implement the detection output. The experimental results show that RBRSE achieves superior ROC curves, AUC values, and background-anomaly separation than some of the other state-of-the-art anomaly detection methods, and is easy to implement

  16. Soil moisture estimation using multi linear regression with terraSAR-X data

    Directory of Open Access Journals (Sweden)

    G. García

    2016-06-01

    Full Text Available The first five centimeters of soil form an interface where the main heat fluxes exchanges between the land surface and the atmosphere occur. Besides ground measurements, remote sensing has proven to be an excellent tool for the monitoring of spatial and temporal distributed data of the most relevant Earth surface parameters including soil’s parameters. Indeed, active microwave sensors (Synthetic Aperture Radar - SAR offer the opportunity to monitor soil moisture (HS at global, regional and local scales by monitoring involved processes. Several inversion algorithms, that derive geophysical information as HS from SAR data, were developed. Many of them use electromagnetic models for simulating the backscattering coefficient and are based on statistical techniques, such as neural networks, inversion methods and regression models. Recent studies have shown that simple multiple regression techniques yield satisfactory results. The involved geophysical variables in these methodologies are descriptive of the soil structure, microwave characteristics and land use. Therefore, in this paper we aim at developing a multiple linear regression model to estimate HS on flat agricultural regions using TerraSAR-X satellite data and data from a ground weather station. The results show that the backscatter, the precipitation and the relative humidity are the explanatory variables of HS. The results obtained presented a RMSE of 5.4 and a R2  of about 0.6

  17. Multiple linear regression to estimate time-frequency electrophysiological responses in single trials.

    Science.gov (United States)

    Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D

    2015-05-01

    Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical

  18. An evaluation of regression methods to estimate nutritional condition of canvasbacks and other water birds

    Science.gov (United States)

    Sparling, D.W.; Barzen, J.A.; Lovvorn, J.R.; Serie, J.R.

    1992-01-01

    Regression equations that use mensural data to estimate body condition have been developed for several water birds. These equations often have been based on data that represent different sexes, age classes, or seasons, without being adequately tested for intergroup differences. We used proximate carcass analysis of 538 adult and juvenile canvasbacks (Aythya valisineria ) collected during fall migration, winter, and spring migrations in 1975-76 and 1982-85 to test regression methods for estimating body condition.

  19. Parameter estimation and statistical test of geographically weighted bivariate Poisson inverse Gaussian regression models

    Science.gov (United States)

    Amalia, Junita; Purhadi, Otok, Bambang Widjanarko

    2017-11-01

    Poisson distribution is a discrete distribution with count data as the random variables and it has one parameter defines both mean and variance. Poisson regression assumes mean and variance should be same (equidispersion). Nonetheless, some case of the count data unsatisfied this assumption because variance exceeds mean (over-dispersion). The ignorance of over-dispersion causes underestimates in standard error. Furthermore, it causes incorrect decision in the statistical test. Previously, paired count data has a correlation and it has bivariate Poisson distribution. If there is over-dispersion, modeling paired count data is not sufficient with simple bivariate Poisson regression. Bivariate Poisson Inverse Gaussian Regression (BPIGR) model is mix Poisson regression for modeling paired count data within over-dispersion. BPIGR model produces a global model for all locations. In another hand, each location has different geographic conditions, social, cultural and economic so that Geographically Weighted Regression (GWR) is needed. The weighting function of each location in GWR generates a different local model. Geographically Weighted Bivariate Poisson Inverse Gaussian Regression (GWBPIGR) model is used to solve over-dispersion and to generate local models. Parameter estimation of GWBPIGR model obtained by Maximum Likelihood Estimation (MLE) method. Meanwhile, hypothesis testing of GWBPIGR model acquired by Maximum Likelihood Ratio Test (MLRT) method.

  20. Hierarchical Matching and Regression with Application to Photometric Redshift Estimation

    Science.gov (United States)

    Murtagh, Fionn

    2017-06-01

    This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or `photo-z' problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.

  1. A subagging regression method for estimating the qualitative and quantitative state of groundwater

    Science.gov (United States)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young

    2017-08-01

    A subsample aggregating (subagging) regression (SBR) method for the analysis of groundwater data pertaining to trend-estimation-associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of other methods, and the uncertainties are reasonably estimated; the others have no uncertainty analysis option. To validate further, actual groundwater data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by both SBR and GPR regardless of Gaussian or non-Gaussian skewed data. However, it is expected that GPR has a limitation in applications to severely corrupted data by outliers owing to its non-robustness. From the implementations, it is determined that the SBR method has the potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data such as the groundwater level and contaminant concentration.

  2. Monopole and dipole estimation for multi-frequency sky maps by linear regression

    Science.gov (United States)

    Wehus, I. K.; Fuskeland, U.; Eriksen, H. K.; Banday, A. J.; Dickinson, C.; Ghosh, T.; Górski, K. M.; Lawrence, C. R.; Leahy, J. P.; Maino, D.; Reich, P.; Reich, W.

    2017-01-01

    We describe a simple but efficient method for deriving a consistent set of monopole and dipole corrections for multi-frequency sky map data sets, allowing robust parametric component separation with the same data set. The computational core of this method is linear regression between pairs of frequency maps, often called T-T plots. Individual contributions from monopole and dipole terms are determined by performing the regression locally in patches on the sky, while the degeneracy between different frequencies is lifted whenever the dominant foreground component exhibits a significant spatial spectral index variation. Based on this method, we present two different, but each internally consistent, sets of monopole and dipole coefficients for the nine-year WMAP, Planck 2013, SFD 100 μm, Haslam 408 MHz and Reich & Reich 1420 MHz maps. The two sets have been derived with different analysis assumptions and data selection, and provide an estimate of residual systematic uncertainties. In general, our values are in good agreement with previously published results. Among the most notable results are a relative dipole between the WMAP and Planck experiments of 10-15μK (depending on frequency), an estimate of the 408 MHz map monopole of 8.9 ± 1.3 K, and a non-zero dipole in the 1420 MHz map of 0.15 ± 0.03 K pointing towards Galactic coordinates (l,b) = (308°,-36°) ± 14°. These values represent the sum of any instrumental and data processing offsets, as well as any Galactic or extra-Galactic component that is spectrally uniform over the full sky.

  3. Estimation of Covariance Matrix on Bi-Response Longitudinal Data Analysis with Penalized Spline Regression

    Science.gov (United States)

    Islamiyati, A.; Fatmawati; Chamidah, N.

    2018-03-01

    The correlation assumption of the longitudinal data with bi-response occurs on the measurement between the subjects of observation and the response. It causes the auto-correlation of error, and this can be overcome by using a covariance matrix. In this article, we estimate the covariance matrix based on the penalized spline regression model. Penalized spline involves knot points and smoothing parameters simultaneously in controlling the smoothness of the curve. Based on our simulation study, the estimated regression model of the weighted penalized spline with covariance matrix gives a smaller error value compared to the error of the model without covariance matrix.

  4. Support Vector Regression-Based Adaptive Divided Difference Filter for Nonlinear State Estimation Problems

    Directory of Open Access Journals (Sweden)

    Hongjian Wang

    2014-01-01

    Full Text Available We present a support vector regression-based adaptive divided difference filter (SVRADDF algorithm for improving the low state estimation accuracy of nonlinear systems, which are typically affected by large initial estimation errors and imprecise prior knowledge of process and measurement noises. The derivative-free SVRADDF algorithm is significantly simpler to compute than other methods and is implemented using only functional evaluations. The SVRADDF algorithm involves the use of the theoretical and actual covariance of the innovation sequence. Support vector regression (SVR is employed to generate the adaptive factor to tune the noise covariance at each sampling instant when the measurement update step executes, which improves the algorithm’s robustness. The performance of the proposed algorithm is evaluated by estimating states for (i an underwater nonmaneuvering target bearing-only tracking system and (ii maneuvering target bearing-only tracking in an air-traffic control system. The simulation results show that the proposed SVRADDF algorithm exhibits better performance when compared with a traditional DDF algorithm.

  5. Semi-parametric estimation of random effects in a logistic regression model using conditional inference

    DEFF Research Database (Denmark)

    Petersen, Jørgen Holm

    2016-01-01

    This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied...

  6. Adding a Parameter Increases the Variance of an Estimated Regression Function

    Science.gov (United States)

    Withers, Christopher S.; Nadarajah, Saralees

    2011-01-01

    The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…

  7. On the degrees of freedom of reduced-rank estimators in multivariate regression.

    Science.gov (United States)

    Mukherjee, A; Chen, K; Wang, N; Zhu, J

    We study the effective degrees of freedom of a general class of reduced-rank estimators for multivariate regression in the framework of Stein's unbiased risk estimation. A finite-sample exact unbiased estimator is derived that admits a closed-form expression in terms of the thresholded singular values of the least-squares solution and hence is readily computable. The results continue to hold in the high-dimensional setting where both the predictor and the response dimensions may be larger than the sample size. The derived analytical form facilitates the investigation of theoretical properties and provides new insights into the empirical behaviour of the degrees of freedom. In particular, we examine the differences and connections between the proposed estimator and a commonly-used naive estimator. The use of the proposed estimator leads to efficient and accurate prediction risk estimation and model selection, as demonstrated by simulation studies and a data example.

  8. Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery

    International Nuclear Information System (INIS)

    Hu, Chao; Jain, Gaurav; Zhang, Puqiang; Schmidt, Craig; Gomadam, Parthasarathy; Gorka, Tom

    2014-01-01

    Highlights: • We develop a data-driven method for the battery capacity estimation. • Five charge-related features that are indicative of the capacity are defined. • The kNN regression model captures the dependency of the capacity on the features. • Results with 10 years’ continuous cycling data verify the effectiveness of the method. - Abstract: Reliability of lithium-ion (Li-ion) rechargeable batteries used in implantable medical devices has been recognized as of high importance from a broad range of stakeholders, including medical device manufacturers, regulatory agencies, physicians, and patients. To ensure Li-ion batteries in these devices operate reliably, it is important to be able to assess the battery health condition by estimating the battery capacity over the life-time. This paper presents a data-driven method for estimating the capacity of Li-ion battery based on the charge voltage and current curves. The contributions of this paper are three-fold: (i) the definition of five characteristic features of the charge curves that are indicative of the capacity, (ii) the development of a non-linear kernel regression model, based on the k-nearest neighbor (kNN) regression, that captures the complex dependency of the capacity on the five features, and (iii) the adaptation of particle swarm optimization (PSO) to finding the optimal combination of feature weights for creating a kNN regression model that minimizes the cross validation (CV) error in the capacity estimation. Verification with 10 years’ continuous cycling data suggests that the proposed method is able to accurately estimate the capacity of Li-ion battery throughout the whole life-time

  9. Applied Prevalence Ratio estimation with different Regression models: An example from a cross-national study on substance use research.

    Science.gov (United States)

    Espelt, Albert; Marí-Dell'Olmo, Marc; Penelo, Eva; Bosque-Prous, Marina

    2016-06-14

    To examine the differences between Prevalence Ratio (PR) and Odds Ratio (OR) in a cross-sectional study and to provide tools to calculate PR using two statistical packages widely used in substance use research (STATA and R). We used cross-sectional data from 41,263 participants of 16 European countries participating in the Survey on Health, Ageing and Retirement in Europe (SHARE). The dependent variable, hazardous drinking, was calculated using the Alcohol Use Disorders Identification Test - Consumption (AUDIT-C). The main independent variable was gender. Other variables used were: age, educational level and country of residence. PR of hazardous drinking in men with relation to women was estimated using Mantel-Haenszel method, log-binomial regression models and poisson regression models with robust variance. These estimations were compared to the OR calculated using logistic regression models. Prevalence of hazardous drinkers varied among countries. Generally, men have higher prevalence of hazardous drinking than women [PR=1.43 (1.38-1.47)]. Estimated PR was identical independently of the method and the statistical package used. However, OR overestimated PR, depending on the prevalence of hazardous drinking in the country. In cross-sectional studies, where comparisons between countries with differences in the prevalence of the disease or condition are made, it is advisable to use PR instead of OR.

  10. Comparison of regression coefficient and GIS-based methodologies for regional estimates of forest soil carbon stocks

    International Nuclear Information System (INIS)

    Elliott Campbell, J.; Moen, Jeremie C.; Ney, Richard A.; Schnoor, Jerald L.

    2008-01-01

    Estimates of forest soil organic carbon (SOC) have applications in carbon science, soil quality studies, carbon sequestration technologies, and carbon trading. Forest SOC has been modeled using a regression coefficient methodology that applies mean SOC densities (mass/area) to broad forest regions. A higher resolution model is based on an approach that employs a geographic information system (GIS) with soil databases and satellite-derived landcover images. Despite this advancement, the regression approach remains the basis of current state and federal level greenhouse gas inventories. Both approaches are analyzed in detail for Wisconsin forest soils from 1983 to 2001, applying rigorous error-fixing algorithms to soil databases. Resulting SOC stock estimates are 20% larger when determined using the GIS method rather than the regression approach. Average annual rates of increase in SOC stocks are 3.6 and 1.0 million metric tons of carbon per year for the GIS and regression approaches respectively. - Large differences in estimates of soil organic carbon stocks and annual changes in stocks for Wisconsin forestlands indicate a need for validation from forthcoming forest surveys

  11. Development of flood regressions and climate change scenarios to explore estimates of future peak flows

    Science.gov (United States)

    Burns, Douglas A.; Smith, Martyn J.; Freehafer, Douglas A.

    2015-12-31

    A new Web-based application, titled “Application of Flood Regressions and Climate Change Scenarios To Explore Estimates of Future Peak Flows”, has been developed by the U.S. Geological Survey, in cooperation with the New York State Department of Transportation, that allows a user to apply a set of regression equations to estimate the magnitude of future floods for any stream or river in New York State (exclusive of Long Island) and the Lake Champlain Basin in Vermont. The regression equations that are the basis of the current application were developed in previous investigations by the U.S. Geological Survey (USGS) and are described at the USGS StreamStats Web sites for New York (http://water.usgs.gov/osw/streamstats/new_york.html) and Vermont (http://water.usgs.gov/osw/streamstats/Vermont.html). These regression equations include several fixed landscape metrics that quantify aspects of watershed geomorphology, basin size, and land cover as well as a climate variable—either annual precipitation or annual runoff.

  12. Estimating water equivalent snow depth from related meteorological variables

    International Nuclear Information System (INIS)

    Steyaert, L.T.; LeDuc, S.K.; Strommen, N.D.; Nicodemus, M.L.; Guttman, N.B.

    1980-05-01

    Engineering design must take into consideration natural loads and stresses caused by meteorological elements, such as, wind, snow, precipitation and temperature. The purpose of this study was to determine a relationship of water equivalent snow depth measurements to meteorological variables. Several predictor models were evaluated for use in estimating water equivalent values. These models include linear regression, principal component regression, and non-linear regression models. Linear, non-linear and Scandanavian models are used to generate annual water equivalent estimates for approximately 1100 cooperative data stations where predictor variables are available, but which have no water equivalent measurements. These estimates are used to develop probability estimates of snow load for each station. Map analyses for 3 probability levels are presented

  13. Comparison of Classical and Robust Estimates of Threshold Auto-regression Parameters

    Directory of Open Access Journals (Sweden)

    V. B. Goryainov

    2017-01-01

    Full Text Available The study object is the first-order threshold auto-regression model with a single zero-located threshold. The model describes a stochastic temporal series with discrete time by means of a piecewise linear equation consisting of two linear classical first-order autoregressive equations. One of these equations is used to calculate a running value of the temporal series. A control variable that determines the choice between these two equations is the sign of the previous value of the same series.The first-order threshold autoregressive model with a single threshold depends on two real parameters that coincide with the coefficients of the piecewise linear threshold equation. These parameters are assumed to be unknown. The paper studies an estimate of the least squares, an estimate the least modules, and the M-estimates of these parameters. The aim of the paper is a comparative study of the accuracy of these estimates for the main probabilistic distributions of the updating process of the threshold autoregressive equation. These probability distributions were normal, contaminated normal, logistic, double-exponential distributions, a Student's distribution with different number of degrees of freedom, and a Cauchy distribution.As a measure of the accuracy of each estimate, was chosen its variance to measure the scattering of the estimate around the estimated parameter. An estimate with smaller variance made from the two estimates was considered to be the best. The variance was estimated by computer simulation. To estimate the smallest modules an iterative weighted least-squares method was used and the M-estimates were done by the method of a deformable polyhedron (the Nelder-Mead method. To calculate the least squares estimate, an explicit analytic expression was used.It turned out that the estimation of least squares is best only with the normal distribution of the updating process. For the logistic distribution and the Student's distribution with the

  14. Estimation of Fine Particulate Matter in Taipei Using Landuse Regression and Bayesian Maximum Entropy Methods

    Directory of Open Access Journals (Sweden)

    Yi-Ming Kuo

    2011-06-01

    Full Text Available Fine airborne particulate matter (PM2.5 has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS, the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME method. The resulting epistemic framework can assimilate knowledge bases including: (a empirical-based spatial trends of PM concentration based on landuse regression, (b the spatio-temporal dependence among PM observation information, and (c site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan from 2005–2007.

  15. Estimation of fine particulate matter in Taipei using landuse regression and bayesian maximum entropy methods.

    Science.gov (United States)

    Yu, Hwa-Lung; Wang, Chih-Hsih; Liu, Ming-Che; Kuo, Yi-Ming

    2011-06-01

    Fine airborne particulate matter (PM2.5) has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS), the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME) method. The resulting epistemic framework can assimilate knowledge bases including: (a) empirical-based spatial trends of PM concentration based on landuse regression, (b) the spatio-temporal dependence among PM observation information, and (c) site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan) from 2005-2007.

  16. Estimating Gestational Age With Sonography: Regression-Derived Formula Versus the Fetal Biometric Average.

    Science.gov (United States)

    Cawyer, Chase R; Anderson, Sarah B; Szychowski, Jeff M; Neely, Cherry; Owen, John

    2018-03-01

    To compare the accuracy of a new regression-derived formula developed from the National Fetal Growth Studies data to the common alternative method that uses the average of the gestational ages (GAs) calculated for each fetal biometric measurement (biparietal diameter, head circumference, abdominal circumference, and femur length). This retrospective cross-sectional study identified nonanomalous singleton pregnancies that had a crown-rump length plus at least 1 additional sonographic examination with complete fetal biometric measurements. With the use of the crown-rump length to establish the referent estimated date of delivery, each method's (National Institute of Child Health and Human Development regression versus Hadlock average [Radiology 1984; 152:497-501]), error at every examination was computed. Error, defined as the difference between the crown-rump length-derived GA and each method's predicted GA (weeks), was compared in 3 GA intervals: 1 (14 weeks-20 weeks 6 days), 2 (21 weeks-28 weeks 6 days), and 3 (≥29 weeks). In addition, the proportion of each method's examinations that had errors outside prespecified (±) day ranges was computed by using odds ratios. A total of 16,904 sonograms were identified. The overall and prespecified GA range subset mean errors were significantly smaller for the regression compared to the average (P < .01), and the regression had significantly lower odds of observing examinations outside the specified range of error in GA intervals 2 (odds ratio, 1.15; 95% confidence interval, 1.01-1.31) and 3 (odds ratio, 1.24; 95% confidence interval, 1.17-1.32) than the average method. In a contemporary unselected population of women dated by a crown-rump length-derived GA, the National Institute of Child Health and Human Development regression formula produced fewer estimates outside a prespecified margin of error than the commonly used Hadlock average; the differences were most pronounced for GA estimates at 29 weeks and later.

  17. Estimated Perennial Streams of Idaho and Related Geospatial Datasets

    Science.gov (United States)

    Rea, Alan; Skinner, Kenneth D.

    2009-01-01

    The perennial or intermittent status of a stream has bearing on many regulatory requirements. Because of changing technologies over time, cartographic representation of perennial/intermittent status of streams on U.S. Geological Survey (USGS) topographic maps is not always accurate and (or) consistent from one map sheet to another. Idaho Administrative Code defines an intermittent stream as one having a 7-day, 2-year low flow (7Q2) less than 0.1 cubic feet per second. To establish consistency with the Idaho Administrative Code, the USGS developed regional regression equations for Idaho streams for several low-flow statistics, including 7Q2. Using these regression equations, the 7Q2 streamflow may be estimated for naturally flowing streams anywhere in Idaho to help determine perennial/intermittent status of streams. Using these equations in conjunction with a Geographic Information System (GIS) technique known as weighted flow accumulation allows for an automated and continuous estimation of 7Q2 streamflow at all points along a stream, which in turn can be used to determine if a stream is intermittent or perennial according to the Idaho Administrative Code operational definition. The selected regression equations were applied to create continuous grids of 7Q2 estimates for the eight low-flow regression regions of Idaho. By applying the 0.1 ft3/s criterion, the perennial streams have been estimated in each low-flow region. Uncertainty in the estimates is shown by identifying a 'transitional' zone, corresponding to flow estimates of 0.1 ft3/s plus and minus one standard error. Considerable additional uncertainty exists in the model of perennial streams presented in this report. The regression models provide overall estimates based on general trends within each regression region. These models do not include local factors such as a large spring or a losing reach that may greatly affect flows at any given point. Site-specific flow data, assuming a sufficient period of

  18. Oil and gas pipeline construction cost analysis and developing regression models for cost estimation

    Science.gov (United States)

    Thaduri, Ravi Kiran

    In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.

  19. Replicating Experimental Impact Estimates Using a Regression Discontinuity Approach. NCEE 2012-4025

    Science.gov (United States)

    Gleason, Philip M.; Resch, Alexandra M.; Berk, Jillian A.

    2012-01-01

    This NCEE Technical Methods Paper compares the estimated impacts of an educational intervention using experimental and regression discontinuity (RD) study designs. The analysis used data from two large-scale randomized controlled trials--the Education Technology Evaluation and the Teach for America Study--to provide evidence on the performance of…

  20. Graphical evaluation of the ridge-type robust regression estimators in mixture experiments.

    Science.gov (United States)

    Erkoc, Ali; Emiroglu, Esra; Akay, Kadri Ulas

    2014-01-01

    In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.

  1. A novel Gaussian process regression model for state-of-health estimation of lithium-ion battery using charging curve

    Science.gov (United States)

    Yang, Duo; Zhang, Xu; Pan, Rui; Wang, Yujie; Chen, Zonghai

    2018-04-01

    The state-of-health (SOH) estimation is always a crucial issue for lithium-ion batteries. In order to provide an accurate and reliable SOH estimation, a novel Gaussian process regression (GPR) model based on charging curve is proposed in this paper. Different from other researches where SOH is commonly estimated by cycle life, in this work four specific parameters extracted from charging curves are used as inputs of the GPR model instead of cycle numbers. These parameters can reflect the battery aging phenomenon from different angles. The grey relational analysis method is applied to analyze the relational grade between selected features and SOH. On the other hand, some adjustments are made in the proposed GPR model. Covariance function design and the similarity measurement of input variables are modified so as to improve the SOH estimate accuracy and adapt to the case of multidimensional input. Several aging data from NASA data repository are used for demonstrating the estimation effect by the proposed method. Results show that the proposed method has high SOH estimation accuracy. Besides, a battery with dynamic discharging profile is used to verify the robustness and reliability of this method.

  2. Direct and simultaneous estimation of cardiac four chamber volumes by multioutput sparse regression.

    Science.gov (United States)

    Zhen, Xiantong; Zhang, Heye; Islam, Ali; Bhaduri, Mousumi; Chan, Ian; Li, Shuo

    2017-02-01

    Cardiac four-chamber volume estimation serves as a fundamental and crucial role in clinical quantitative analysis of whole heart functions. It is a challenging task due to the huge complexity of the four chambers including great appearance variations, huge shape deformation and interference between chambers. Direct estimation has recently emerged as an effective and convenient tool for cardiac ventricular volume estimation. However, existing direct estimation methods were specifically developed for one single ventricle, i.e., left ventricle (LV), or bi-ventricles; they can not be directly used for four chamber volume estimation due to the great combinatorial variability and highly complex anatomical interdependency of the four chambers. In this paper, we propose a new, general framework for direct and simultaneous four chamber volume estimation. We have addressed two key issues, i.e., cardiac image representation and simultaneous four chamber volume estimation, which enables accurate and efficient four-chamber volume estimation. We generate compact and discriminative image representations by supervised descriptor learning (SDL) which can remove irrelevant information and extract discriminative features. We propose direct and simultaneous four-chamber volume estimation by the multioutput sparse latent regression (MSLR), which enables jointly modeling nonlinear input-output relationships and capturing four-chamber interdependence. The proposed method is highly generalized, independent of imaging modalities, which provides a general regression framework that can be extensively used for clinical data prediction to achieve automated diagnosis. Experiments on both MR and CT images show that our method achieves high performance with a correlation coefficient of up to 0.921 with ground truth obtained manually by human experts, which is clinically significant and enables more accurate, convenient and comprehensive assessment of cardiac functions. Copyright © 2016 Elsevier

  3. Regression to fuzziness method for estimation of remaining useful life in power plant components

    Science.gov (United States)

    Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.

    2014-10-01

    Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.

  4. Continuous water-quality monitoring and regression analysis to estimate constituent concentrations and loads in the Red River of the North at Fargo and Grand Forks, North Dakota, 2003-12

    Science.gov (United States)

    Galloway, Joel M.

    2014-01-01

    The Red River of the North (hereafter referred to as “Red River”) Basin is an important hydrologic region where water is a valuable resource for the region’s economy. Continuous water-quality monitors have been operated by the U.S. Geological Survey, in cooperation with the North Dakota Department of Health, Minnesota Pollution Control Agency, City of Fargo, City of Moorhead, City of Grand Forks, and City of East Grand Forks at the Red River at Fargo, North Dakota, from 2003 through 2012 and at Grand Forks, N.Dak., from 2007 through 2012. The purpose of the monitoring was to provide a better understanding of the water-quality dynamics of the Red River and provide a way to track changes in water quality. Regression equations were developed that can be used to estimate concentrations and loads for dissolved solids, sulfate, chloride, nitrate plus nitrite, total phosphorus, and suspended sediment using explanatory variables such as streamflow, specific conductance, and turbidity. Specific conductance was determined to be a significant explanatory variable for estimating dissolved solids concentrations at the Red River at Fargo and Grand Forks. The regression equations provided good relations between dissolved solid concentrations and specific conductance for the Red River at Fargo and at Grand Forks, with adjusted coefficients of determination of 0.99 and 0.98, respectively. Specific conductance, log-transformed streamflow, and a seasonal component were statistically significant explanatory variables for estimating sulfate in the Red River at Fargo and Grand Forks. Regression equations provided good relations between sulfate concentrations and the explanatory variables, with adjusted coefficients of determination of 0.94 and 0.89, respectively. For the Red River at Fargo and Grand Forks, specific conductance, streamflow, and a seasonal component were statistically significant explanatory variables for estimating chloride. For the Red River at Grand Forks, a time

  5. Estimation of Geographically Weighted Regression Case Study on Wet Land Paddy Productivities in Tulungagung Regency

    Directory of Open Access Journals (Sweden)

    Danang Ariyanto

    2017-11-01

    Full Text Available Regression is a method connected independent variable and dependent variable with estimation parameter as an output. Principal problem in this method is its application in spatial data. Geographically Weighted Regression (GWR method used to solve the problem. GWR  is a regression technique that extends the traditional regression framework by allowing the estimation of local rather than global parameters. In other words, GWR runs a regression for each location, instead of a sole regression for the entire study area. The purpose of this research is to analyze the factors influencing wet land paddy productivities in Tulungagung Regency. The methods used in this research is  GWR using cross validation  bandwidth and weighted by adaptive Gaussian kernel fungtion.This research using  4 variables which are presumed affecting the wet land paddy productivities such as:  the rate of rainfall(X1, the average cost of fertilizer per hectare(X2, the average cost of pestisides per hectare(X3 and Allocation of subsidized NPK fertilizer of food crops sub-sector(X4. Based on the result, X1, X2, X3 and X4  has a different effect on each Distric. So, to improve the productivity of wet land paddy in Tulungagung Regency required a special policy based on the GWR model in each distric.

  6. Estimation of adjusted rate differences using additive negative binomial regression.

    Science.gov (United States)

    Donoghoe, Mark W; Marschner, Ian C

    2016-08-15

    Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation-conditional maximisation-either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  7. Estimation of a Reactor Core Power Peaking Factor Using Support Vector Regression and Uncertainty Analysis

    International Nuclear Information System (INIS)

    Bae, In Ho; Naa, Man Gyun; Lee, Yoon Joon; Park, Goon Cherl

    2009-01-01

    The monitoring of detailed 3-dimensional (3D) reactor core power distribution is a prerequisite in the operation of nuclear power reactors to ensure that various safety limits imposed on the LPD and DNBR, are not violated during nuclear power reactor operation. The LPD and DNBR should be calculated in order to perform the two major functions of the core protection calculator system (CPCS) and the core operation limit supervisory system (COLSS). The LPD at the hottest part of a hot fuel rod, which is related to the power peaking factor (PPF, F q ), is more important than the LPD at any other position in a reactor core. The LPD needs to be estimated accurately to prevent nuclear fuel rods from melting. In this study, support vector regression (SVR) and uncertainty analysis have been applied to estimation of reactor core power peaking factor

  8. The number of subjects per variable required in linear regression analyses.

    Science.gov (United States)

    Austin, Peter C; Steyerberg, Ewout W

    2015-06-01

    To determine the number of independent variables that can be included in a linear regression model. We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R(2) of the fitted model. A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R(2), although adjusted R(2) estimates behaved well. The bias in estimating the model R(2) statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  9. A Simulation Investigation of Principal Component Regression.

    Science.gov (United States)

    Allen, David E.

    Regression analysis is one of the more common analytic tools used by researchers. However, multicollinearity between the predictor variables can cause problems in using the results of regression analyses. Problems associated with multicollinearity include entanglement of relative influences of variables due to reduced precision of estimation,…

  10. Estimating the Counterfactual Impact of Conservation Programs on Land Cover Outcomes: The Role of Matching and Panel Regression Techniques.

    Science.gov (United States)

    Jones, Kelly W; Lewis, David J

    2015-01-01

    Deforestation and conversion of native habitats continues to be the leading driver of biodiversity and ecosystem service loss. A number of conservation policies and programs are implemented--from protected areas to payments for ecosystem services (PES)--to deter these losses. Currently, empirical evidence on whether these approaches stop or slow land cover change is lacking, but there is increasing interest in conducting rigorous, counterfactual impact evaluations, especially for many new conservation approaches, such as PES and REDD, which emphasize additionality. In addition, several new, globally available and free high-resolution remote sensing datasets have increased the ease of carrying out an impact evaluation on land cover change outcomes. While the number of conservation evaluations utilizing 'matching' to construct a valid control group is increasing, the majority of these studies use simple differences in means or linear cross-sectional regression to estimate the impact of the conservation program using this matched sample, with relatively few utilizing fixed effects panel methods--an alternative estimation method that relies on temporal variation in the data. In this paper we compare the advantages and limitations of (1) matching to construct the control group combined with differences in means and cross-sectional regression, which control for observable forms of bias in program evaluation, to (2) fixed effects panel methods, which control for observable and time-invariant unobservable forms of bias, with and without matching to create the control group. We then use these four approaches to estimate forest cover outcomes for two conservation programs: a PES program in Northeastern Ecuador and strict protected areas in European Russia. In the Russia case we find statistically significant differences across estimators--due to the presence of unobservable bias--that lead to differences in conclusions about effectiveness. The Ecuador case illustrates that

  11. Estimating the Counterfactual Impact of Conservation Programs on Land Cover Outcomes: The Role of Matching and Panel Regression Techniques.

    Directory of Open Access Journals (Sweden)

    Kelly W Jones

    Full Text Available Deforestation and conversion of native habitats continues to be the leading driver of biodiversity and ecosystem service loss. A number of conservation policies and programs are implemented--from protected areas to payments for ecosystem services (PES--to deter these losses. Currently, empirical evidence on whether these approaches stop or slow land cover change is lacking, but there is increasing interest in conducting rigorous, counterfactual impact evaluations, especially for many new conservation approaches, such as PES and REDD, which emphasize additionality. In addition, several new, globally available and free high-resolution remote sensing datasets have increased the ease of carrying out an impact evaluation on land cover change outcomes. While the number of conservation evaluations utilizing 'matching' to construct a valid control group is increasing, the majority of these studies use simple differences in means or linear cross-sectional regression to estimate the impact of the conservation program using this matched sample, with relatively few utilizing fixed effects panel methods--an alternative estimation method that relies on temporal variation in the data. In this paper we compare the advantages and limitations of (1 matching to construct the control group combined with differences in means and cross-sectional regression, which control for observable forms of bias in program evaluation, to (2 fixed effects panel methods, which control for observable and time-invariant unobservable forms of bias, with and without matching to create the control group. We then use these four approaches to estimate forest cover outcomes for two conservation programs: a PES program in Northeastern Ecuador and strict protected areas in European Russia. In the Russia case we find statistically significant differences across estimators--due to the presence of unobservable bias--that lead to differences in conclusions about effectiveness. The Ecuador case

  12. Large biases in regression-based constituent flux estimates: causes and diagnostic tools

    Science.gov (United States)

    Hirsch, Robert M.

    2014-01-01

    It has been documented in the literature that, in some cases, widely used regression-based models can produce severely biased estimates of long-term mean river fluxes of various constituents. These models, estimated using sample values of concentration, discharge, and date, are used to compute estimated fluxes for a multiyear period at a daily time step. This study compares results of the LOADEST seven-parameter model, LOADEST five-parameter model, and the Weighted Regressions on Time, Discharge, and Season (WRTDS) model using subsampling of six very large datasets to better understand this bias problem. This analysis considers sample datasets for dissolved nitrate and total phosphorus. The results show that LOADEST-7 and LOADEST-5, although they often produce very nearly unbiased results, can produce highly biased results. This study identifies three conditions that can give rise to these severe biases: (1) lack of fit of the log of concentration vs. log discharge relationship, (2) substantial differences in the shape of this relationship across seasons, and (3) severely heteroscedastic residuals. The WRTDS model is more resistant to the bias problem than the LOADEST models but is not immune to them. Understanding the causes of the bias problem is crucial to selecting an appropriate method for flux computations. Diagnostic tools for identifying the potential for bias problems are introduced, and strategies for resolving bias problems are described.

  13. A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.

    Science.gov (United States)

    Meaney, Christopher; Moineddin, Rahim

    2014-01-24

    In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the

  14. Group-Contribution based Property Estimation and Uncertainty analysis for Flammability-related Properties

    DEFF Research Database (Denmark)

    Frutiger, Jerome; Marcarie, Camille; Abildskov, Jens

    2016-01-01

    regression and outlier treatment have been applied to achieve high accuracy. Furthermore, linear error propagation based on covariance matrix of estimated parameters was performed. Therefore, every estimated property value of the flammability-related properties is reported together with its corresponding 95......%-confidence interval of the prediction. Compared to existing models the developed ones have a higher accuracy, are simple to apply and provide uncertainty information on the calculated prediction. The average relative error and correlation coefficient are 11.5% and 0.99 for LFL, 15.9% and 0.91 for UFL, 2...

  15. Dual Regression

    OpenAIRE

    Spady, Richard; Stouli, Sami

    2012-01-01

    We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...

  16. Boosted beta regression.

    Directory of Open Access Journals (Sweden)

    Matthias Schmid

    Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.

  17. Minimax Regression Quantiles

    DEFF Research Database (Denmark)

    Bache, Stefan Holst

    A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....

  18. A gentle introduction to quantile regression for ecologists

    Science.gov (United States)

    Cade, B.S.; Noon, B.R.

    2003-01-01

    Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.

  19. Normalization Ridge Regression in Practice I: Comparisons Between Ordinary Least Squares, Ridge Regression and Normalization Ridge Regression.

    Science.gov (United States)

    Bulcock, J. W.

    The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…

  20. A classical regression framework for mediation analysis: fitting one model to estimate mediation effects.

    Science.gov (United States)

    Saunders, Christina T; Blume, Jeffrey D

    2017-10-26

    Mediation analysis explores the degree to which an exposure's effect on an outcome is diverted through a mediating variable. We describe a classical regression framework for conducting mediation analyses in which estimates of causal mediation effects and their variance are obtained from the fit of a single regression model. The vector of changes in exposure pathway coefficients, which we named the essential mediation components (EMCs), is used to estimate standard causal mediation effects. Because these effects are often simple functions of the EMCs, an analytical expression for their model-based variance follows directly. Given this formula, it is instructive to revisit the performance of routinely used variance approximations (e.g., delta method and resampling methods). Requiring the fit of only one model reduces the computation time required for complex mediation analyses and permits the use of a rich suite of regression tools that are not easily implemented on a system of three equations, as would be required in the Baron-Kenny framework. Using data from the BRAIN-ICU study, we provide examples to illustrate the advantages of this framework and compare it with the existing approaches. © The Author 2017. Published by Oxford University Press.

  1. Logistic quantile regression provides improved estimates for bounded avian counts: A case study of California Spotted Owl fledgling production

    Science.gov (United States)

    Cade, Brian S.; Noon, Barry R.; Scherer, Rick D.; Keane, John J.

    2017-01-01

    Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical conditional distribution of a bounded discrete random variable. The logistic quantile regression model requires that counts are randomly jittered to a continuous random variable, logit transformed to bound them between specified lower and upper values, then estimated in conventional linear quantile regression, repeating the 3 steps and averaging estimates. Back-transformation to the original discrete scale relies on the fact that quantiles are equivariant to monotonic transformations. We demonstrate this statistical procedure by modeling 20 years of California Spotted Owl fledgling production (0−3 per territory) on the Lassen National Forest, California, USA, as related to climate, demographic, and landscape habitat characteristics at territories. Spotted Owl fledgling counts increased nonlinearly with decreasing precipitation in the early nesting period, in the winter prior to nesting, and in the prior growing season; with increasing minimum temperatures in the early nesting period; with adult compared to subadult parents; when there was no fledgling production in the prior year; and when percentage of the landscape surrounding nesting sites (202 ha) with trees ≥25 m height increased. Changes in production were primarily driven by changes in the proportion of territories with 2 or 3 fledglings. Average variances of the discrete cumulative distributions of the estimated fledgling counts indicated that temporal changes in climate and parent age class explained 18% of the annual variance in owl fledgling production, which was 34% of the total variance. Prior fledgling production explained as much of

  2. Surgical Care Required for Populations Affected by Climate-related Natural Disasters: A Global Estimation.

    Science.gov (United States)

    Lee, Eugenia E; Stewart, Barclay; Zha, Yuanting A; Groen, Thomas A; Burkle, Frederick M; Kushner, Adam L

    2016-08-10

    Climate extremes will increase the frequency and severity of natural disasters worldwide.  Climate-related natural disasters were anticipated to affect 375 million people in 2015, more than 50% greater than the yearly average in the previous decade. To inform surgical assistance preparedness, we estimated the number of surgical procedures needed.   The numbers of people affected by climate-related disasters from 2004 to 2014 were obtained from the Centre for Research of the Epidemiology of Disasters database. Using 5,000 procedures per 100,000 persons as the minimum, baseline estimates were calculated. A linear regression of the number of surgical procedures performed annually and the estimated number of surgical procedures required for climate-related natural disasters was performed. Approximately 140 million people were affected by climate-related natural disasters annually requiring 7.0 million surgical procedures. The greatest need for surgical care was in the People's Republic of China, India, and the Philippines. Linear regression demonstrated a poor relationship between national surgical capacity and estimated need for surgical care resulting from natural disaster, but countries with the least surgical capacity will have the greatest need for surgical care for persons affected by climate-related natural disasters. As climate extremes increase the frequency and severity of natural disasters, millions will need surgical care beyond baseline needs. Countries with insufficient surgical capacity will have the most need for surgical care for persons affected by climate-related natural disasters. Estimates of surgical are particularly important for countries least equipped to meet surgical care demands given critical human and physical resource deficiencies.

  3. Logistic regression for dichotomized counts.

    Science.gov (United States)

    Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

    2016-12-01

    Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.

  4. Estimating the Counterfactual Impact of Conservation Programs on Land Cover Outcomes: The Role of Matching and Panel Regression Techniques

    Science.gov (United States)

    Jones, Kelly W.; Lewis, David J.

    2015-01-01

    Deforestation and conversion of native habitats continues to be the leading driver of biodiversity and ecosystem service loss. A number of conservation policies and programs are implemented—from protected areas to payments for ecosystem services (PES)—to deter these losses. Currently, empirical evidence on whether these approaches stop or slow land cover change is lacking, but there is increasing interest in conducting rigorous, counterfactual impact evaluations, especially for many new conservation approaches, such as PES and REDD, which emphasize additionality. In addition, several new, globally available and free high-resolution remote sensing datasets have increased the ease of carrying out an impact evaluation on land cover change outcomes. While the number of conservation evaluations utilizing ‘matching’ to construct a valid control group is increasing, the majority of these studies use simple differences in means or linear cross-sectional regression to estimate the impact of the conservation program using this matched sample, with relatively few utilizing fixed effects panel methods—an alternative estimation method that relies on temporal variation in the data. In this paper we compare the advantages and limitations of (1) matching to construct the control group combined with differences in means and cross-sectional regression, which control for observable forms of bias in program evaluation, to (2) fixed effects panel methods, which control for observable and time-invariant unobservable forms of bias, with and without matching to create the control group. We then use these four approaches to estimate forest cover outcomes for two conservation programs: a PES program in Northeastern Ecuador and strict protected areas in European Russia. In the Russia case we find statistically significant differences across estimators—due to the presence of unobservable bias—that lead to differences in conclusions about effectiveness. The Ecuador case

  5. In search of a corrected prescription drug elasticity estimate: a meta-regression approach.

    Science.gov (United States)

    Gemmill, Marin C; Costa-Font, Joan; McGuire, Alistair

    2007-06-01

    An understanding of the relationship between cost sharing and drug consumption depends on consistent and unbiased price elasticity estimates. However, there is wide heterogeneity among studies, which constrains the applicability of elasticity estimates for empirical purposes and policy simulation. This paper attempts to provide a corrected measure of the drug price elasticity by employing meta-regression analysis (MRA). The results indicate that the elasticity estimates are significantly different from zero, and the corrected elasticity is -0.209 when the results are made robust to heteroskedasticity and clustering of observations. Elasticity values are higher when the study was published in an economic journal, when the study employed a greater number of observations, and when the study used aggregate data. Elasticity estimates are lower when the institutional setting was a tax-based health insurance system.

  6. Regression analysis with categorized regression calibrated exposure: some interesting findings

    Directory of Open Access Journals (Sweden)

    Hjartåker Anette

    2006-07-01

    Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a

  7. The Roots of Inequality: Estimating Inequality of Opportunity from Regression Trees

    DEFF Research Database (Denmark)

    Brunori, Paolo; Hufe, Paul; Mahler, Daniel Gerszon

    2017-01-01

    the risk of arbitrary and ad-hoc model selection. Second, they provide a standardized way of trading off upward and downward biases in inequality of opportunity estimations. Finally, regression trees can be graphically represented; their structure is immediate to read and easy to understand. This will make...... the measurement of inequality of opportunity more easily comprehensible to a large audience. These advantages are illustrated by an empirical application based on the 2011 wave of the European Union Statistics on Income and Living Conditions....

  8. Estimating Frequency by Interpolation Using Least Squares Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Changwei Ma

    2015-01-01

    Full Text Available Discrete Fourier transform- (DFT- based maximum likelihood (ML algorithm is an important part of single sinusoid frequency estimation. As signal to noise ratio (SNR increases and is above the threshold value, it will lie very close to Cramer-Rao lower bound (CRLB, which is dependent on the number of DFT points. However, its mean square error (MSE performance is directly proportional to its calculation cost. As a modified version of support vector regression (SVR, least squares SVR (LS-SVR can not only still keep excellent capabilities for generalizing and fitting but also exhibit lower computational complexity. In this paper, therefore, LS-SVR is employed to interpolate on Fourier coefficients of received signals and attain high frequency estimation accuracy. Our results show that the proposed algorithm can make a good compromise between calculation cost and MSE performance under the assumption that the sample size, number of DFT points, and resampling points are already known.

  9. Background stratified Poisson regression analysis of cohort data.

    Science.gov (United States)

    Richardson, David B; Langholz, Bryan

    2012-03-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.

  10. Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

    Science.gov (United States)

    Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

    2018-05-01

    Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.

  11. Estimation of the laser cutting operating cost by support vector regression methodology

    Science.gov (United States)

    Jović, Srđan; Radović, Aleksandar; Šarkoćević, Živče; Petković, Dalibor; Alizamir, Meysam

    2016-09-01

    Laser cutting is a popular manufacturing process utilized to cut various types of materials economically. The operating cost is affected by laser power, cutting speed, assist gas pressure, nozzle diameter and focus point position as well as the workpiece material. In this article, the process factors investigated were: laser power, cutting speed, air pressure and focal point position. The aim of this work is to relate the operating cost to the process parameters mentioned above. CO2 laser cutting of stainless steel of medical grade AISI316L has been investigated. The main goal was to analyze the operating cost through the laser power, cutting speed, air pressure, focal point position and material thickness. Since the laser operating cost is a complex, non-linear task, soft computing optimization algorithms can be used. Intelligent soft computing scheme support vector regression (SVR) was implemented. The performance of the proposed estimator was confirmed with the simulation results. The SVR results are then compared with artificial neural network and genetic programing. According to the results, a greater improvement in estimation accuracy can be achieved through the SVR compared to other soft computing methodologies. The new optimization methods benefit from the soft computing capabilities of global optimization and multiobjective optimization rather than choosing a starting point by trial and error and combining multiple criteria into a single criterion.

  12. The limiting behavior of the estimated parameters in a misspecified random field regression model

    DEFF Research Database (Denmark)

    Dahl, Christian Møller; Qin, Yu

    This paper examines the limiting properties of the estimated parameters in the random field regression model recently proposed by Hamilton (Econometrica, 2001). Though the model is parametric, it enjoys the flexibility of the nonparametric approach since it can approximate a large collection of n...

  13. Skeletal height estimation from regression analysis of sternal lengths in a Northwest Indian population of Chandigarh region: a postmortem study.

    Science.gov (United States)

    Singh, Jagmahender; Pathak, R K; Chavali, Krishnadutt H

    2011-03-20

    Skeletal height estimation from regression analysis of eight sternal lengths in the subjects of Chandigarh zone of Northwest India is the topic of discussion in this study. Analysis of eight sternal lengths (length of manubrium, length of mesosternum, combined length of manubrium and mesosternum, total sternal length and first four intercostals lengths of mesosternum) measured from 252 male and 91 female sternums obtained at postmortems revealed that mean cadaver stature and sternal lengths were more in North Indians and males than the South Indians and females. Except intercostal lengths, all the sternal lengths were positively correlated with stature of the deceased in both sexes (P regression analysis of sternal lengths was found more useful than the linear regression for stature estimation. Using multivariate regression analysis, the combined length of manubrium and mesosternum in both sexes and the length of manubrium along with 2nd and 3rd intercostal lengths of mesosternum in males were selected as best estimators of stature. Nonetheless, the stature of males can be predicted with SEE of 6.66 (R(2) = 0.16, r = 0.318) from combination of MBL+BL_3+LM+BL_2, and in females from MBL only, it can be estimated with SEE of 6.65 (R(2) = 0.10, r = 0.318), whereas from the multiple regression analysis of pooled data, stature can be known with SEE of 6.97 (R(2) = 0.387, r = 575) from the combination of MBL+LM+BL_2+TSL+BL_3. The R(2) and F-ratio were found to be statistically significant for almost all the variables in both the sexes, except 4th intercostal length in males and 2nd to 4th intercostal lengths in females. The 'major' sternal lengths were more useful than the 'minor' ones for stature estimation The universal regression analysis used by Kanchan et al. [39] when applied to sternal lengths, gave satisfactory estimates of stature for males only but female stature was comparatively better estimated from simple linear regressions. But they are not proposed for the

  14. Estimation of error components in a multi-error linear regression model, with an application to track fitting

    International Nuclear Information System (INIS)

    Fruehwirth, R.

    1993-01-01

    We present an estimation procedure of the error components in a linear regression model with multiple independent stochastic error contributions. After solving the general problem we apply the results to the estimation of the actual trajectory in track fitting with multiple scattering. (orig.)

  15. Use of instantaneous streamflow measurements to improve regression estimates of index flow for the summer month of lowest streamflow in Michigan

    Science.gov (United States)

    Holtschlag, David J.

    2011-01-01

    In Michigan, index flow Q50 is a streamflow characteristic defined as the minimum of median flows for July, August, and September. The state of Michigan uses index flow estimates to help regulate large (greater than 100,000 gallons per day) water withdrawals to prevent adverse effects on characteristic fish populations. At sites where long-term streamgages are located, index flows are computed directly from continuous streamflow records as GageQ50. In an earlier study, a multiple-regression equation was developed to estimate index flows IndxQ50 at ungaged sites. The index equation explains about 94 percent of the variability of index flows at 147 (index) streamgages by use of six explanatory variables describing soil type, aquifer transmissivity, land cover, and precipitation characteristics. This report extends the results of the previous study, by use of Monte Carlo simulations, to evaluate alternative flow estimators, DiscQ50, IntgQ50, SiteQ50, and AugmQ50. The Monte Carlo simulations treated each of the available index streamgages, in turn, as a miscellaneous site where streamflow conditions are described by one or more instantaneous measurements of flow. In the simulations, instantaneous flows were approximated by daily mean flows at the corresponding site. All estimators use information that can be obtained from instantaneous flow measurements and contemporaneous daily mean flow data from nearby long-term streamgages. The efficacy of these estimators was evaluated over a set of measurement intensities in which the number of simulated instantaneous flow measurements ranged from 1 to 100 at a site. The discrete measurement estimator DiscQ50 is based on a simple linear regression developed between information on daily mean flows at five or more streamgages near the miscellaneous site and their corresponding GageQ50 index flows. The regression relation then was used to compute a DiscQ50 estimate at the miscellaneous site by use of the simulated instantaneous flow

  16. A comparison of the performances of an artificial neural network and a regression model for GFR estimation.

    Science.gov (United States)

    Liu, Xun; Li, Ning-shan; Lv, Lin-sheng; Huang, Jian-hua; Tang, Hua; Chen, Jin-xia; Ma, Hui-juan; Wu, Xiao-ming; Lou, Tan-qi

    2013-12-01

    Accurate estimation of glomerular filtration rate (GFR) is important in clinical practice. Current models derived from regression are limited by the imprecision of GFR estimates. We hypothesized that an artificial neural network (ANN) might improve the precision of GFR estimates. A study of diagnostic test accuracy. 1,230 patients with chronic kidney disease were enrolled, including the development cohort (n=581), internal validation cohort (n=278), and external validation cohort (n=371). Estimated GFR (eGFR) using a new ANN model and a new regression model using age, sex, and standardized serum creatinine level derived in the development and internal validation cohort, and the CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration) 2009 creatinine equation. Measured GFR (mGFR). GFR was measured using a diethylenetriaminepentaacetic acid renal dynamic imaging method. Serum creatinine was measured with an enzymatic method traceable to isotope-dilution mass spectrometry. In the external validation cohort, mean mGFR was 49±27 (SD) mL/min/1.73 m2 and biases (median difference between mGFR and eGFR) for the CKD-EPI, new regression, and new ANN models were 0.4, 1.5, and -0.5 mL/min/1.73 m2, respectively (P30% from mGFR) were 50.9%, 77.4%, and 78.7%, respectively (Psource of systematic bias in comparisons of new models to CKD-EPI, and both the derivation and validation cohorts consisted of a group of patients who were referred to the same institution. An ANN model using 3 variables did not perform better than a new regression model. Whether ANN can improve GFR estimation using more variables requires further investigation. Copyright © 2013 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.

  17. Estimation of lung tumor position from multiple anatomical features on 4D-CT using multiple regression analysis.

    Science.gov (United States)

    Ono, Tomohiro; Nakamura, Mitsuhiro; Hirose, Yoshinori; Kitsuda, Kenji; Ono, Yuka; Ishigaki, Takashi; Hiraoka, Masahiro

    2017-09-01

    To estimate the lung tumor position from multiple anatomical features on four-dimensional computed tomography (4D-CT) data sets using single regression analysis (SRA) and multiple regression analysis (MRA) approach and evaluate an impact of the approach on internal target volume (ITV) for stereotactic body radiotherapy (SBRT) of the lung. Eleven consecutive lung cancer patients (12 cases) underwent 4D-CT scanning. The three-dimensional (3D) lung tumor motion exceeded 5 mm. The 3D tumor position and anatomical features, including lung volume, diaphragm, abdominal wall, and chest wall positions, were measured on 4D-CT images. The tumor position was estimated by SRA using each anatomical feature and MRA using all anatomical features. The difference between the actual and estimated tumor positions was defined as the root-mean-square error (RMSE). A standard partial regression coefficient for the MRA was evaluated. The 3D lung tumor position showed a high correlation with the lung volume (R = 0.92 ± 0.10). Additionally, ITVs derived from SRA and MRA approaches were compared with ITV derived from contouring gross tumor volumes on all 10 phases of the 4D-CT (conventional ITV). The RMSE of the SRA was within 3.7 mm in all directions. Also, the RMSE of the MRA was within 1.6 mm in all directions. The standard partial regression coefficient for the lung volume was the largest and had the most influence on the estimated tumor position. Compared with conventional ITV, average percentage decrease of ITV were 31.9% and 38.3% using SRA and MRA approaches, respectively. The estimation accuracy of lung tumor position was improved by the MRA approach, which provided smaller ITV than conventional ITV. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.

  18. Regional regression equations for the estimation of selected monthly low-flow duration and frequency statistics at ungaged sites on streams in New Jersey

    Science.gov (United States)

    Watson, Kara M.; McHugh, Amy R.

    2014-01-01

    Regional regression equations were developed for estimating monthly flow-duration and monthly low-flow frequency statistics for ungaged streams in Coastal Plain and non-coastal regions of New Jersey for baseline and current land- and water-use conditions. The equations were developed to estimate 87 different streamflow statistics, which include the monthly 99-, 90-, 85-, 75-, 50-, and 25-percentile flow-durations of the minimum 1-day daily flow; the August–September 99-, 90-, and 75-percentile minimum 1-day daily flow; and the monthly 7-day, 10-year (M7D10Y) low-flow frequency. These 87 streamflow statistics were computed for 41 continuous-record streamflow-gaging stations (streamgages) with 20 or more years of record and 167 low-flow partial-record stations in New Jersey with 10 or more streamflow measurements. The regression analyses used to develop equations to estimate selected streamflow statistics were performed by testing the relation between flow-duration statistics and low-flow frequency statistics for 32 basin characteristics (physical characteristics, land use, surficial geology, and climate) at the 41 streamgages and 167 low-flow partial-record stations. The regression analyses determined drainage area, soil permeability, average April precipitation, average June precipitation, and percent storage (water bodies and wetlands) were the significant explanatory variables for estimating the selected flow-duration and low-flow frequency statistics. Streamflow estimates were computed for two land- and water-use conditions in New Jersey—land- and water-use during the baseline period of record (defined as the years a streamgage had little to no change in development and water use) and current land- and water-use conditions (1989–2008)—for each selected station using data collected through water year 2008. The baseline period of record is representative of a period when the basin was unaffected by change in development. The current period is

  19. Estimation of pyrethroid pesticide intake using regression modeling of food groups based on composite dietary samples

    Data.gov (United States)

    U.S. Environmental Protection Agency — Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression...

  20. A regressive methodology for estimating missing data in rainfall daily time series

    Science.gov (United States)

    Barca, E.; Passarella, G.

    2009-04-01

    The "presence" of gaps in environmental data time series represents a very common, but extremely critical problem, since it can produce biased results (Rubin, 1976). Missing data plagues almost all surveys. The problem is how to deal with missing data once it has been deemed impossible to recover the actual missing values. Apart from the amount of missing data, another issue which plays an important role in the choice of any recovery approach is the evaluation of "missingness" mechanisms. When data missing is conditioned by some other variable observed in the data set (Schafer, 1997) the mechanism is called MAR (Missing at Random). Otherwise, when the missingness mechanism depends on the actual value of the missing data, it is called NCAR (Not Missing at Random). This last is the most difficult condition to model. In the last decade interest arose in the estimation of missing data by using regression (single imputation). More recently multiple imputation has become also available, which returns a distribution of estimated values (Scheffer, 2002). In this paper an automatic methodology for estimating missing data is presented. In practice, given a gauging station affected by missing data (target station), the methodology checks the randomness of the missing data and classifies the "similarity" between the target station and the other gauging stations spread over the study area. Among different methods useful for defining the similarity degree, whose effectiveness strongly depends on the data distribution, the Spearman correlation coefficient was chosen. Once defined the similarity matrix, a suitable, nonparametric, univariate, and regressive method was applied in order to estimate missing data in the target station: the Theil method (Theil, 1950). Even though the methodology revealed to be rather reliable an improvement of the missing data estimation can be achieved by a generalization. A first possible improvement consists in extending the univariate technique to

  1. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    Science.gov (United States)

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  2. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    Science.gov (United States)

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  3. Trend Estimation and Regression Analysis in Climatological Time Series: An Application of Structural Time Series Models and the Kalman Filter.

    Science.gov (United States)

    Visser, H.; Molenaar, J.

    1995-05-01

    The detection of trends in climatological data has become central to the discussion on climate change due to the enhanced greenhouse effect. To prove detection, a method is needed (i) to make inferences on significant rises or declines in trends, (ii) to take into account natural variability in climate series, and (iii) to compare output from GCMs with the trends in observed climate data. To meet these requirements, flexible mathematical tools are needed. A structural time series model is proposed with which a stochastic trend, a deterministic trend, and regression coefficients can be estimated simultaneously. The stochastic trend component is described using the class of ARIMA models. The regression component is assumed to be linear. However, the regression coefficients corresponding with the explanatory variables may be time dependent to validate this assumption. The mathematical technique used to estimate this trend-regression model is the Kaiman filter. The main features of the filter are discussed.Examples of trend estimation are given using annual mean temperatures at a single station in the Netherlands (1706-1990) and annual mean temperatures at Northern Hemisphere land stations (1851-1990). The inclusion of explanatory variables is shown by regressing the latter temperature series on four variables: Southern Oscillation index (SOI), volcanic dust index (VDI), sunspot numbers (SSN), and a simulated temperature signal, induced by increasing greenhouse gases (GHG). In all analyses, the influence of SSN on global temperatures is found to be negligible. The correlations between temperatures and SOI and VDI appear to be negative. For SOI, this correlation is significant, but for VDI it is not, probably because of a lack of volcanic eruptions during the sample period. The relation between temperatures and GHG is positive, which is in agreement with the hypothesis of a warming climate because of increasing levels of greenhouse gases. The prediction performance of

  4. A Seemingly Unrelated Poisson Regression Model

    OpenAIRE

    King, Gary

    1989-01-01

    This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.

  5. Background stratified Poisson regression analysis of cohort data

    International Nuclear Information System (INIS)

    Richardson, David B.; Langholz, Bryan

    2012-01-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models. (orig.)

  6. Regression methodology in groundwater composition estimation with composition predictions for Romuvaara borehole KR10

    Energy Technology Data Exchange (ETDEWEB)

    Luukkonen, A.; Korkealaakso, J.; Pitkaenen, P. [VTT Communities and Infrastructure, Espoo (Finland)

    1997-11-01

    Teollisuuden Voima Oy selected five investigation areas for preliminary site studies (1987Ae1992). The more detailed site investigation project, launched at the beginning of 1993 and presently supervised by Posiva Oy, is concentrated to three investigation areas. Romuvaara at Kuhmo is one of the present target areas, and the geochemical, structural and hydrological data used in this study are extracted from there. The aim of the study is to develop suitable methods for groundwater composition estimation based on a group of known hydrogeological variables. The input variables used are related to the host type of groundwater, hydrological conditions around the host location, mixing potentials between different types of groundwater, and minerals equilibrated with the groundwater. The output variables are electrical conductivity, Ca, Mg, Mn, Na, K, Fe, Cl, S, HS, SO{sub 4}, alkalinity, {sup 3}H, {sup 14}C, {sup 13}C, Al, Sr, F, Br and I concentrations, and pH of the groundwater. The methodology is to associate the known hydrogeological conditions (i.e. input variables), with the known water compositions (output variables), and to evaluate mathematical relations between these groups. Output estimations are done with two separate procedures: partial least squares regressions on the principal components of input variables, and by training neural networks with input-output pairs. Coefficients of linear equations and trained networks are optional methods for actual predictions. The quality of output predictions are monitored with confidence limit estimations, evaluated from input variable covariances and output variances, and with charge balance calculations. Groundwater compositions in Romuvaara borehole KR10 are predicted at 10 metre intervals with both prediction methods. 46 refs.

  7. Nonlinear Forecasting With Many Predictors Using Kernel Ridge Regression

    DEFF Research Database (Denmark)

    Exterkate, Peter; Groenen, Patrick J.F.; Heij, Christiaan

    This paper puts forward kernel ridge regression as an approach for forecasting with many predictors that are related nonlinearly to the target variable. In kernel ridge regression, the observed predictor variables are mapped nonlinearly into a high-dimensional space, where estimation of the predi...

  8. Using a Regression Method for Estimating Performance in a Rapid Serial Visual Presentation Target-Detection Task

    Science.gov (United States)

    2017-12-01

    Fig. 2 Simulation method; the process for one iteration of the simulation . It was repeated 250 times per combination of HR and FAR. Analysis was...distribution is unlimited. 8 Fig. 2 Simulation method; the process for one iteration of the simulation . It was repeated 250 times per combination of HR...stimuli. Simulations show that this regression method results in an unbiased and accurate estimate of target detection performance. The regression

  9. Simple estimation procedures for regression analysis of interval-censored failure time data under the proportional hazards model.

    Science.gov (United States)

    Sun, Jianguo; Feng, Yanqin; Zhao, Hui

    2015-01-01

    Interval-censored failure time data occur in many fields including epidemiological and medical studies as well as financial and sociological studies, and many authors have investigated their analysis (Sun, The statistical analysis of interval-censored failure time data, 2006; Zhang, Stat Modeling 9:321-343, 2009). In particular, a number of procedures have been developed for regression analysis of interval-censored data arising from the proportional hazards model (Finkelstein, Biometrics 42:845-854, 1986; Huang, Ann Stat 24:540-568, 1996; Pan, Biometrics 56:199-203, 2000). For most of these procedures, however, one drawback is that they involve estimation of both regression parameters and baseline cumulative hazard function. In this paper, we propose two simple estimation approaches that do not need estimation of the baseline cumulative hazard function. The asymptotic properties of the resulting estimates are given, and an extensive simulation study is conducted and indicates that they work well for practical situations.

  10. Evaluation of Regression and Neuro_Fuzzy Models in Estimating Saturated Hydraulic Conductivity

    Directory of Open Access Journals (Sweden)

    J. Behmanesh

    2015-06-01

    Full Text Available Study of soil hydraulic properties such as saturated and unsaturated hydraulic conductivity is required in the environmental investigations. Despite numerous research, measuring saturated hydraulic conductivity using by direct methods are still costly, time consuming and professional. Therefore estimating saturated hydraulic conductivity using rapid and low cost methods such as pedo-transfer functions with acceptable accuracy was developed. The purpose of this research was to compare and evaluate 11 pedo-transfer functions and Adaptive Neuro-Fuzzy Inference System (ANFIS to estimate saturated hydraulic conductivity of soil. In this direct, saturated hydraulic conductivity and physical properties in 40 points of Urmia were calculated. The soil excavated was used in the lab to determine its easily accessible parameters. The results showed that among existing models, Aimrun et al model had the best estimation for soil saturated hydraulic conductivity. For mentioned model, the Root Mean Square Error and Mean Absolute Error parameters were 0.174 and 0.028 m/day respectively. The results of the present research, emphasises the importance of effective porosity application as an important accessible parameter in accuracy of pedo-transfer functions. sand and silt percent, bulk density and soil particle density were selected to apply in 561 ANFIS models. In training phase of best ANFIS model, the R2 and RMSE were calculated 1 and 1.2×10-7 respectively. These amounts in the test phase were 0.98 and 0.0006 respectively. Comparison of regression and ANFIS models showed that the ANFIS model had better results than regression functions. Also Nuro-Fuzzy Inference System had capability to estimatae with high accuracy in various soil textures.

  11. ESTIMATION OF INTRINSIC AND EXTRINSIC ENVIRONMENT FACTORS OF AGE-RELATED TOOTH COLOUR CHANGES

    Czech Academy of Sciences Publication Activity Database

    Hyšpler, P.; Jezbera, D.; Fürst, T.; Mikšík, Ivan; Waclawek, M.

    2010-01-01

    Roč. 17, č. 4 (2010), s. 515-525 ISSN 1898-6196 Institutional research plan: CEZ:AV0Z50110509 Keywords : age-related colour changes of teeth * intrinsic and extrinsic factors * 3D mathematical regression models * estimation of real age Subject RIV: ED - Physiology Impact factor: 0.294, year: 2010

  12. Testing and Estimating Shape-Constrained Nonparametric Density and Regression in the Presence of Measurement Error

    KAUST Repository

    Carroll, Raymond J.

    2011-03-01

    In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.

  13. Flexible regression models for estimating postmortem interval (PMI) in forensic medicine.

    Science.gov (United States)

    Muñoz Barús, José Ignacio; Febrero-Bande, Manuel; Cadarso-Suárez, Carmen

    2008-10-30

    Correct determination of time of death is an important goal in forensic medicine. Numerous methods have been described for estimating postmortem interval (PMI), but most are imprecise, poorly reproducible and/or have not been validated with real data. In recent years, however, some progress in PMI estimation has been made, notably through the use of new biochemical methods for quantifying relevant indicator compounds in the vitreous humour. The best, but unverified, results have been obtained with [K+] and hypoxanthine [Hx], using simple linear regression (LR) models. The main aim of this paper is to offer more flexible alternatives to LR, such as generalized additive models (GAMs) and support vector machines (SVMs) in order to obtain improved PMI estimates. The present study, based on detailed analysis of [K+] and [Hx] in more than 200 vitreous humour samples from subjects with known PMI, compared classical LR methodology with GAM and SVM methodologies. Both proved better than LR for estimation of PMI. SVM showed somewhat greater precision than GAM, but GAM offers a readily interpretable graphical output, facilitating understanding of findings by legal professionals; there are thus arguments for using both types of models. R code for these methods is available from the authors, permitting accurate prediction of PMI from vitreous humour [K+], [Hx] and [U], with confidence intervals and graphical output provided. Copyright 2008 John Wiley & Sons, Ltd.

  14. Estimation of Stature from Footprint Anthropometry Using Regression Analysis: A Study on the Bidayuh Population of East Malaysia

    Directory of Open Access Journals (Sweden)

    T. Nataraja Moorthy

    2015-05-01

    Full Text Available The human foot has been studied for a variety of reasons, i.e., for forensic as well as non-forensic purposes by anatomists, forensic scientists, anthropologists, physicians, podiatrists, and numerous other groups. An aspect of human identification that has received scant attention from forensic anthropologists is the study of human feet and the footprints made by the feet. The present study, conducted during 2013-2014, aimed to derive population specific regression equations to estimate stature from the footprint anthropometry of indigenous adult Bidayuhs in the east of Malaysia. The study sample consisted of 480 bilateral footprints collected using a footprint kit from 240 Bidayuhs (120 males and 120 females, who consented to taking part in the study. Their ages ranged from 18 to 70 years. Stature was measured using a portable body meter device (SECA model 206. The data were analyzed using PASW Statistics version 20. In this investigation, better results were obtained in terms of correlation coefficient (R between stature and various footprint measurements and regression analysis in estimating the stature. The (R values showed a positive and statistically significant (p < 0.001 relationship between the two parameters. The correlation coefficients in the pooled sample (0.861–0.882 were comparatively higher than those of an individual male (0.762-0.795 and female (0.722-0.765. This study provided regression equations to estimate stature from footprints in the Bidayuh population. The result showed that the regression equations without sex indicators performed significantly better than models with gender indications. The regression equations derived for a pooled sample can be used to estimate stature, even when the sex of the footprint is unknown, as in real crime scenes.

  15. Linear regression and the normality assumption.

    Science.gov (United States)

    Schmidt, Amand F; Finan, Chris

    2017-12-16

    Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates. Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient. Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings. Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Image Jacobian Matrix Estimation Based on Online Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Shangqin Mao

    2012-10-01

    Full Text Available Research into robotics visual servoing is an important area in the field of robotics. It has proven difficult to achieve successful results for machine vision and robotics in unstructured environments without using any a priori camera or kinematic models. In uncalibrated visual servoing, image Jacobian matrix estimation methods can be divided into two groups: the online method and the offline method. The offline method is not appropriate for most natural environments. The online method is robust but rough. Moreover, if the images feature configuration changes, it needs to restart the approximating procedure. A novel approach based on an online support vector regression (OL-SVR algorithm is proposed which overcomes the drawbacks and combines the virtues just mentioned.

  17. Statistical analysis of sediment toxicity by additive monotone regression splines

    NARCIS (Netherlands)

    Boer, de W.J.; Besten, den P.J.; Braak, ter C.J.F.

    2002-01-01

    Modeling nonlinearity and thresholds in dose-effect relations is a major challenge, particularly in noisy data sets. Here we show the utility of nonlinear regression with additive monotone regression splines. These splines lead almost automatically to the estimation of thresholds. We applied this

  18. Power system state estimation using an iteratively reweighted least squares method for sequential L{sub 1}-regression

    Energy Technology Data Exchange (ETDEWEB)

    Jabr, R.A. [Electrical, Computer and Communication Engineering Department, Notre Dame University, P.O. Box 72, Zouk Mikhael, Zouk Mosbeh (Lebanon)

    2006-02-15

    This paper presents an implementation of the least absolute value (LAV) power system state estimator based on obtaining a sequence of solutions to the L{sub 1}-regression problem using an iteratively reweighted least squares (IRLS{sub L1}) method. The proposed implementation avoids reformulating the regression problem into standard linear programming (LP) form and consequently does not require the use of common methods of LP, such as those based on the simplex method or interior-point methods. It is shown that the IRLS{sub L1} method is equivalent to solving a sequence of linear weighted least squares (LS) problems. Thus, its implementation presents little additional effort since the sparse LS solver is common to existing LS state estimators. Studies on the termination criteria of the IRLS{sub L1} method have been carried out to determine a procedure for which the proposed estimator is more computationally efficient than a previously proposed non-linear iteratively reweighted least squares (IRLS) estimator. Indeed, it is revealed that the proposed method is a generalization of the previously reported IRLS estimator, but is based on more rigorous theory. (author)

  19. Polynomial regression analysis and significance test of the regression function

    International Nuclear Information System (INIS)

    Gao Zhengming; Zhao Juan; He Shengping

    2012-01-01

    In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)

  20. Estimating severity of sideways fall using a generic multi linear regression model based on kinematic input variables.

    Science.gov (United States)

    van der Zijden, A M; Groen, B E; Tanck, E; Nienhuis, B; Verdonschot, N; Weerdesteyn, V

    2017-03-21

    Many research groups have studied fall impact mechanics to understand how fall severity can be reduced to prevent hip fractures. Yet, direct impact force measurements with force plates are restricted to a very limited repertoire of experimental falls. The purpose of this study was to develop a generic model for estimating hip impact forces (i.e. fall severity) in in vivo sideways falls without the use of force plates. Twelve experienced judokas performed sideways Martial Arts (MA) and Block ('natural') falls on a force plate, both with and without a mat on top. Data were analyzed to determine the hip impact force and to derive 11 selected (subject-specific and kinematic) variables. Falls from kneeling height were used to perform a stepwise regression procedure to assess the effects of these input variables and build the model. The final model includes four input variables, involving one subject-specific measure and three kinematic variables: maximum upper body deceleration, body mass, shoulder angle at the instant of 'maximum impact' and maximum hip deceleration. The results showed that estimated and measured hip impact forces were linearly related (explained variances ranging from 46 to 63%). Hip impact forces of MA falls onto the mat from a standing position (3650±916N) estimated by the final model were comparable with measured values (3698±689N), even though these data were not used for training the model. In conclusion, a generic linear regression model was developed that enables the assessment of fall severity through kinematic measures of sideways falls, without using force plates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

    Science.gov (United States)

    Karabatsos, George

    2017-02-01

    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected

  2. Estimation of the daily global solar radiation based on the Gaussian process regression methodology in the Saharan climate

    Science.gov (United States)

    Guermoui, Mawloud; Gairaa, Kacem; Rabehi, Abdelaziz; Djafer, Djelloul; Benkaciali, Said

    2018-06-01

    Accurate estimation of solar radiation is the major concern in renewable energy applications. Over the past few years, a lot of machine learning paradigms have been proposed in order to improve the estimation performances, mostly based on artificial neural networks, fuzzy logic, support vector machine and adaptive neuro-fuzzy inference system. The aim of this work is the prediction of the daily global solar radiation, received on a horizontal surface through the Gaussian process regression (GPR) methodology. A case study of Ghardaïa region (Algeria) has been used in order to validate the above methodology. In fact, several combinations have been tested; it was found that, GPR-model based on sunshine duration, minimum air temperature and relative humidity gives the best results in term of mean absolute bias error (MBE), root mean square error (RMSE), relative mean square error (rRMSE), and correlation coefficient ( r) . The obtained values of these indicators are 0.67 MJ/m2, 1.15 MJ/m2, 5.2%, and 98.42%, respectively.

  3. Modeling of energy consumption and related GHG (greenhouse gas) intensity and emissions in Europe using general regression neural networks

    International Nuclear Information System (INIS)

    Antanasijević, Davor; Pocajt, Viktor; Ristić, Mirjana; Perić-Grujić, Aleksandra

    2015-01-01

    This paper presents a new approach for the estimation of energy-related GHG (greenhouse gas) emissions at the national level that combines the simplicity of the concept of GHG intensity and the generalization capabilities of ANNs (artificial neural networks). The main objectives of this work includes the determination of the accuracy of a GRNN (general regression neural network) model applied for the prediction of EC (energy consumption) and GHG intensity of energy consumption, utilizing general country statistics as inputs, as well as analysis of the accuracy of energy-related GHG emissions obtained by multiplying the two aforementioned outputs. The models were developed using historical data from the period 2004–2012, for a set of 26 European countries (EU Members). The obtained results demonstrate that the GRNN GHG intensity model provides a more accurate prediction, with the MAPE (mean absolute percentage error) of 4.5%, than tested MLR (multiple linear regression) and second-order and third-order non-linear MPR (multiple polynomial regression) models. Also, the GRNN EC model has high accuracy (MAPE = 3.6%), and therefore both GRNN models and the proposed approach can be considered as suitable for the calculation of GHG emissions. The energy-related predicted GHG emissions were very similar to the actual GHG emissions of EU Members (MAPE = 6.4%). - Highlights: • ANN modeling of GHG intensity of energy consumption is presented. • ANN modeling of energy consumption at the national level is presented. • GHG intensity concept was used for the estimation of energy-related GHG emissions. • The ANN models provide better results in comparison with conventional models. • Forecast of GHG emissions for 26 countries was made successfully with MAPE of 6.4%

  4. Estimating carbon and showing impacts of drought using satellite data in regression-tree models

    Science.gov (United States)

    Boyte, Stephen; Wylie, Bruce K.; Howard, Danny; Dahal, Devendra; Gilmanov, Tagir G.

    2018-01-01

    Integrating spatially explicit biogeophysical and remotely sensed data into regression-tree models enables the spatial extrapolation of training data over large geographic spaces, allowing a better understanding of broad-scale ecosystem processes. The current study presents annual gross primary production (GPP) and annual ecosystem respiration (RE) for 2000–2013 in several short-statured vegetation types using carbon flux data from towers that are located strategically across the conterminous United States (CONUS). We calculate carbon fluxes (annual net ecosystem production [NEP]) for each year in our study period, which includes 2012 when drought and higher-than-normal temperatures influence vegetation productivity in large parts of the study area. We present and analyse carbon flux dynamics in the CONUS to better understand how drought affects GPP, RE, and NEP. Model accuracy metrics show strong correlation coefficients (r) (r ≥ 94%) between training and estimated data for both GPP and RE. Overall, average annual GPP, RE, and NEP are relatively constant throughout the study period except during 2012 when almost 60% less carbon is sequestered than normal. These results allow us to conclude that this modelling method effectively estimates carbon dynamics through time and allows the exploration of impacts of meteorological anomalies and vegetation types on carbon dynamics.

  5. Regression and kriging analysis for grid power factor estimation

    Directory of Open Access Journals (Sweden)

    Rajesh Guntaka

    2014-12-01

    Full Text Available The measurement of power factor (PF in electrical utility grids is a mainstay of load balancing and is also a critical element of transmission and distribution efficiency. The measurement of PF dates back to the earliest periods of electrical power distribution to public grids. In the wide-area distribution grid, measurement of current waveforms is trivial and may be accomplished at any point in the grid using a current tap transformer. However, voltage measurement requires reference to ground and so is more problematic and measurements are normally constrained to points that have ready and easy access to a ground source. We present two mathematical analysis methods based on kriging and linear least square estimation (LLSE (regression to derive PF at nodes with unknown voltages that are within a perimeter of sample nodes with ground reference across a selected power grid. Our results indicate an error average of 1.884% that is within acceptable tolerances for PF measurements that are used in load balancing tasks.

  6. Unbalanced Regressions and the Predictive Equation

    DEFF Research Database (Denmark)

    Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

    Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...... in the theoretical predictive equation by suggesting a data generating process, where returns are generated as linear functions of a lagged latent I(0) risk process. The observed predictor is a function of this latent I(0) process, but it is corrupted by a fractionally integrated noise. Such a process may arise due...... to aggregation or unexpected level shifts. In this setup, the practitioner estimates a misspecified, unbalanced, and endogenous predictive regression. We show that the OLS estimate of this regression is inconsistent, but standard inference is possible. To obtain a consistent slope estimate, we then suggest...

  7. Marginal longitudinal semiparametric regression via penalized splines

    KAUST Repository

    Al Kadiri, M.

    2010-08-01

    We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.

  8. Marginal longitudinal semiparametric regression via penalized splines

    KAUST Repository

    Al Kadiri, M.; Carroll, R.J.; Wand, M.P.

    2010-01-01

    We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.

  9. Estimating overall exposure effects for the clustered and censored outcome using random effect Tobit regression models.

    Science.gov (United States)

    Wang, Wei; Griswold, Michael E

    2016-11-30

    The random effect Tobit model is a regression model that accommodates both left- and/or right-censoring and within-cluster dependence of the outcome variable. Regression coefficients of random effect Tobit models have conditional interpretations on a constructed latent dependent variable and do not provide inference of overall exposure effects on the original outcome scale. Marginalized random effects model (MREM) permits likelihood-based estimation of marginal mean parameters for the clustered data. For random effect Tobit models, we extend the MREM to marginalize over both the random effects and the normal space and boundary components of the censored response to estimate overall exposure effects at population level. We also extend the 'Average Predicted Value' method to estimate the model-predicted marginal means for each person under different exposure status in a designated reference group by integrating over the random effects and then use the calculated difference to assess the overall exposure effect. The maximum likelihood estimation is proposed utilizing a quasi-Newton optimization algorithm with Gauss-Hermite quadrature to approximate the integration of the random effects. We use these methods to carefully analyze two real datasets. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  10. Estimating the input function non-invasively for FDG-PET quantification with multiple linear regression analysis: simulation and verification with in vivo data

    International Nuclear Information System (INIS)

    Fang, Yu-Hua; Kao, Tsair; Liu, Ren-Shyan; Wu, Liang-Chih

    2004-01-01

    A novel statistical method, namely Regression-Estimated Input Function (REIF), is proposed in this study for the purpose of non-invasive estimation of the input function for fluorine-18 2-fluoro-2-deoxy-d-glucose positron emission tomography (FDG-PET) quantitative analysis. We collected 44 patients who had undergone a blood sampling procedure during their FDG-PET scans. First, we generated tissue time-activity curves of the grey matter and the whole brain with a segmentation technique for every subject. Summations of different intervals of these two curves were used as a feature vector, which also included the net injection dose. Multiple linear regression analysis was then applied to find the correlation between the input function and the feature vector. After a simulation study with in vivo data, the data of 29 patients were applied to calculate the regression coefficients, which were then used to estimate the input functions of the other 15 subjects. Comparing the estimated input functions with the corresponding real input functions, the averaged error percentages of the area under the curve and the cerebral metabolic rate of glucose (CMRGlc) were 12.13±8.85 and 16.60±9.61, respectively. Regression analysis of the CMRGlc values derived from the real and estimated input functions revealed a high correlation (r=0.91). No significant difference was found between the real CMRGlc and that derived from our regression-estimated input function (Student's t test, P>0.05). The proposed REIF method demonstrated good abilities for input function and CMRGlc estimation, and represents a reliable replacement for the blood sampling procedures in FDG-PET quantification. (orig.)

  11. Using a Regression Discontinuity Design to Estimate the Impact of Placement Decisions in Developmental Math

    Science.gov (United States)

    Melguizo, Tatiana; Bos, Johannes M.; Ngo, Federick; Mills, Nicholas; Prather, George

    2016-01-01

    This study evaluates the effectiveness of math placement policies for entering community college students on these students' academic success in math. We estimate the impact of placement decisions by using a discrete-time survival model within a regression discontinuity framework. The primary conclusion that emerges is that initial placement in a…

  12. Longitudinal beta regression models for analyzing health-related quality of life scores over time

    Directory of Open Access Journals (Sweden)

    Hunger Matthias

    2012-09-01

    Full Text Available Abstract Background Health-related quality of life (HRQL has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice. Methods We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy. Results At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors. Conclusions Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.

  13. Satellite rainfall retrieval by logistic regression

    Science.gov (United States)

    Chiu, Long S.

    1986-01-01

    The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.

  14. Applied regression analysis a research tool

    CERN Document Server

    Pantula, Sastry; Dickey, David

    1998-01-01

    Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...

  15. Digital Hydrologic Networks Supporting Applications Related to Spatially Referenced Regression Modeling

    Science.gov (United States)

    Brakebill, J.W.; Wolock, D.M.; Terziotti, S.E.

    2011-01-01

    Digital hydrologic networks depicting surface-water pathways and their associated drainage catchments provide a key component to hydrologic analysis and modeling. Collectively, they form common spatial units that can be used to frame the descriptions of aquatic and watershed processes. In addition, they provide the ability to simulate and route the movement of water and associated constituents throughout the landscape. Digital hydrologic networks have evolved from derivatives of mapping products to detailed, interconnected, spatially referenced networks of water pathways, drainage areas, and stream and watershed characteristics. These properties are important because they enhance the ability to spatially evaluate factors that affect the sources and transport of water-quality constituents at various scales. SPAtially Referenced Regressions On Watershed attributes (SPARROW), a process-based/statistical model, relies on a digital hydrologic network in order to establish relations between quantities of monitored contaminant flux, contaminant sources, and the associated physical characteristics affecting contaminant transport. Digital hydrologic networks modified from the River Reach File (RF1) and National Hydrography Dataset (NHD) geospatial datasets provided frameworks for SPARROW in six regions of the conterminous United States. In addition, characteristics of the modified RF1 were used to update estimates of mean-annual streamflow. This produced more current flow estimates for use in SPARROW modeling. ?? 2011 American Water Resources Association. This article is a U.S. Government work and is in the public domain in the USA.

  16. A fuzzy regression with support vector machine approach to the estimation of horizontal global solar radiation

    International Nuclear Information System (INIS)

    Baser, Furkan; Demirhan, Haydar

    2017-01-01

    Accurate estimation of the amount of horizontal global solar radiation for a particular field is an important input for decision processes in solar radiation investments. In this article, we focus on the estimation of yearly mean daily horizontal global solar radiation by using an approach that utilizes fuzzy regression functions with support vector machine (FRF-SVM). This approach is not seriously affected by outlier observations and does not suffer from the over-fitting problem. To demonstrate the utility of the FRF-SVM approach in the estimation of horizontal global solar radiation, we conduct an empirical study over a dataset collected in Turkey and applied the FRF-SVM approach with several kernel functions. Then, we compare the estimation accuracy of the FRF-SVM approach to an adaptive neuro-fuzzy system and a coplot supported-genetic programming approach. We observe that the FRF-SVM approach with a Gaussian kernel function is not affected by both outliers and over-fitting problem and gives the most accurate estimates of horizontal global solar radiation among the applied approaches. Consequently, the use of hybrid fuzzy functions and support vector machine approaches is found beneficial in long-term forecasting of horizontal global solar radiation over a region with complex climatic and terrestrial characteristics. - Highlights: • A fuzzy regression functions with support vector machines approach is proposed. • The approach is robust against outlier observations and over-fitting problem. • Estimation accuracy of the model is superior to several existent alternatives. • A new solar radiation estimation model is proposed for the region of Turkey. • The model is useful under complex terrestrial and climatic conditions.

  17. Stellar atmospheric parameter estimation using Gaussian process regression

    Science.gov (United States)

    Bu, Yude; Pan, Jingchang

    2015-02-01

    As is well known, it is necessary to derive stellar parameters from massive amounts of spectral data automatically and efficiently. However, in traditional automatic methods such as artificial neural networks (ANNs) and kernel regression (KR), it is often difficult to optimize the algorithm structure and determine the optimal algorithm parameters. Gaussian process regression (GPR) is a recently developed method that has been proven to be capable of overcoming these difficulties. Here we apply GPR to derive stellar atmospheric parameters from spectra. Through evaluating the performance of GPR on Sloan Digital Sky Survey (SDSS) spectra, Medium resolution Isaac Newton Telescope Library of Empirical Spectra (MILES) spectra, ELODIE spectra and the spectra of member stars of galactic globular clusters, we conclude that GPR can derive stellar parameters accurately and precisely, especially when we use data preprocessed with principal component analysis (PCA). We then compare the performance of GPR with that of several widely used regression methods (ANNs, support-vector regression and KR) and find that with GPR it is easier to optimize structures and parameters and more efficient and accurate to extract atmospheric parameters.

  18. Methods for estimating disease transmission rates: Evaluating the precision of Poisson regression and two novel methods

    DEFF Research Database (Denmark)

    Kirkeby, Carsten Thure; Hisham Beshara Halasa, Tariq; Gussmann, Maya Katrin

    2017-01-01

    the transmission rate. We use data from the two simulation models and vary the sampling intervals and the size of the population sampled. We devise two new methods to determine transmission rate, and compare these to the frequently used Poisson regression method in both epidemic and endemic situations. For most...... tested scenarios these new methods perform similar or better than Poisson regression, especially in the case of long sampling intervals. We conclude that transmission rate estimates are easily biased, which is important to take into account when using these rates in simulation models....

  19. Improved regression models for ventilation estimation based on chest and abdomen movements

    International Nuclear Information System (INIS)

    Liu, Shaopeng; Gao, Robert; He, Qingbo; Staudenmayer, John; Freedson, Patty

    2012-01-01

    Non-invasive estimation of minute ventilation is important for quantifying the intensity of physical activity of individuals. In this paper, several improved regression models are presented, based on the measurement of chest and abdomen movements from sensor belts worn by subjects (n = 50) engaged in 14 types of physical activity. Five linear models involving a combination of 11 features were developed, and the effects of different model training approaches and window sizes for computing the features were investigated. The performance of the models was evaluated using experimental data collected during the physical activity protocol. The predicted minute ventilation was compared to the criterion ventilation measured using a bidirectional digital volume transducer housed in a respiratory gas exchange system. The results indicate that the inclusion of breathing frequency and the use of percentile points instead of interdecile ranges over a 60 s window size reduced error by about 43%, when applied to the classical two-degrees-of-freedom model. The mean percentage error of the minute ventilation estimated for all the activities was below 7.5%, verifying reasonably good performance of the models and the applicability of the wearable sensing system for minute ventilation estimation during physical activity. (paper)

  20. Unbalanced Regressions and the Predictive Equation

    DEFF Research Database (Denmark)

    Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

    Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...

  1. Maximum Correntropy Unscented Kalman Filter for Spacecraft Relative State Estimation

    Directory of Open Access Journals (Sweden)

    Xi Liu

    2016-09-01

    Full Text Available A new algorithm called maximum correntropy unscented Kalman filter (MCUKF is proposed and applied to relative state estimation in space communication networks. As is well known, the unscented Kalman filter (UKF provides an efficient tool to solve the non-linear state estimate problem. However, the UKF usually plays well in Gaussian noises. Its performance may deteriorate substantially in the presence of non-Gaussian noises, especially when the measurements are disturbed by some heavy-tailed impulsive noises. By making use of the maximum correntropy criterion (MCC, the proposed algorithm can enhance the robustness of UKF against impulsive noises. In the MCUKF, the unscented transformation (UT is applied to obtain a predicted state estimation and covariance matrix, and a nonlinear regression method with the MCC cost is then used to reformulate the measurement information. Finally, the UT is adopted to the measurement equation to obtain the filter state and covariance matrix. Illustrative examples demonstrate the superior performance of the new algorithm.

  2. COVAR: Computer Program for Multifactor Relative Risks and Tests of Hypotheses Using a Variance-Covariance Matrix from Linear and Log-Linear Regression

    Directory of Open Access Journals (Sweden)

    Leif E. Peterson

    1997-11-01

    Full Text Available A computer program for multifactor relative risks, confidence limits, and tests of hypotheses using regression coefficients and a variance-covariance matrix obtained from a previous additive or multiplicative regression analysis is described in detail. Data used by the program can be stored and input from an external disk-file or entered via the keyboard. The output contains a list of the input data, point estimates of single or joint effects, confidence intervals and tests of hypotheses based on a minimum modified chi-square statistic. Availability of the program is also discussed.

  3. Engineering estimates versus impact evaluation of energy efficiency projects: Regression discontinuity evidence from a case study

    International Nuclear Information System (INIS)

    Lang, Corey; Siler, Matthew

    2013-01-01

    Energy efficiency upgrades have been gaining widespread attention across global channels as a cost-effective approach to addressing energy challenges. The cost-effectiveness of these projects is generally predicted using engineering estimates pre-implementation, often with little ex post analysis of project success. In this paper, for a suite of energy efficiency projects, we directly compare ex ante engineering estimates of energy savings to ex post econometric estimates that use 15-min interval, building-level energy consumption data. In contrast to most prior literature, our econometric results confirm the engineering estimates, even suggesting the engineering estimates were too modest. Further, we find heterogeneous efficiency impacts by time of day, suggesting select efficiency projects can be useful in reducing peak load. - Highlights: • Regression discontinuity used to estimate energy savings from efficiency projects. • Ex post econometric estimates validate ex ante engineering estimates of energy savings. • Select efficiency projects shown to reduce peak load

  4. Output-Only Modal Parameter Recursive Estimation of Time-Varying Structures via a Kernel Ridge Regression FS-TARMA Approach

    Directory of Open Access Journals (Sweden)

    Zhi-Sai Ma

    2017-01-01

    Full Text Available Modal parameter estimation plays an important role in vibration-based damage detection and is worth more attention and investigation, as changes in modal parameters are usually being used as damage indicators. This paper focuses on the problem of output-only modal parameter recursive estimation of time-varying structures based upon parameterized representations of the time-dependent autoregressive moving average (TARMA. A kernel ridge regression functional series TARMA (FS-TARMA recursive identification scheme is proposed and subsequently employed for the modal parameter estimation of a numerical three-degree-of-freedom time-varying structural system and a laboratory time-varying structure consisting of a simply supported beam and a moving mass sliding on it. The proposed method is comparatively assessed against an existing recursive pseudolinear regression FS-TARMA approach via Monte Carlo experiments and shown to be capable of accurately tracking the time-varying dynamics in a recursive manner.

  5. An improved geographically weighted regression model for PM2.5 concentration estimation in large areas

    Science.gov (United States)

    Zhai, Liang; Li, Shuang; Zou, Bin; Sang, Huiyong; Fang, Xin; Xu, Shan

    2018-05-01

    Considering the spatial non-stationary contributions of environment variables to PM2.5 variations, the geographically weighted regression (GWR) modeling method has been using to estimate PM2.5 concentrations widely. However, most of the GWR models in reported studies so far were established based on the screened predictors through pretreatment correlation analysis, and this process might cause the omissions of factors really driving PM2.5 variations. This study therefore developed a best subsets regression (BSR) enhanced principal component analysis-GWR (PCA-GWR) modeling approach to estimate PM2.5 concentration by fully considering all the potential variables' contributions simultaneously. The performance comparison experiment between PCA-GWR and regular GWR was conducted in the Beijing-Tianjin-Hebei (BTH) region over a one-year-period. Results indicated that the PCA-GWR modeling outperforms the regular GWR modeling with obvious higher model fitting- and cross-validation based adjusted R2 and lower RMSE. Meanwhile, the distribution map of PM2.5 concentration from PCA-GWR modeling also clearly depicts more spatial variation details in contrast to the one from regular GWR modeling. It can be concluded that the BSR enhanced PCA-GWR modeling could be a reliable way for effective air pollution concentration estimation in the coming future by involving all the potential predictor variables' contributions to PM2.5 variations.

  6. Regression with Sparse Approximations of Data

    DEFF Research Database (Denmark)

    Noorzad, Pardis; Sturm, Bob L.

    2012-01-01

    We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected...... by a sparse approximation of the point in terms of the regressors. We show SPARROW can be considered a variant of \\(k\\)-nearest neighbors regression (\\(k\\)-NNR), and more generally, local polynomial kernel regression. Unlike \\(k\\)-NNR, however, SPARROW can adapt the number of regressors to use based...

  7. Estimating the Impact of Urbanization on Air Quality in China Using Spatial Regression Models

    Directory of Open Access Journals (Sweden)

    Chuanglin Fang

    2015-11-01

    Full Text Available Urban air pollution is one of the most visible environmental problems to have accompanied China’s rapid urbanization. Based on emission inventory data from 2014, gathered from 289 cities, we used Global and Local Moran’s I to measure the spatial autorrelation of Air Quality Index (AQI values at the city level, and employed Ordinary Least Squares (OLS, Spatial Lag Model (SAR, and Geographically Weighted Regression (GWR to quantitatively estimate the comprehensive impact and spatial variations of China’s urbanization process on air quality. The results show that a significant spatial dependence and heterogeneity existed in AQI values. Regression models revealed urbanization has played an important negative role in determining air quality in Chinese cities. The population, urbanization rate, automobile density, and the proportion of secondary industry were all found to have had a significant influence over air quality. Per capita Gross Domestic Product (GDP and the scale of urban land use, however, failed the significance test at 10% level. The GWR model performed better than global models and the results of GWR modeling show that the relationship between urbanization and air quality was not constant in space. Further, the local parameter estimates suggest significant spatial variation in the impacts of various urbanization factors on air quality.

  8. Descriptor Learning via Supervised Manifold Regularization for Multioutput Regression.

    Science.gov (United States)

    Zhen, Xiantong; Yu, Mengyang; Islam, Ali; Bhaduri, Mousumi; Chan, Ian; Li, Shuo

    2017-09-01

    Multioutput regression has recently shown great ability to solve challenging problems in both computer vision and medical image analysis. However, due to the huge image variability and ambiguity, it is fundamentally challenging to handle the highly complex input-target relationship of multioutput regression, especially with indiscriminate high-dimensional representations. In this paper, we propose a novel supervised descriptor learning (SDL) algorithm for multioutput regression, which can establish discriminative and compact feature representations to improve the multivariate estimation performance. The SDL is formulated as generalized low-rank approximations of matrices with a supervised manifold regularization. The SDL is able to simultaneously extract discriminative features closely related to multivariate targets and remove irrelevant and redundant information by transforming raw features into a new low-dimensional space aligned to targets. The achieved discriminative while compact descriptor largely reduces the variability and ambiguity for multioutput regression, which enables more accurate and efficient multivariate estimation. We conduct extensive evaluation of the proposed SDL on both synthetic data and real-world multioutput regression tasks for both computer vision and medical image analysis. Experimental results have shown that the proposed SDL can achieve high multivariate estimation accuracy on all tasks and largely outperforms the algorithms in the state of the arts. Our method establishes a novel SDL framework for multioutput regression, which can be widely used to boost the performance in different applications.

  9. Vector regression introduced

    Directory of Open Access Journals (Sweden)

    Mok Tik

    2014-06-01

    Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.

  10. Estimating one's own and one's relatives' multiple intelligence: a study from Argentina.

    Science.gov (United States)

    Furnham, Adrian; Chamorro-Premuzic, Tomas

    2005-05-01

    Participants from Argentina (N = 217) estimated their own, their partner's, their parents' and their grandparents' overall and multiple intelligences. The Argentinean data showed that men gave higher overall estimates than women (M = 110.4 vs. 105.1) as well as higher estimates on mathematical and spatial intelligence. Participants thought themselves slightly less bright than their fathers (2 IQ points) but brighter than their mothers (6 points), their grandfathers (8 points), but especially their grandmothers (11 points). Regressions showed that participants thought verbal and mathematical IQ to be the best predictors of overall IQ. Results were broadly in agreement with other studies in the area. A comparison was also made with British data using the same questionnaire. British participants tended to give significantly higher self-estimates than for relatives, though the pattern was generally similar. Results are discussed in terms of the studies in the field.

  11. A simulation study on Bayesian Ridge regression models for several collinearity levels

    Science.gov (United States)

    Efendi, Achmad; Effrihan

    2017-12-01

    When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.

  12. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

  13. Kernel PLS Estimation of Single-trial Event-related Potentials

    Science.gov (United States)

    Rosipal, Roman; Trejo, Leonard J.

    2004-01-01

    Nonlinear kernel partial least squaes (KPLS) regressior, is a novel smoothing approach to nonparametric regression curve fitting. We have developed a KPLS approach to the estimation of single-trial event related potentials (ERPs). For improved accuracy of estimation, we also developed a local KPLS method for situations in which there exists prior knowledge about the approximate latency of individual ERP components. To assess the utility of the KPLS approach, we compared non-local KPLS and local KPLS smoothing with other nonparametric signal processing and smoothing methods. In particular, we examined wavelet denoising, smoothing splines, and localized smoothing splines. We applied these methods to the estimation of simulated mixtures of human ERPs and ongoing electroencephalogram (EEG) activity using a dipole simulator (BESA). In this scenario we considered ongoing EEG to represent spatially and temporally correlated noise added to the ERPs. This simulation provided a reasonable but simplified model of real-world ERP measurements. For estimation of the simulated single-trial ERPs, local KPLS provided a level of accuracy that was comparable with or better than the other methods. We also applied the local KPLS method to the estimation of human ERPs recorded in an experiment on co,onitive fatigue. For these data, the local KPLS method provided a clear improvement in visualization of single-trial ERPs as well as their averages. The local KPLS method may serve as a new alternative to the estimation of single-trial ERPs and improvement of ERP averages.

  14. Bias and efficiency loss in regression estimates due to duplicated observations: a Monte Carlo simulation

    Directory of Open Access Journals (Sweden)

    Francesco Sarracino

    2017-04-01

    Full Text Available Recent studies documented that survey data contain duplicate records. We assess how duplicate records affect regression estimates, and we evaluate the effectiveness of solutions to deal with duplicate records. Results show that the chances of obtaining unbiased estimates when data contain 40 doublets (about 5% of the sample range between 3.5% and 11.5% depending on the distribution of duplicates. If 7 quintuplets are present in the data (2% of the sample, then the probability of obtaining biased estimates ranges between 11% and 20%. Weighting the duplicate records by the inverse of their multiplicity, or dropping superfluous duplicates outperform other solutions in all considered scenarios. Our results illustrate the risk of using data in presence of duplicate records and call for further research on strategies to analyze affected data.

  15. Beyond the mean estimate: a quantile regression analysis of inequalities in educational outcomes using INVALSI survey data

    Directory of Open Access Journals (Sweden)

    Antonella Costanzo

    2017-09-01

    Full Text Available Abstract The number of studies addressing issues of inequality in educational outcomes using cognitive achievement tests and variables from large-scale assessment data has increased. Here the value of using a quantile regression approach is compared with a classical regression analysis approach to study the relationships between educational outcomes and likely predictor variables. Italian primary school data from INVALSI large-scale assessments were analyzed using both quantile and standard regression approaches. Mathematics and reading scores were regressed on students' characteristics and geographical variables selected for their theoretical and policy relevance. The results demonstrated that, in Italy, the role of gender and immigrant status varied across the entire conditional distribution of students’ performance. Analogous results emerged pertaining to the difference in students’ performance across Italian geographic areas. These findings suggest that quantile regression analysis is a useful tool to explore the determinants and mechanisms of inequality in educational outcomes. A proper interpretation of quantile estimates may enable teachers to identify effective learning activities and help policymakers to develop tailored programs that increase equity in education.

  16. Estimating leaf photosynthetic pigments information by stepwise multiple linear regression analysis and a leaf optical model

    Science.gov (United States)

    Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei

    2014-10-01

    Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.

  17. Estimating the Influence of Accident Related Factors on Motorcycle Fatal Accidents using Logistic Regression (Case Study: Denpasar-Bali

    Directory of Open Access Journals (Sweden)

    Wedagama D.M.P.

    2010-01-01

    Full Text Available In Denpasar the capital of Bali Province, motorcycle accident contributes to about 80% of total road accidents. Out of those motorcycle accidents, 32% are fatal accidents. This study investigates the influence of accident related factors on motorcycle fatal accidents in the city of Denpasar during period 2006-2008 using a logistic regression model. The study found that the fatality of collision with pedestrians and right angle accidents were respectively about 0.44 and 0.40 times lower than collision with other vehicles and accidents due to other factors. In contrast, the odds that a motorcycle accident will be fatal due to collision with heavy and light vehicles were 1.67 times more likely than with other motorcycles. Collision with pedestrians, right angle accidents, and heavy and light vehicles were respectively accounted for 31%, 29%, and 63% of motorcycle fatal accidents.

  18. Estimating the Impact of Urbanization on Air Quality in China Using Spatial Regression Models

    OpenAIRE

    Fang, Chuanglin; Liu, Haimeng; Li, Guangdong; Sun, Dongqi; Miao, Zhuang

    2015-01-01

    Urban air pollution is one of the most visible environmental problems to have accompanied China’s rapid urbanization. Based on emission inventory data from 2014, gathered from 289 cities, we used Global and Local Moran’s I to measure the spatial autorrelation of Air Quality Index (AQI) values at the city level, and employed Ordinary Least Squares (OLS), Spatial Lag Model (SAR), and Geographically Weighted Regression (GWR) to quantitatively estimate the comprehensive impact and spatial variati...

  19. Polylinear regression analysis in radiochemistry

    International Nuclear Information System (INIS)

    Kopyrin, A.A.; Terent'eva, T.N.; Khramov, N.N.

    1995-01-01

    A number of radiochemical problems have been formulated in the framework of polylinear regression analysis, which permits the use of conventional mathematical methods for their solution. The authors have considered features of the use of polylinear regression analysis for estimating the contributions of various sources to the atmospheric pollution, for studying irradiated nuclear fuel, for estimating concentrations from spectral data, for measuring neutron fields of a nuclear reactor, for estimating crystal lattice parameters from X-ray diffraction patterns, for interpreting data of X-ray fluorescence analysis, for estimating complex formation constants, and for analyzing results of radiometric measurements. The problem of estimating the target parameters can be incorrect at certain properties of the system under study. The authors showed the possibility of regularization by adding a fictitious set of data open-quotes obtainedclose quotes from the orthogonal design. To estimate only a part of the parameters under consideration, the authors used incomplete rank models. In this case, it is necessary to take into account the possibility of confounding estimates. An algorithm for evaluating the degree of confounding is presented which is realized using standard software or regression analysis

  20. Nonparametric Estimation of Regression Parameters in Measurement Error Models

    Czech Academy of Sciences Publication Activity Database

    Ehsanes Saleh, A.K.M.D.; Picek, J.; Kalina, Jan

    2009-01-01

    Roč. 67, č. 2 (2009), s. 177-200 ISSN 0026-1424 Grant - others:GA AV ČR(CZ) IAA101120801; GA MŠk(CZ) LC06024 Institutional research plan: CEZ:AV0Z10300504 Keywords : asymptotic relative efficiency(ARE) * asymptotic theory * emaculate mode * Me model * R-estimation * Reliabilty ratio(RR) Subject RIV: BB - Applied Statistics, Operational Research

  1. Mathematical models for estimating earthquake casualties and damage cost through regression analysis using matrices

    International Nuclear Information System (INIS)

    Urrutia, J D; Bautista, L A; Baccay, E B

    2014-01-01

    The aim of this study was to develop mathematical models for estimating earthquake casualties such as death, number of injured persons, affected families and total cost of damage. To quantify the direct damages from earthquakes to human beings and properties given the magnitude, intensity, depth of focus, location of epicentre and time duration, the regression models were made. The researchers formulated models through regression analysis using matrices and used α = 0.01. The study considered thirty destructive earthquakes that hit the Philippines from the inclusive years 1968 to 2012. Relevant data about these said earthquakes were obtained from Philippine Institute of Volcanology and Seismology. Data on damages and casualties were gathered from the records of National Disaster Risk Reduction and Management Council. This study will be of great value in emergency planning, initiating and updating programs for earthquake hazard reduction in the Philippines, which is an earthquake-prone country.

  2. Regression models to estimate real-time concentrations of selected constituents in two tributaries to Lake Houston near Houston, Texas, 2005-07

    Science.gov (United States)

    Oden, Timothy D.; Asquith, William H.; Milburn, Matthew S.

    2009-01-01

    In December 2005, the U.S. Geological Survey in cooperation with the City of Houston, Texas, began collecting discrete water-quality samples for nutrients, total organic carbon, bacteria (total coliform and Escherichia coli), atrazine, and suspended sediment at two U.S. Geological Survey streamflow-gaging stations upstream from Lake Houston near Houston (08068500 Spring Creek near Spring, Texas, and 08070200 East Fork San Jacinto River near New Caney, Texas). The data from the discrete water-quality samples collected during 2005-07, in conjunction with monitored real-time data already being collected - physical properties (specific conductance, pH, water temperature, turbidity, and dissolved oxygen), streamflow, and rainfall - were used to develop regression models for predicting water-quality constituent concentrations for inflows to Lake Houston. Rainfall data were obtained from a rain gage monitored by Harris County Homeland Security and Emergency Management and colocated with the Spring Creek station. The leaps and bounds algorithm was used to find the best subsets of possible regression models (minimum residual sum of squares for a given number of variables). The potential explanatory or predictive variables included discharge (streamflow), specific conductance, pH, water temperature, turbidity, dissolved oxygen, rainfall, and time (to account for seasonal variations inherent in some water-quality data). The response variables at each site were nitrite plus nitrate nitrogen, total phosphorus, organic carbon, Escherichia coli, atrazine, and suspended sediment. The explanatory variables provide easily measured quantities as a means to estimate concentrations of the various constituents under investigation, with accompanying estimates of measurement uncertainty. Each regression equation can be used to estimate concentrations of a given constituent in real time. In conjunction with estimated concentrations, constituent loads were estimated by multiplying the

  3. Genetic Parameters for Body condition score, Body weigth, Milk yield and Fertility estimated using random regression models

    NARCIS (Netherlands)

    Berry, D.P.; Buckley, F.; Dillon, P.; Evans, R.D.; Rath, M.; Veerkamp, R.F.

    2003-01-01

    Genetic (co)variances between body condition score (BCS), body weight (BW), milk yield, and fertility were estimated using a random regression animal model extended to multivariate analysis. The data analyzed included 81,313 BCS observations, 91,937 BW observations, and 100,458 milk test-day yields

  4. Regression to Causality : Regression-style presentation influences causal attribution

    DEFF Research Database (Denmark)

    Bordacconi, Mats Joe; Larsen, Martin Vinæs

    2014-01-01

    of equivalent results presented as either regression models or as a test of two sample means. Our experiment shows that the subjects who were presented with results as estimates from a regression model were more inclined to interpret these results causally. Our experiment implies that scholars using regression...... models – one of the primary vehicles for analyzing statistical results in political science – encourage causal interpretation. Specifically, we demonstrate that presenting observational results in a regression model, rather than as a simple comparison of means, makes causal interpretation of the results...... more likely. Our experiment drew on a sample of 235 university students from three different social science degree programs (political science, sociology and economics), all of whom had received substantial training in statistics. The subjects were asked to compare and evaluate the validity...

  5. An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis

    Directory of Open Access Journals (Sweden)

    Wen-Tsao Pan

    2016-01-01

    Full Text Available Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quantile of satisfaction level. The result of the quantile regression analysis provided a bank manager with information to formulate policies to further promote satisfaction of the customers at different quantiles of satisfaction level. We also compared the prediction accuracies of the regression models at different quantiles. The experiment result showed that, among the seven quantile regression models, the median regression model has the best performance in terms of RMSE, RTIC, and CE performance measures.

  6. Logistic regression to estimate the welfare of broiler breeders in relation to environmental and behavioral variables Regressão logística para estimativa do bem-estar de matrizes pesadas em função de variáveis comportamentais e ambientais

    Directory of Open Access Journals (Sweden)

    Danilo F Pereira

    2011-02-01

    Full Text Available The increasing demand of consumer markets for the welfare of birds in poultry house has motivated many scientific researches to monitor and classify the welfare according to the production environment. Given the complexity between the birds and the environment of the aviary, the correct interpretation of the conduct becomes an important way to estimate the welfare of these birds. This study obtained multiple logistic regression models with capacity of estimating the welfare of broiler breeders in relation to the environment of the aviaries and behaviors expressed by the birds. In the experiment, were observed several behaviors expressed by breeders housed in a climatic chamber under controlled temperatures and three different ammonia concentrations from the air monitored daily. From the analysis of the data it was obtained two logistic regression models, of which the first model uses a value of ammonia concentration measured by unit and the second model uses a binary value to classify the ammonia concentration that is assigned by a person through his olfactory perception. The analysis showed that both models classified the broiler breeder's welfare successfully.As crescentes demandas e exigências dos mercados consumidores pelo bem-estar das aves nos aviários têm motivado diversas pesquisas científicas a monitorar e a classificar o bem-estar em função do ambiente de criação. Diante da complexidade com que as aves interagem com o ambiente do aviário, a correta interpretação dos comportamentos torna-se uma importante maneira para estimar o bem-estar dessas aves. Este trabalho criou modelos de regressão logística múltipla capazes de estimar o bem-estar de matrizes pesadas em função do ambiente do aviário e dos comportamentos expressos pelas aves. No experimento, foram observados diversos comportamentos expressos por matrizes pesadas alojadas em câmara climática sob três temperaturas controladas e diferentes concentrações de am

  7. Effect of clinical response to active drugs and placebo on antipsychotics and mood stabilizers relative efficacy for bipolar depression and mania: A meta-regression analysis.

    Science.gov (United States)

    Bartoli, Francesco; Clerici, Massimo; Di Brita, Carmen; Riboldi, Ilaria; Crocamo, Cristina; Carrà, Giuseppe

    2018-04-01

    Randomised placebo-controlled trials investigating treatments for bipolar disorder have been hampered by wide variations of active drugs and placebo clinical response rates. It is important to estimate whether the active drug or placebo response has a greater influence in determining the relative efficacy of drugs for psychosis (antipsychotics) and relapse prevention (mood stabilisers) for bipolar depression and mania. We identified 53 randomised, placebo-controlled trials assessing antipsychotic or mood stabiliser monotherapy ('active drugs') for bipolar depression or mania. We carried out random-effects meta-regressions, estimating the influence of active drugs and placebo response rates on treatment relative efficacy. Meta-regressions showed that treatment relative efficacy for bipolar mania was influenced by the magnitude of clinical response to active drugs ( p=0.002), but not to placebo ( p=0.60). On the other hand, treatment relative efficacy for bipolar depression was influenced by response to placebo ( p=0.047), but not to active drugs ( p=0.98). Despite several limitations, our unexpected findings showed that antipsychotics / mood stabilisers relative efficacy for bipolar depression seems unrelated to active drugs response rates, depending only on clinical response to placebo. Future research should explore strategies to reduce placebo-related issues in randomised, placebo-controlled trials for bipolar depression.

  8. Regression: A Bibliography.

    Science.gov (United States)

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  9. Estimating Penetration Resistance in Agricultural Soils of Ardabil Plain Using Artificial Neural Network and Regression Methods

    Directory of Open Access Journals (Sweden)

    Gholam Reza Sheykhzadeh

    2017-02-01

    Full Text Available Introduction: Penetration resistance is one of the criteria for evaluating soil compaction. It correlates with several soil properties such as vehicle trafficability, resistance to root penetration, seedling emergence, and soil compaction by farm machinery. Direct measurement of penetration resistance is time consuming and difficult because of high temporal and spatial variability. Therefore, many different regressions and artificial neural network pedotransfer functions have been proposed to estimate penetration resistance from readily available soil variables such as particle size distribution, bulk density (Db and gravimetric water content (θm. The lands of Ardabil Province are one of the main production regions of potato in Iran, thus, obtaining the soil penetration resistance in these regions help with the management of potato production. The objective of this research was to derive pedotransfer functions by using regression and artificial neural network to predict penetration resistance from some soil variations in the agricultural soils of Ardabil plain and to compare the performance of artificial neural network with regression models. Materials and methods: Disturbed and undisturbed soil samples (n= 105 were systematically taken from 0-10 cm soil depth with nearly 3000 m distance in the agricultural lands of the Ardabil plain ((lat 38°15' to 38°40' N, long 48°16' to 48°61' E. The contents of sand, silt and clay (hydrometer method, CaCO3 (titration method, bulk density (cylinder method, particle density (Dp (pychnometer method, organic carbon (wet oxidation method, total porosity(calculating from Db and Dp, saturated (θs and field soil water (θf using the gravimetric method were measured in the laboratory. Mean geometric diameter (dg and standard deviation (σg of soil particles were computed using the percentages of sand, silt and clay. Penetration resistance was measured in situ using cone penetrometer (analog model at 10

  10. An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models

    International Nuclear Information System (INIS)

    Harlim, John; Mahdi, Adam; Majda, Andrew J.

    2014-01-01

    A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partial noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model

  11. Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression

    Science.gov (United States)

    Ndiaye, Eugene; Fercoq, Olivier; Gramfort, Alexandre; Leclère, Vincent; Salmon, Joseph

    2017-10-01

    In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider ℓ 1 penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for uncertainty quantification. In this work, after illustrating numerical difficulties for the Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expensive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features.

  12. Sparse Regression by Projection and Sparse Discriminant Analysis

    KAUST Repository

    Qi, Xin

    2015-04-03

    © 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.

  13. Nonparametric Mixture of Regression Models.

    Science.gov (United States)

    Huang, Mian; Li, Runze; Wang, Shaoli

    2013-07-01

    Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.

  14. Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

    Science.gov (United States)

    Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

    2006-11-01

    We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.

  15. Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.

    Science.gov (United States)

    Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H

    2016-01-01

    Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.

  16. Integrating address geocoding, land use regression, and spatiotemporal geostatistical estimation for groundwater tetrachloroethylene.

    Science.gov (United States)

    Messier, Kyle P; Akita, Yasuyuki; Serre, Marc L

    2012-03-06

    Geographic information systems (GIS) based techniques are cost-effective and efficient methods used by state agencies and epidemiology researchers for estimating concentration and exposure. However, budget limitations have made statewide assessments of contamination difficult, especially in groundwater media. Many studies have implemented address geocoding, land use regression, and geostatistics independently, but this is the first to examine the benefits of integrating these GIS techniques to address the need of statewide exposure assessments. A novel framework for concentration exposure is introduced that integrates address geocoding, land use regression (LUR), below detect data modeling, and Bayesian Maximum Entropy (BME). A LUR model was developed for tetrachloroethylene that accounts for point sources and flow direction. We then integrate the LUR model into the BME method as a mean trend while also modeling below detects data as a truncated Gaussian probability distribution function. We increase available PCE data 4.7 times from previously available databases through multistage geocoding. The LUR model shows significant influence of dry cleaners at short ranges. The integration of the LUR model as mean trend in BME results in a 7.5% decrease in cross validation mean square error compared to BME with a constant mean trend.

  17. Regression analysis and transfer function in estimating the parameters of central pulse waves from brachial pulse wave.

    Science.gov (United States)

    Chai Rui; Li Si-Man; Xu Li-Sheng; Yao Yang; Hao Li-Ling

    2017-07-01

    This study mainly analyzed the parameters such as ascending branch slope (A_slope), dicrotic notch height (Hn), diastolic area (Ad) and systolic area (As) diastolic blood pressure (DBP), systolic blood pressure (SBP), pulse pressure (PP), subendocardial viability ratio (SEVR), waveform parameter (k), stroke volume (SV), cardiac output (CO) and peripheral resistance (RS) of central pulse wave invasively and non-invasively measured. These parameters extracted from the central pulse wave invasively measured were compared with the parameters measured from the brachial pulse waves by a regression model and a transfer function model. The accuracy of the parameters which were estimated by the regression model and the transfer function model was compared too. Our findings showed that in addition to the k value, the above parameters of the central pulse wave and the brachial pulse wave invasively measured had positive correlation. Both the regression model parameters including A_slope, DBP, SEVR and the transfer function model parameters had good consistency with the parameters invasively measured, and they had the same effect of consistency. The regression equations of the three parameters were expressed by Y'=a+bx. The SBP, PP, SV, CO of central pulse wave could be calculated through the regression model, but their accuracies were worse than that of transfer function model.

  18. LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA

    Directory of Open Access Journals (Sweden)

    Ersin Yılmaz

    2016-05-01

    Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then  we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for  the estimation of the model. With the weights regression model  will be consistent and unbiased with that.   And also there is a method for the censored data that is a semi parametric regression and this method also give  useful results  for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.

  19. Regression: The Apple Does Not Fall Far From the Tree.

    Science.gov (United States)

    Vetter, Thomas R; Schober, Patrick

    2018-05-15

    Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.

  20. Functional data analysis of generalized regression quantiles

    KAUST Repository

    Guo, Mengmeng; Zhou, Lan; Huang, Jianhua Z.; Hä rdle, Wolfgang Karl

    2013-01-01

    Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.

  1. Functional data analysis of generalized regression quantiles

    KAUST Repository

    Guo, Mengmeng

    2013-11-05

    Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.

  2. Estimating the temporal distribution of exposure-related cancers

    International Nuclear Information System (INIS)

    Carter, R.L.; Sposto, R.; Preston, D.L.

    1993-09-01

    The temporal distribution of exposure-related cancers is relevant to the study of carcinogenic mechanisms. Statistical methods for extracting pertinent information from time-to-tumor data, however, are not well developed. Separation of incidence from 'latency' and the contamination of background cases are two problems. In this paper, we present methods for estimating both the conditional distribution given exposure-related cancers observed during the study period and the unconditional distribution. The methods adjust for confounding influences of background cases and the relationship between time to tumor and incidence. Two alternative methods are proposed. The first is based on a structured, theoretically derived model and produces direct inferences concerning the distribution of interest but often requires more-specialized software. The second relies on conventional modeling of incidence and is implemented through readily available, easily used computer software. Inferences concerning the effects of radiation dose and other covariates, however, are not always obtainable directly. We present three examples to illustrate the use of these two methods and suggest criteria for choosing between them. The first approach was used, with a log-logistic specification of the distribution of interest, to analyze times to bone sarcoma among a group of German patients injected with 224 Ra. Similarly, a log-logistic specification was used in the analysis of time to chronic myelogenous leukemias among male atomic-bomb survivors. We used the alternative approach, involving conventional modeling, to estimate the conditional distribution of exposure-related acute myelogenous leukemias among male atomic-bomb survivors, given occurrence between 1 October 1950 and 31 December 1985. All analyses were performed using Poisson regression methods for analyzing grouped survival data. (J.P.N.)

  3. Forecasting exchange rates: a robust regression approach

    OpenAIRE

    Preminger, Arie; Franck, Raphael

    2005-01-01

    The least squares estimation method as well as other ordinary estimation method for regression models can be severely affected by a small number of outliers, thus providing poor out-of-sample forecasts. This paper suggests a robust regression approach, based on the S-estimation method, to construct forecasting models that are less sensitive to data contamination by outliers. A robust linear autoregressive (RAR) and a robust neural network (RNN) models are estimated to study the predictabil...

  4. Moderation analysis using a two-level regression model.

    Science.gov (United States)

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  5. Regression calibration with more surrogates than mismeasured variables

    KAUST Repository

    Kipnis, Victor

    2012-06-29

    In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449-461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two-stage method. We also discuss extensions to different data structures.

  6. Regression calibration with more surrogates than mismeasured variables

    KAUST Repository

    Kipnis, Victor; Midthune, Douglas; Freedman, Laurence S.; Carroll, Raymond J.

    2012-01-01

    In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449-461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two-stage method. We also discuss extensions to different data structures.

  7. Quantile Regression With Measurement Error

    KAUST Repository

    Wei, Ying

    2009-08-27

    Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. © 2009 American Statistical Association.

  8. Time-varying effect moderation using the structural nested mean model: estimation using inverse-weighted regression with residuals

    Science.gov (United States)

    Almirall, Daniel; Griffin, Beth Ann; McCaffrey, Daniel F.; Ramchand, Rajeev; Yuen, Robert A.; Murphy, Susan A.

    2014-01-01

    This article considers the problem of examining time-varying causal effect moderation using observational, longitudinal data in which treatment, candidate moderators, and possible confounders are time varying. The structural nested mean model (SNMM) is used to specify the moderated time-varying causal effects of interest in a conditional mean model for a continuous response given time-varying treatments and moderators. We present an easy-to-use estimator of the SNMM that combines an existing regression-with-residuals (RR) approach with an inverse-probability-of-treatment weighting (IPTW) strategy. The RR approach has been shown to identify the moderated time-varying causal effects if the time-varying moderators are also the sole time-varying confounders. The proposed IPTW+RR approach provides estimators of the moderated time-varying causal effects in the SNMM in the presence of an additional, auxiliary set of known and measured time-varying confounders. We use a small simulation experiment to compare IPTW+RR versus the traditional regression approach and to compare small and large sample properties of asymptotic versus bootstrap estimators of the standard errors for the IPTW+RR approach. This article clarifies the distinction between time-varying moderators and time-varying confounders. We illustrate the methodology in a case study to assess if time-varying substance use moderates treatment effects on future substance use. PMID:23873437

  9. An integrated fuzzy regression algorithm for energy consumption estimation with non-stationary data: A case study of Iran

    Energy Technology Data Exchange (ETDEWEB)

    Azadeh, A; Seraj, O [Department of Industrial Engineering and Research Institute of Energy Management and Planning, Center of Excellence for Intelligent-Based Experimental Mechanics, College of Engineering, University of Tehran, P.O. Box 11365-4563 (Iran); Saberi, M [Department of Industrial Engineering, University of Tafresh (Iran); Institute for Digital Ecosystems and Business Intelligence, Curtin University of Technology, Perth (Australia)

    2010-06-15

    This study presents an integrated fuzzy regression and time series framework to estimate and predict electricity demand for seasonal and monthly changes in electricity consumption especially in developing countries such as China and Iran with non-stationary data. Furthermore, it is difficult to model uncertain behavior of energy consumption with only conventional fuzzy regression (FR) or time series and the integrated algorithm could be an ideal substitute for such cases. At First, preferred Time series model is selected from linear or nonlinear models. For this, after selecting preferred Auto Regression Moving Average (ARMA) model, Mcleod-Li test is applied to determine nonlinearity condition. When, nonlinearity condition is satisfied, the preferred nonlinear model is selected and defined as preferred time series model. At last, the preferred model from fuzzy regression and time series model is selected by the Granger-Newbold. Also, the impact of data preprocessing on the fuzzy regression performance is considered. Monthly electricity consumption of Iran from March 1994 to January 2005 is considered as the case of this study. The superiority of the proposed algorithm is shown by comparing its results with other intelligent tools such as Genetic Algorithm (GA) and Artificial Neural Network (ANN). (author)

  10. Estimation of Relative Economic Weights of Hanwoo Carcass Traits Based on Carcass Market Price

    Science.gov (United States)

    Choy, Yun Ho; Park, Byoung Ho; Choi, Tae Jung; Choi, Jae Gwan; Cho, Kwang Hyun; Lee, Seung Soo; Choi, You Lim; Koh, Kyung Chul; Kim, Hyo Sun

    2012-01-01

    The objective of this study was to estimate economic weights of Hanwoo carcass traits that can be used to build economic selection indexes for selection of seedstocks. Data from carcass measures for determining beef yield and quality grades were collected and provided by the Korean Institute for Animal Products Quality Evaluation (KAPE). Out of 1,556,971 records, 476,430 records collected from 13 abattoirs from 2008 to 2010 after deletion of outlying observations were used to estimate relative economic weights of bid price per kg carcass weight on cold carcass weight (CW), eye muscle area (EMA), backfat thickness (BF) and marbling score (MS) and the phenotypic relationships among component traits. Price of carcass tended to increase linearly as yield grades or quality grades, in marginal or in combination, increased. Partial regression coefficients for MS, EMA, BF, and for CW in original scales were +948.5 won/score, +27.3 won/cm2, −95.2 won/mm and +7.3 won/kg when all three sex categories were taken into account. Among four grade determining traits, relative economic weight of MS was the greatest. Variations in partial regression coefficients by sex categories were great but the trends in relative weights for each carcass measures were similar. Relative economic weights of four traits in integer values when standardized measures were fit into covariance model were +4:+1:−1:+1 for MS:EMA:BF:CW. Further research is required to account for the cost of production per unit carcass weight or per unit production under different economic situations. PMID:25049531

  11. Comparison of multiple linear regression, partial least squares and artificial neural networks for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids.

    Science.gov (United States)

    Fragkaki, A G; Farmaki, E; Thomaidis, N; Tsantili-Kakoulidou, A; Angelis, Y S; Koupparis, M; Georgakopoulos, C

    2012-09-21

    The comparison among different modelling techniques, such as multiple linear regression, partial least squares and artificial neural networks, has been performed in order to construct and evaluate models for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids. The performance of the quantitative structure-retention relationship study, using the multiple linear regression and partial least squares techniques, has been previously conducted. In the present study, artificial neural networks models were constructed and used for the prediction of relative retention times of anabolic androgenic steroids, while their efficiency is compared with that of the models derived from the multiple linear regression and partial least squares techniques. For overall ranking of the models, a novel procedure [Trends Anal. Chem. 29 (2010) 101-109] based on sum of ranking differences was applied, which permits the best model to be selected. The suggested models are considered useful for the estimation of relative retention times of designer steroids for which no analytical data are available. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. Bivariate least squares linear regression: Towards a unified analytic formalism. I. Functional models

    Science.gov (United States)

    Caimmi, R.

    2011-08-01

    Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both

  13. Impact of regression methods on improved effects of soil structure on soil water retention estimates

    Science.gov (United States)

    Nguyen, Phuong Minh; De Pue, Jan; Le, Khoa Van; Cornelis, Wim

    2015-06-01

    Increasing the accuracy of pedotransfer functions (PTFs), an indirect method for predicting non-readily available soil features such as soil water retention characteristics (SWRC), is of crucial importance for large scale agro-hydrological modeling. Adding significant predictors (i.e., soil structure), and implementing more flexible regression algorithms are among the main strategies of PTFs improvement. The aim of this study was to investigate whether the improved effect of categorical soil structure information on estimating soil-water content at various matric potentials, which has been reported in literature, could be enduringly captured by regression techniques other than the usually applied linear regression. Two data mining techniques, i.e., Support Vector Machines (SVM), and k-Nearest Neighbors (kNN), which have been recently introduced as promising tools for PTF development, were utilized to test if the incorporation of soil structure will improve PTF's accuracy under a context of rather limited training data. The results show that incorporating descriptive soil structure information, i.e., massive, structured and structureless, as grouping criterion can improve the accuracy of PTFs derived by SVM approach in the range of matric potential of -6 to -33 kPa (average RMSE decreased up to 0.005 m3 m-3 after grouping, depending on matric potentials). The improvement was primarily attributed to the outperformance of SVM-PTFs calibrated on structureless soils. No improvement was obtained with kNN technique, at least not in our study in which the data set became limited in size after grouping. Since there is an impact of regression techniques on the improved effect of incorporating qualitative soil structure information, selecting a proper technique will help to maximize the combined influence of flexible regression algorithms and soil structure information on PTF accuracy.

  14. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    Science.gov (United States)

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  15. Robust mislabel logistic regression without modeling mislabel probabilities.

    Science.gov (United States)

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  16. Approximate median regression for complex survey data with skewed response.

    Science.gov (United States)

    Fraser, Raphael André; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett M; Pan, Yi

    2016-12-01

    The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. © 2016, The International Biometric Society.

  17. Quantile regression theory and applications

    CERN Document Server

    Davino, Cristina; Vistocco, Domenico

    2013-01-01

    A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and

  18. Relative performances of artificial neural network and regression mapping tools in evaluation of spinal loads and muscle forces during static lifting.

    Science.gov (United States)

    Arjmand, N; Ekrami, O; Shirazi-Adl, A; Plamondon, A; Parnianpour, M

    2013-05-31

    Two artificial neural networks (ANNs) are constructed, trained, and tested to map inputs of a complex trunk finite element (FE) model to its outputs for spinal loads and muscle forces. Five input variables (thorax flexion angle, load magnitude, its anterior and lateral positions, load handling technique, i.e., one- or two-handed static lifting) and four model outputs (L4-L5 and L5-S1 disc compression and anterior-posterior shear forces) for spinal loads and 76 model outputs (forces in individual trunk muscles) are considered. Moreover, full quadratic regression equations mapping input-outputs of the model developed here for muscle forces and previously for spine loads are used to compare the relative accuracy of these two mapping tools (ANN and regression equations). Results indicate that the ANNs are more accurate in mapping input-output relationships of the FE model (RMSE= 20.7 N for spinal loads and RMSE= 4.7 N for muscle forces) as compared to regression equations (RMSE= 120.4 N for spinal loads and RMSE=43.2 N for muscle forces). Quadratic regression equations map up to second order variations of outputs with inputs while ANNs capture higher order variations too. Despite satisfactory achievement in estimating overall muscle forces by the ANN, some inadequacies are noted including assigning force to antagonistic muscles with no activity in the optimization algorithm of the FE model or predicting slightly different forces in bilateral pair muscles in symmetric lifting activities. Using these user-friendly tools spine loads and trunk muscle forces during symmetric and asymmetric static lifts can be easily estimated. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Using Multiple and Logistic Regression to Estimate the Median WillCost and Probability of Cost and Schedule Overrun for Program Managers

    Science.gov (United States)

    2017-03-23

    Logistic Regression to Estimate the Median Will-Cost and Probability of Cost and Schedule Overrun for Program Managers Ryan C. Trudelle, B.S...not the other. We are able to give logistic regression models to program managers that identify several program characteristics for either...considered acceptable. We recommend the use of our logistic models as a tool to manage a portfolio of programs in order to gain potential elusive

  20. On the use of a regression model for trend estimates from ground-based atmospheric observations in the Southern hemisphere

    CSIR Research Space (South Africa)

    Bencherif, H

    2010-09-01

    Full Text Available The present reports on the use of a multi-regression model adapted at Reunion University for temperature and ozone trend estimates. Depending on the location of the observing site, the studied geophysical signal is broken down in form of a sum...

  1. Refinement of regression models to estimate real-time concentrations of contaminants in the Menomonee River drainage basin, southeast Wisconsin, 2008-11

    Science.gov (United States)

    Baldwin, Austin K.; Robertson, Dale M.; Saad, David A.; Magruder, Christopher

    2013-01-01

    In 2008, the U.S. Geological Survey and the Milwaukee Metropolitan Sewerage District initiated a study to develop regression models to estimate real-time concentrations and loads of chloride, suspended solids, phosphorus, and bacteria in streams near Milwaukee, Wisconsin. To collect monitoring data for calibration of models, water-quality sensors and automated samplers were installed at six sites in the Menomonee River drainage basin. The sensors continuously measured four potential explanatory variables: water temperature, specific conductance, dissolved oxygen, and turbidity. Discrete water-quality samples were collected and analyzed for five response variables: chloride, total suspended solids, total phosphorus, Escherichia coli bacteria, and fecal coliform bacteria. Using the first year of data, regression models were developed to continuously estimate the response variables on the basis of the continuously measured explanatory variables. Those models were published in a previous report. In this report, those models are refined using 2 years of additional data, and the relative improvement in model predictability is discussed. In addition, a set of regression models is presented for a new site in the Menomonee River Basin, Underwood Creek at Wauwatosa. The refined models use the same explanatory variables as the original models. The chloride models all used specific conductance as the explanatory variable, except for the model for the Little Menomonee River near Freistadt, which used both specific conductance and turbidity. Total suspended solids and total phosphorus models used turbidity as the only explanatory variable, and bacteria models used water temperature and turbidity as explanatory variables. An analysis of covariance (ANCOVA), used to compare the coefficients in the original models to those in the refined models calibrated using all of the data, showed that only 3 of the 25 original models changed significantly. Root-mean-squared errors (RMSEs

  2. Comparison of Regression Analysis and Transfer Function in Estimating the Parameters of Central Pulse Waves from Brachial Pulse Wave.

    Science.gov (United States)

    Chai, Rui; Xu, Li-Sheng; Yao, Yang; Hao, Li-Ling; Qi, Lin

    2017-01-01

    This study analyzed ascending branch slope (A_slope), dicrotic notch height (Hn), diastolic area (Ad) and systolic area (As) diastolic blood pressure (DBP), systolic blood pressure (SBP), pulse pressure (PP), subendocardial viability ratio (SEVR), waveform parameter (k), stroke volume (SV), cardiac output (CO), and peripheral resistance (RS) of central pulse wave invasively and non-invasively measured. Invasively measured parameters were compared with parameters measured from brachial pulse waves by regression model and transfer function model. Accuracy of parameters estimated by regression and transfer function model, was compared too. Findings showed that k value, central pulse wave and brachial pulse wave parameters invasively measured, correlated positively. Regression model parameters including A_slope, DBP, SEVR, and transfer function model parameters had good consistency with parameters invasively measured. They had same effect of consistency. SBP, PP, SV, and CO could be calculated through the regression model, but their accuracies were worse than that of transfer function model.

  3. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

    Science.gov (United States)

    Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

    2018-03-01

    Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.

  4. Role of regression model selection and station distribution on the estimation of oceanic anthropogenic carbon change by eMLR

    Directory of Open Access Journals (Sweden)

    Y. Plancherel

    2013-07-01

    Full Text Available Quantifying oceanic anthropogenic carbon uptake by monitoring interior dissolved inorganic carbon (DIC concentrations is complicated by the influence of natural variability. The "eMLR method" aims to address this issue by using empirical regression fits of the data instead of the data themselves, inferring the change in anthropogenic carbon in time by difference between predictions generated by the regressions at each time. The advantages of the method are that it provides in principle a means to filter out natural variability, which theoretically becomes the regression residuals, and a way to deal with sparsely and unevenly distributed data. The degree to which these advantages are realized in practice is unclear, however. The ability of the eMLR method to recover the anthropogenic carbon signal is tested here using a global circulation and biogeochemistry model in which the true signal is known. Results show that regression model selection is particularly important when the observational network changes in time. When the observational network is fixed, the likelihood that co-located systematic misfits between the empirical model and the underlying, yet unknown, true model cancel is greater, improving eMLR results. Changing the observational network modifies how the spatio-temporal variance pattern is captured by the respective datasets, resulting in empirical models that are dynamically or regionally inconsistent, leading to systematic errors. In consequence, the use of regression formulae that change in time to represent systematically best-fit models at all times does not guarantee the best estimates of anthropogenic carbon change if the spatial distributions of the stations emphasize hydrographic features differently in time. Other factors, such as a balanced and representative station coverage, vertical continuity of the regression formulae consistent with the hydrographic context and resiliency of the spatial distribution of the residual

  5. Influence diagnostics in meta-regression model.

    Science.gov (United States)

    Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua

    2017-09-01

    This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.

  6. Comparison of several measure-correlate-predict models using support vector regression techniques to estimate wind power densities. A case study

    International Nuclear Information System (INIS)

    Díaz, Santiago; Carta, José A.; Matías, José M.

    2017-01-01

    Highlights: • Eight measure-correlate-predict (MCP) models used to estimate the wind power densities (WPDs) at a target site are compared. • Support vector regressions are used as the main prediction techniques in the proposed MCPs. • The most precise MCP uses two sub-models which predict wind speed and air density in an unlinked manner. • The most precise model allows to construct a bivariable (wind speed and air density) WPD probability density function. • MCP models trained to minimise wind speed prediction error do not minimise WPD prediction error. - Abstract: The long-term annual mean wind power density (WPD) is an important indicator of wind as a power source which is usually included in regional wind resource maps as useful prior information to identify potentially attractive sites for the installation of wind projects. In this paper, a comparison is made of eight proposed Measure-Correlate-Predict (MCP) models to estimate the WPDs at a target site. Seven of these models use the Support Vector Regression (SVR) and the eighth the Multiple Linear Regression (MLR) technique, which serves as a basis to compare the performance of the other models. In addition, a wrapper technique with 10-fold cross-validation has been used to select the optimal set of input features for the SVR and MLR models. Some of the eight models were trained to directly estimate the mean hourly WPDs at a target site. Others, however, were firstly trained to estimate the parameters on which the WPD depends (i.e. wind speed and air density) and then, using these parameters, the target site mean hourly WPDs. The explanatory features considered are different combinations of the mean hourly wind speeds, wind directions and air densities recorded in 2014 at ten weather stations in the Canary Archipelago (Spain). The conclusions that can be drawn from the study undertaken include the argument that the most accurate method for the long-term estimation of WPDs requires the execution of a

  7. Stochastic development regression using method of moments

    DEFF Research Database (Denmark)

    Kühnel, Line; Sommer, Stefan Horst

    2017-01-01

    This paper considers the estimation problem arising when inferring parameters in the stochastic development regression model for manifold valued non-linear data. Stochastic development regression captures the relation between manifold-valued response and Euclidean covariate variables using...... the stochastic development construction. It is thereby able to incorporate several covariate variables and random effects. The model is intrinsically defined using the connection of the manifold, and the use of stochastic development avoids linearizing the geometry. We propose to infer parameters using...... the Method of Moments procedure that matches known constraints on moments of the observations conditional on the latent variables. The performance of the model is investigated in a simulation example using data on finite dimensional landmark manifolds....

  8. Uncertainty relations for approximation and estimation

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Jaeha, E-mail: jlee@post.kek.jp [Department of Physics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033 (Japan); Tsutsui, Izumi, E-mail: izumi.tsutsui@kek.jp [Department of Physics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033 (Japan); Theory Center, Institute of Particle and Nuclear Studies, High Energy Accelerator Research Organization (KEK), 1-1 Oho, Tsukuba, Ibaraki 305-0801 (Japan)

    2016-05-27

    We present a versatile inequality of uncertainty relations which are useful when one approximates an observable and/or estimates a physical parameter based on the measurement of another observable. It is shown that the optimal choice for proxy functions used for the approximation is given by Aharonov's weak value, which also determines the classical Fisher information in parameter estimation, turning our inequality into the genuine Cramér–Rao inequality. Since the standard form of the uncertainty relation arises as a special case of our inequality, and since the parameter estimation is available as well, our inequality can treat both the position–momentum and the time–energy relations in one framework albeit handled differently. - Highlights: • Several inequalities interpreted as uncertainty relations for approximation/estimation are derived from a single ‘versatile inequality’. • The ‘versatile inequality’ sets a limit on the approximation of an observable and/or the estimation of a parameter by another observable. • The ‘versatile inequality’ turns into an elaboration of the Robertson–Kennard (Schrödinger) inequality and the Cramér–Rao inequality. • Both the position–momentum and the time–energy relation are treated in one framework. • In every case, Aharonov's weak value arises as a key geometrical ingredient, deciding the optimal choice for the proxy functions.

  9. Uncertainty relations for approximation and estimation

    International Nuclear Information System (INIS)

    Lee, Jaeha; Tsutsui, Izumi

    2016-01-01

    We present a versatile inequality of uncertainty relations which are useful when one approximates an observable and/or estimates a physical parameter based on the measurement of another observable. It is shown that the optimal choice for proxy functions used for the approximation is given by Aharonov's weak value, which also determines the classical Fisher information in parameter estimation, turning our inequality into the genuine Cramér–Rao inequality. Since the standard form of the uncertainty relation arises as a special case of our inequality, and since the parameter estimation is available as well, our inequality can treat both the position–momentum and the time–energy relations in one framework albeit handled differently. - Highlights: • Several inequalities interpreted as uncertainty relations for approximation/estimation are derived from a single ‘versatile inequality’. • The ‘versatile inequality’ sets a limit on the approximation of an observable and/or the estimation of a parameter by another observable. • The ‘versatile inequality’ turns into an elaboration of the Robertson–Kennard (Schrödinger) inequality and the Cramér–Rao inequality. • Both the position–momentum and the time–energy relation are treated in one framework. • In every case, Aharonov's weak value arises as a key geometrical ingredient, deciding the optimal choice for the proxy functions.

  10. riskRegression

    DEFF Research Database (Denmark)

    Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

    2017-01-01

    In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...... functionals. The software presented here is implemented in the riskRegression package....

  11. Study Heterogeneity and Estimation of Prevalence of Primary Aldosteronism: A Systematic Review and Meta-Regression Analysis.

    Science.gov (United States)

    Käyser, Sabine C; Dekkers, Tanja; Groenewoud, Hans J; van der Wilt, Gert Jan; Carel Bakx, J; van der Wel, Mark C; Hermus, Ad R; Lenders, Jacques W; Deinum, Jaap

    2016-07-01

    For health care planning and allocation of resources, realistic estimation of the prevalence of primary aldosteronism is necessary. Reported prevalences of primary aldosteronism are highly variable, possibly due to study heterogeneity. Our objective was to identify and explain heterogeneity in studies that aimed to establish the prevalence of primary aldosteronism in hypertensive patients. PubMed, EMBASE, Web of Science, Cochrane Library, and reference lists from January 1, 1990, to January 31, 2015, were used as data sources. Description of an adult hypertensive patient population with confirmed diagnosis of primary aldosteronism was included in this study. Dual extraction and quality assessment were the forms of data extraction. Thirty-nine studies provided data on 42 510 patients (nine studies, 5896 patients from primary care). Prevalence estimates varied from 3.2% to 12.7% in primary care and from 1% to 29.8% in referral centers. Heterogeneity was too high to establish point estimates (I(2) = 57.6% in primary care; 97.1% in referral centers). Meta-regression analysis showed higher prevalences in studies 1) published after 2000, 2) from Australia, 3) aimed at assessing prevalence of secondary hypertension, 4) that were retrospective, 5) that selected consecutive patients, and 6) not using a screening test. All studies had minor or major flaws. This study demonstrates that it is pointless to claim low or high prevalence of primary aldosteronism based on published reports. Because of the significant impact of a diagnosis of primary aldosteronism on health care resources and the necessary facilities, our findings urge for a prevalence study whose design takes into account the factors identified in the meta-regression analysis.

  12. Steganalysis using logistic regression

    Science.gov (United States)

    Lubenko, Ivans; Ker, Andrew D.

    2011-02-01

    We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.

  13. Ridge Regression Signal Processing

    Science.gov (United States)

    Kuhl, Mark R.

    1990-01-01

    The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.

  14. Regression relation for pure quantum states and its implications for efficient computing.

    Science.gov (United States)

    Elsayed, Tarek A; Fine, Boris V

    2013-02-15

    We obtain a modified version of the Onsager regression relation for the expectation values of quantum-mechanical operators in pure quantum states of isolated many-body quantum systems. We use the insights gained from this relation to show that high-temperature time correlation functions in many-body quantum systems can be controllably computed without complete diagonalization of the Hamiltonians, using instead the direct integration of the Schrödinger equation for randomly sampled pure states. This method is also applicable to quantum quenches and other situations describable by time-dependent many-body Hamiltonians. The method implies exponential reduction of the computer memory requirement in comparison with the complete diagonalization. We illustrate the method by numerically computing infinite-temperature correlation functions for translationally invariant Heisenberg chains of up to 29 spins 1/2. Thereby, we also test the spin diffusion hypothesis and find it in a satisfactory agreement with the numerical results. Both the derivation of the modified regression relation and the justification of the computational method are based on the notion of quantum typicality.

  15. Computed statistics at streamgages, and methods for estimating low-flow frequency statistics and development of regional regression equations for estimating low-flow frequency statistics at ungaged locations in Missouri

    Science.gov (United States)

    Southard, Rodney E.

    2013-01-01

    The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical

  16. Precision Interval Estimation of the Response Surface by Means of an Integrated Algorithm of Neural Network and Linear Regression

    Science.gov (United States)

    Lo, Ching F.

    1999-01-01

    The integration of Radial Basis Function Networks and Back Propagation Neural Networks with the Multiple Linear Regression has been accomplished to map nonlinear response surfaces over a wide range of independent variables in the process of the Modem Design of Experiments. The integrated method is capable to estimate the precision intervals including confidence and predicted intervals. The power of the innovative method has been demonstrated by applying to a set of wind tunnel test data in construction of response surface and estimation of precision interval.

  17. Penalized estimation for competing risks regression with applications to high-dimensional covariates

    DEFF Research Database (Denmark)

    Ambrogi, Federico; Scheike, Thomas H.

    2016-01-01

    of competing events. The direct binomial regression model of Scheike and others (2008. Predicting cumulative incidence probability by direct binomial regression. Biometrika 95: (1), 205-220) is reformulated in a penalized framework to possibly fit a sparse regression model. The developed approach is easily...... Research 19: (1), 29-51), the research regarding competing risks is less developed (Binder and others, 2009. Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25: (7), 890-896). The aim of this work is to consider how to do penalized regression in the presence...... implementable using existing high-performance software to do penalized regression. Results from simulation studies are presented together with an application to genomic data when the endpoint is progression-free survival. An R function is provided to perform regularized competing risks regression according...

  18. Random regression models to estimate genetic parameters for milk production of Guzerat cows using orthogonal Legendre polynomials

    Directory of Open Access Journals (Sweden)

    Maria Gabriela Campolina Diniz Peixoto

    2014-05-01

    Full Text Available The objective of this work was to compare random regression models for the estimation of genetic parameters for Guzerat milk production, using orthogonal Legendre polynomials. Records (20,524 of test-day milk yield (TDMY from 2,816 first-lactation Guzerat cows were used. TDMY grouped into 10-monthly classes were analyzed for additive genetic effect and for environmental and residual permanent effects (random effects, whereas the contemporary group, calving age (linear and quadratic effects and mean lactation curve were analized as fixed effects. Trajectories for the additive genetic and permanent environmental effects were modeled by means of a covariance function employing orthogonal Legendre polynomials ranging from the second to the fifth order. Residual variances were considered in one, four, six, or ten variance classes. The best model had six residual variance classes. The heritability estimates for the TDMY records varied from 0.19 to 0.32. The random regression model that used a second-order Legendre polynomial for the additive genetic effect, and a fifth-order polynomial for the permanent environmental effect is adequate for comparison by the main employed criteria. The model with a second-order Legendre polynomial for the additive genetic effect, and that with a fourth-order for the permanent environmental effect could also be employed in these analyses.

  19. Evaluation of Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR) for Water Quality Monitoring: A Case Study for the Estimation of Salinity

    Science.gov (United States)

    Nazeer, Majid; Bilal, Muhammad

    2018-04-01

    Landsat-5 Thematic Mapper (TM) dataset have been used to estimate salinity in the coastal area of Hong Kong. Four adjacent Landsat TM images were used in this study, which was atmospherically corrected using the Second Simulation of the Satellite Signal in the Solar Spectrum (6S) radiative transfer code. The atmospherically corrected images were further used to develop models for salinity using Ordinary Least Square (OLS) regression and Geographically Weighted Regression (GWR) based on in situ data of October 2009. Results show that the coefficient of determination ( R 2) of 0.42 between the OLS estimated and in situ measured salinity is much lower than that of the GWR model, which is two times higher ( R 2 = 0.86). It indicates that the GWR model has more ability than the OLS regression model to predict salinity and show its spatial heterogeneity better. It was observed that the salinity was high in Deep Bay (north-western part of Hong Kong) which might be due to the industrial waste disposal, whereas the salinity was estimated to be constant (32 practical salinity units) towards the open sea.

  20. State-Level Estimates of Cancer-Related Absenteeism Costs

    Science.gov (United States)

    Tangka, Florence K.; Trogdon, Justin G.; Nwaise, Isaac; Ekwueme, Donatus U.; Guy, Gery P.; Orenstein, Diane

    2016-01-01

    Background Cancer is one of the top five most costly diseases in the United States and leads to substantial work loss. Nevertheless, limited state-level estimates of cancer absenteeism costs have been published. Methods In analyses of data from the 2004–2008 Medical Expenditure Panel Survey, the 2004 National Nursing Home Survey, the U.S. Census Bureau for 2008, and the 2009 Current Population Survey, we used regression modeling to estimate annual state-level absenteeism costs attributable to cancer from 2004 to 2008. Results We estimated that the state-level median number of days of absenteeism per year among employed cancer patients was 6.1 days and that annual state-level cancer absenteeism costs ranged from $14.9 million to $915.9 million (median = $115.9 million) across states in 2010 dollars. Absenteeism costs are approximately 6.5% of the costs of premature cancer mortality. Conclusions The results from this study suggest that lost productivity attributable to cancer is a substantial cost to employees and employers and contributes to estimates of the overall impact of cancer in a state population. PMID:23969498

  1. Fragility estimation for seismically isolated nuclear structures by high confidence low probability of failure values and bi-linear regression

    International Nuclear Information System (INIS)

    Carausu, A.

    1996-01-01

    A method for the fragility estimation of seismically isolated nuclear power plant structure is proposed. The relationship between the ground motion intensity parameter (e.g. peak ground velocity or peak ground acceleration) and the response of isolated structures is expressed in terms of a bi-linear regression line, whose coefficients are estimated by the least-square method in terms of available data on seismic input and structural response. The notion of high confidence low probability of failure (HCLPF) value is also used for deriving compound fragility curves for coupled subsystems. (orig.)

  2. SEPARATION PHENOMENA LOGISTIC REGRESSION

    Directory of Open Access Journals (Sweden)

    Ikaro Daniel de Carvalho Barreto

    2014-03-01

    Full Text Available This paper proposes an application of concepts about the maximum likelihood estimation of the binomial logistic regression model to the separation phenomena. It generates bias in the estimation and provides different interpretations of the estimates on the different statistical tests (Wald, Likelihood Ratio and Score and provides different estimates on the different iterative methods (Newton-Raphson and Fisher Score. It also presents an example that demonstrates the direct implications for the validation of the model and validation of variables, the implications for estimates of odds ratios and confidence intervals, generated from the Wald statistics. Furthermore, we present, briefly, the Firth correction to circumvent the phenomena of separation.

  3. The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard

    and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...... within a nonparametric panel data regression framework. The fourth paper analyses the technical efficiency of dairy farms with environmental output using nonparametric kernel regression in a semiparametric stochastic frontier analysis. The results provided in this PhD thesis show that nonparametric......This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...

  4. Prediction of the distillation temperatures of crude oils using ¹H NMR and support vector regression with estimated confidence intervals.

    Science.gov (United States)

    Filgueiras, Paulo R; Terra, Luciana A; Castro, Eustáquio V R; Oliveira, Lize M S L; Dias, Júlio C M; Poppi, Ronei J

    2015-09-01

    This paper aims to estimate the temperature equivalent to 10% (T10%), 50% (T50%) and 90% (T90%) of distilled volume in crude oils using (1)H NMR and support vector regression (SVR). Confidence intervals for the predicted values were calculated using a boosting-type ensemble method in a procedure called ensemble support vector regression (eSVR). The estimated confidence intervals obtained by eSVR were compared with previously accepted calculations from partial least squares (PLS) models and a boosting-type ensemble applied in the PLS method (ePLS). By using the proposed boosting strategy, it was possible to identify outliers in the T10% property dataset. The eSVR procedure improved the accuracy of the distillation temperature predictions in relation to standard PLS, ePLS and SVR. For T10%, a root mean square error of prediction (RMSEP) of 11.6°C was obtained in comparison with 15.6°C for PLS, 15.1°C for ePLS and 28.4°C for SVR. The RMSEPs for T50% were 24.2°C, 23.4°C, 22.8°C and 14.4°C for PLS, ePLS, SVR and eSVR, respectively. For T90%, the values of RMSEP were 39.0°C, 39.9°C and 39.9°C for PLS, ePLS, SVR and eSVR, respectively. The confidence intervals calculated by the proposed boosting methodology presented acceptable values for the three properties analyzed; however, they were lower than those calculated by the standard methodology for PLS. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Estimation of residual stress in welding of dissimilar metals at nuclear power plants using cascaded support vetor regression

    Energy Technology Data Exchange (ETDEWEB)

    Koo, Young Do; Yoo, Kwae Hwan; Na, Man Gyun [Dept. of Nuclear Engineering, Chosun University, Gwangju (Korea, Republic of)

    2017-06-15

    Residual stress is a critical element in determining the integrity of parts and the lifetime of welded structures. It is necessary to estimate the residual stress of a welding zone because residual stress is a major reason for the generation of primary water stress corrosion cracking in nuclear power plants. That is, it is necessary to estimate the distribution of the residual stress in welding of dissimilar metals under manifold welding conditions. In this study, a cascaded support vector regression (CSVR) model was presented to estimate the residual stress of a welding zone. The CSVR model was serially and consecutively structured in terms of SVR modules. Using numerical data obtained from finite element analysis by a subtractive clustering method, learning data that explained the characteristic behavior of the residual stress of a welding zone were selected to optimize the proposed model. The results suggest that the CSVR model yielded a better estimation performance when compared with a classic SVR model.

  6. Estimated prevalence of halitosis: a systematic review and meta-regression analysis.

    Science.gov (United States)

    Silva, Manuela F; Leite, Fábio R M; Ferreira, Larissa B; Pola, Natália M; Scannapieco, Frank A; Demarco, Flávio F; Nascimento, Gustavo G

    2018-01-01

    This study aims to conduct a systematic review to determine the prevalence of halitosis in adolescents and adults. Electronic searches were performed using four different databases without restrictions: PubMed, Scopus, Web of Science, and SciELO. Population-based observational studies that provided data about the prevalence of halitosis in adolescents and adults were included. Additionally, meta-analyses, meta-regression, and sensitivity analyses were conducted to synthesize the evidence. A total of 584 articles were initially found and considered for title and abstract evaluation. Thirteen articles met inclusion criteria. The combined prevalence of halitosis was found to be 31.8% (95% CI 24.6-39.0%). Methodological aspects such as the year of publication and the socioeconomic status of the country where the study was conducted seemed to influence the prevalence of halitosis. Our results demonstrated that the estimated prevalence of halitosis was 31.8%, with high heterogeneity between studies. The results suggest a worldwide trend towards a rise in halitosis prevalence. Given the high prevalence of halitosis and its complex etiology, dental professionals should be aware of their roles in halitosis prevention and treatment.

  7. Testing the transferability of regression equations derived from small sub-catchments to a large area in central Sweden

    Directory of Open Access Journals (Sweden)

    C. Xu

    2003-01-01

    Full Text Available There is an ever increasing need to apply hydrological models to catchments where streamflow data are unavailable or to large geographical regions where calibration is not feasible. Estimation of model parameters from spatial physical data is the key issue in the development and application of hydrological models at various scales. To investigate the suitability of transferring the regression equations relating model parameters to physical characteristics developed from small sub-catchments to a large region for estimating model parameters, a conceptual snow and water balance model was optimised on all the sub-catchments in the region. A multiple regression analysis related model parameters to physical data for the catchments and the regression equations derived from the small sub-catchments were used to calculate regional parameter values for the large basin using spatially aggregated physical data. For the model tested, the results support the suitability of transferring the regression equations to the larger region. Keywords: water balance modelling,large scale, multiple regression, regionalisation

  8. Relative Importance for Linear Regression in R: The Package relaimpo

    Directory of Open Access Journals (Sweden)

    Ulrike Gromping

    2006-09-01

    Full Text Available Relative importance is a topic that has seen a lot of interest in recent years, particularly in applied work. The R package relaimpo implements six different metrics for assessing relative importance of regressors in the linear model, two of which are recommended - averaging over orderings of regressors and a newly proposed metric (Feldman 2005 called pmvd. Apart from delivering the metrics themselves, relaimpo also provides (exploratory bootstrap confidence intervals. This paper offers a brief tutorial introduction to the package. The methods and relaimpo’s functionality are illustrated using the data set swiss that is generally available in R. The paper targets readers who have a basic understanding of multiple linear regression. For the background of more advanced aspects, references are provided.

  9. Estimation of snowpack matching ground-truth data and MODIS satellite-based observations by using regression kriging

    Science.gov (United States)

    Juan Collados-Lara, Antonio; Pardo-Iguzquiza, Eulogio; Pulido-Velazquez, David

    2016-04-01

    The estimation of Snow Water Equivalent (SWE) is essential for an appropriate assessment of the available water resources in Alpine catchment. The hydrologic regime in these areas is dominated by the storage of water in the snowpack, which is discharged to rivers throughout the melt season. An accurate estimation of the resources will be necessary for an appropriate analysis of the system operation alternatives using basin scale management models. In order to obtain an appropriate estimation of the SWE we need to know the spatial distribution snowpack and snow density within the Snow Cover Area (SCA). Data for these snow variables can be extracted from in-situ point measurements and air-borne/space-borne remote sensing observations. Different interpolation and simulation techniques have been employed for the estimation of the cited variables. In this paper we propose to estimate snowpack from a reduced number of ground-truth data (1 or 2 campaigns per year with 23 observation point from 2000-2014) and MODIS satellite-based observations in the Sierra Nevada Mountain (Southern Spain). Regression based methodologies has been used to study snowpack distribution using different kind of explicative variables: geographic, topographic, climatic. 40 explicative variables were considered: the longitude, latitude, altitude, slope, eastness, northness, radiation, maximum upwind slope and some mathematical transformation of each of them [Ln(v), (v)^-1; (v)^2; (v)^0.5). Eight different structure of regression models have been tested (combining 1, 2, 3 or 4 explicative variables). Y=B0+B1Xi (1); Y=B0+B1XiXj (2); Y=B0+B1Xi+B2Xj (3); Y=B0+B1Xi+B2XjXl (4); Y=B0+B1XiXk+B2XjXl (5); Y=B0+B1Xi+B2Xj+B3Xl (6); Y=B0+B1Xi+B2Xj+B3XlXk (7); Y=B0+B1Xi+B2Xj+B3Xl+B4Xk (8). Where: Y is the snow depth; (Xi, Xj, Xl, Xk) are the prediction variables (any of the 40 variables); (B0, B1, B2, B3) are the coefficients to be estimated. The ground data are employed to calibrate the multiple regressions. In

  10. Dynamical System and Nonlinear Regression for Estimate Host-Parasitoid Relationship

    Directory of Open Access Journals (Sweden)

    Ileana Miranda Cabrera

    2010-01-01

    Full Text Available The complex relationships of a crop with the pest, its natural enemies, and the climate factors exist in all the ecosystems, but the mathematic models has studied only some components to know the relation cause-effect. The most studied system has been concerned with the relationship pest-natural enemies such as prey-predator or host-parasitoid. The present paper shows a dynamical system for studying the relationship host-parasitoid (Diaphorina citri, Tamarixia radiata and shows that a nonlinear model permits the estimation of the parasite nymphs using nymphs healthy as the known variable. The model showed the functional answer of the parasitoid, in which a point arrives that its density is not augmented although the number host increases, and it becomes necessary to intervene in the ecosystem. A simple algorithm is used to estimate the parasitoids level using the priori relationship between the host and the climate factors and then the nonlinear model.

  11. Drusen regression is associated with local changes in fundus autofluorescence in intermediate age-related macular degeneration.

    Science.gov (United States)

    Toy, Brian C; Krishnadev, Nupura; Indaram, Maanasa; Cunningham, Denise; Cukras, Catherine A; Chew, Emily Y; Wong, Wai T

    2013-09-01

    To investigate the association of spontaneous drusen regression in intermediate age-related macular degeneration (AMD) with changes on fundus photography and fundus autofluorescence (FAF) imaging. Prospective observational case series. Fundus images from 58 eyes (in 58 patients) with intermediate AMD and large drusen were assessed over 2 years for areas of drusen regression that exceeded the area of circle C1 (diameter 125 μm; Age-Related Eye Disease Study grading protocol). Manual segmentation and computer-based image analysis were used to detect and delineate areas of drusen regression. Delineated regions were graded as to their appearance on fundus photographs and FAF images, and changes in FAF signal were graded manually and quantitated using automated image analysis. Drusen regression was detected in approximately half of study eyes using manual (48%) and computer-assisted (50%) techniques. At year-2, the clinical appearance of areas of drusen regression on fundus photography was mostly unremarkable, with a majority of eyes (71%) demonstrating no detectable clinical abnormalities, and the remainder (29%) showing minor pigmentary changes. However, drusen regression areas were associated with local changes in FAF that were significantly more prominent than changes on fundus photography. A majority of eyes (64%-66%) demonstrated a predominant decrease in overall FAF signal, while 14%-21% of eyes demonstrated a predominant increase in overall FAF signal. FAF imaging demonstrated that drusen regression in intermediate AMD was often accompanied by changes in local autofluorescence signal. Drusen regression may be associated with concurrent structural and physiologic changes in the outer retina. Published by Elsevier Inc.

  12. Practical Aspects of Log-ratio Coordinate Representations in Regression with Compositional Response

    Directory of Open Access Journals (Sweden)

    Fišerová Eva

    2016-10-01

    Full Text Available Regression analysis with compositional response, observations carrying relative information, is an appropriate tool for statistical modelling in many scientific areas (e.g. medicine, geochemistry, geology, economics. Even though this technique has been recently intensively studied, there are still some practical aspects that deserve to be further analysed. Here we discuss the issue related to the coordinate representation of compositional data. It is shown that linear relation between particular orthonormal coordinates and centred log-ratio coordinates can be utilized to simplify the computation concerning regression parameters estimation and hypothesis testing. To enhance interpretation of regression parameters, the orthogonal coordinates and their relation with orthonormal and centred log-ratio coordinates are presented. Further we discuss the quality of prediction in different coordinate system. It is shown that the mean squared error (MSE for orthonormal coordinates is less or equal to the MSE for log-transformed data. Finally, an illustrative real-world example from geology is presented.

  13. Linear regressive model structures for estimation and prediction of compartmental diffusive systems

    NARCIS (Netherlands)

    Vries, D; Keesman, K.J.; Zwart, Heiko J.

    In input-output relations of (compartmental) diffusive systems, physical parameters appear non-linearly, resulting in the use of (constrained) non-linear parameter estimation techniques with its short-comings regarding global optimality and computational effort. Given a LTI system in state space

  14. Linear regressive model structures for estimation and prediction of compartmental diffusive systems

    NARCIS (Netherlands)

    Vries, D.; Keesman, K.J.; Zwart, H.

    2006-01-01

    Abstract In input-output relations of (compartmental) diffusive systems, physical parameters appear non-linearly, resulting in the use of (constrained) non-linear parameter estimation techniques with its short-comings regarding global optimality and computational effort. Given a LTI system in state

  15. Regression Discontinuity and Randomized Controlled Trial Estimates: An Application to The Mycotic Ulcer Treatment Trials.

    Science.gov (United States)

    Oldenburg, Catherine E; Venkatesh Prajna, N; Krishnan, Tiruvengada; Rajaraman, Revathi; Srinivasan, Muthiah; Ray, Kathryn J; O'Brien, Kieran S; Glymour, M Maria; Porco, Travis C; Acharya, Nisha R; Rose-Nussbaumer, Jennifer; Lietman, Thomas M

    2018-08-01

    We compare results from regression discontinuity (RD) analysis to primary results of a randomized controlled trial (RCT) utilizing data from two contemporaneous RCTs for treatment of fungal corneal ulcers. Patients were enrolled in the Mycotic Ulcer Treatment Trials I and II (MUTT I & MUTT II) based on baseline visual acuity: patients with acuity ≤ 20/400 (logMAR 1.3) enrolled in MUTT I, and >20/400 in MUTT II. MUTT I investigated the effect of topical natamycin versus voriconazole on best spectacle-corrected visual acuity. MUTT II investigated the effect of topical voriconazole plus placebo versus topical voriconazole plus oral voriconazole. We compared the RD estimate (natamycin arm of MUTT I [N = 162] versus placebo arm of MUTT II [N = 54]) to the RCT estimate from MUTT I (topical natamycin [N = 162] versus topical voriconazole [N = 161]). In the RD, patients receiving natamycin had mean improvement of 4-lines of visual acuity at 3 months (logMAR -0.39, 95% CI: -0.61, -0.17) compared to topical voriconazole plus placebo, and 2-lines in the RCT (logMAR -0.18, 95% CI: -0.30, -0.05) compared to topical voriconazole. The RD and RCT estimates were similar, although the RD design overestimated effects compared to the RCT.

  16. Linear regression in astronomy. I

    Science.gov (United States)

    Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

    1990-01-01

    Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.

  17. Mixed Frequency Data Sampling Regression Models: The R Package midasr

    Directory of Open Access Journals (Sweden)

    Eric Ghysels

    2016-08-01

    Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.

  18. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    Directory of Open Access Journals (Sweden)

    Guoqi Qian

    2016-01-01

    Full Text Available Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method.

  19. Taking into account latency, amplitude, and morphology: improved estimation of single-trial ERPs by wavelet filtering and multiple linear regression.

    Science.gov (United States)

    Hu, L; Liang, M; Mouraux, A; Wise, R G; Hu, Y; Iannetti, G D

    2011-12-01

    Across-trial averaging is a widely used approach to enhance the signal-to-noise ratio (SNR) of event-related potentials (ERPs). However, across-trial variability of ERP latency and amplitude may contain physiologically relevant information that is lost by across-trial averaging. Hence, we aimed to develop a novel method that uses 1) wavelet filtering (WF) to enhance the SNR of ERPs and 2) a multiple linear regression with a dispersion term (MLR(d)) that takes into account shape distortions to estimate the single-trial latency and amplitude of ERP peaks. Using simulated ERP data sets containing different levels of noise, we provide evidence that, compared with other approaches, the proposed WF+MLR(d) method yields the most accurate estimate of single-trial ERP features. When applied to a real laser-evoked potential data set, the WF+MLR(d) approach provides reliable estimation of single-trial latency, amplitude, and morphology of ERPs and thereby allows performing meaningful correlations at single-trial level. We obtained three main findings. First, WF significantly enhances the SNR of single-trial ERPs. Second, MLR(d) effectively captures and measures the variability in the morphology of single-trial ERPs, thus providing an accurate and unbiased estimate of their peak latency and amplitude. Third, intensity of pain perception significantly correlates with the single-trial estimates of N2 and P2 amplitude. These results indicate that WF+MLR(d) can be used to explore the dynamics between different ERP features, behavioral variables, and other neuroimaging measures of brain activity, thus providing new insights into the functional significance of the different brain processes underlying the brain responses to sensory stimuli.

  20. Comparison of beta-binomial regression model approaches to analyze health-related quality of life data.

    Science.gov (United States)

    Najera-Zuloaga, Josu; Lee, Dae-Jin; Arostegui, Inmaculada

    2017-01-01

    Health-related quality of life has become an increasingly important indicator of health status in clinical trials and epidemiological research. Moreover, the study of the relationship of health-related quality of life with patients and disease characteristics has become one of the primary aims of many health-related quality of life studies. Health-related quality of life scores are usually assumed to be distributed as binomial random variables and often highly skewed. The use of the beta-binomial distribution in the regression context has been proposed to model such data; however, the beta-binomial regression has been performed by means of two different approaches in the literature: (i) beta-binomial distribution with a logistic link; and (ii) hierarchical generalized linear models. None of the existing literature in the analysis of health-related quality of life survey data has performed a comparison of both approaches in terms of adequacy and regression parameter interpretation context. This paper is motivated by the analysis of a real data application of health-related quality of life outcomes in patients with Chronic Obstructive Pulmonary Disease, where the use of both approaches yields to contradictory results in terms of covariate effects significance and consequently the interpretation of the most relevant factors in health-related quality of life. We present an explanation of the results in both methodologies through a simulation study and address the need to apply the proper approach in the analysis of health-related quality of life survey data for practitioners, providing an R package.

  1. User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)

    Science.gov (United States)

    Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.

    2009-01-01

    Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.

  2. Significance testing in ridge regression for genetic data

    Directory of Open Access Journals (Sweden)

    De Iorio Maria

    2011-09-01

    Full Text Available Abstract Background Technological developments have increased the feasibility of large scale genetic association studies. Densely typed genetic markers are obtained using SNP arrays, next-generation sequencing technologies and imputation. However, SNPs typed using these methods can be highly correlated due to linkage disequilibrium among them, and standard multiple regression techniques fail with these data sets due to their high dimensionality and correlation structure. There has been increasing interest in using penalised regression in the analysis of high dimensional data. Ridge regression is one such penalised regression technique which does not perform variable selection, instead estimating a regression coefficient for each predictor variable. It is therefore desirable to obtain an estimate of the significance of each ridge regression coefficient. Results We develop and evaluate a test of significance for ridge regression coefficients. Using simulation studies, we demonstrate that the performance of the test is comparable to that of a permutation test, with the advantage of a much-reduced computational cost. We introduce the p-value trace, a plot of the negative logarithm of the p-values of ridge regression coefficients with increasing shrinkage parameter, which enables the visualisation of the change in p-value of the regression coefficients with increasing penalisation. We apply the proposed method to a lung cancer case-control data set from EPIC, the European Prospective Investigation into Cancer and Nutrition. Conclusions The proposed test is a useful alternative to a permutation test for the estimation of the significance of ridge regression coefficients, at a much-reduced computational cost. The p-value trace is an informative graphical tool for evaluating the results of a test of significance of ridge regression coefficients as the shrinkage parameter increases, and the proposed test makes its production computationally feasible.

  3. A Solution to Separation and Multicollinearity in Multiple Logistic Regression.

    Science.gov (United States)

    Shen, Jianzhao; Gao, Sujuan

    2008-10-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.

  4. Depth-weighted robust multivariate regression with application to sparse data

    KAUST Repository

    Dutta, Subhajit; Genton, Marc G.

    2017-01-01

    A robust method for multivariate regression is developed based on robust estimators of the joint location and scatter matrix of the explanatory and response variables using the notion of data depth. The multivariate regression estimator possesses desirable affine equivariance properties, achieves the best breakdown point of any affine equivariant estimator, and has an influence function which is bounded in both the response as well as the predictor variable. To increase the efficiency of this estimator, a re-weighted estimator based on robust Mahalanobis distances of the residual vectors is proposed. In practice, the method is more stable than existing methods that are constructed using subsamples of the data. The resulting multivariate regression technique is computationally feasible, and turns out to perform better than several popular robust multivariate regression methods when applied to various simulated data as well as a real benchmark data set. When the data dimension is quite high compared to the sample size it is still possible to use meaningful notions of data depth along with the corresponding depth values to construct a robust estimator in a sparse setting.

  5. Depth-weighted robust multivariate regression with application to sparse data

    KAUST Repository

    Dutta, Subhajit

    2017-04-05

    A robust method for multivariate regression is developed based on robust estimators of the joint location and scatter matrix of the explanatory and response variables using the notion of data depth. The multivariate regression estimator possesses desirable affine equivariance properties, achieves the best breakdown point of any affine equivariant estimator, and has an influence function which is bounded in both the response as well as the predictor variable. To increase the efficiency of this estimator, a re-weighted estimator based on robust Mahalanobis distances of the residual vectors is proposed. In practice, the method is more stable than existing methods that are constructed using subsamples of the data. The resulting multivariate regression technique is computationally feasible, and turns out to perform better than several popular robust multivariate regression methods when applied to various simulated data as well as a real benchmark data set. When the data dimension is quite high compared to the sample size it is still possible to use meaningful notions of data depth along with the corresponding depth values to construct a robust estimator in a sparse setting.

  6. Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study.

    Science.gov (United States)

    Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P

    2014-06-26

    To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.

  7. Development and Application of Watershed Regressions for Pesticides (WARP) for Estimating Atrazine Concentration Distributions in Streams

    Science.gov (United States)

    Larson, Steven J.; Crawford, Charles G.; Gilliom, Robert J.

    2004-01-01

    Regression models were developed for predicting atrazine concentration distributions in rivers and streams, using the Watershed Regressions for Pesticides (WARP) methodology. Separate regression equations were derived for each of nine percentiles of the annual distribution of atrazine concentrations and for the annual time-weighted mean atrazine concentration. In addition, seasonal models were developed for two specific periods of the year--the high season, when the highest atrazine concentrations are expected in streams, and the low season, when concentrations are expected to be low or undetectable. Various nationally available watershed parameters were used as explanatory variables, including atrazine use intensity, soil characteristics, hydrologic parameters, climate and weather variables, land use, and agricultural management practices. Concentration data from 112 river and stream stations sampled as part of the U.S. Geological Survey's National Water-Quality Assessment and National Stream Quality Accounting Network Programs were used for computing the concentration percentiles and mean concentrations used as the response variables in regression models. Tobit regression methods, using maximum likelihood estimation, were used for developing the models because some of the concentration values used for the response variables were censored (reported as less than a detection threshold). Data from 26 stations not used for model development were used for model validation. The annual models accounted for 62 to 77 percent of the variability in concentrations among the 112 model development stations. Atrazine use intensity (the amount of atrazine used in the watershed divided by watershed area) was the most important explanatory variable in all models, but additional watershed parameters significantly increased the amount of variability explained by the models. Predicted concentrations from all 10 models were within a factor of 10 of the observed concentrations at most

  8. Estimating Engineering and Manufacturing Development Cost Risk Using Logistic and Multiple Regression

    National Research Council Canada - National Science Library

    Bielecki, John

    2003-01-01

    .... Previous research has demonstrated the use of a two-step logistic and multiple regression methodology to predicting cost growth produces desirable results versus traditional single-step regression...

  9. Bias in regression coefficient estimates upon different treatments of ...

    African Journals Online (AJOL)

    MS and PW consistently overestimated the population parameter. EM and RI, on the other hand, tended to consistently underestimate the population parameter under non-monotonic pattern. Keywords: Missing data, bias, regression, percent missing, non-normality, missing pattern > East African Journal of Statistics Vol.

  10. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

    Science.gov (United States)

    Granato, Gregory E.

    2006-01-01

    The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and

  11. Estimation of trabecular bone parameters in children from multisequence MRI using texture-based regression

    Energy Technology Data Exchange (ETDEWEB)

    Lekadir, Karim, E-mail: karim.lekadir@upf.edu; Hoogendoorn, Corné [Center for Computational Imaging and Simulation Technologies in Biomedicine, Universitat Pompeu Fabra, Barcelona 08018 (Spain); Armitage, Paul [The Academic Unit of Radiology, The University of Sheffield, Sheffield S10 2JF (United Kingdom); Whitby, Elspeth [The Academic Unit of Reproductive and Developmental Medicine, The University of Sheffield, Sheffield S10 2SF (United Kingdom); King, David [The Academic Unit of Child Health, The University of Sheffield, Sheffield S10 2TH (United Kingdom); Dimitri, Paul [The Mellanby Centre for Bone Research, The University of Sheffield, Sheffield S10 2RX (United Kingdom); Frangi, Alejandro F. [Center for Computational Imaging and Simulation Technologies in Biomedicine, The University of Sheffield, Sheffield S1 3JD (United Kingdom)

    2016-06-15

    Purpose: This paper presents a statistical approach for the prediction of trabecular bone parameters from low-resolution multisequence magnetic resonance imaging (MRI) in children, thus addressing the limitations of high-resolution modalities such as HR-pQCT, including the significant exposure of young patients to radiation and the limited applicability of such modalities to peripheral bones in vivo. Methods: A statistical predictive model is constructed from a database of MRI and HR-pQCT datasets, to relate the low-resolution MRI appearance in the cancellous bone to the trabecular parameters extracted from the high-resolution images. The description of the MRI appearance is achieved between subjects by using a collection of feature descriptors, which describe the texture properties inside the cancellous bone, and which are invariant to the geometry and size of the trabecular areas. The predictive model is built by fitting to the training data a nonlinear partial least square regression between the input MRI features and the output trabecular parameters. Results: Detailed validation based on a sample of 96 datasets shows correlations >0.7 between the trabecular parameters predicted from low-resolution multisequence MRI based on the proposed statistical model and the values extracted from high-resolution HRp-QCT. Conclusions: The obtained results indicate the promise of the proposed predictive technique for the estimation of trabecular parameters in children from multisequence MRI, thus reducing the need for high-resolution radiation-based scans for a fragile population that is under development and growth.

  12. Regression Association Analysis of Yield-Related Traits with RAPD Molecular Markers in Pistachio (Pistacia vera L.

    Directory of Open Access Journals (Sweden)

    Saeid Mirzaei

    2017-10-01

    molecular date (as independent variable and morphological data (as dependent variable was performed using multiple regression analysis to identify informative markers associated with the yield related traits. Multiple regression analysis was conducted using stepwise method of linear regression analysis option of SPSS. Student t-test was performed to assess significance difference between mean trait estimates of genotypes where specific markers were present and absent. Markers shown significant regression values were considered to be associated with the trait under consideration. Results and Discussion: Finally 11 primers were polymorphic and a total of 56 pieces (loci were amplified that among these, 36 segments (64.29% showed polymorphism with an average of 5.09% per primers and the rate of this polymorphism ranged from at least 25% for AJ05 primer up to 87.5% for OPAD02 primer. Polymorphic information content ranged from 0.095 (AJ05 and OPAD14 to 0.39 (OPC05, with an average of 0.23. Stepwise regression analysis between molecular data and traits was performed to identify informative markers associated with yield component traits. Nineteen RAPD fragments were found associated with six yield related traits. Some of RAPD markers were associated with more than one trait in multiple regression analysis that may be due to pleiotropic effect of the linked quantitative trait locus on different traits. However, to better understand these relationships, preparation of segregating population and linkage mapping is necessary. Also, these results could be useful in marker-assisted breeding programs when no other genetic information is available. Conclusion: This investigation on molecular markers associated with yield traits in Pistachio has provided clues for identification of the genotypes with higher yield value. In breeding programs selection of quality material is often a time-consuming process, and thus marker-assisted selection could be of great useful in identification of

  13. Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)

    Science.gov (United States)

    Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul

    2018-05-01

    The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.

  14. Estimating the relative utility of screening mammography.

    Science.gov (United States)

    Abbey, Craig K; Eckstein, Miguel P; Boone, John M

    2013-05-01

    The concept of diagnostic utility is a fundamental component of signal detection theory, going back to some of its earliest works. Attaching utility values to the various possible outcomes of a diagnostic test should, in principle, lead to meaningful approaches to evaluating and comparing such systems. However, in many areas of medical imaging, utility is not used because it is presumed to be unknown. In this work, we estimate relative utility (the utility benefit of a detection relative to that of a correct rejection) for screening mammography using its known relation to the slope of a receiver operating characteristic (ROC) curve at the optimal operating point. The approach assumes that the clinical operating point is optimal for the goal of maximizing expected utility and therefore the slope at this point implies a value of relative utility for the diagnostic task, for known disease prevalence. We examine utility estimation in the context of screening mammography using the Digital Mammographic Imaging Screening Trials (DMIST) data. We show how various conditions can influence the estimated relative utility, including characteristics of the rating scale, verification time, probability model, and scope of the ROC curve fit. Relative utility estimates range from 66 to 227. We argue for one particular set of conditions that results in a relative utility estimate of 162 (±14%). This is broadly consistent with values in screening mammography determined previously by other means. At the disease prevalence found in the DMIST study (0.59% at 365-day verification), optimal ROC slopes are near unity, suggesting that utility-based assessments of screening mammography will be similar to those found using Youden's index.

  15. Top Incomes, Heavy Tails, and Rank-Size Regressions

    Directory of Open Access Journals (Sweden)

    Christian Schluter

    2018-03-01

    Full Text Available In economics, rank-size regressions provide popular estimators of tail exponents of heavy-tailed distributions. We discuss the properties of this approach when the tail of the distribution is regularly varying rather than strictly Pareto. The estimator then over-estimates the true value in the leading parametric income models (so the upper income tail is less heavy than estimated, which leads to test size distortions and undermines inference. For practical work, we propose a sensitivity analysis based on regression diagnostics in order to assess the likely impact of the distortion. The methods are illustrated using data on top incomes in the UK.

  16. A Note on Penalized Regression Spline Estimation in the Secondary Analysis of Case-Control Data

    KAUST Repository

    Gazioglu, Suzan

    2013-05-25

    Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y, X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.

  17. A Note on Penalized Regression Spline Estimation in the Secondary Analysis of Case-Control Data

    KAUST Repository

    Gazioglu, Suzan; Wei, Jiawei; Jennings, Elizabeth M.; Carroll, Raymond J.

    2013-01-01

    Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y, X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.

  18. Factors affecting CO_2 emissions in China’s agriculture sector: Evidence from geographically weighted regression model

    International Nuclear Information System (INIS)

    Xu, Bin; Lin, Boqiang

    2017-01-01

    China is currently the world's largest emitter of carbon dioxide. Considered as a large agricultural country, carbon emission in China’s agriculture sector keeps on growing rapidly. It is, therefore, of great importance to investigate the driving forces of carbon dioxide emissions in this sector. The traditional regression estimation can only get “average” and “global” parameter estimates; it excludes the “local” parameter estimates which vary across space in some spatial systems. Geographically weighted regression embeds the latitude and longitude of the sample data into the regression parameters, and uses the local weighted least squares method to estimate the parameters point–by–point. To reveal the nonstationary spatial effects of driving forces, geographically weighted regression model is employed in this paper. The results show that economic growth is positively correlated with emissions, with the impact in the western region being less than that in the central and eastern regions. Urbanization is positively related to emissions but produces opposite effects pattern. Energy intensity is also correlated with emissions, with a decreasing trend from the eastern region to the central and western regions. Therefore, policymakers should take full account of the spatial nonstationarity of driving forces in designing emission reduction policies. - Highlights: • We explore the driving forces of CO_2 emissions in the agriculture sector. • Urbanization is positively related to emissions but produces opposite effect pattern. • The effect of energy intensity declines from the eastern region to western region.

  19. Considering a non-polynomial basis for local kernel regression problem

    Science.gov (United States)

    Silalahi, Divo Dharma; Midi, Habshah

    2017-01-01

    A common used as solution for local kernel nonparametric regression problem is given using polynomial regression. In this study, we demonstrated the estimator and properties using maximum likelihood estimator for a non-polynomial basis such B-spline to replacing the polynomial basis. This estimator allows for flexibility in the selection of a bandwidth and a knot. The best estimator was selected by finding an optimal bandwidth and knot through minimizing the famous generalized validation function.

  20. Mixture of Regression Models with Single-Index

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2016-01-01

    In this article, we propose a class of semiparametric mixture regression models with single-index. We argue that many recently proposed semiparametric/nonparametric mixture regression models can be considered special cases of the proposed model. However, unlike existing semiparametric mixture regression models, the new pro- posed model can easily incorporate multivariate predictors into the nonparametric components. Backfitting estimates and the corresponding algorithms have been proposed for...

  1. The MIDAS Touch: Mixed Data Sampling Regression Models

    OpenAIRE

    Ghysels, Eric; Santa-Clara, Pedro; Valkanov, Rossen

    2004-01-01

    We introduce Mixed Data Sampling (henceforth MIDAS) regression models. The regressions involve time series data sampled at different frequencies. Technically speaking MIDAS models specify conditional expectations as a distributed lag of regressors recorded at some higher sampling frequencies. We examine the asymptotic properties of MIDAS regression estimation and compare it with traditional distributed lag models. MIDAS regressions have wide applicability in macroeconomics and �nance.

  2. Clearness index in cloudy days estimated with meteorological information by multiple regression analysis; Kisho joho wo riyoshita kaiki bunseki ni yoru dontenbi no seiten shisu no suitei

    Energy Technology Data Exchange (ETDEWEB)

    Nakagawa, S [Maizuru National College of Technology, Kyoto (Japan); Kenmoku, Y; Sakakibara, T [Toyohashi University of Technology, Aichi (Japan); Kawamoto, T [Shizuoka University, Shizuoka (Japan). Faculty of Engineering

    1996-10-27

    Study is under way for a more accurate solar radiation quantity prediction for the enhancement of solar energy utilization efficiency. Utilizing the technique of roughly estimating the day`s clearness index from forecast weather, the forecast weather (constituted of weather conditions such as `clear,` `cloudy,` etc., and adverbs or adjectives such as `afterward,` `temporary,` and `intermittent`) has been quantified relative to the clearness index. This index is named the `weather index` for the purpose of this article. The error high in rate in the weather index relates to cloudy days, which means a weather index falling in 0.2-0.5. It has also been found that there is a high correlation between the clearness index and the north-south wind direction component. A multiple regression analysis has been carried out, under the circumstances, for the estimation of clearness index from the maximum temperature and the north-south wind direction component. As compared with estimation of the clearness index on the basis only of the weather index, estimation using the weather index and maximum temperature achieves a 3% improvement throughout the year. It has also been learned that estimation by use of the weather index and north-south wind direction component enables a 2% improvement for summer and a 5% or higher improvement for winter. 2 refs., 6 figs., 4 tabs.

  3. Logistic quantile regression provides improved estimates for bounded avian counts: a case study of California Spotted Owl fledgling production

    Science.gov (United States)

    Brian S. Cade; Barry R. Noon; Rick D. Scherer; John J. Keane

    2017-01-01

    Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical...

  4. A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM

    Science.gov (United States)

    Nose, Takashi; Kobayashi, Takao

    In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.

  5. On concurvity in nonlinear and nonparametric regression models

    Directory of Open Access Journals (Sweden)

    Sonia Amodio

    2014-12-01

    Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.

  6. The APT model as reduced-rank regression

    NARCIS (Netherlands)

    Bekker, P.A.; Dobbelstein, P.; Wansbeek, T.J.

    Integrating the two steps of an arbitrage pricing theory (APT) model leads to a reduced-rank regression (RRR) model. So the results on RRR can be used to estimate APT models, making estimation very simple. We give a succinct derivation of estimation of RRR, derive the asymptotic variance of RRR

  7. Fuzzy multinomial logistic regression analysis: A multi-objective programming approach

    Science.gov (United States)

    Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan

    2017-05-01

    Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.

  8. From Rasch scores to regression

    DEFF Research Database (Denmark)

    Christensen, Karl Bang

    2006-01-01

    Rasch models provide a framework for measurement and modelling latent variables. Having measured a latent variable in a population a comparison of groups will often be of interest. For this purpose the use of observed raw scores will often be inadequate because these lack interval scale propertie....... This paper compares two approaches to group comparison: linear regression models using estimated person locations as outcome variables and latent regression models based on the distribution of the score....

  9. Regression model development and computational procedures to support estimation of real-time concentrations and loads of selected constituents in two tributaries to Lake Houston near Houston, Texas, 2005-9

    Science.gov (United States)

    Lee, Michael T.; Asquith, William H.; Oden, Timothy D.

    2012-01-01

    In December 2005, the U.S. Geological Survey (USGS), in cooperation with the City of Houston, Texas, began collecting discrete water-quality samples for nutrients, total organic carbon, bacteria (Escherichia coli and total coliform), atrazine, and suspended sediment at two USGS streamflow-gaging stations that represent watersheds contributing to Lake Houston (08068500 Spring Creek near Spring, Tex., and 08070200 East Fork San Jacinto River near New Caney, Tex.). Data from the discrete water-quality samples collected during 2005–9, in conjunction with continuously monitored real-time data that included streamflow and other physical water-quality properties (specific conductance, pH, water temperature, turbidity, and dissolved oxygen), were used to develop regression models for the estimation of concentrations of water-quality constituents of substantial source watersheds to Lake Houston. The potential explanatory variables included discharge (streamflow), specific conductance, pH, water temperature, turbidity, dissolved oxygen, and time (to account for seasonal variations inherent in some water-quality data). The response variables (the selected constituents) at each site were nitrite plus nitrate nitrogen, total phosphorus, total organic carbon, E. coli, atrazine, and suspended sediment. The explanatory variables provide easily measured quantities to serve as potential surrogate variables to estimate concentrations of the selected constituents through statistical regression. Statistical regression also facilitates accompanying estimates of uncertainty in the form of prediction intervals. Each regression model potentially can be used to estimate concentrations of a given constituent in real time. Among other regression diagnostics, the diagnostics used as indicators of general model reliability and reported herein include the adjusted R-squared, the residual standard error, residual plots, and p-values. Adjusted R-squared values for the Spring Creek models ranged

  10. Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing

    NARCIS (Netherlands)

    Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.

    2006-01-01

    The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval

  11. Using Structured Additive Regression Models to Estimate Risk Factors of Malaria: Analysis of 2010 Malawi Malaria Indicator Survey Data

    Science.gov (United States)

    Chirombo, James; Lowe, Rachel; Kazembe, Lawrence

    2014-01-01

    Background After years of implementing Roll Back Malaria (RBM) interventions, the changing landscape of malaria in terms of risk factors and spatial pattern has not been fully investigated. This paper uses the 2010 malaria indicator survey data to investigate if known malaria risk factors remain relevant after many years of interventions. Methods We adopted a structured additive logistic regression model that allowed for spatial correlation, to more realistically estimate malaria risk factors. Our model included child and household level covariates, as well as climatic and environmental factors. Continuous variables were modelled by assuming second order random walk priors, while spatial correlation was specified as a Markov random field prior, with fixed effects assigned diffuse priors. Inference was fully Bayesian resulting in an under five malaria risk map for Malawi. Results Malaria risk increased with increasing age of the child. With respect to socio-economic factors, the greater the household wealth, the lower the malaria prevalence. A general decline in malaria risk was observed as altitude increased. Minimum temperatures and average total rainfall in the three months preceding the survey did not show a strong association with disease risk. Conclusions The structured additive regression model offered a flexible extension to standard regression models by enabling simultaneous modelling of possible nonlinear effects of continuous covariates, spatial correlation and heterogeneity, while estimating usual fixed effects of categorical and continuous observed variables. Our results confirmed that malaria epidemiology is a complex interaction of biotic and abiotic factors, both at the individual, household and community level and that risk factors are still relevant many years after extensive implementation of RBM activities. PMID:24991915

  12. Multinomial Logistic Regression & Bootstrapping for Bayesian Estimation of Vertical Facies Prediction in Heterogeneous Sandstone Reservoirs

    Science.gov (United States)

    Al-Mudhafar, W. J.

    2013-12-01

    Precisely prediction of rock facies leads to adequate reservoir characterization by improving the porosity-permeability relationships to estimate the properties in non-cored intervals. It also helps to accurately identify the spatial facies distribution to perform an accurate reservoir model for optimal future reservoir performance. In this paper, the facies estimation has been done through Multinomial logistic regression (MLR) with respect to the well logs and core data in a well in upper sandstone formation of South Rumaila oil field. The entire independent variables are gamma rays, formation density, water saturation, shale volume, log porosity, core porosity, and core permeability. Firstly, Robust Sequential Imputation Algorithm has been considered to impute the missing data. This algorithm starts from a complete subset of the dataset and estimates sequentially the missing values in an incomplete observation by minimizing the determinant of the covariance of the augmented data matrix. Then, the observation is added to the complete data matrix and the algorithm continues with the next observation with missing values. The MLR has been chosen to estimate the maximum likelihood and minimize the standard error for the nonlinear relationships between facies & core and log data. The MLR is used to predict the probabilities of the different possible facies given each independent variable by constructing a linear predictor function having a set of weights that are linearly combined with the independent variables by using a dot product. Beta distribution of facies has been considered as prior knowledge and the resulted predicted probability (posterior) has been estimated from MLR based on Baye's theorem that represents the relationship between predicted probability (posterior) with the conditional probability and the prior knowledge. To assess the statistical accuracy of the model, the bootstrap should be carried out to estimate extra-sample prediction error by randomly

  13. A multiple regression analysis for accurate background subtraction in 99Tcm-DTPA renography

    International Nuclear Information System (INIS)

    Middleton, G.W.; Thomson, W.H.; Davies, I.H.; Morgan, A.

    1989-01-01

    A technique for accurate background subtraction in 99 Tc m -DTPA renography is described. The technique is based on a multiple regression analysis of the renal curves and separate heart and soft tissue curves which together represent background activity. It is compared, in over 100 renograms, with a previously described linear regression technique. Results show that the method provides accurate background subtraction, even in very poorly functioning kidneys, thus enabling relative renal filtration and excretion to be accurately estimated. (author)

  14. SNOW DEPTH ESTIMATION USING TIME SERIES PASSIVE MICROWAVE IMAGERY VIA GENETICALLY SUPPORT VECTOR REGRESSION (CASE STUDY URMIA LAKE BASIN

    Directory of Open Access Journals (Sweden)

    N. Zahir

    2015-12-01

    Full Text Available Lake Urmia is one of the most important ecosystems of the country which is on the verge of elimination. Many factors contribute to this crisis among them is the precipitation, paly important roll. Precipitation has many forms one of them is in the form of snow. The snow on Sahand Mountain is one of the main and important sources of the Lake Urmia’s water. Snow Depth (SD is vital parameters for estimating water balance for future year. In this regards, this study is focused on SD parameter using Special Sensor Microwave/Imager (SSM/I instruments on board the Defence Meteorological Satellite Program (DMSP F16. The usual statistical methods for retrieving SD include linear and non-linear ones. These methods used least square procedure to estimate SD model. Recently, kernel base methods widely used for modelling statistical problem. From these methods, the support vector regression (SVR is achieved the high performance for modelling the statistical problem. Examination of the obtained data shows the existence of outlier in them. For omitting these outliers, wavelet denoising method is applied. After the omission of the outliers it is needed to select the optimum bands and parameters for SVR. To overcome these issues, feature selection methods have shown a direct effect on improving the regression performance. We used genetic algorithm (GA for selecting suitable features of the SSMI bands in order to estimate SD model. The results for the training and testing data in Sahand mountain is [R²_TEST=0.9049 and RMSE= 6.9654] that show the high SVR performance.

  15. Analysis of γ spectra in airborne radioactivity measurements using multiple linear regressions

    International Nuclear Information System (INIS)

    Bao Min; Shi Quanlin; Zhang Jiamei

    2004-01-01

    This paper describes the net peak counts calculating of nuclide 137 Cs at 662 keV of γ spectra in airborne radioactivity measurements using multiple linear regressions. Mathematic model is founded by analyzing every factor that has contribution to Cs peak counts in spectra, and multiple linear regression function is established. Calculating process adopts stepwise regression, and the indistinctive factors are eliminated by F check. The regression results and its uncertainty are calculated using Least Square Estimation, then the Cs peak net counts and its uncertainty can be gotten. The analysis results for experimental spectrum are displayed. The influence of energy shift and energy resolution on the analyzing result is discussed. In comparison with the stripping spectra method, multiple linear regression method needn't stripping radios, and the calculating result has relation with the counts in Cs peak only, and the calculating uncertainty is reduced. (authors)

  16. Regression of esophageal varices and splenomegaly in two patients with hepatitis-C-related liver cirrhosis after interferon and ribavirin combination therapy

    Directory of Open Access Journals (Sweden)

    Soon Jae Lee

    2016-09-01

    Full Text Available Some recent studies have found regression of liver cirrhosis after antiviral therapy in patients with hepatitis C virus (HCV-related liver cirrhosis, but there have been no reports of complete regression of esophageal varices after interferon/peg-interferon and ribavirin combination therapy. We describe two cases of complete regression of esophageal varices and splenomegaly after interferon-alpha and ribavirin combination therapy in patients with HCV-related liver cirrhosis. Esophageal varices and splenomegaly regressed after 3 and 8 years of sustained virologic responses in cases 1 and 2, respectively. To our knowledge, this is the first study demonstrating that complications of liver cirrhosis, such as esophageal varices and splenomegaly, can regress after antiviral therapy in patients with HCV-related liver cirrhosis.

  17. Multitask Quantile Regression under the Transnormal Model.

    Science.gov (United States)

    Fan, Jianqing; Xue, Lingzhou; Zou, Hui

    2016-01-01

    We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ 1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.

  18. Integrating travel behavior with land use regression to estimate dynamic air pollution exposure in Hong Kong.

    Science.gov (United States)

    Tang, Robert; Tian, Linwei; Thach, Thuan-Quoc; Tsui, Tsz Him; Brauer, Michael; Lee, Martha; Allen, Ryan; Yuchi, Weiran; Lai, Poh-Chin; Wong, Paulina; Barratt, Benjamin

    2018-04-01

    Epidemiological studies typically use subjects' residential address to estimate individuals' air pollution exposure. However, in reality this exposure is rarely static as people move from home to work/study locations and commute during the day. Integrating mobility and time-activity data may reduce errors and biases, thereby improving estimates of health risks. To incorporate land use regression with movement and building infiltration data to estimate time-weighted air pollution exposures stratified by age, sex, and employment status for population subgroups in Hong Kong. A large population-representative survey (N = 89,385) was used to characterize travel behavior, and derive time-activity pattern for each subject. Infiltration factors calculated from indoor/outdoor monitoring campaigns were used to estimate micro-environmental concentrations. We evaluated dynamic and static (residential location-only) exposures in a staged modeling approach to quantify effects of each component. Higher levels of exposures were found for working adults and students due to increased mobility. Compared to subjects aged 65 or older, exposures to PM 2.5 , BC, and NO 2 were 13%, 39% and 14% higher, respectively for subjects aged below 18, and 3%, 18% and 11% higher, respectively for working adults. Exposures of females were approximately 4% lower than those of males. Dynamic exposures were around 20% lower than ambient exposures at residential addresses. The incorporation of infiltration and mobility increased heterogeneity in population exposure and allowed identification of highly exposed groups. The use of ambient concentrations may lead to exposure misclassification which introduces bias, resulting in lower effect estimates than 'true' exposures. Copyright © 2018 Elsevier Ltd. All rights reserved.

  19. Robust logistic regression to narrow down the winner's curse for rare and recessive susceptibility variants.

    Science.gov (United States)

    Kesselmeier, Miriam; Lorenzo Bermejo, Justo

    2017-11-01

    Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. Estimating Dbh of Trees Employing Multiple Linear Regression of the best Lidar-Derived Parameter Combination Automated in Python in a Natural Broadleaf Forest in the Philippines

    Science.gov (United States)

    Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.

    2016-06-01

    Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).

  1. Regression Model to Predict Global Solar Irradiance in Malaysia

    Directory of Open Access Journals (Sweden)

    Hairuniza Ahmed Kutty

    2015-01-01

    Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.

  2. An investigation of the speeding-related crash designation through crash narrative reviews sampled via logistic regression.

    Science.gov (United States)

    Fitzpatrick, Cole D; Rakasi, Saritha; Knodler, Michael A

    2017-01-01

    Speed is one of the most important factors in traffic safety as higher speeds are linked to increased crash risk and higher injury severities. Nearly a third of fatal crashes in the United States are designated as "speeding-related", which is defined as either "the driver behavior of exceeding the posted speed limit or driving too fast for conditions." While many studies have utilized the speeding-related designation in safety analyses, no studies have examined the underlying accuracy of this designation. Herein, we investigate the speeding-related crash designation through the development of a series of logistic regression models that were derived from the established speeding-related crash typologies and validated using a blind review, by multiple researchers, of 604 crash narratives. The developed logistic regression model accurately identified crashes which were not originally designated as speeding-related but had crash narratives that suggested speeding as a causative factor. Only 53.4% of crashes designated as speeding-related contained narratives which described speeding as a causative factor. Further investigation of these crashes revealed that the driver contributing code (DCC) of "driving too fast for conditions" was being used in three separate situations. Additionally, this DCC was also incorrectly used when "exceeding the posted speed limit" would likely have been a more appropriate designation. Finally, it was determined that the responding officer only utilized one DCC in 82% of crashes not designated as speeding-related but contained a narrative indicating speed as a contributing causal factor. The use of logistic regression models based upon speeding-related crash typologies offers a promising method by which all possible speeding-related crashes could be identified. Published by Elsevier Ltd.

  3. Multivariate and semiparametric kernel regression

    OpenAIRE

    Härdle, Wolfgang; Müller, Marlene

    1997-01-01

    The paper gives an introduction to theory and application of multivariate and semiparametric kernel smoothing. Multivariate nonparametric density estimation is an often used pilot tool for examining the structure of data. Regression smoothing helps in investigating the association between covariates and responses. We concentrate on kernel smoothing using local polynomial fitting which includes the Nadaraya-Watson estimator. Some theory on the asymptotic behavior and bandwidth selection is pro...

  4. The microcomputer scientific software series 2: general linear model--regression.

    Science.gov (United States)

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  5. Estimation of operational parameters for a direct injection turbocharged spark ignition engine by using regression analysis and artificial neural network

    Directory of Open Access Journals (Sweden)

    Tosun Erdi

    2017-01-01

    Full Text Available This study was aimed at estimating the variation of several engine control parameters within the rotational speed-load map, using regression analysis and artificial neural network techniques. Duration of injection, specific fuel consumption, exhaust gas at turbine inlet, and within the catalytic converter brick were chosen as the output parameters for the models, while engine speed and brake mean effective pressure were selected as independent variables for prediction. Measurements were performed on a turbocharged direct injection spark ignition engine fueled with gasoline. A three-layer feed-forward structure and back-propagation algorithm was used for training the artificial neural network. It was concluded that this technique is capable of predicting engine parameters with better accuracy than linear and non-linear regression techniques.

  6. [Application of detecting and taking overdispersion into account in Poisson regression model].

    Science.gov (United States)

    Bouche, G; Lepage, B; Migeot, V; Ingrand, P

    2009-08-01

    Researchers often use the Poisson regression model to analyze count data. Overdispersion can occur when a Poisson regression model is used, resulting in an underestimation of variance of the regression model parameters. Our objective was to take overdispersion into account and assess its impact with an illustration based on the data of a study investigating the relationship between use of the Internet to seek health information and number of primary care consultations. Three methods, overdispersed Poisson, a robust estimator, and negative binomial regression, were performed to take overdispersion into account in explaining variation in the number (Y) of primary care consultations. We tested overdispersion in the Poisson regression model using the ratio of the sum of Pearson residuals over the number of degrees of freedom (chi(2)/df). We then fitted the three models and compared parameter estimation to the estimations given by Poisson regression model. Variance of the number of primary care consultations (Var[Y]=21.03) was greater than the mean (E[Y]=5.93) and the chi(2)/df ratio was 3.26, which confirmed overdispersion. Standard errors of the parameters varied greatly between the Poisson regression model and the three other regression models. Interpretation of estimates from two variables (using the Internet to seek health information and single parent family) would have changed according to the model retained, with significant levels of 0.06 and 0.002 (Poisson), 0.29 and 0.09 (overdispersed Poisson), 0.29 and 0.13 (use of a robust estimator) and 0.45 and 0.13 (negative binomial) respectively. Different methods exist to solve the problem of underestimating variance in the Poisson regression model when overdispersion is present. The negative binomial regression model seems to be particularly accurate because of its theorical distribution ; in addition this regression is easy to perform with ordinary statistical software packages.

  7. Estimation of Production KWS Maize Hybrids Using Nonlinear Regression

    Directory of Open Access Journals (Sweden)

    Florica MORAR

    2018-06-01

    Full Text Available This article approaches the model of non-linear regression and the method of smallest squares with examples, including calculations for the model of logarithmic function. This required data obtained from a study which involved the observation of the phases of growth and development in KWS maize hybrids in order to analyze the influence of the MMB quality indicator on grain production per hectare.

  8. Statistical approach for selection of regression model during validation of bioanalytical method

    Directory of Open Access Journals (Sweden)

    Natalija Nakov

    2014-06-01

    Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.

  9. Allelic drop-out probabilities estimated by logistic regression

    DEFF Research Database (Denmark)

    Tvedebrink, Torben; Eriksen, Poul Svante; Asplund, Maria

    2012-01-01

    We discuss the model for estimating drop-out probabilities presented by Tvedebrink et al. [7] and the concerns, that have been raised. The criticism of the model has demonstrated that the model is not perfect. However, the model is very useful for advanced forensic genetic work, where allelic drop-out...... is occurring. With this discussion, we hope to improve the drop-out model, so that it can be used for practical forensic genetics and stimulate further discussions. We discuss how to estimate drop-out probabilities when using a varying number of PCR cycles and other experimental conditions....

  10. How Do Different Aspects of Spatial Skills Relate to Early Arithmetic and Number Line Estimation?

    Directory of Open Access Journals (Sweden)

    Véronique Cornu

    2017-12-01

    Full Text Available The present study investigated the predictive role of spatial skills for arithmetic and number line estimation in kindergarten children (N = 125. Spatial skills are known to be related to mathematical development, but due to the construct’s non-unitary nature, different aspects of spatial skills need to be differentiated. In the present study, a spatial orientation task, a spatial visualization task and visuo-motor integration task were administered to assess three different aspects of spatial skills. Furthermore, we assessed counting abilities, knowledge of Arabic numerals, quantitative knowledge, as well as verbal working memory and verbal intelligence in kindergarten. Four months later, the same children performed an arithmetic and a number line estimation task to evaluate how the abilities measured at Time 1 predicted early mathematics outcomes. Hierarchical regression analysis revealed that children’s performance in arithmetic was predicted by their performance on the spatial orientation and visuo-motor integration task, as well as their knowledge of the Arabic numerals. Performance in number line estimation was significantly predicted by the children’s spatial orientation performance. Our findings emphasize the role of spatial skills, notably spatial orientation, in mathematical development. The relation between spatial orientation and arithmetic was partially mediated by the number line estimation task. Our results further show that some aspects of spatial skills might be more predictive of mathematical development than others, underlining the importance to differentiate within the construct of spatial skills when it comes to understanding numerical development.

  11. Alternative Methods of Regression

    CERN Document Server

    Birkes, David

    2011-01-01

    Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s

  12. EMD-regression for modelling multi-scale relationships, and application to weather-related cardiovascular mortality

    Science.gov (United States)

    Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.

    2018-01-01

    In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.

  13. Spatial correlation in Bayesian logistic regression with misclassification

    DEFF Research Database (Denmark)

    Bihrmann, Kristine; Toft, Nils; Nielsen, Søren Saxmose

    2014-01-01

    Standard logistic regression assumes that the outcome is measured perfectly. In practice, this is often not the case, which could lead to biased estimates if not accounted for. This study presents Bayesian logistic regression with adjustment for misclassification of the outcome applied to data...

  14. Panel data specifications in nonparametric kernel regression

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...

  15. Establishment of regression dependences. Linear and nonlinear dependences

    International Nuclear Information System (INIS)

    Onishchenko, A.M.

    1994-01-01

    The main problems of determination of linear and 19 types of nonlinear regression dependences are completely discussed. It is taken into consideration that total dispersions are the sum of measurement dispersions and parameter variation dispersions themselves. Approaches to all dispersions determination are described. It is shown that the least square fit gives inconsistent estimation for industrial objects and processes. The correction methods by taking into account comparable measurement errors for both variable give an opportunity to obtain consistent estimation for the regression equation parameters. The condition of the correction technique application expediency is given. The technique for determination of nonlinear regression dependences taking into account the dependence form and comparable errors of both variables is described. 6 refs., 1 tab

  16. Relative Pose Estimation Algorithm with Gyroscope Sensor

    Directory of Open Access Journals (Sweden)

    Shanshan Wei

    2016-01-01

    Full Text Available This paper proposes a novel vision and inertial fusion algorithm S2fM (Simplified Structure from Motion for camera relative pose estimation. Different from current existing algorithms, our algorithm estimates rotation parameter and translation parameter separately. S2fM employs gyroscopes to estimate camera rotation parameter, which is later fused with the image data to estimate camera translation parameter. Our contributions are in two aspects. (1 Under the circumstance that no inertial sensor can estimate accurately enough translation parameter, we propose a translation estimation algorithm by fusing gyroscope sensor and image data. (2 Our S2fM algorithm is efficient and suitable for smart devices. Experimental results validate efficiency of the proposed S2fM algorithm.

  17. Estimation of genotype X environment interactions, in a grassbased system, for milk yield, body condition score,and body weight using random regression models

    NARCIS (Netherlands)

    Berry, D.P.; Buckley, F.; Dillon, P.; Evans, R.D.; Rath, M.; Veerkamp, R.F.

    2003-01-01

    (Co)variance components for milk yield, body condition score (BCS), body weight (BW), BCS change and BW change over different herd-year mean milk yields (HMY) and nutritional environments (concentrate feeding level, grazing severity and silage quality) were estimated using a random regression model.

  18. A regression approach for zircaloy-2 in-reactor creep constitutive equations

    International Nuclear Information System (INIS)

    Yung Liu, Y.; Bement, A.L.

    1977-01-01

    In this paper the methodology of multiple regressions as applied to zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. From data analysis and model development point of views, both the assumption of independence and prior committment to specific model forms are unacceptable. One would desire means which can not only estimate the required parameters directly from data but also provide basis for model selections, viz., one model against others. Basic understanding of the physics of deformation is important in choosing the forms of starting physical model equations, but the justifications must rely on their abilities in correlating the overall data. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) when there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets, (2) regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections

  19. The Initial Regression Statistical Characteristics of Intervals Between Zeros of Random Processes

    Directory of Open Access Journals (Sweden)

    V. K. Hohlov

    2014-01-01

    Full Text Available The article substantiates the initial regression statistical characteristics of intervals between zeros of realizing random processes, studies their properties allowing the use these features in the autonomous information systems (AIS of near location (NL. Coefficients of the initial regression (CIR to minimize the residual sum of squares of multiple initial regression views are justified on the basis of vector representations associated with a random vector notion of analyzed signal parameters. It is shown that even with no covariance-based private CIR it is possible to predict one random variable through another with respect to the deterministic components. The paper studies dependences of CIR interval sizes between zeros of the narrowband stationary in wide-sense random process with its energy spectrum. Particular CIR for random processes with Gaussian and rectangular energy spectra are obtained. It is shown that the considered CIRs do not depend on the average frequency of spectra, are determined by the relative bandwidth of the energy spectra, and weakly depend on the type of spectrum. CIR properties enable its use as an informative parameter when implementing temporary regression methods of signal processing, invariant to the average rate and variance of the input implementations. We consider estimates of the average energy spectrum frequency of the random stationary process by calculating the length of the time interval corresponding to the specified number of intervals between zeros. It is shown that the relative variance in estimation of the average energy spectrum frequency of stationary random process with increasing relative bandwidth ceases to depend on the last process implementation in processing above ten intervals between zeros. The obtained results can be used in the AIS NL to solve the tasks of detection and signal recognition, when a decision is made in conditions of unknown mathematical expectations on a limited observation

  20. A novel relational regularization feature selection method for joint regression and classification in AD diagnosis.

    Science.gov (United States)

    Zhu, Xiaofeng; Suk, Heung-Il; Wang, Li; Lee, Seong-Whan; Shen, Dinggang

    2017-05-01

    In this paper, we focus on joint regression and classification for Alzheimer's disease diagnosis and propose a new feature selection method by embedding the relational information inherent in the observations into a sparse multi-task learning framework. Specifically, the relational information includes three kinds of relationships (such as feature-feature relation, response-response relation, and sample-sample relation), for preserving three kinds of the similarity, such as for the features, the response variables, and the samples, respectively. To conduct feature selection, we first formulate the objective function by imposing these three relational characteristics along with an ℓ 2,1 -norm regularization term, and further propose a computationally efficient algorithm to optimize the proposed objective function. With the dimension-reduced data, we train two support vector regression models to predict the clinical scores of ADAS-Cog and MMSE, respectively, and also a support vector classification model to determine the clinical label. We conducted extensive experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset to validate the effectiveness of the proposed method. Our experimental results showed the efficacy of the proposed method in enhancing the performances of both clinical scores prediction and disease status identification, compared to the state-of-the-art methods. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Blind estimation of a ship's relative wave heading

    DEFF Research Database (Denmark)

    Nielsen, Ulrik Dam; Iseki, Toshio

    2012-01-01

    This article proposes a method to estimate a ship’s relative heading against the waves. The procedure relies purely on ship- board measurements of global responses such as motion components, accelerations and the bending moment amidships. There is no particular (mathematical) model connected to t...... to the estimate, and therefore it is called a ’blind estimate’. The approach is in this introductory study tested by analysing simulated data. The analysis reveals that it is possible to estimate a ship’s relative heading on the basis of shipboard measurements only....

  2. Solving the Omitted Variables Problem of Regression Analysis Using the Relative Vertical Position of Observations

    Directory of Open Access Journals (Sweden)

    Jonathan E. Leightner

    2012-01-01

    Full Text Available The omitted variables problem is one of regression analysis’ most serious problems. The standard approach to the omitted variables problem is to find instruments, or proxies, for the omitted variables, but this approach makes strong assumptions that are rarely met in practice. This paper introduces best projection reiterative truncated projected least squares (BP-RTPLS, the third generation of a technique that solves the omitted variables problem without using proxies or instruments. This paper presents a theoretical argument that BP-RTPLS produces unbiased reduced form estimates when there are omitted variables. This paper also provides simulation evidence that shows OLS produces between 250% and 2450% more errors than BP-RTPLS when there are omitted variables and when measurement and round-off error is 1 percent or less. In an example, the government spending multiplier, , is estimated using annual data for the USA between 1929 and 2010.

  3. Gaussian Process Regression Model in Spatial Logistic Regression

    Science.gov (United States)

    Sofro, A.; Oktaviarina, A.

    2018-01-01

    Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.

  4. Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

    Directory of Open Access Journals (Sweden)

    Ade Widyaningsih

    2015-04-01

    Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.

  5. Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

    Directory of Open Access Journals (Sweden)

    Ade Widyaningsih

    2014-06-01

    Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.

  6. Estimation of nutrients and organic matter in Korean swine slurry using multiple regression analysis of physical and chemical properties.

    Science.gov (United States)

    Suresh, Arumuganainar; Choi, Hong Lim

    2011-10-01

    Swine waste land application has increased due to organic fertilization, but excess application in an arable system can cause environmental risk. Therefore, in situ characterizations of such resources are important prior to application. To explore this, 41 swine slurry samples were collected from Korea, and wide differences were observed in the physico-biochemical properties. However, significant (Phydrometer, EC meter, drying oven and pH meter were found useful to estimate Mn, Fe, Ca, K, Al, Na, N and 5-day biochemical oxygen demands (BOD₅) at improved R² values of 0.83, 0.82, 0.77, 0.75, 0.67, 0.47, 0.88 and 0.70, respectively. The results from this study suggest that multiple property regressions can facilitate the prediction of micronutrients and organic matter much better than a single property regression for livestock waste. Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. Mapping of the DLQI scores to EQ-5D utility values using ordinal logistic regression.

    Science.gov (United States)

    Ali, Faraz Mahmood; Kay, Richard; Finlay, Andrew Y; Piguet, Vincent; Kupfer, Joerg; Dalgard, Florence; Salek, M Sam

    2017-11-01

    The Dermatology Life Quality Index (DLQI) and the European Quality of Life-5 Dimension (EQ-5D) are separate measures that may be used to gather health-related quality of life (HRQoL) information from patients. The EQ-5D is a generic measure from which health utility estimates can be derived, whereas the DLQI is a specialty-specific measure to assess HRQoL. To reduce the burden of multiple measures being administered and to enable a more disease-specific calculation of health utility estimates, we explored an established mathematical technique known as ordinal logistic regression (OLR) to develop an appropriate model to map DLQI data to EQ-5D-based health utility estimates. Retrospective data from 4010 patients were randomly divided five times into two groups for the derivation and testing of the mapping model. Split-half cross-validation was utilized resulting in a total of ten ordinal logistic regression models for each of the five EQ-5D dimensions against age, sex, and all ten items of the DLQI. Using Monte Carlo simulation, predicted health utility estimates were derived and compared against those observed. This method was repeated for both OLR and a previously tested mapping methodology based on linear regression. The model was shown to be highly predictive and its repeated fitting demonstrated a stable model using OLR as well as linear regression. The mean differences between OLR-predicted health utility estimates and observed health utility estimates ranged from 0.0024 to 0.0239 across the ten modeling exercises, with an average overall difference of 0.0120 (a 1.6% underestimate, not of clinical importance). This modeling framework developed in this study will enable researchers to calculate EQ-5D health utility estimates from a specialty-specific study population, reducing patient and economic burden.

  8. Non-Invasive Methodology to Estimate Polyphenol Content in Extra Virgin Olive Oil Based on Stepwise Multilinear Regression.

    Science.gov (United States)

    Martínez Gila, Diego Manuel; Cano Marchal, Pablo; Gómez Ortega, Juan; Gámez García, Javier

    2018-03-25

    Normally the olive oil quality is assessed by chemical analysis according to international standards. These norms define chemical and organoleptic markers, and depending on the markers, the olive oil can be labelled as lampante, virgin, or extra virgin olive oil (EVOO), the last being an indicator of top quality. The polyphenol content is related to EVOO organoleptic features, and different scientific works have studied the positive influence that these compounds have on human health. The works carried out in this paper are focused on studying relations between the polyphenol content in olive oil samples and its spectral response in the near infrared spectra. In this context, several acquisition parameters have been assessed to optimize the measurement process within the virgin olive oil production process. The best regression model reached a mean error value of 156.14 mg/kg in leave one out cross validation, and the higher regression coefficient was 0.81 through holdout validation.

  9. Non-Invasive Methodology to Estimate Polyphenol Content in Extra Virgin Olive Oil Based on Stepwise Multilinear Regression

    Directory of Open Access Journals (Sweden)

    Diego Manuel Martínez Gila

    2018-03-01

    Full Text Available Normally the olive oil quality is assessed by chemical analysis according to international standards. These norms define chemical and organoleptic markers, and depending on the markers, the olive oil can be labelled as lampante, virgin, or extra virgin olive oil (EVOO, the last being an indicator of top quality. The polyphenol content is related to EVOO organoleptic features, and different scientific works have studied the positive influence that these compounds have on human health. The works carried out in this paper are focused on studying relations between the polyphenol content in olive oil samples and its spectral response in the near infrared spectra. In this context, several acquisition parameters have been assessed to optimize the measurement process within the virgin olive oil production process. The best regression model reached a mean error value of 156.14 mg/kg in leave one out cross validation, and the higher regression coefficient was 0.81 through holdout validation.

  10. Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis.

    Science.gov (United States)

    Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

    2015-01-01

    Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.

  11. Estimating and mapping forest biomass using regression models and Spot-6 images (case study: Hyrcanian forests of north of Iran).

    Science.gov (United States)

    Motlagh, Mohadeseh Ghanbari; Kafaky, Sasan Babaie; Mataji, Asadollah; Akhavan, Reza

    2018-05-21

    Hyrcanian forests of North of Iran are of great importance in terms of various economic and environmental aspects. In this study, Spot-6 satellite images and regression models were applied to estimate above-ground biomass in these forests. This research was carried out in six compartments in three climatic (semi-arid to humid) types and two altitude classes. In the first step, ground sampling methods at the compartment level were used to estimate aboveground biomass (Mg/ha). Then, by reviewing the results of other studies, the most appropriate vegetation indices were selected. In this study, three indices of NDVI, RVI, and TVI were calculated. We investigated the relationship between the vegetation indices and aboveground biomass measured at sample-plot level. Based on the results, the relationship between aboveground biomass values and vegetation indices was a linear regression with the highest level of significance for NDVI in all compartments. Since at the compartment level the correlation coefficient between NDVI and aboveground biomass was the highest, NDVI was used for mapping aboveground biomass. According to the results of this study, biomass values were highly different in various climatic and altitudinal classes with the highest biomass value observed in humid climate and high-altitude class.

  12. A regression approach for Zircaloy-2 in-reactor creep constitutive equations

    International Nuclear Information System (INIS)

    Yung Liu, Y.; Bement, A.L.

    1977-01-01

    In this paper the methodology of multiple regressions as applied to Zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor Zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) When there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets. (2) Regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections. Multiple regression analysis performed on a set of carefully selected Zircaloy-2 in-reactor creep data leads to a model which provides excellent correlations for the data. (Auth.)

  13. Regression models in the determination of the absorbed dose with extrapolation chamber for ophthalmological applicators

    International Nuclear Information System (INIS)

    Alvarez R, J.T.; Morales P, R.

    1992-06-01

    The absorbed dose for equivalent soft tissue is determined,it is imparted by ophthalmologic applicators, ( 90 Sr/ 90 Y, 1850 MBq) using an extrapolation chamber of variable electrodes; when estimating the slope of the extrapolation curve using a simple lineal regression model is observed that the dose values are underestimated from 17.7 percent up to a 20.4 percent in relation to the estimate of this dose by means of a regression model polynomial two grade, at the same time are observed an improvement in the standard error for the quadratic model until in 50%. Finally the global uncertainty of the dose is presented, taking into account the reproducibility of the experimental arrangement. As conclusion it can infers that in experimental arrangements where the source is to contact with the extrapolation chamber, it was recommended to substitute the lineal regression model by the quadratic regression model, in the determination of the slope of the extrapolation curve, for more exact and accurate measurements of the absorbed dose. (Author)

  14. The Use of Alternative Regression Methods in Social Sciences and the Comparison of Least Squares and M Estimation Methods in Terms of the Determination of Coefficient

    Science.gov (United States)

    Coskuntuncel, Orkun

    2013-01-01

    The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…

  15. A covariance correction that accounts for correlation estimation to improve finite-sample inference with generalized estimating equations: A study on its applicability with structured correlation matrices.

    Science.gov (United States)

    Westgate, Philip M

    2016-01-01

    When generalized estimating equations (GEE) incorporate an unstructured working correlation matrix, the variances of regression parameter estimates can inflate due to the estimation of the correlation parameters. In previous work, an approximation for this inflation that results in a corrected version of the sandwich formula for the covariance matrix of regression parameter estimates was derived. Use of this correction for correlation structure selection also reduces the over-selection of the unstructured working correlation matrix. In this manuscript, we conduct a simulation study to demonstrate that an increase in variances of regression parameter estimates can occur when GEE incorporates structured working correlation matrices as well. Correspondingly, we show the ability of the corrected version of the sandwich formula to improve the validity of inference and correlation structure selection. We also study the relative influences of two popular corrections to a different source of bias in the empirical sandwich covariance estimator.

  16. Time-resolved flow reconstruction with indirect measurements using regression models and Kalman-filtered POD ROM

    Science.gov (United States)

    Leroux, Romain; Chatellier, Ludovic; David, Laurent

    2018-01-01

    This article is devoted to the estimation of time-resolved particle image velocimetry (TR-PIV) flow fields using a time-resolved point measurements of a voltage signal obtained by hot-film anemometry. A multiple linear regression model is first defined to map the TR-PIV flow fields onto the voltage signal. Due to the high temporal resolution of the signal acquired by the hot-film sensor, the estimates of the TR-PIV flow fields are obtained with a multiple linear regression method called orthonormalized partial least squares regression (OPLSR). Subsequently, this model is incorporated as the observation equation in an ensemble Kalman filter (EnKF) applied on a proper orthogonal decomposition reduced-order model to stabilize it while reducing the effects of the hot-film sensor noise. This method is assessed for the reconstruction of the flow around a NACA0012 airfoil at a Reynolds number of 1000 and an angle of attack of {20}°. Comparisons with multi-time delay-modified linear stochastic estimation show that both the OPLSR and EnKF combined with OPLSR are more accurate as they produce a much lower relative estimation error, and provide a faithful reconstruction of the time evolution of the velocity flow fields.

  17. Piecewise linear regression splines with hyperbolic covariates

    International Nuclear Information System (INIS)

    Cologne, John B.; Sposto, Richard

    1992-09-01

    Consider the problem of fitting a curve to data that exhibit a multiphase linear response with smooth transitions between phases. We propose substituting hyperbolas as covariates in piecewise linear regression splines to obtain curves that are smoothly joined. The method provides an intuitive and easy way to extend the two-phase linear hyperbolic response model of Griffiths and Miller and Watts and Bacon to accommodate more than two linear segments. The resulting regression spline with hyperbolic covariates may be fit by nonlinear regression methods to estimate the degree of curvature between adjoining linear segments. The added complexity of fitting nonlinear, as opposed to linear, regression models is not great. The extra effort is particularly worthwhile when investigators are unwilling to assume that the slope of the response changes abruptly at the join points. We can also estimate the join points (the values of the abscissas where the linear segments would intersect if extrapolated) if their number and approximate locations may be presumed known. An example using data on changing age at menarche in a cohort of Japanese women illustrates the use of the method for exploratory data analysis. (author)

  18. Testing the Perturbation Sensitivity of Abortion-Crime Regressions

    Directory of Open Access Journals (Sweden)

    Michał Brzeziński

    2012-06-01

    Full Text Available The hypothesis that the legalisation of abortion contributed significantly to the reduction of crime in the United States in 1990s is one of the most prominent ideas from the recent “economics-made-fun” movement sparked by the book Freakonomics. This paper expands on the existing literature about the computational stability of abortion-crime regressions by testing the sensitivity of coefficients’ estimates to small amounts of data perturbation. In contrast to previous studies, we use a new data set on crime correlates for each of the US states, the original model specifica-tion and estimation methodology, and an improved data perturbation algorithm. We find that the coefficients’ estimates in abortion-crime regressions are not computationally stable and, therefore, are unreliable.

  19. Radiologic assessment of third molar tooth and spheno-occipital synchondrosis for age estimation: a multiple regression analysis study.

    Science.gov (United States)

    Demirturk Kocasarac, Husniye; Sinanoglu, Alper; Noujeim, Marcel; Helvacioglu Yigit, Dilek; Baydemir, Canan

    2016-05-01

    For forensic age estimation, radiographic assessment of third molar mineralization is important between 14 and 21 years which coincides with the legal age in most countries. The spheno-occipital synchondrosis (SOS) is an important growth site during development, and its use for age estimation is beneficial when combined with other markers. In this study, we aimed to develop a regression model to estimate and narrow the age range based on the radiologic assessment of third molar and SOS in a Turkish subpopulation. Panoramic radiographs and cone beam CT scans of 349 subjects (182 males, 167 females) with age between 8 and 25 were evaluated. Four-stage system was used to evaluate the fusion degree of SOS, and Demirjian's eight stages of development for calcification for third molars. The Pearson correlation indicated a strong positive relationship between age and third molar calcification for both sexes (r = 0.850 for females, r = 0.839 for males, P < 0.001) and also between age and SOS fusion for females (r = 0.814), but a moderate relationship was found for males (r = 0.599), P < 0.001). Based on the results obtained, an age determination formula using these scores was established.

  20. Survival analysis II: Cox regression

    NARCIS (Netherlands)

    Stel, Vianda S.; Dekker, Friedo W.; Tripepi, Giovanni; Zoccali, Carmine; Jager, Kitty J.

    2011-01-01

    In contrast to the Kaplan-Meier method, Cox proportional hazards regression can provide an effect estimate by quantifying the difference in survival between patient groups and can adjust for confounding effects of other variables. The purpose of this article is to explain the basic concepts of the

  1. Stochastic development regression on non-linear manifolds

    DEFF Research Database (Denmark)

    Kühnel, Line; Sommer, Stefan Horst

    2017-01-01

    We introduce a regression model for data on non-linear manifolds. The model describes the relation between a set of manifold valued observations, such as shapes of anatomical objects, and Euclidean explanatory variables. The approach is based on stochastic development of Euclidean diffusion...... processes to the manifold. Defining the data distribution as the transition distribution of the mapped stochastic process, parameters of the model, the non-linear analogue of design matrix and intercept, are found via maximum likelihood. The model is intrinsically related to the geometry encoded...... in the connection of the manifold. We propose an estimation procedure which applies the Laplace approximation of the likelihood function. A simulation study of the performance of the model is performed and the model is applied to a real dataset of Corpus Callosum shapes....

  2. Multiple regression for physiological data analysis: the problem of multicollinearity.

    Science.gov (United States)

    Slinker, B K; Glantz, S A

    1985-07-01

    Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.

  3. Censored Hurdle Negative Binomial Regression (Case Study: Neonatorum Tetanus Case in Indonesia)

    Science.gov (United States)

    Yuli Rusdiana, Riza; Zain, Ismaini; Wulan Purnami, Santi

    2017-06-01

    Hurdle negative binomial model regression is a method that can be used for discreate dependent variable, excess zero and under- and overdispersion. It uses two parts approach. The first part estimates zero elements from dependent variable is zero hurdle model and the second part estimates not zero elements (non-negative integer) from dependent variable is called truncated negative binomial models. The discrete dependent variable in such cases is censored for some values. The type of censor that will be studied in this research is right censored. This study aims to obtain the parameter estimator hurdle negative binomial regression for right censored dependent variable. In the assessment of parameter estimation methods used Maximum Likelihood Estimator (MLE). Hurdle negative binomial model regression for right censored dependent variable is applied on the number of neonatorum tetanus cases in Indonesia. The type data is count data which contains zero values in some observations and other variety value. This study also aims to obtain the parameter estimator and test statistic censored hurdle negative binomial model. Based on the regression results, the factors that influence neonatorum tetanus case in Indonesia is the percentage of baby health care coverage and neonatal visits.

  4. Estimation of Genetic Parameters for First Lactation Monthly Test-day Milk Yields using Random Regression Test Day Model in Karan Fries Cattle

    Directory of Open Access Journals (Sweden)

    Ajay Singh

    2016-06-01

    Full Text Available A single trait linear mixed random regression test-day model was applied for the first time for analyzing the first lactation monthly test-day milk yield records in Karan Fries cattle. The test-day milk yield data was modeled using a random regression model (RRM considering different order of Legendre polynomial for the additive genetic effect (4th order and the permanent environmental effect (5th order. Data pertaining to 1,583 lactation records spread over a period of 30 years were recorded and analyzed in the study. The variance component, heritability and genetic correlations among test-day milk yields were estimated using RRM. RRM heritability estimates of test-day milk yield varied from 0.11 to 0.22 in different test-day records. The estimates of genetic correlations between different test-day milk yields ranged 0.01 (test-day 1 [TD-1] and TD-11 to 0.99 (TD-4 and TD-5. The magnitudes of genetic correlations between test-day milk yields decreased as the interval between test-days increased and adjacent test-day had higher correlations. Additive genetic and permanent environment variances were higher for test-day milk yields at both ends of lactation. The residual variance was observed to be lower than the permanent environment variance for all the test-day milk yields.

  5. Modeling daily soil temperature over diverse climate conditions in Iran—a comparison of multiple linear regression and support vector regression techniques

    Science.gov (United States)

    Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan

    2018-02-01

    The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.

  6. A Model for Shovel Capital Cost Estimation, Using a Hybrid Model of Multivariate Regression and Neural Networks

    Directory of Open Access Journals (Sweden)

    Abdolreza Yazdani-Chamzini

    2017-12-01

    Full Text Available Cost estimation is an essential issue in feasibility studies in civil engineering. Many different methods can be applied to modelling costs. These methods can be divided into several main groups: (1 artificial intelligence, (2 statistical methods, and (3 analytical methods. In this paper, the multivariate regression (MVR method, which is one of the most popular linear models, and the artificial neural network (ANN method, which is widely applied to solving different prediction problems with a high degree of accuracy, have been combined to provide a cost estimate model for a shovel machine. This hybrid methodology is proposed, taking the advantages of MVR and ANN models in linear and nonlinear modelling, respectively. In the proposed model, the unique advantages of the MVR model in linear modelling are used first to recognize the existing linear structure in data, and, then, the ANN for determining nonlinear patterns in preprocessed data is applied. The results with three indices indicate that the proposed model is efficient and capable of increasing the prediction accuracy.

  7. An introduction to using Bayesian linear regression with clinical data.

    Science.gov (United States)

    Baldwin, Scott A; Larson, Michael J

    2017-11-01

    Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Estimation of Total Nitrogen and Phosphorus in New England Streams Using Spatially Referenced Regression Models

    Science.gov (United States)

    Moore, Richard Bridge; Johnston, Craig M.; Robinson, Keith W.; Deacon, Jeffrey R.

    2004-01-01

    The U.S. Geological Survey (USGS), in cooperation with the U.S. Environmental Protection Agency (USEPA) and the New England Interstate Water Pollution Control Commission (NEIWPCC), has developed a water-quality model, called SPARROW (Spatially Referenced Regressions on Watershed Attributes), to assist in regional total maximum daily load (TMDL) and nutrient-criteria activities in New England. SPARROW is a spatially detailed, statistical model that uses regression equations to relate total nitrogen and phosphorus (nutrient) stream loads to nutrient sources and watershed characteristics. The statistical relations in these equations are then used to predict nutrient loads in unmonitored streams. The New England SPARROW models are built using a hydrologic network of 42,000 stream reaches and associated watersheds. Watershed boundaries are defined for each stream reach in the network through the use of a digital elevation model and existing digitized watershed divides. Nutrient source data is from permitted wastewater discharge data from USEPA's Permit Compliance System (PCS), various land-use sources, and atmospheric deposition. Physical watershed characteristics include drainage area, land use, streamflow, time-of-travel, stream density, percent wetlands, slope of the land surface, and soil permeability. The New England SPARROW models for total nitrogen and total phosphorus have R-squared values of 0.95 and 0.94, with mean square errors of 0.16 and 0.23, respectively. Variables that were statistically significant in the total nitrogen model include permitted municipal-wastewater discharges, atmospheric deposition, agricultural area, and developed land area. Total nitrogen stream-loss rates were significant only in streams with average annual flows less than or equal to 2.83 cubic meters per second. In streams larger than this, there is nondetectable in-stream loss of annual total nitrogen in New England. Variables that were statistically significant in the total

  9. Asymptotic theory for regressions with smoothly changing parameters

    DEFF Research Database (Denmark)

    Hillebrand, Eric; Medeiros, Marcelo; Xu, Junyue

    2013-01-01

    We derive asymptotic properties of the quasi maximum likelihood estimator of smooth transition regressions when time is the transition variable. The consistency of the estimator and its asymptotic distribution are examined. It is shown that the estimator converges at the usual pT-rate and has...... an asymptotically normal distribution. Finite sample properties of the estimator are explored in simulations. We illustrate with an application to US inflation and output data....

  10. Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors

    Directory of Open Access Journals (Sweden)

    Xibin Zhang

    2016-04-01

    Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.

  11. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    Science.gov (United States)

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  12. Kernel regression with functional response

    OpenAIRE

    Ferraty, Frédéric; Laksaci, Ali; Tadj, Amel; Vieu, Philippe

    2011-01-01

    We consider kernel regression estimate when both the response variable and the explanatory one are functional. The rates of uniform almost complete convergence are stated as function of the small ball probability of the predictor and as function of the entropy of the set on which uniformity is obtained.

  13. Use of empirical likelihood to calibrate auxiliary information in partly linear monotone regression models.

    Science.gov (United States)

    Chen, Baojiang; Qin, Jing

    2014-05-10

    In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.

  14. A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data.

    Science.gov (United States)

    Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E

    2013-06-01

    Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric

  15. A Comparison of Advanced Regression Algorithms for Quantifying Urban Land Cover

    Directory of Open Access Journals (Sweden)

    Akpona Okujeni

    2014-07-01

    Full Text Available Quantitative methods for mapping sub-pixel land cover fractions are gaining increasing attention, particularly with regard to upcoming hyperspectral satellite missions. We evaluated five advanced regression algorithms combined with synthetically mixed training data for quantifying urban land cover from HyMap data at 3.6 and 9 m spatial resolution. Methods included support vector regression (SVR, kernel ridge regression (KRR, artificial neural networks (NN, random forest regression (RFR and partial least squares regression (PLSR. Our experiments demonstrate that both kernel methods SVR and KRR yield high accuracies for mapping complex urban surface types, i.e., rooftops, pavements, grass- and tree-covered areas. SVR and KRR models proved to be stable with regard to the spatial and spectral differences between both images and effectively utilized the higher complexity of the synthetic training mixtures for improving estimates for coarser resolution data. Observed deficiencies mainly relate to known problems arising from spectral similarities or shadowing. The remaining regressors either revealed erratic (NN or limited (RFR and PLSR performances when comprehensively mapping urban land cover. Our findings suggest that the combination of kernel-based regression methods, such as SVR and KRR, with synthetically mixed training data is well suited for quantifying urban land cover from imaging spectrometer data at multiple scales.

  16. Combination of supervised and semi-supervised regression models for improved unbiased estimation

    DEFF Research Database (Denmark)

    Arenas-Garía, Jeronimo; Moriana-Varo, Carlos; Larsen, Jan

    2010-01-01

    In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised and semisupervi......In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised...

  17. Regression Models for Market-Shares

    DEFF Research Database (Denmark)

    Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue

    2005-01-01

    On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the interpretat......On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put...... on the interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....

  18. Factoring vs linear modeling in rate estimation: a simulation study of relative accuracy.

    Science.gov (United States)

    Maldonado, G; Greenland, S

    1998-07-01

    A common strategy for modeling dose-response in epidemiology is to transform ordered exposures and covariates into sets of dichotomous indicator variables (that is, to factor the variables). Factoring tends to increase estimation variance, but it also tends to decrease bias and thus may increase or decrease total accuracy. We conducted a simulation study to examine the impact of factoring on the accuracy of rate estimation. Factored and unfactored Poisson regression models were fit to follow-up study datasets that were randomly generated from 37,500 population model forms that ranged from subadditive to supramultiplicative. In the situations we examined, factoring sometimes substantially improved accuracy relative to fitting the corresponding unfactored model, sometimes substantially decreased accuracy, and sometimes made little difference. The difference in accuracy between factored and unfactored models depended in a complicated fashion on the difference between the true and fitted model forms, the strength of exposure and covariate effects in the population, and the study size. It may be difficult in practice to predict when factoring is increasing or decreasing accuracy. We recommend, therefore, that the strategy of factoring variables be supplemented with other strategies for modeling dose-response.

  19. Asymptotic Theory for Regressions with Smoothly Changing Parameters

    DEFF Research Database (Denmark)

    Hillebrand, Eric Tobias; Medeiros, Marcelo C.; Xu, Junyue

    We derive asymptotic properties of the quasi maximum likelihood estimator of smooth transition regressions when time is the transition variable. The consistency of the estimator and its asymptotic distribution are examined. It is shown that the estimator converges at the usual square-root-of-T rate...... and has an asymptotically normal distribution. Finite sample properties of the estimator are explored in simulations. We illustrate with an application to US inflation and output data....

  20. Regression and direct methods do not give different estimates of digestible and metabolizable energy values of barley, sorghum, and wheat for pigs.

    Science.gov (United States)

    Bolarinwa, O A; Adeola, O

    2016-02-01

    Direct or indirect methods can be used to determine the DE and ME of feed ingredients for pigs. In situations when only the indirect approach is suitable, the regression method presents a robust indirect approach. Three experiments were conducted to compare the direct and regression methods for determining the DE and ME values of barley, sorghum, and wheat for pigs. In each experiment, 24 barrows with an average initial BW of 31, 32, and 33 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g barley, sorghum, or wheat/kg plus minerals and vitamins for the direct method; a corn-soybean meal reference diet (RD); the RD + 300 g barley, sorghum, or wheat/kg; and the RD + 600 g barley, sorghum, or wheat/kg. The 3 corn-soybean meal diets were used for the regression method. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d period of total but separate collection of feces and urine in each experiment. Graded substitution of barley or wheat, but not sorghum, into the RD linearly reduced ( direct method-derived DE and ME for barley were 3,669 and 3,593 kcal/kg DM, respectively. The regressions of barley contribution to DE and ME in kilocalories against the quantity of barley DMI in kilograms generated 3,746 kcal DE/kg DM and 3,647 kcal ME/kg DM. The DE and ME for sorghum by the direct method were 4,097 and 4,042 kcal/kg DM, respectively; the corresponding regression-derived estimates were 4,145 and 4,066 kcal/kg DM. Using the direct method, energy values for wheat were 3,953 kcal DE/kg DM and 3,889 kcal ME/kg DM. The regressions of wheat contribution to DE and ME in kilocalories against the quantity of wheat DMI in kilograms generated 3,960 kcal DE/kg DM and 3,874 kcal ME/kg DM. The DE and ME of barley using the direct method were not different (0.3 direct method-derived DE and ME of sorghum were not different (0.5 direct method- and regression method-derived DE (3,953 and 3

  1. Physiologic noise regression, motion regression, and TOAST dynamic field correction in complex-valued fMRI time series.

    Science.gov (United States)

    Hahn, Andrew D; Rowe, Daniel B

    2012-02-01

    As more evidence is presented suggesting that the phase, as well as the magnitude, of functional MRI (fMRI) time series may contain important information and that there are theoretical drawbacks to modeling functional response in the magnitude alone, removing noise in the phase is becoming more important. Previous studies have shown that retrospective correction of noise from physiologic sources can remove significant phase variance and that dynamic main magnetic field correction and regression of estimated motion parameters also remove significant phase fluctuations. In this work, we investigate the performance of physiologic noise regression in a framework along with correction for dynamic main field fluctuations and motion regression. Our findings suggest that including physiologic regressors provides some benefit in terms of reduction in phase noise power, but it is small compared to the benefit of dynamic field corrections and use of estimated motion parameters as nuisance regressors. Additionally, we show that the use of all three techniques reduces phase variance substantially, removes undesirable spatial phase correlations and improves detection of the functional response in magnitude and phase. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. Estimating maneuvers for precise relative orbit determination using GPS

    Science.gov (United States)

    Allende-Alba, Gerardo; Montenbruck, Oliver; Ardaens, Jean-Sébastien; Wermuth, Martin; Hugentobler, Urs

    2017-01-01

    Precise relative orbit determination is an essential element for the generation of science products from distributed instrumentation of formation flying satellites in low Earth orbit. According to the mission profile, the required formation is typically maintained and/or controlled by executing maneuvers. In order to generate consistent and precise orbit products, a strategy for maneuver handling is mandatory in order to avoid discontinuities or precision degradation before, after and during maneuver execution. Precise orbit determination offers the possibility of maneuver estimation in an adjustment of single-satellite trajectories using GPS measurements. However, a consistent formulation of a precise relative orbit determination scheme requires the implementation of a maneuver estimation strategy which can be used, in addition, to improve the precision of maneuver estimates by drawing upon the use of differential GPS measurements. The present study introduces a method for precise relative orbit determination based on a reduced-dynamic batch processing of differential GPS pseudorange and carrier phase measurements, which includes maneuver estimation as part of the relative orbit adjustment. The proposed method has been validated using flight data from space missions with different rates of maneuvering activity, including the GRACE, TanDEM-X and PRISMA missions. The results show the feasibility of obtaining precise relative orbits without degradation in the vicinity of maneuvers as well as improved maneuver estimates that can be used for better maneuver planning in flight dynamics operations.

  3. Parametric Bayesian Estimation of Differential Entropy and Relative Entropy

    Directory of Open Access Journals (Sweden)

    Maya Gupta

    2010-04-01

    Full Text Available Given iid samples drawn from a distribution with known parametric form, we propose the minimization of expected Bregman divergence to form Bayesian estimates of differential entropy and relative entropy, and derive such estimators for the uniform, Gaussian, Wishart, and inverse Wishart distributions. Additionally, formulas are given for a log gamma Bregman divergence and the differential entropy and relative entropy for the Wishart and inverse Wishart. The results, as always with Bayesian estimates, depend on the accuracy of the prior parameters, but example simulations show that the performance can be substantially improved compared to maximum likelihood or state-of-the-art nonparametric estimators.

  4. Bayesian ARTMAP for regression.

    Science.gov (United States)

    Sasu, L M; Andonie, R

    2013-10-01

    Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. Copyright © 2013 Elsevier Ltd. All rights reserved.

  5. Closed-Loop Surface Related Multiple Estimation

    NARCIS (Netherlands)

    Lopez Angarita, G.A.

    2016-01-01

    Surface-related multiple elimination (SRME) is one of the most commonly used methods for suppressing surface multiples. However, in order to obtain an accurate surface multiple estimation, dense source and receiver sampling is required. The traditional approach to this problem is performing data

  6. Geographically weighted regression and multicollinearity: dispelling the myth

    Science.gov (United States)

    Fotheringham, A. Stewart; Oshan, Taylor M.

    2016-10-01

    Geographically weighted regression (GWR) extends the familiar regression framework by estimating a set of parameters for any number of locations within a study area, rather than producing a single parameter estimate for each relationship specified in the model. Recent literature has suggested that GWR is highly susceptible to the effects of multicollinearity between explanatory variables and has proposed a series of local measures of multicollinearity as an indicator of potential problems. In this paper, we employ a controlled simulation to demonstrate that GWR is in fact very robust to the effects of multicollinearity. Consequently, the contention that GWR is highly susceptible to multicollinearity issues needs rethinking.

  7. Cross-property relations and permeability estimation in model porous media

    International Nuclear Information System (INIS)

    Schwartz, L.M.; Martys, N.; Bentz, D.P.; Garboczi, E.J.; Torquato, S.

    1993-01-01

    Results from a numerical study examining cross-property relations linking fluid permeability to diffusive and electrical properties are presented. Numerical solutions of the Stokes equations in three-dimensional consolidated granular packings are employed to provide a basis of comparison between different permeability estimates. Estimates based on the Λ parameter (a length derived from electrical conduction) and on d c (a length derived from immiscible displacement) are found to be considerably more reliable than estimates based on rigorous permeability bounds related to pore space diffusion. We propose two hybrid relations based on diffusion which provide more accurate estimates than either of the rigorous permeability bounds

  8. A modified approach to estimating sample size for simple logistic regression with one continuous covariate.

    Science.gov (United States)

    Novikov, I; Fund, N; Freedman, L S

    2010-01-15

    Different methods for the calculation of sample size for simple logistic regression (LR) with one normally distributed continuous covariate give different results. Sometimes the difference can be large. Furthermore, some methods require the user to specify the prevalence of cases when the covariate equals its population mean, rather than the more natural population prevalence. We focus on two commonly used methods and show through simulations that the power for a given sample size may differ substantially from the nominal value for one method, especially when the covariate effect is large, while the other method performs poorly if the user provides the population prevalence instead of the required parameter. We propose a modification of the method of Hsieh et al. that requires specification of the population prevalence and that employs Schouten's sample size formula for a t-test with unequal variances and group sizes. This approach appears to increase the accuracy of the sample size estimates for LR with one continuous covariate.

  9. Use of Geographically Weighted Regression (GWR Method to Estimate the Effects of Location Attributes on the Residential Property Values

    Directory of Open Access Journals (Sweden)

    Mohd Faris Dziauddin

    2017-07-01

    Full Text Available This study estimates the effect of locational attributes on residential property values in Kuala Lumpur, Malaysia. Geographically weighted regression (GWR enables the use of the local parameter rather than the global parameter to be estimated, with the results presented in map form. The results of this study reveal that residential property values are mainly determined by the property’s physical (structural attributes, but proximity to locational attributes also contributes marginally. The use of GWR in this study is considered a better approach than other methods to examine the effect of locational attributes on residential property values. GWR has the capability to produce meaningful results in which different locational attributes have differential spatial effects across a geographical area on residential property values. This method has the ability to determine the factors on which premiums depend, and in turn it can assist the government in taxation matters.

  10. Comparison of multinomial logistic regression and logistic regression: which is more efficient in allocating land use?

    Science.gov (United States)

    Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun

    2014-12-01

    Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.

  11. Meta-analytical synthesis of regression coefficients under different categorization scheme of continuous covariates.

    Science.gov (United States)

    Yoneoka, Daisuke; Henmi, Masayuki

    2017-11-30

    Recently, the number of clinical prediction models sharing the same regression task has increased in the medical literature. However, evidence synthesis methodologies that use the results of these regression models have not been sufficiently studied, particularly in meta-analysis settings where only regression coefficients are available. One of the difficulties lies in the differences between the categorization schemes of continuous covariates across different studies. In general, categorization methods using cutoff values are study specific across available models, even if they focus on the same covariates of interest. Differences in the categorization of covariates could lead to serious bias in the estimated regression coefficients and thus in subsequent syntheses. To tackle this issue, we developed synthesis methods for linear regression models with different categorization schemes of covariates. A 2-step approach to aggregate the regression coefficient estimates is proposed. The first step is to estimate the joint distribution of covariates by introducing a latent sampling distribution, which uses one set of individual participant data to estimate the marginal distribution of covariates with categorization. The second step is to use a nonlinear mixed-effects model with correction terms for the bias due to categorization to estimate the overall regression coefficients. Especially in terms of precision, numerical simulations show that our approach outperforms conventional methods, which only use studies with common covariates or ignore the differences between categorization schemes. The method developed in this study is also applied to a series of WHO epidemiologic studies on white blood cell counts. Copyright © 2017 John Wiley & Sons, Ltd.

  12. On the Relationship Between Confidence Sets and Exchangeable Weights in Multiple Linear Regression.

    Science.gov (United States)

    Pek, Jolynn; Chalmers, R Philip; Monette, Georges

    2016-01-01

    When statistical models are employed to provide a parsimonious description of empirical relationships, the extent to which strong conclusions can be drawn rests on quantifying the uncertainty in parameter estimates. In multiple linear regression (MLR), regression weights carry two kinds of uncertainty represented by confidence sets (CSs) and exchangeable weights (EWs). Confidence sets quantify uncertainty in estimation whereas the set of EWs quantify uncertainty in the substantive interpretation of regression weights. As CSs and EWs share certain commonalities, we clarify the relationship between these two kinds of uncertainty about regression weights. We introduce a general framework describing how CSs and the set of EWs for regression weights are estimated from the likelihood-based and Wald-type approach, and establish the analytical relationship between CSs and sets of EWs. With empirical examples on posttraumatic growth of caregivers (Cadell et al., 2014; Schneider, Steele, Cadell & Hemsworth, 2011) and on graduate grade point average (Kuncel, Hezlett & Ones, 2001), we illustrate the usefulness of CSs and EWs for drawing strong scientific conclusions. We discuss the importance of considering both CSs and EWs as part of the scientific process, and provide an Online Appendix with R code for estimating Wald-type CSs and EWs for k regression weights.

  13. New robust statistical procedures for the polytomous logistic regression models.

    Science.gov (United States)

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  14. Differentiating regressed melanoma from regressed lichenoid keratosis.

    Science.gov (United States)

    Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

    2017-04-01

    Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  15. The estimation and prediction of the inventories for the liquid and gaseous radwaste systems using the linear regression analysis

    International Nuclear Information System (INIS)

    Kim, J. Y.; Shin, C. H.; Kim, J. K.; Lee, J. K.; Park, Y. J.

    2003-01-01

    The variation transitions of the inventories for the liquid radwaste system and the radioactive gas have being released in containment, and their predictive values according to the operation histories of Yonggwang(YGN) 3 and 4 were analyzed by linear regression analysis methodology. The results show that the variation transitions of the inventories for those systems are linearly increasing according to the operation histories but the inventories released to the environment are considerably lower than the recommended values based on the FSAR suggestions. It is considered that some conservation were presented in the estimation methodology in preparing stage of FSAR

  16. Introduction to the use of regression models in epidemiology.

    Science.gov (United States)

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  17. Estimating life expectancies for US small areas: a regression framework

    Science.gov (United States)

    Congdon, Peter

    2014-01-01

    Analysis of area mortality variations and estimation of area life tables raise methodological questions relevant to assessing spatial clustering, and socioeconomic inequalities in mortality. Existing small area analyses of US life expectancy variation generally adopt ad hoc amalgamations of counties to alleviate potential instability of mortality rates involved in deriving life tables, and use conventional life table analysis which takes no account of correlated mortality for adjacent areas or ages. The alternative strategy here uses structured random effects methods that recognize correlations between adjacent ages and areas, and allows retention of the original county boundaries. This strategy generalizes to include effects of area category (e.g. poverty status, ethnic mix), allowing estimation of life tables according to area category, and providing additional stabilization of estimated life table functions. This approach is used here to estimate stabilized mortality rates, derive life expectancies in US counties, and assess trends in clustering and in inequality according to county poverty category.

  18. The nuisance of nuisance regression: spectral misspecification in a common approach to resting-state fMRI preprocessing reintroduces noise and obscures functional connectivity.

    Science.gov (United States)

    Hallquist, Michael N; Hwang, Kai; Luna, Beatriz

    2013-11-15

    Recent resting-state functional connectivity fMRI (RS-fcMRI) research has demonstrated that head motion during fMRI acquisition systematically influences connectivity estimates despite bandpass filtering and nuisance regression, which are intended to reduce such nuisance variability. We provide evidence that the effects of head motion and other nuisance signals are poorly controlled when the fMRI time series are bandpass-filtered but the regressors are unfiltered, resulting in the inadvertent reintroduction of nuisance-related variation into frequencies previously suppressed by the bandpass filter, as well as suboptimal correction for noise signals in the frequencies of interest. This is important because many RS-fcMRI studies, including some focusing on motion-related artifacts, have applied this approach. In two cohorts of individuals (n=117 and 22) who completed resting-state fMRI scans, we found that the bandpass-regress approach consistently overestimated functional connectivity across the brain, typically on the order of r=.10-.35, relative to a simultaneous bandpass filtering and nuisance regression approach. Inflated correlations under the bandpass-regress approach were associated with head motion and cardiac artifacts. Furthermore, distance-related differences in the association of head motion and connectivity estimates were much weaker for the simultaneous filtering approach. We recommend that future RS-fcMRI studies ensure that the frequencies of nuisance regressors and fMRI data match prior to nuisance regression, and we advocate a simultaneous bandpass filtering and nuisance regression strategy that better controls nuisance-related variability. Copyright © 2013 Elsevier Inc. All rights reserved.

  19. Estimation of Household Waste in the Republic of Serbia using R software

    Directory of Open Access Journals (Sweden)

    Melinda Tokai

    2016-06-01

    Full Text Available This paper deals with the problem of estimation of annual amount of waste generated by households in Republic of Serbia. Waste generated by households is a part of municipal waste that also includes waste generated by trade and services activities as well as by tourists. In order to estimate pure household waste, regression analysis was preformed with reference to Cammarota et al. (2005 ”A proposal for the estimation of household waste”. In order to face this problem, regression models were constructed for municipal waste that are based on non domestic variables which are related to trade and services activities and tourism. The part of the municipal waste that could not be explained by a model based on non domestic variables was ascribed to pure household waste. In order to check validity of results, the model residuals were then related to domestic variables (usual population and the average number of inhabitants per occupied dwelling. The regression models were fitted using R software.

  20. Parameter Estimation of Nonlinear Models in Forestry.

    OpenAIRE

    Fekedulegn, Desta; Mac Siúrtáin, Máirtín Pádraig; Colbert, Jim J.

    1999-01-01

    Partial derivatives of the negative exponential, monomolecular, Mitcherlich, Gompertz, logistic, Chapman-Richards, von Bertalanffy, Weibull and the Richard’s nonlinear growth models are presented. The application of these partial derivatives in estimating the model parameters is illustrated. The parameters are estimated using the Marquardt iterative method of nonlinear regression relating top height to age of Norway spruce (Picea abies L.) from the Bowmont Norway Spruce Thinnin...

  1. Finite Algorithms for Robust Linear Regression

    DEFF Research Database (Denmark)

    Madsen, Kaj; Nielsen, Hans Bruun

    1990-01-01

    The Huber M-estimator for robust linear regression is analyzed. Newton type methods for solution of the problem are defined and analyzed, and finite convergence is proved. Numerical experiments with a large number of test problems demonstrate efficiency and indicate that this kind of approach may...

  2. Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days

    Energy Technology Data Exchange (ETDEWEB)

    Bramer, L. M.; Rounds, J.; Burleyson, C. D.; Fortin, D.; Hathaway, J.; Rice, J.; Kraucunas, I.

    2017-11-01

    Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.

  3. The Efficiency of OLS Estimators of Structural Parameters in a Simple Linear Regression Model in the Calibration of the Averages Scheme

    Directory of Open Access Journals (Sweden)

    Kowal Robert

    2016-12-01

    Full Text Available A simple linear regression model is one of the pillars of classic econometrics. Multiple areas of research function within its scope. One of the many fundamental questions in the model concerns proving the efficiency of the most commonly used OLS estimators and examining their properties. In the literature of the subject one can find taking back to this scope and certain solutions in that regard. Methodically, they are borrowed from the multiple regression model or also from a boundary partial model. Not everything, however, is here complete and consistent. In the paper a completely new scheme is proposed, based on the implementation of the Cauchy-Schwarz inequality in the arrangement of the constraint aggregated from calibrated appropriately secondary constraints of unbiasedness which in a result of choice the appropriate calibrator for each variable directly leads to showing this property. A separate range-is a matter of choice of such a calibrator. These deliberations, on account of the volume and kinds of the calibration, were divided into a few parts. In the one the efficiency of OLS estimators is proven in a mixed scheme of the calibration by averages, that is preliminary, and in the most basic frames of the proposed methodology. In these frames the future outlines and general premises constituting the base of more distant generalizations are being created.

  4. Estimation of Chinese surface NO2 concentrations combining satellite data and Land Use Regression

    Science.gov (United States)

    Anand, J.; Monks, P.

    2016-12-01

    Monitoring surface-level air quality is often limited by in-situ instrument placement and issues arising from harmonisation over long timescales. Satellite instruments can offer a synoptic view of regional pollution sources, but in many cases only a total or tropospheric column can be measured. In this work a new technique of estimating surface NO2 combining both satellite and in-situ data is presented, in which a Land Use Regression (LUR) model is used to create high resolution pollution maps based on known predictor variables such as population density, road networks, and land cover. By employing a mixed effects approach, it is possible to take advantage of the spatiotemporal variability in the satellite-derived column densities to account for daily and regional variations in surface NO2 caused by factors such as temperature, elevation, and wind advection. In this work, surface NO2 maps are modelled over the North China Plain and Pearl River Delta during high-pollution episodes by combining in-situ measurements and tropospheric columns from the Ozone Monitoring Instrument (OMI). The modelled concentrations show good agreement with in-situ data and surface NO2 concentrations derived from the MACC-II global reanalysis.

  5. Genetic parameters for body condition score, body weight, milk yield, and fertility estimated using random regression models.

    Science.gov (United States)

    Berry, D P; Buckley, F; Dillon, P; Evans, R D; Rath, M; Veerkamp, R F

    2003-11-01

    Genetic (co)variances between body condition score (BCS), body weight (BW), milk yield, and fertility were estimated using a random regression animal model extended to multivariate analysis. The data analyzed included 81,313 BCS observations, 91,937 BW observations, and 100,458 milk test-day yields from 8725 multiparous Holstein-Friesian cows. A cubic random regression was sufficient to model the changing genetic variances for BCS, BW, and milk across different days in milk. The genetic correlations between BCS and fertility changed little over the lactation; genetic correlations between BCS and interval to first service and between BCS and pregnancy rate to first service varied from -0.47 to -0.31, and from 0.15 to 0.38, respectively. This suggests that maximum genetic gain in fertility from indirect selection on BCS should be based on measurements taken in midlactation when the genetic variance for BCS is largest. Selection for increased BW resulted in shorter intervals to first service, but more services and poorer pregnancy rates; genetic correlations between BW and pregnancy rate to first service varied from -0.52 to -0.45. Genetic selection for higher lactation milk yield alone through selection on increased milk yield in early lactation is likely to have a more deleterious effect on genetic merit for fertility than selection on higher milk yield in late lactation.

  6. Building vulnerability to hydro-geomorphic hazards: Estimating damage probability from qualitative vulnerability assessment using logistic regression

    Science.gov (United States)

    Ettinger, Susanne; Mounaud, Loïc; Magill, Christina; Yao-Lafourcade, Anne-Françoise; Thouret, Jean-Claude; Manville, Vern; Negulescu, Caterina; Zuccaro, Giulio; De Gregorio, Daniela; Nardone, Stefano; Uchuchoque, Juan Alexis Luque; Arguedas, Anita; Macedo, Luisa; Manrique Llerena, Nélida

    2016-10-01

    The focus of this study is an analysis of building vulnerability through investigating impacts from the 8 February 2013 flash flood event along the Avenida Venezuela channel in the city of Arequipa, Peru. On this day, 124.5 mm of rain fell within 3 h (monthly mean: 29.3 mm) triggering a flash flood that inundated at least 0.4 km2 of urban settlements along the channel, affecting more than 280 buildings, 23 of a total of 53 bridges (pedestrian, vehicle and railway), and leading to the partial collapse of sections of the main road, paralyzing central parts of the city for more than one week. This study assesses the aspects of building design and site specific environmental characteristics that render a building vulnerable by considering the example of a flash flood event in February 2013. A statistical methodology is developed that enables estimation of damage probability for buildings. The applied method uses observed inundation height as a hazard proxy in areas where more detailed hydrodynamic modeling data is not available. Building design and site-specific environmental conditions determine the physical vulnerability. The mathematical approach considers both physical vulnerability and hazard related parameters and helps to reduce uncertainty in the determination of descriptive parameters, parameter interdependency and respective contributions to damage. This study aims to (1) enable the estimation of damage probability for a certain hazard intensity, and (2) obtain data to visualize variations in damage susceptibility for buildings in flood prone areas. Data collection is based on a post-flood event field survey and the analysis of high (sub-metric) spatial resolution images (Pléiades 2012, 2013). An inventory of 30 city blocks was collated in a GIS database in order to estimate the physical vulnerability of buildings. As many as 1103 buildings were surveyed along the affected drainage and 898 buildings were included in the statistical analysis. Univariate and

  7. TREE STEM AND CANOPY BIOMASS ESTIMATES FROM TERRESTRIAL LASER SCANNING DATA

    Directory of Open Access Journals (Sweden)

    K. Olofsson

    2017-10-01

    Full Text Available In this study an automatic method for estimating both the tree stem and the tree canopy biomass is presented. The point cloud tree extraction techniques operate on TLS data and models the biomass using the estimated stem and canopy volume as independent variables. The regression model fit error is of the order of less than 5 kg, which gives a relative model error of about 5 % for the stem estimate and 10–15 % for the spruce and pine canopy biomass estimates. The canopy biomass estimate was improved by separating the models by tree species which indicates that the method is allometry dependent and that the regression models need to be recomputed for different areas with different climate and different vegetation.

  8. The Growth Points of Regional Economy and Regression Estimation for Branch Investment Multipliers

    Directory of Open Access Journals (Sweden)

    Nina Pavlovna Goridko

    2018-03-01

    Full Text Available The article develops the methodology of using investment multipliers to identify growth points for a regional economy. The paper discusses various options for the assessment of multiplicative effects caused by investments in certain sectors of the economy. All calculations are carried out on the example of economy of the Republic of Tatarstan for the period 2005–2015. The instrument of regression modeling using the method of least squares, permits to estimate sectoral and cross-sectoral investment multipliers in the economy of the Republic of Tatarstan. Moreover, this method allows to assess the elasticity of gross output of regional economy and its individual sectors depending on investment in various sectors of the economy. Calculations results allowed to identify three growth points of the economy of the Republic of Tatarstan. They are mining industry, manufacturing industry and construction. The success of a particular industry or sub-industry in a country or a region should be measured not only by its share in macro-system’s gross output or value added, but also by the multiplicative effect that investments in the industry have on the development of other industries, on employment and on general national or regional product. In recent years, the growth of the Russian was close to zero. Thus, it is crucial to understand the structural consequences of the increasing investments in various sectors of the Russian economy. In this regard, the problems solved in the article are relevant for a number of countries and regions with a similar economic situation. The obtained results can be applied for similar estimations of investment multipliers as well as multipliers of government spending, and other components of aggregate demand in various countries and regions to identify growth points. Investments in these growth points will induce the greatest and the most evident increment of the outcome from the macro-system’s economic activities.

  9. Outlier detection algorithms for least squares time series regression

    DEFF Research Database (Denmark)

    Johansen, Søren; Nielsen, Bent

    We review recent asymptotic results on some robust methods for multiple regression. The regressors include stationary and non-stationary time series as well as polynomial terms. The methods include the Huber-skip M-estimator, 1-step Huber-skip M-estimators, in particular the Impulse Indicator Sat...

  10. Estimating relative demand for wildlife: Conservation activity indicators

    Science.gov (United States)

    Gray, Gary G.; Larson, Joseph S.

    1982-09-01

    An alternative method of estimating relative demand among nonconsumptive uses of wildlife and among wildlife species is proposed. A demand intensity score (DIS), derived from the relative extent of an individual's involvement in outdoor recreation and conservation activities, is used as a weighting device to adjust the importance of preference rankings for wildlife uses and wildlife species relative to other members of a survey population. These adjusted preference rankings were considered to reflect relative demand levels (RDLs) for wildlife uses and for species by the survey population. This technique may be useful where it is not possible or desirable to estimate demand using traditional economic means. In one of the findings from a survey of municipal conservation commission members in Massachusetts, presented as an illustration of this methodology, poisonous snakes were ranked third in preference among five groups of reptiles. The relative demand level for poisonous snakes, however, was last among the five groups.

  11. Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression

    Science.gov (United States)

    Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.

    2013-01-01

    Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…

  12. Logistic regression models for predicting physical and mental health-related quality of life in rheumatoid arthritis patients.

    Science.gov (United States)

    Alishiri, Gholam Hossein; Bayat, Noushin; Fathi Ashtiani, Ali; Tavallaii, Seyed Abbas; Assari, Shervin; Moharamzad, Yashar

    2008-01-01

    The aim of this work was to develop two logistic regression models capable of predicting physical and mental health related quality of life (HRQOL) among rheumatoid arthritis (RA) patients. In this cross-sectional study which was conducted during 2006 in the outpatient rheumatology clinic of our university hospital, Short Form 36 (SF-36) was used for HRQOL measurements in 411 RA patients. A cutoff point to define poor versus good HRQOL was calculated using the first quartiles of SF-36 physical and mental component scores (33.4 and 36.8, respectively). Two distinct logistic regression models were used to derive predictive variables including demographic, clinical, and psychological factors. The sensitivity, specificity, and accuracy of each model were calculated. Poor physical HRQOL was positively associated with pain score, disease duration, monthly family income below 300 US$, comorbidity, patient global assessment of disease activity or PGA, and depression (odds ratios: 1.1; 1.004; 15.5; 1.1; 1.02; 2.08, respectively). The variables that entered into the poor mental HRQOL prediction model were monthly family income below 300 US$, comorbidity, PGA, and bodily pain (odds ratios: 6.7; 1.1; 1.01; 1.01, respectively). Optimal sensitivity and specificity were achieved at a cutoff point of 0.39 for the estimated probability of poor physical HRQOL and 0.18 for mental HRQOL. Sensitivity, specificity, and accuracy of the physical and mental models were 73.8, 87, 83.7% and 90.38, 70.36, 75.43%, respectively. The results show that the suggested models can be used to predict poor physical and mental HRQOL separately among RA patients using simple variables with acceptable accuracy. These models can be of use in the clinical decision-making of RA patients and to recognize patients with poor physical or mental HRQOL in advance, for better management.

  13. Application of Fourier transform infrared spectroscopy and orthogonal projections to latent structures/partial least squares regression for estimation of procyanidins average degree of polymerisation.

    Science.gov (United States)

    Passos, Cláudia P; Cardoso, Susana M; Barros, António S; Silva, Carlos M; Coimbra, Manuel A

    2010-02-28

    Fourier transform infrared (FTIR) spectroscopy has being emphasised as a widespread technique in the quick assess of food components. In this work, procyanidins were extracted with methanol and acetone/water from the seeds of white and red grape varieties. A fractionation by graded methanol/chloroform precipitations allowed to obtain 26 samples that were characterised using thiolysis as pre-treatment followed by HPLC-UV and MS detection. The average degree of polymerisation (DPn) of the procyanidins in the samples ranged from 2 to 11 flavan-3-ol residues. FTIR spectroscopy within the wavenumbers region of 1800-700 cm(-1) allowed to build a partial least squares (PLS1) regression model with 8 latent variables (LVs) for the estimation of the DPn, giving a RMSECV of 11.7%, with a R(2) of 0.91 and a RMSEP of 2.58. The application of orthogonal projection to latent structures (O-PLS1) clarifies the interpretation of the regression model vectors. Moreover, the O-PLS procedure has removed 88% of non-correlated variations with the DPn, allowing to relate the increase of the absorbance peaks at 1203 and 1099 cm(-1) with the increase of the DPn due to the higher proportion of substitutions in the aromatic ring of the polymerised procyanidin molecules. Copyright 2009 Elsevier B.V. All rights reserved.

  14. Parametric Bayesian Estimation of Differential Entropy and Relative Entropy

    OpenAIRE

    Gupta; Srivastava

    2010-01-01

    Given iid samples drawn from a distribution with known parametric form, we propose the minimization of expected Bregman divergence to form Bayesian estimates of differential entropy and relative entropy, and derive such estimators for the uniform, Gaussian, Wishart, and inverse Wishart distributions. Additionally, formulas are given for a log gamma Bregman divergence and the differential entropy and relative entropy for the Wishart and inverse Wishart. The results, as always with Bayesian est...

  15. Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data

    Science.gov (United States)

    Ciolino, Jody D.; Martin, Reneé H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.

    2014-01-01

    In logistic regression analysis for binary clinical trial data, adjusted treatment effect estimates are often not equivalent to unadjusted estimates in the presence of influential covariates. This paper uses simulation to quantify the benefit of covariate adjustment in logistic regression. However, International Conference on Harmonization guidelines suggest that covariate adjustment be pre-specified. Unplanned adjusted analyses should be considered secondary. Results suggest that that if adjustment is not possible or unplanned in a logistic setting, balance in continuous covariates can alleviate some (but never all) of the shortcomings of unadjusted analyses. The case of log binomial regression is also explored. PMID:24138438

  16. Linear regression in astronomy. II

    Science.gov (United States)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  17. Advanced colorectal neoplasia risk stratification by penalized logistic regression.

    Science.gov (United States)

    Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F

    2016-08-01

    Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.

  18. Impact of multicollinearity on small sample hydrologic regression models

    Science.gov (United States)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  19. Alternative regression models to assess increase in childhood BMI.

    Science.gov (United States)

    Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M

    2008-09-08

    Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  20. Estimation of Constituent Concentrations, Loads, and Yields in Streams of Johnson County, Northeast Kansas, Using Continuous Water-Quality Monitoring and Regression Models, October 2002 through December 2006

    Science.gov (United States)

    Rasmussen, Teresa J.; Lee, Casey J.; Ziegler, Andrew C.

    2008-01-01

    Johnson County is one of the most rapidly developing counties in Kansas. Population growth and expanding urban land use affect the quality of county streams, which are important for human and environmental health, water supply, recreation, and aesthetic value. This report describes estimates of streamflow and constituent concentrations, loads, and yields in relation to watershed characteristics in five Johnson County streams using continuous in-stream sensor measurements. Specific conductance, pH, water temperature, turbidity, and dissolved oxygen were monitored in five watersheds from October 2002 through December 2006. These continuous data were used in conjunction with discrete water samples to develop regression models for continuously estimating concentrations of other constituents. Continuous regression-based concentrations were estimated for suspended sediment, total suspended solids, dissolved solids and selected major ions, nutrients (nitrogen and phosphorus species), and fecal-indicator bacteria. Continuous daily, monthly, seasonal, and annual loads were calculated from concentration estimates and streamflow. The data are used to describe differences in concentrations, loads, and yields and to explain these differences relative to watershed characteristics. Water quality at the five monitoring sites varied according to hydrologic conditions; contributing drainage area; land use (including degree of urbanization); relative contributions from point and nonpoint constituent sources; and human activity within each watershed. Dissolved oxygen (DO) concentrations were less than the Kansas aquatic-life-support criterion of 5.0 mg/L less than 10 percent of the time at all sites except Indian Creek, which had DO concentrations less than the criterion about 15 percent of the time. Concentrations of suspended sediment, chloride (winter only), indicator bacteria, and pesticides were substantially larger during periods of increased streamflow. Suspended

  1. A new method to estimate parameters of linear compartmental models using artificial neural networks

    International Nuclear Information System (INIS)

    Gambhir, Sanjiv S.; Keppenne, Christian L.; Phelps, Michael E.; Banerjee, Pranab K.

    1998-01-01

    At present, the preferred tool for parameter estimation in compartmental analysis is an iterative procedure; weighted nonlinear regression. For a large number of applications, observed data can be fitted to sums of exponentials whose parameters are directly related to the rate constants/coefficients of the compartmental models. Since weighted nonlinear regression often has to be repeated for many different data sets, the process of fitting data from compartmental systems can be very time consuming. Furthermore the minimization routine often converges to a local (as opposed to global) minimum. In this paper, we examine the possibility of using artificial neural networks instead of weighted nonlinear regression in order to estimate model parameters. We train simple feed-forward neural networks to produce as outputs the parameter values of a given model when kinetic data are fed to the networks' input layer. The artificial neural networks produce unbiased estimates and are orders of magnitude faster than regression algorithms. At noise levels typical of many real applications, the neural networks are found to produce lower variance estimates than weighted nonlinear regression in the estimation of parameters from mono- and biexponential models. These results are primarily due to the inability of weighted nonlinear regression to converge. These results establish that artificial neural networks are powerful tools for estimating parameters for simple compartmental models. (author)

  2. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    Science.gov (United States)

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  3. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    Science.gov (United States)

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  4. Accounting for measurement error in log regression models with applications to accelerated testing.

    Science.gov (United States)

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  5. Accounting for measurement error in log regression models with applications to accelerated testing.

    Directory of Open Access Journals (Sweden)

    Robert Richardson

    Full Text Available In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  6. Multiple regression approach to predict turbine-generator output for Chinshan nuclear power plant

    International Nuclear Information System (INIS)

    Chan, Yea-Kuang; Tsai, Yu-Ching

    2017-01-01

    The objective of this study is to develop a turbine cycle model using the multiple regression approach to estimate the turbine-generator output for the Chinshan Nuclear Power Plant (NPP). The plant operating data was verified using a linear regression model with a corresponding 95% confidence interval for the operating data. In this study, the key parameters were selected as inputs for the multiple regression based turbine cycle model. The proposed model was used to estimate the turbine-generator output. The effectiveness of the proposed turbine cycle model was demonstrated by using plant operating data obtained from the Chinshan NPP Unit 2. The results show that this multiple regression based turbine cycle model can be used to accurately estimate the turbine-generator output. In addition, this study also provides an alternative approach with simple and easy features to evaluate the thermal performance for nuclear power plants.

  7. Multiple regression approach to predict turbine-generator output for Chinshan nuclear power plant

    Energy Technology Data Exchange (ETDEWEB)

    Chan, Yea-Kuang; Tsai, Yu-Ching [Institute of Nuclear Energy Research, Taoyuan City, Taiwan (China). Nuclear Engineering Division

    2017-03-15

    The objective of this study is to develop a turbine cycle model using the multiple regression approach to estimate the turbine-generator output for the Chinshan Nuclear Power Plant (NPP). The plant operating data was verified using a linear regression model with a corresponding 95% confidence interval for the operating data. In this study, the key parameters were selected as inputs for the multiple regression based turbine cycle model. The proposed model was used to estimate the turbine-generator output. The effectiveness of the proposed turbine cycle model was demonstrated by using plant operating data obtained from the Chinshan NPP Unit 2. The results show that this multiple regression based turbine cycle model can be used to accurately estimate the turbine-generator output. In addition, this study also provides an alternative approach with simple and easy features to evaluate the thermal performance for nuclear power plants.

  8. Geographically Weighted Logistic Regression Applied to Credit Scoring Models

    Directory of Open Access Journals (Sweden)

    Pedro Henrique Melo Albuquerque

    Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

  9. Teaching the Concept of Breakdown Point in Simple Linear Regression.

    Science.gov (United States)

    Chan, Wai-Sum

    2001-01-01

    Most introductory textbooks on simple linear regression analysis mention the fact that extreme data points have a great influence on ordinary least-squares regression estimation; however, not many textbooks provide a rigorous mathematical explanation of this phenomenon. Suggests a way to fill this gap by teaching students the concept of breakdown…

  10. Empirical estimation of the grades of hearing impairment among industrial workers based on new artificial neural networks and classical regression methods.

    Science.gov (United States)

    Farhadian, Maryam; Aliabadi, Mohsen; Darvishi, Ebrahim

    2015-01-01

    Prediction models are used in a variety of medical domains, and they are frequently built from experience which constitutes data acquired from actual cases. This study aimed to analyze the potential of artificial neural networks and logistic regression techniques for estimation of hearing impairment among industrial workers. A total of 210 workers employed in a steel factory (in West of Iran) were selected, and their occupational exposure histories were analyzed. The hearing loss thresholds of the studied workers were determined using a calibrated audiometer. The personal noise exposures were also measured using a noise dosimeter in the workstations. Data obtained from five variables, which can influence the hearing loss, were used as input features, and the hearing loss thresholds were considered as target feature of the prediction methods. Multilayer feedforward neural networks and logistic regression were developed using MATLAB R2011a software. Based on the World Health Organization classification for the grades of hearing loss, 74.2% of the studied workers have normal hearing thresholds, 23.4% have slight hearing loss, and 2.4% have moderate hearing loss. The accuracy and kappa coefficient of the best developed neural networks for prediction of the grades of hearing loss were 88.6 and 66.30, respectively. The accuracy and kappa coefficient of the logistic regression were also 84.28 and 51.30, respectively. Neural networks could provide more accurate predictions of the hearing loss than logistic regression. The prediction method can provide reliable and comprehensible information for occupational health and medicine experts.

  11. Relative index of inequality and slope index of inequality: a structured regression framework for estimation

    NARCIS (Netherlands)

    Moreno-Betancur, Margarita; Latouche, Aurélien; Menvielle, Gwenn; Kunst, Anton E.; Rey, Grégoire

    2015-01-01

    The relative index of inequality and the slope index of inequality are the two major indices used in epidemiologic studies for the measurement of socioeconomic inequalities in health. Yet the current definitions of these indices are not adapted to their main purpose, which is to provide summary

  12. Model-based Quantile Regression for Discrete Data

    KAUST Repository

    Padellini, Tullia

    2018-04-10

    Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution\\'s parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.

  13. Linear regression metamodeling as a tool to summarize and present simulation model results.

    Science.gov (United States)

    Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M

    2013-10-01

    Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.

  14. Flood quantile estimation at ungauged sites by Bayesian networks

    Science.gov (United States)

    Mediero, L.; Santillán, D.; Garrote, L.

    2012-04-01

    Estimating flood quantiles at a site for which no observed measurements are available is essential for water resources planning and management. Ungauged sites have no observations about the magnitude of floods, but some site and basin characteristics are known. The most common technique used is the multiple regression analysis, which relates physical and climatic basin characteristic to flood quantiles. Regression equations are fitted from flood frequency data and basin characteristics at gauged sites. Regression equations are a rigid technique that assumes linear relationships between variables and cannot take the measurement errors into account. In addition, the prediction intervals are estimated in a very simplistic way from the variance of the residuals in the estimated model. Bayesian networks are a probabilistic computational structure taken from the field of Artificial Intelligence, which have been widely and successfully applied to many scientific fields like medicine and informatics, but application to the field of hydrology is recent. Bayesian networks infer the joint probability distribution of several related variables from observations through nodes, which represent random variables, and links, which represent causal dependencies between them. A Bayesian network is more flexible than regression equations, as they capture non-linear relationships between variables. In addition, the probabilistic nature of Bayesian networks allows taking the different sources of estimation uncertainty into account, as they give a probability distribution as result. A homogeneous region in the Tagus Basin was selected as case study. A regression equation was fitted taking the basin area, the annual maximum 24-hour rainfall for a given recurrence interval and the mean height as explanatory variables. Flood quantiles at ungauged sites were estimated by Bayesian networks. Bayesian networks need to be learnt from a huge enough data set. As observational data are reduced, a

  15. A robust ridge regression approach in the presence of both multicollinearity and outliers in the data

    Science.gov (United States)

    Shariff, Nurul Sima Mohamad; Ferdaos, Nur Aqilah

    2017-08-01

    Multicollinearity often leads to inconsistent and unreliable parameter estimates in regression analysis. This situation will be more severe in the presence of outliers it will cause fatter tails in the error distributions than the normal distributions. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is expected to be affected by the presence of outliers due to some assumptions imposed in the modeling procedure. Thus, the robust version of existing ridge method with some modification in the inverse matrix and the estimated response value is introduced. The performance of the proposed method is discussed and comparisons are made with several existing estimators namely, Ordinary Least Squares (OLS), ridge regression and robust ridge regression based on GM-estimates. The finding of this study is able to produce reliable parameter estimates in the presence of both multicollinearity and outliers in the data.

  16. Maximum likelihood estimation for cytogenetic dose-response curves

    International Nuclear Information System (INIS)

    Frome, E.L.; DuFrain, R.J.

    1986-01-01

    In vitro dose-response curves are used to describe the relation between chromosome aberrations and radiation dose for human lymphocytes. The lymphocytes are exposed to low-LET radiation, and the resulting dicentric chromosome aberrations follow the Poisson distribution. The expected yield depends on both the magnitude and the temporal distribution of the dose. A general dose-response model that describes this relation has been presented by Kellerer and Rossi (1972, Current Topics on Radiation Research Quarterly 8, 85-158; 1978, Radiation Research 75, 471-488) using the theory of dual radiation action. Two special cases of practical interest are split-dose and continuous exposure experiments, and the resulting dose-time-response models are intrinsically nonlinear in the parameters. A general-purpose maximum likelihood estimation procedure is described, and estimation for the nonlinear models is illustrated with numerical examples from both experimental designs. Poisson regression analysis is used for estimation, hypothesis testing, and regression diagnostics. Results are discussed in the context of exposure assessment procedures for both acute and chronic human radiation exposure

  17. Regularized Regression and Density Estimation based on Optimal Transport

    KAUST Repository

    Burger, M.; Franek, M.; Schonlieb, C.-B.

    2012-01-01

    for estimating densities and for preserving edges in the case of total variation regularization. In order to compute solutions of the variational problems, a regularized optimal transport problem needs to be solved, for which we discuss several formulations

  18. Methods for estimating low-flow statistics for Massachusetts streams

    Science.gov (United States)

    Ries, Kernell G.; Friesz, Paul J.

    2000-01-01

    Methods and computer software are described in this report for determining flow duration, low-flow frequency statistics, and August median flows. These low-flow statistics can be estimated for unregulated streams in Massachusetts using different methods depending on whether the location of interest is at a streamgaging station, a low-flow partial-record station, or an ungaged site where no data are available. Low-flow statistics for streamgaging stations can be estimated using standard U.S. Geological Survey methods described in the report. The MOVE.1 mathematical method and a graphical correlation method can be used to estimate low-flow statistics for low-flow partial-record stations. The MOVE.1 method is recommended when the relation between measured flows at a partial-record station and daily mean flows at a nearby, hydrologically similar streamgaging station is linear, and the graphical method is recommended when the relation is curved. Equations are presented for computing the variance and equivalent years of record for estimates of low-flow statistics for low-flow partial-record stations when either a single or multiple index stations are used to determine the estimates. The drainage-area ratio method or regression equations can be used to estimate low-flow statistics for ungaged sites where no data are available. The drainage-area ratio method is generally as accurate as or more accurate than regression estimates when the drainage-area ratio for an ungaged site is between 0.3 and 1.5 times the drainage area of the index data-collection site. Regression equations were developed to estimate the natural, long-term 99-, 98-, 95-, 90-, 85-, 80-, 75-, 70-, 60-, and 50-percent duration flows; the 7-day, 2-year and the 7-day, 10-year low flows; and the August median flow for ungaged sites in Massachusetts. Streamflow statistics and basin characteristics for 87 to 133 streamgaging stations and low-flow partial-record stations were used to develop the equations. The

  19. Regularized Regression and Density Estimation based on Optimal Transport

    KAUST Repository

    Burger, M.

    2012-03-11

    The aim of this paper is to investigate a novel nonparametric approach for estimating and smoothing density functions as well as probability densities from discrete samples based on a variational regularization method with the Wasserstein metric as a data fidelity. The approach allows a unified treatment of discrete and continuous probability measures and is hence attractive for various tasks. In particular, the variational model for special regularization functionals yields a natural method for estimating densities and for preserving edges in the case of total variation regularization. In order to compute solutions of the variational problems, a regularized optimal transport problem needs to be solved, for which we discuss several formulations and provide a detailed analysis. Moreover, we compute special self-similar solutions for standard regularization functionals and we discuss several computational approaches and results. © 2012 The Author(s).

  20. ESTIMATION OF GENETIC PARAMETERS IN TROPICARNE CATTLE WITH RANDOM REGRESSION MODELS USING B-SPLINES

    Directory of Open Access Journals (Sweden)

    Joel Domínguez Viveros

    2015-04-01

    Full Text Available The objectives were to estimate variance components, and direct (h2 and maternal (m2 heritability in the growth of Tropicarne cattle based on a random regression model using B-Splines for random effects modeling. Information from 12 890 monthly weightings of 1787 calves, from birth to 24 months old, was analyzed. The pedigree included 2504 animals. The random effects model included genetic and permanent environmental (direct and maternal of cubic order, and residuals. The fixed effects included contemporaneous groups (year – season of weighed, sex and the covariate age of the cow (linear and quadratic. The B-Splines were defined in four knots through the growth period analyzed. Analyses were performed with the software Wombat. The variances (phenotypic and residual presented a similar behavior; of 7 to 12 months of age had a negative trend; from birth to 6 months and 13 to 18 months had positive trend; after 19 months were maintained constant. The m2 were low and near to zero, with an average of 0.06 in an interval of 0.04 to 0.11; the h2 also were close to zero, with an average of 0.10 in an interval of 0.03 to 0.23.

  1. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

    Science.gov (United States)

    Almedeij, Jaber

    2012-01-01

    Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984

  2. Regression models for estimating concentrations of atrazine plus deethylatrazine in shallow groundwater in agricultural areas of the United States

    Science.gov (United States)

    Stackelberg, Paul E.; Barbash, Jack E.; Gilliom, Robert J.; Stone, Wesley W.; Wolock, David M.

    2012-01-01

    Tobit regression models were developed to predict the summed concentration of atrazine [6-chloro-N-ethyl-N'-(1-methylethyl)-1,3,5-triazine-2,4-diamine] and its degradate deethylatrazine [6-chloro-N-(1-methylethyl)-1,3,5,-triazine-2,4-diamine] (DEA) in shallow groundwater underlying agricultural settings across the conterminous United States. The models were developed from atrazine and DEA concentrations in samples from 1298 wells and explanatory variables that represent the source of atrazine and various aspects of the transport and fate of atrazine and DEA in the subsurface. One advantage of these newly developed models over previous national regression models is that they predict concentrations (rather than detection frequency), which can be compared with water quality benchmarks. Model results indicate that variability in the concentration of atrazine residues (atrazine plus DEA) in groundwater underlying agricultural areas is more strongly controlled by the history of atrazine use in relation to the timing of recharge (groundwater age) than by processes that control the dispersion, adsorption, or degradation of these compounds in the saturated zone. Current (1990s) atrazine use was found to be a weak explanatory variable, perhaps because it does not represent the use of atrazine at the time of recharge of the sampled groundwater and because the likelihood that these compounds will reach the water table is affected by other factors operating within the unsaturated zone, such as soil characteristics, artificial drainage, and water movement. Results show that only about 5% of agricultural areas have greater than a 10% probability of exceeding the USEPA maximum contaminant level of 3.0 μg L-1. These models are not developed for regulatory purposes but rather can be used to (i) identify areas of potential concern, (ii) provide conservative estimates of the concentrations of atrazine residues in deeper potential drinking water supplies, and (iii) set priorities

  3. Flexible competing risks regression modeling and goodness-of-fit

    DEFF Research Database (Denmark)

    Scheike, Thomas; Zhang, Mei-Jie

    2008-01-01

    In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause...... models that is easy to fit and contains the Fine-Gray model as a special case. One advantage of this approach is that our regression modeling allows for non-proportional hazards. This leads to a new simple goodness-of-fit procedure for the proportional subdistribution hazards assumption that is very easy...... of the flexible regression models to analyze competing risks data when non-proportionality is present in the data....

  4. Retro-regression--another important multivariate regression improvement.

    Science.gov (United States)

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  5. Boosting structured additive quantile regression for longitudinal childhood obesity data.

    Science.gov (United States)

    Fenske, Nora; Fahrmeir, Ludwig; Hothorn, Torsten; Rzehak, Peter; Höhle, Michael

    2013-07-25

    Childhood obesity and the investigation of its risk factors has become an important public health issue. Our work is based on and motivated by a German longitudinal study including 2,226 children with up to ten measurements on their body mass index (BMI) and risk factors from birth to the age of 10 years. We introduce boosting of structured additive quantile regression as a novel distribution-free approach for longitudinal quantile regression. The quantile-specific predictors of our model include conventional linear population effects, smooth nonlinear functional effects, varying-coefficient terms, and individual-specific effects, such as intercepts and slopes. Estimation is based on boosting, a computer intensive inference method for highly complex models. We propose a component-wise functional gradient descent boosting algorithm that allows for penalized estimation of the large variety of different effects, particularly leading to individual-specific effects shrunken toward zero. This concept allows us to flexibly estimate the nonlinear age curves of upper quantiles of the BMI distribution, both on population and on individual-specific level, adjusted for further risk factors and to detect age-varying effects of categorical risk factors. Our model approach can be regarded as the quantile regression analog of Gaussian additive mixed models (or structured additive mean regression models), and we compare both model classes with respect to our obesity data.

  6. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  7. Alternative regression models to assess increase in childhood BMI

    Directory of Open Access Journals (Sweden)

    Mansmann Ulrich

    2008-09-01

    Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  8. Temperature-related mortality estimates after accounting for the cumulative effects of air pollution in an urban area.

    Science.gov (United States)

    Stanišić Stojić, Svetlana; Stanišić, Nemanja; Stojić, Andreja

    2016-07-11

    To propose a new method for including the cumulative mid-term effects of air pollution in the traditional Poisson regression model and compare the temperature-related mortality risk estimates, before and after including air pollution data. The analysis comprised a total of 56,920 residents aged 65 years or older who died from circulatory and respiratory diseases in Belgrade, Serbia, and daily mean PM10, NO2, SO2 and soot concentrations obtained for the period 2009-2014. After accounting for the cumulative effects of air pollutants, the risk associated with cold temperatures was significantly lower and the overall temperature-attributable risk decreased from 8.80 to 3.00 %. Furthermore, the optimum range of temperature, within which no excess temperature-related mortality is expected to occur, was very broad, between -5 and 21 °C, which differs from the previous findings that most of the attributable deaths were associated with mild temperatures. These results suggest that, in polluted areas of developing countries, most of the mortality risk, previously attributed to cold temperatures, can be explained by the mid-term effects of air pollution. The results also showed that the estimated relative importance of PM10 was the smallest of four examined pollutant species, and thus, including PM10 data only is clearly not the most effective way to control for the effects of air pollution.

  9. bayesQR: A Bayesian Approach to Quantile Regression

    Directory of Open Access Journals (Sweden)

    Dries F. Benoit

    2017-01-01

    Full Text Available After its introduction by Koenker and Basset (1978, quantile regression has become an important and popular tool to investigate the conditional response distribution in regression. The R package bayesQR contains a number of routines to estimate quantile regression parameters using a Bayesian approach based on the asymmetric Laplace distribution. The package contains functions for the typical quantile regression with continuous dependent variable, but also supports quantile regression for binary dependent variables. For both types of dependent variables, an approach to variable selection using the adaptive lasso approach is provided. For the binary quantile regression model, the package also contains a routine that calculates the fitted probabilities for each vector of predictors. In addition, functions for summarizing the results, creating traceplots, posterior histograms and drawing quantile plots are included. This paper starts with a brief overview of the theoretical background of the models used in the bayesQR package. The main part of this paper discusses the computational problems that arise in the implementation of the procedure and illustrates the usefulness of the package through selected examples.

  10. Genetic correlations among body condition score, yield, and fertility in first-parity cows estimated by random regression models.

    Science.gov (United States)

    Veerkamp, R F; Koenen, E P; De Jong, G

    2001-10-01

    Twenty type classifiers scored body condition (BCS) of 91,738 first-parity cows from 601 sires and 5518 maternal grandsires. Fertility data during first lactation were extracted for 177,220 cows, of which 67,278 also had a BCS observation, and first-lactation 305-d milk, fat, and protein yields were added for 180,631 cows. Heritabilities and genetic correlations were estimated using a sire-maternal grandsire model. Heritability of BCS was 0.38. Heritabilities for fertility traits were low (0.01 to 0.07), but genetic standard deviations were substantial, 9 d for days to first service and calving interval, 0.25 for number of services, and 5% for first-service conception. Phenotypic correlations between fertility and yield or BCS were small (-0.15 to 0.20). Genetic correlations between yield and all fertility traits were unfavorable (0.37 to 0.74). Genetic correlations with BCS were between -0.4 and -0.6 for calving interval and days to first service. Random regression analysis (RR) showed that correlations changed with days in milk for BCS. Little agreement was found between variances and correlations from RR, and analysis including a single month (mo 1 to 10) of data for BCS, especially during early and late lactation. However, this was due to excluding data from the conventional analysis, rather than due to the polynomials used. RR and a conventional five-traits model where BCS in mo 1, 4, 7, and 10 was treated as a separate traits (plus yield or fertility) gave similar results. Thus a parsimonious random regression model gave more realistic estimates for the (co)variances than a series of bivariate analysis on subsets of the data for BCS. A higher genetic merit for yield has unfavorable effects on fertility, but the genetic correlation suggests that BCS (at some stages of lactation) might help to alleviate the unfavorable effect of selection for higher yield on fertility.

  11. Adaptive metric kernel regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    2000-01-01

    Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...

  12. Differential item functioning (DIF) analyses of health-related quality of life instruments using logistic regression

    DEFF Research Database (Denmark)

    Scott, Neil W; Fayers, Peter M; Aaronson, Neil K

    2010-01-01

    Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise ...... when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application....

  13. Subset selection in regression

    CERN Document Server

    Miller, Alan

    2002-01-01

    Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...

  14. Supremum Norm Posterior Contraction and Credible Sets for Nonparametric Multivariate Regression

    NARCIS (Netherlands)

    Yoo, W.W.; Ghosal, S

    2016-01-01

    In the setting of nonparametric multivariate regression with unknown error variance, we study asymptotic properties of a Bayesian method for estimating a regression function f and its mixed partial derivatives. We use a random series of tensor product of B-splines with normal basis coefficients as a

  15. Applied parameter estimation for chemical engineers

    CERN Document Server

    Englezos, Peter

    2000-01-01

    Formulation of the parameter estimation problem; computation of parameters in linear models-linear regression; Gauss-Newton method for algebraic models; other nonlinear regression methods for algebraic models; Gauss-Newton method for ordinary differential equation (ODE) models; shortcut estimation methods for ODE models; practical guidelines for algorithm implementation; constrained parameter estimation; Gauss-Newton method for partial differential equation (PDE) models; statistical inferences; design of experiments; recursive parameter estimation; parameter estimation in nonlinear thermodynam

  16. Spectral density regression for bivariate extremes

    KAUST Repository

    Castro Camilo, Daniela

    2016-05-11

    We introduce a density regression model for the spectral density of a bivariate extreme value distribution, that allows us to assess how extremal dependence can change over a covariate. Inference is performed through a double kernel estimator, which can be seen as an extension of the Nadaraya–Watson estimator where the usual scalar responses are replaced by mean constrained densities on the unit interval. Numerical experiments with the methods illustrate their resilience in a variety of contexts of practical interest. An extreme temperature dataset is used to illustrate our methods. © 2016 Springer-Verlag Berlin Heidelberg

  17. Regression dilution bias: tools for correction methods and sample size calculation.

    Science.gov (United States)

    Berglund, Lars

    2012-08-01

    Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.

  18. A generalized right truncated bivariate Poisson regression model with applications to health data.

    Science.gov (United States)

    Islam, M Ataharul; Chowdhury, Rafiqul I

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.

  19. Exploring factors associated with traumatic dental injuries in preschool children: a Poisson regression analysis.

    Science.gov (United States)

    Feldens, Carlos Alberto; Kramer, Paulo Floriani; Ferreira, Simone Helena; Spiguel, Mônica Hermann; Marquezan, Marcela

    2010-04-01

    This cross-sectional study aimed to investigate the factors associated with dental trauma in preschool children using Poisson regression analysis with robust variance. The study population comprised 888 children aged 3- to 5-year-old attending public nurseries in Canoas, southern Brazil. Questionnaires assessing information related to the independent variables (age, gender, race, mother's educational level and family income) were completed by the parents. Clinical examinations were carried out by five trained examiners in order to assess traumatic dental injuries (TDI) according to Andreasen's classification. One of the five examiners was calibrated to assess orthodontic characteristics (open bite and overjet). Multivariable Poisson regression analysis with robust variance was used to determine the factors associated with dental trauma as well as the strengths of association. Traditional logistic regression was also performed in order to compare the estimates obtained by both methods of statistical analysis. 36.4% (323/888) of the children suffered dental trauma and there was no difference in prevalence rates from 3 to 5 years of age. Poisson regression analysis showed that the probability of the outcome was almost 30% higher for children whose mothers had more than 8 years of education (Prevalence Ratio = 1.28; 95% CI = 1.03-1.60) and 63% higher for children with an overjet greater than 2 mm (Prevalence Ratio = 1.63; 95% CI = 1.31-2.03). Odds ratios clearly overestimated the size of the effect when compared with prevalence ratios. These findings indicate the need for preventive orientation regarding TDI, in order to educate parents and caregivers about supervising infants, particularly those with increased overjet and whose mothers have a higher level of education. Poisson regression with robust variance represents a better alternative than logistic regression to estimate the risk of dental trauma in preschool children.

  20. ESTIMATION ACCURACY OF EXPONENTIAL DISTRIBUTION PARAMETERS

    Directory of Open Access Journals (Sweden)

    muhammad zahid rashid

    2011-04-01

    Full Text Available The exponential distribution is commonly used to model the behavior of units that have a constant failure rate. The two-parameter exponential distribution provides a simple but nevertheless useful model for the analysis of lifetimes, especially when investigating reliability of technical equipment.This paper is concerned with estimation of parameters of the two parameter (location and scale exponential distribution. We used the least squares method (LSM, relative least squares method (RELS, ridge regression method (RR,  moment estimators (ME, modified moment estimators (MME, maximum likelihood estimators (MLE and modified maximum likelihood estimators (MMLE. We used the mean square error MSE, and total deviation TD, as measurement for the comparison between these methods. We determined the best method for estimation using different values for the parameters and different sample sizes

  1. Regression analysis of case K interval-censored failure time data in the presence of informative censoring.

    Science.gov (United States)

    Wang, Peijie; Zhao, Hui; Sun, Jianguo

    2016-12-01

    Interval-censored failure time data occur in many fields such as demography, economics, medical research, and reliability and many inference procedures on them have been developed (Sun, 2006; Chen, Sun, and Peace, 2012). However, most of the existing approaches assume that the mechanism that yields interval censoring is independent of the failure time of interest and it is clear that this may not be true in practice (Zhang et al., 2007; Ma, Hu, and Sun, 2015). In this article, we consider regression analysis of case K interval-censored failure time data when the censoring mechanism may be related to the failure time of interest. For the problem, an estimated sieve maximum-likelihood approach is proposed for the data arising from the proportional hazards frailty model and for estimation, a two-step procedure is presented. In the addition, the asymptotic properties of the proposed estimators of regression parameters are established and an extensive simulation study suggests that the method works well. Finally, we apply the method to a set of real interval-censored data that motivated this study. © 2016, The International Biometric Society.

  2. The number of subjects per variable required in linear regression analyses

    NARCIS (Netherlands)

    P.C. Austin (Peter); E.W. Steyerberg (Ewout)

    2015-01-01

    textabstractObjectives To determine the number of independent variables that can be included in a linear regression model. Study Design and Setting We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression

  3. Lowland rice yield estimates based on air temperature and solar radiation

    International Nuclear Information System (INIS)

    Pedro Júnior, M.J.; Sentelhas, P.C.; Moraes, A.V.C.; Villela, O.V.

    1995-01-01

    Two regression equations were developed to estimate lowland rice yield as a function of air temperature and incoming solar radiation, during the crop yield production period in Pindamonhangaba, SP, Brazil. The following rice cultivars were used: IAC-242, IAC-100, IAC-101 and IAC-102. The value of optimum air temperature obtained was 25.0°C and of optimum global solar radiation was 475 cal.cm -2 , day -1 . The best agrometeorological model was the one that related least deviation of air temperature and solar radiation in relation to the optimum value obtained through a multiple linear regression. The yield values estimated by the model showed good fit to actual yields of lowland rice (less than 10%). (author) [pt

  4. Online Censoring for Large-Scale Regressions with Application to Streaming Big Data.

    Science.gov (United States)

    Berberidis, Dimitris; Kekatos, Vassilis; Giannakis, Georgios B

    2016-08-01

    On par with data-intensive applications, the sheer size of modern linear regression problems creates an ever-growing demand for efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. This work introduces means of identifying and omitting less informative observations in an online and data-adaptive fashion. Given streaming data, the related maximum-likelihood estimator is sequentially found using first- and second-order stochastic approximation algorithms. These schemes are well suited when data are inherently censored or when the aim is to save communication overhead in decentralized learning setups. In a different operational scenario, the task of joint censoring and estimation is put forth to solve large-scale linear regressions in a centralized setup. Novel online algorithms are developed enjoying simple closed-form updates and provable (non)asymptotic convergence guarantees. To attain desired censoring patterns and levels of dimensionality reduction, thresholding rules are investigated too. Numerical tests on real and synthetic datasets corroborate the efficacy of the proposed data-adaptive methods compared to data-agnostic random projection-based alternatives.

  5. Estimation of morbidity effects

    International Nuclear Information System (INIS)

    Ostro, B.

    1994-01-01

    Many researchers have related exposure to ambient air pollution to respiratory morbidity. To be included in this review and analysis, however, several criteria had to be met. First, a careful study design and a methodology that generated quantitative dose-response estimates were required. Therefore, there was a focus on time-series regression analyses relating daily incidence of morbidity to air pollution in a single city or metropolitan area. Studies that used weekly or monthly average concentrations or that involved particulate measurements in poorly characterized metropolitan areas (e.g., one monitor representing a large region) were not included in this review. Second, studies that minimized confounding ad omitted variables were included. For example, research that compared two cities or regions and characterized them as 'high' and 'low' pollution area were not included because of potential confounding by other factors in the respective areas. Third, concern for the effects of seasonality and weather had to be demonstrated. This could be accomplished by either stratifying and analyzing the data by season, by examining the independent effects of temperature and humidity, and/or by correcting the model for possible autocorrelation. A fourth criterion for study inclusion was that the study had to include a reasonably complete analysis of the data. Such analysis would include an careful exploration of the primary hypothesis as well as possible examination of te robustness and sensitivity of the results to alternative functional forms, specifications, and influential data points. When studies reported the results of these alternative analyses, the quantitative estimates that were judged as most representative of the overall findings were those that were summarized in this paper. Finally, for inclusion in the review of particulate matter, the study had to provide a measure of particle concentration that could be converted into PM10, particulate matter below 10

  6. Adaptive Metric Kernel Regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    1998-01-01

    Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression...... by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...

  7. Estimate the contribution of incubation parameters influence egg hatchability using multiple linear regression analysis.

    Science.gov (United States)

    Khalil, Mohamed H; Shebl, Mostafa K; Kosba, Mohamed A; El-Sabrout, Karim; Zaki, Nesma

    2016-08-01

    This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens' eggs. Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens.

  8. Direct integral linear least square regression method for kinetic evaluation of hepatobiliary scintigraphy

    International Nuclear Information System (INIS)

    Shuke, Noriyuki

    1991-01-01

    In hepatobiliary scintigraphy, kinetic model analysis, which provides kinetic parameters like hepatic extraction or excretion rate, have been done for quantitative evaluation of liver function. In this analysis, unknown model parameters are usually determined using nonlinear least square regression method (NLS method) where iterative calculation and initial estimate for unknown parameters are required. As a simple alternative to NLS method, direct integral linear least square regression method (DILS method), which can determine model parameters by a simple calculation without initial estimate, is proposed, and tested the applicability to analysis of hepatobiliary scintigraphy. In order to see whether DILS method could determine model parameters as good as NLS method, or to determine appropriate weight for DILS method, simulated theoretical data based on prefixed parameters were fitted to 1 compartment model using both DILS method with various weightings and NLS method. The parameter values obtained were then compared with prefixed values which were used for data generation. The effect of various weights on the error of parameter estimate was examined, and inverse of time was found to be the best weight to make the error minimum. When using this weight, DILS method could give parameter values close to those obtained by NLS method and both parameter values were very close to prefixed values. With appropriate weighting, the DILS method could provide reliable parameter estimate which is relatively insensitive to the data noise. In conclusion, the DILS method could be used as a simple alternative to NLS method, providing reliable parameter estimate. (author)

  9. Causal correlation of foliar biochemical concentrations with AVIRIS spectra using forced entry linear regression

    Science.gov (United States)

    Dawson, Terence P.; Curran, Paul J.; Kupiec, John A.

    1995-01-01

    A major goal of airborne imaging spectrometry is to estimate the biochemical composition of vegetation canopies from reflectance spectra. Remotely-sensed estimates of foliar biochemical concentrations of forests would provide valuable indicators of ecosystem function at regional and eventually global scales. Empirical research has shown a relationship exists between the amount of radiation reflected from absorption features and the concentration of given biochemicals in leaves and canopies (Matson et al., 1994, Johnson et al., 1994). A technique commonly used to determine which wavelengths have the strongest correlation with the biochemical of interest is unguided (stepwise) multiple regression. Wavelengths are entered into a multivariate regression equation, in their order of importance, each contributing to the reduction of the variance in the measured biochemical concentration. A significant problem with the use of stepwise regression for determining the correlation between biochemical concentration and spectra is that of 'overfitting' as there are significantly more wavebands than biochemical measurements. This could result in the selection of wavebands which may be more accurately attributable to noise or canopy effects. In addition, there is a real problem of collinearity in that the individual biochemical concentrations may covary. A strong correlation between the reflectance at a given wavelength and the concentration of a biochemical of interest, therefore, may be due to the effect of another biochemical which is closely related. Furthermore, it is not always possible to account for potentially suitable waveband omissions in the stepwise selection procedure. This concern about the suitability of stepwise regression has been identified and acknowledged in a number of recent studies (Wessman et al., 1988, Curran, 1989, Curran et al., 1992, Peterson and Hubbard, 1992, Martine and Aber, 1994, Kupiec, 1994). These studies have pointed to the lack of a physical

  10. Econometric Analysis of the Demand for Pulses in Sri Lanka: An Almost Ideal Estimation with a Censored Regression

    Directory of Open Access Journals (Sweden)

    Lokuge Dona Manori Nimanthika Lokuge

    2015-06-01

    Full Text Available Due to high prevalence of dietary diseases and malnutrition in Sri Lanka, it is essential to assess food consumption patterns. Because pulses are a major source of nutrients, this paper employed the Linear Approximation of the Almost Ideal Demand System (LA/AIDS to estimate price and expenditure elasticities for six types of pulses, by utilizing the Household Income and Expenditure Survey, 2006/07. The infrequency of purchases, a typical problem encountered in LA/AIDS estimation is circumvented by using a probit regression in the first stage, to capture the effect of demographic factors, in consumption choice. Results reveal that the buying decision of pulses is influenced by the sector (rural, urban and estate, household size, education level, presence of children, prevalence of blood pressure and diabetes. All pulses types except dhal are highly responsive to their own prices. Dhal is identified as the most prominent choice among all other alternatives and hence, it is distinguished as a necessity whereas, the rest show luxurious behavior, with the income. Because dhal is an import product, consumption choices of dhal may be severely affected by any action which exporting countries introduce, while rest of the pulses will be affected by both price and income oriented policies.

  11. Models for Estimating Genetic Parameters of Milk Production Traits Using Random Regression Models in Korean Holstein Cattle

    Directory of Open Access Journals (Sweden)

    C. I. Cho

    2016-05-01

    Full Text Available The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs, and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK, fat yield (FAT, protein yield (PROT, and solids-not-fat yield (SNF. The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP of the third to fifth order (L3–L5, fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order. The residual variances in the models were either homogeneous (HOM or heterogeneous (15 classes, HET15; 60 classes, HET60. A total of nine models (3 orders of polynomials×3 types of residual variance including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC and/or Schwarz Bayesian information criteria (BIC statistics to identify the model(s of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF and L4-HET15 (FAT, which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first

  12. Generic global regression models for growth prediction of Salmonella in ground pork and pork cuts

    DEFF Research Database (Denmark)

    Buschhardt, Tasja; Hansen, Tina Beck; Bahl, Martin Iain

    2017-01-01

    Introduction and Objectives Models for the prediction of bacterial growth in fresh pork are primarily developed using two-step regression (i.e. primary models followed by secondary models). These models are also generally based on experiments in liquids or ground meat and neglect surface growth....... It has been shown that one-step global regressions can result in more accurate models and that bacterial growth on intact surfaces can substantially differ from growth in liquid culture. Material and Methods We used a global-regression approach to develop predictive models for the growth of Salmonella....... One part of obtained logtransformed cell counts was used for model development and another for model validation. The Ratkowsky square root model and the relative lag time (RLT) model were integrated into the logistic model with delay. Fitted parameter estimates were compared to investigate the effect...

  13. Direction of Effects in Multiple Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  14. Adaptive Estimation of Heteroscedastic Money Demand Model of Pakistan

    Directory of Open Access Journals (Sweden)

    Muhammad Aslam

    2007-07-01

    Full Text Available For the problem of estimation of Money demand model of Pakistan, money supply (M1 shows heteroscedasticity of the unknown form. For estimation of such model we compare two adaptive estimators with ordinary least squares estimator and show the attractive performance of the adaptive estimators, namely, nonparametric kernel estimator and nearest neighbour regression estimator. These comparisons are made on the basis standard errors of the estimated coefficients, standard error of regression, Akaike Information Criteria (AIC value, and the Durban-Watson statistic for autocorrelation. We further show that nearest neighbour regression estimator performs better when comparing with the other nonparametric kernel estimator.

  15. APLIKASI SPLINE ESTIMATOR TERBOBOT

    Directory of Open Access Journals (Sweden)

    I Nyoman Budiantara

    2001-01-01

    Full Text Available We considered the nonparametric regression model : Zj = X(tj + ej, j = 1,2,…,n, where X(tj is the regression curve. The random error ej are independently distributed normal with a zero mean and a variance s2/bj, bj > 0. The estimation of X obtained by minimizing a Weighted Least Square. The solution of this optimation is a Weighted Spline Polynomial. Further, we give an application of weigted spline estimator in nonparametric regression. Abstract in Bahasa Indonesia : Diberikan model regresi nonparametrik : Zj = X(tj + ej, j = 1,2,…,n, dengan X (tj kurva regresi dan ej sesatan random yang diasumsikan berdistribusi normal dengan mean nol dan variansi s2/bj, bj > 0. Estimasi kurva regresi X yang meminimumkan suatu Penalized Least Square Terbobot, merupakan estimator Polinomial Spline Natural Terbobot. Selanjutnya diberikan suatu aplikasi estimator spline terbobot dalam regresi nonparametrik. Kata kunci: Spline terbobot, Regresi nonparametrik, Penalized Least Square.

  16. Electricity consumption forecasting in Italy using linear regression models

    Energy Technology Data Exchange (ETDEWEB)

    Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio [DIAM, Seconda Universita degli Studi di Napoli, Via Roma 29, 81031 Aversa (CE) (Italy)

    2009-09-15

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of {+-}1% for the best case and {+-}11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  17. Electricity consumption forecasting in Italy using linear regression models

    International Nuclear Information System (INIS)

    Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio

    2009-01-01

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of ±1% for the best case and ±11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  18. A method for nonlinear exponential regression analysis

    Science.gov (United States)

    Junkin, B. G.

    1971-01-01

    A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.

  19. Simple and multiple linear regression: sample size considerations.

    Science.gov (United States)

    Hanley, James A

    2016-11-01

    The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu; Pourahmadi, Mohsen; Maadooliat, Mehdi

    2014-01-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both