variable regression 4-level: Topics by WorldWideScience.org

Sample records for variable regression 4-level

Latent Variable Regression 4-Level Hierarchical Model Using Multisite Multiple-Cohorts Longitudinal Data. CRESST Report 801

Science.gov (United States)

Choi, Kilchan

2011-01-01

This report explores a new latent variable regression 4-level hierarchical model for monitoring school performance over time using multisite multiple-cohorts longitudinal data. This kind of data set has a 4-level hierarchical structure: time-series observation nested within students who are nested within different cohorts of students. These…
Moderation analysis using a two-level regression model.

Science.gov (United States)

Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

2014-10-01

Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.
Variable importance in latent variable regression models

NARCIS (Netherlands)

Kvalheim, O.M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, A.K.; Westerhuis, J.A.

2014-01-01

The quality and practical usefulness of a regression model are a function of both interpretability and prediction performance. This work presents some new graphical tools for improved interpretation of latent variable regression models that can also assist in improved algorithms for variable
Variable Selection for Regression Models of Percentile Flows

Science.gov (United States)

Fouad, G.

2017-12-01

Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high
Variable and subset selection in PLS regression

DEFF Research Database (Denmark)

Høskuldsson, Agnar

2001-01-01

The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...
Independent variable complexity for regional regression of the flow duration curve in ungauged basins

Science.gov (United States)

Fouad, Geoffrey; Skupin, André; Hope, Allen

2016-04-01

The flow duration curve (FDC) is one of the most widely used tools to quantify streamflow. Its percentile flows are often required for water resource applications, but these values must be predicted for ungauged basins with insufficient or no streamflow data. Regional regression is a commonly used approach for predicting percentile flows that involves identifying hydrologic regions and calibrating regression models to each region. The independent variables used to describe the physiographic and climatic setting of the basins are a critical component of regional regression, yet few studies have investigated their effect on resulting predictions. In this study, the complexity of the independent variables needed for regional regression is investigated. Different levels of variable complexity are applied for a regional regression consisting of 918 basins in the US. Both the hydrologic regions and regression models are determined according to the different sets of variables, and the accuracy of resulting predictions is assessed. The different sets of variables include (1) a simple set of three variables strongly tied to the FDC (mean annual precipitation, potential evapotranspiration, and baseflow index), (2) a traditional set of variables describing the average physiographic and climatic conditions of the basins, and (3) a more complex set of variables extending the traditional variables to include statistics describing the distribution of physiographic data and temporal components of climatic data. The latter set of variables is not typically used in regional regression, and is evaluated for its potential to predict percentile flows. The simplest set of only three variables performed similarly to the other more complex sets of variables. Traditional variables used to describe climate, topography, and soil offered little more to the predictions, and the experimental set of variables describing the distribution of basin data in more detail did not improve predictions
transformation of independent variables in polynomial regression ...

African Journals Online (AJOL)

Ada

preferable when possible to work with a simple functional form in transformed variables rather than with a more complicated form in the original variables. In this paper, it is shown that linear transformations applied to independent variables in polynomial regression models affect the t ratio and hence the statistical ...
Prediction of radiation levels in residences: A methodological comparison of CART [Classification and Regression Tree Analysis] and conventional regression

International Nuclear Information System (INIS)

Janssen, I.; Stebbings, J.H.

1990-01-01

In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and ∼200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs
Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection

KAUST Repository

Chen, Lisha

2012-12-01

The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.
A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.

Science.gov (United States)

Meaney, Christopher; Moineddin, Rahim

2014-01-24

In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the
How Robust Is Linear Regression with Dummy Variables?

Science.gov (United States)

Blankmeyer, Eric

2006-01-01

Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…
Integrated Multiscale Latent Variable Regression and Application to Distillation Columns

Directory of Open Access Journals (Sweden)

Muddu Madakyaru

2013-01-01

Full Text Available Proper control of distillation columns requires estimating some key variables that are challenging to measure online (such as compositions, which are usually estimated using inferential models. Commonly used inferential models include latent variable regression (LVR techniques, such as principal component regression (PCR, partial least squares (PLS, and regularized canonical correlation analysis (RCCA. Unfortunately, measured practical data are usually contaminated with errors, which degrade the prediction abilities of inferential models. Therefore, noisy measurements need to be filtered to enhance the prediction accuracy of these models. Multiscale filtering has been shown to be a powerful feature extraction tool. In this work, the advantages of multiscale filtering are utilized to enhance the prediction accuracy of LVR models by developing an integrated multiscale LVR (IMSLVR modeling algorithm that integrates modeling and feature extraction. The idea behind the IMSLVR modeling algorithm is to filter the process data at different decomposition levels, model the filtered data from each level, and then select the LVR model that optimizes a model selection criterion. The performance of the developed IMSLVR algorithm is illustrated using three examples, one using synthetic data, one using simulated distillation column data, and one using experimental packed bed distillation column data. All examples clearly demonstrate the effectiveness of the IMSLVR algorithm over the conventional methods.
Regression calibration with more surrogates than mismeasured variables

KAUST Repository

Kipnis, Victor

2012-06-29

In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449-461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two-stage method. We also discuss extensions to different data structures.
Regression calibration with more surrogates than mismeasured variables

KAUST Repository

Kipnis, Victor; Midthune, Douglas; Freedman, Laurence S.; Carroll, Raymond J.

2012-01-01

In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449-461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two-stage method. We also discuss extensions to different data structures.
Variable selection and model choice in geoadditive regression models.

Science.gov (United States)

Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

2009-06-01

Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.
The number of subjects per variable required in linear regression analyses

NARCIS (Netherlands)

P.C. Austin (Peter); E.W. Steyerberg (Ewout)

2015-01-01

textabstractObjectives To determine the number of independent variables that can be included in a linear regression model. Study Design and Setting We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression
Bayesian approach to errors-in-variables in regression models

Science.gov (United States)

Rozliman, Nur Aainaa; Ibrahim, Adriana Irawati Nur; Yunus, Rossita Mohammad

2017-05-01

In many applications and experiments, data sets are often contaminated with error or mismeasured covariates. When at least one of the covariates in a model is measured with error, Errors-in-Variables (EIV) model can be used. Measurement error, when not corrected, would cause misleading statistical inferences and analysis. Therefore, our goal is to examine the relationship of the outcome variable and the unobserved exposure variable given the observed mismeasured surrogate by applying the Bayesian formulation to the EIV model. We shall extend the flexible parametric method proposed by Hossain and Gustafson (2009) to another nonlinear regression model which is the Poisson regression model. We shall then illustrate the application of this approach via a simulation study using Markov chain Monte Carlo sampling methods.
Temporal Synchronization Analysis for Improving Regression Modeling of Fecal Indicator Bacteria Levels

Science.gov (United States)

Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
Spatial variability in levels of benzene, formaldehyde, and total benzene, toluene, ethylbenzene and xylenes in New York City: a land-use regression study.

Science.gov (United States)

Kheirbek, Iyad; Johnson, Sarah; Ross, Zev; Pezeshki, Grant; Ito, Kazuhiko; Eisl, Holger; Matte, Thomas

2012-07-31

Hazardous air pollutant exposures are common in urban areas contributing to increased risk of cancer and other adverse health outcomes. While recent analyses indicate that New York City residents experience significantly higher cancer risks attributable to hazardous air pollutant exposures than the United States as a whole, limited data exist to assess intra-urban variability in air toxics exposures. To assess intra-urban spatial variability in exposures to common hazardous air pollutants, street-level air sampling for volatile organic compounds and aldehydes was conducted at 70 sites throughout New York City during the spring of 2011. Land-use regression models were developed using a subset of 59 sites and validated against the remaining 11 sites to describe the relationship between concentrations of benzene, total BTEX (benzene, toluene, ethylbenzene, xylenes) and formaldehyde to indicators of local sources, adjusting for temporal variation. Total BTEX levels exhibited the most spatial variability, followed by benzene and formaldehyde (coefficient of variation of temporally adjusted measurements of 0.57, 0.35, 0.22, respectively). Total roadway length within 100 m, traffic signal density within 400 m of monitoring sites, and an indicator of temporal variation explained 65% of the total variability in benzene while 70% of the total variability in BTEX was accounted for by traffic signal density within 450 m, density of permitted solvent-use industries within 500 m, and an indicator of temporal variation. Measures of temporal variation, traffic signal density within 400 m, road length within 100 m, and interior building area within 100 m (indicator of heating fuel combustion) predicted 83% of the total variability of formaldehyde. The models built with the modeling subset were found to predict concentrations well, predicting 62% to 68% of monitored values at validation sites. Traffic and point source emissions cause substantial variation in street-level exposures
Cluster regression model and level fluctuation features of Van Lake, Turkey

Directory of Open Access Journals (Sweden)

Z. Şen

1999-02-01

Full Text Available Lake water levels change under the influences of natural and/or anthropogenic environmental conditions. Among these influences are the climate change, greenhouse effects and ozone layer depletions which are reflected in the hydrological cycle features over the lake drainage basins. Lake levels are among the most significant hydrological variables that are influenced by different atmospheric and environmental conditions. Consequently, lake level time series in many parts of the world include nonstationarity components such as shifts in the mean value, apparent or hidden periodicities. On the other hand, many lake level modeling techniques have a stationarity assumption. The main purpose of this work is to develop a cluster regression model for dealing with nonstationarity especially in the form of shifting means. The basis of this model is the combination of transition probability and classical regression technique. Both parts of the model are applied to monthly level fluctuations of Lake Van in eastern Turkey. It is observed that the cluster regression procedure does preserve the statistical properties and the transitional probabilities that are indistinguishable from the original data.Key words. Hydrology (hydrologic budget; stochastic processes · Meteorology and atmospheric dynamics (ocean-atmosphere interactions

Cluster regression model and level fluctuation features of Van Lake, Turkey

Directory of Open Access Journals (Sweden)

Z. Şen

Full Text Available Lake water levels change under the influences of natural and/or anthropogenic environmental conditions. Among these influences are the climate change, greenhouse effects and ozone layer depletions which are reflected in the hydrological cycle features over the lake drainage basins. Lake levels are among the most significant hydrological variables that are influenced by different atmospheric and environmental conditions. Consequently, lake level time series in many parts of the world include nonstationarity components such as shifts in the mean value, apparent or hidden periodicities. On the other hand, many lake level modeling techniques have a stationarity assumption. The main purpose of this work is to develop a cluster regression model for dealing with nonstationarity especially in the form of shifting means. The basis of this model is the combination of transition probability and classical regression technique. Both parts of the model are applied to monthly level fluctuations of Lake Van in eastern Turkey. It is observed that the cluster regression procedure does preserve the statistical properties and the transitional probabilities that are indistinguishable from the original data.

Key words. Hydrology (hydrologic budget; stochastic processes · Meteorology and atmospheric dynamics (ocean-atmosphere interactions
Spatial variability in levels of benzene, formaldehyde, and total benzene, toluene, ethylbenzene and xylenes in New York City: a land-use regression study

Directory of Open Access Journals (Sweden)

Kheirbek Iyad

2012-07-01

Full Text Available Abstract Background Hazardous air pollutant exposures are common in urban areas contributing to increased risk of cancer and other adverse health outcomes. While recent analyses indicate that New York City residents experience significantly higher cancer risks attributable to hazardous air pollutant exposures than the United States as a whole, limited data exist to assess intra-urban variability in air toxics exposures. Methods To assess intra-urban spatial variability in exposures to common hazardous air pollutants, street-level air sampling for volatile organic compounds and aldehydes was conducted at 70 sites throughout New York City during the spring of 2011. Land-use regression models were developed using a subset of 59 sites and validated against the remaining 11 sites to describe the relationship between concentrations of benzene, total BTEX (benzene, toluene, ethylbenzene, xylenes and formaldehyde to indicators of local sources, adjusting for temporal variation. Results Total BTEX levels exhibited the most spatial variability, followed by benzene and formaldehyde (coefficient of variation of temporally adjusted measurements of 0.57, 0.35, 0.22, respectively. Total roadway length within 100 m, traffic signal density within 400 m of monitoring sites, and an indicator of temporal variation explained 65% of the total variability in benzene while 70% of the total variability in BTEX was accounted for by traffic signal density within 450 m, density of permitted solvent-use industries within 500 m, and an indicator of temporal variation. Measures of temporal variation, traffic signal density within 400 m, road length within 100 m, and interior building area within 100 m (indicator of heating fuel combustion predicted 83% of the total variability of formaldehyde. The models built with the modeling subset were found to predict concentrations well, predicting 62% to 68% of monitored values at validation sites. Conclusions Traffic and
Fasting Glucose and the Risk of Depressive Symptoms: Instrumental-Variable Regression in the Cardiovascular Risk in Young Finns Study.

Science.gov (United States)

Wesołowska, Karolina; Elovainio, Marko; Hintsa, Taina; Jokela, Markus; Pulkki-Råback, Laura; Pitkänen, Niina; Lipsanen, Jari; Tukiainen, Janne; Lyytikäinen, Leo-Pekka; Lehtimäki, Terho; Juonala, Markus; Raitakari, Olli; Keltikangas-Järvinen, Liisa

2017-12-01

Type 2 diabetes (T2D) has been associated with depressive symptoms, but the causal direction of this association and the underlying mechanisms, such as increased glucose levels, remain unclear. We used instrumental-variable regression with a genetic instrument (Mendelian randomization) to examine a causal role of increased glucose concentrations in the development of depressive symptoms. Data were from the population-based Cardiovascular Risk in Young Finns Study (n = 1217). Depressive symptoms were assessed in 2012 using a modified Beck Depression Inventory (BDI-I). Fasting glucose was measured concurrently with depressive symptoms. A genetic risk score for fasting glucose (with 35 single nucleotide polymorphisms) was used as an instrumental variable for glucose. Glucose was not associated with depressive symptoms in the standard linear regression (B = -0.04, 95% CI [-0.12, 0.04], p = .34), but the instrumental-variable regression showed an inverse association between glucose and depressive symptoms (B = -0.43, 95% CI [-0.79, -0.07], p = .020). The difference between the estimates of standard linear regression and instrumental-variable regression was significant (p = .026) CONCLUSION: Our results suggest that the association between T2D and depressive symptoms is unlikely to be caused by increased glucose concentrations. It seems possible that T2D might be linked to depressive symptoms due to low glucose levels.
The number of subjects per variable required in linear regression analyses.

Science.gov (United States)

Austin, Peter C; Steyerberg, Ewout W

2015-06-01

To determine the number of independent variables that can be included in a linear regression model. We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R(2) of the fitted model. A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R(2), although adjusted R(2) estimates behaved well. The bias in estimating the model R(2) statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Bayesian Group Bridge for Bi-level Variable Selection.

Science.gov (United States)

Mallick, Himel; Yi, Nengjun

2017-06-01

A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.
Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

Science.gov (United States)

Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

2008-04-01

Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
Robust best linear estimation for regression analysis using surrogate and instrumental variables.

Science.gov (United States)

Wang, C Y

2012-04-01

We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case-control study.
Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection

KAUST Repository

Chen, Lisha; Huang, Jianhua Z.

2012-01-01

and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group
Uncovering state-dependent relationships in shallow lakes using Bayesian latent variable regression.

Science.gov (United States)

Vitense, Kelsey; Hanson, Mark A; Herwig, Brian R; Zimmer, Kyle D; Fieberg, John

2018-03-01

Ecosystems sometimes undergo dramatic shifts between contrasting regimes. Shallow lakes, for instance, can transition between two alternative stable states: a clear state dominated by submerged aquatic vegetation and a turbid state dominated by phytoplankton. Theoretical models suggest that critical nutrient thresholds differentiate three lake types: highly resilient clear lakes, lakes that may switch between clear and turbid states following perturbations, and highly resilient turbid lakes. For effective and efficient management of shallow lakes and other systems, managers need tools to identify critical thresholds and state-dependent relationships between driving variables and key system features. Using shallow lakes as a model system for which alternative stable states have been demonstrated, we developed an integrated framework using Bayesian latent variable regression (BLR) to classify lake states, identify critical total phosphorus (TP) thresholds, and estimate steady state relationships between TP and chlorophyll a (chl a) using cross-sectional data. We evaluated the method using data simulated from a stochastic differential equation model and compared its performance to k-means clustering with regression (KMR). We also applied the framework to data comprising 130 shallow lakes. For simulated data sets, BLR had high state classification rates (median/mean accuracy >97%) and accurately estimated TP thresholds and state-dependent TP-chl a relationships. Classification and estimation improved with increasing sample size and decreasing noise levels. Compared to KMR, BLR had higher classification rates and better approximated the TP-chl a steady state relationships and TP thresholds. We fit the BLR model to three different years of empirical shallow lake data, and managers can use the estimated bifurcation diagrams to prioritize lakes for management according to their proximity to thresholds and chance of successful rehabilitation. Our model improves upon
A simulation study on Bayesian Ridge regression models for several collinearity levels

Science.gov (United States)

Efendi, Achmad; Effrihan

2017-12-01

When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.
Geographically weighted negative binomial regression applied to zonal level safety performance models.

Science.gov (United States)

Gomes, Marcos José Timbó Lima; Cunto, Flávio; da Silva, Alan Ricardo

2017-09-01

Generalized Linear Models (GLM) with negative binomial distribution for errors, have been widely used to estimate safety at the level of transportation planning. The limited ability of this technique to take spatial effects into account can be overcome through the use of local models from spatial regression techniques, such as Geographically Weighted Poisson Regression (GWPR). Although GWPR is a system that deals with spatial dependency and heterogeneity and has already been used in some road safety studies at the planning level, it fails to account for the possible overdispersion that can be found in the observations on road-traffic crashes. Two approaches were adopted for the Geographically Weighted Negative Binomial Regression (GWNBR) model to allow discrete data to be modeled in a non-stationary form and to take note of the overdispersion of the data: the first examines the constant overdispersion for all the traffic zones and the second includes the variable for each spatial unit. This research conducts a comparative analysis between non-spatial global crash prediction models and spatial local GWPR and GWNBR at the level of traffic zones in Fortaleza/Brazil. A geographic database of 126 traffic zones was compiled from the available data on exposure, network characteristics, socioeconomic factors and land use. The models were calibrated by using the frequency of injury crashes as a dependent variable and the results showed that GWPR and GWNBR achieved a better performance than GLM for the average residuals and likelihood as well as reducing the spatial autocorrelation of the residuals, and the GWNBR model was more able to capture the spatial heterogeneity of the crash frequency. Copyright © 2017 Elsevier Ltd. All rights reserved.
Interpreting Multiple Linear Regression: A Guidebook of Variable Importance

Science.gov (United States)

Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim

2012-01-01

Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…
Modified Regression Correlation Coefficient for Poisson Regression Model

Science.gov (United States)

Kaengthong, Nattacha; Domthong, Uthumporn

2017-09-01

This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Exhaustive Search for Sparse Variable Selection in Linear Regression

Science.gov (United States)

Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato

2018-04-01

We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.
Regression Analysis to Identify Factors Associated with Urinary Iodine Concentration at the Sub-National Level in India, Ghana, and Senegal

Directory of Open Access Journals (Sweden)

Jacky Knowles

2018-04-01

Full Text Available Single and multiple variable regression analyses were conducted using data from stratified, cluster sample design, iodine surveys in India, Ghana, and Senegal to identify factors associated with urinary iodine concentration (UIC among women of reproductive age (WRA at the national and sub-national level. Subjects were survey household respondents, typically WRA. For all three countries, UIC was significantly different (p < 0.05 by household salt iodine category. Other significant differences were by strata and by household vulnerability to poverty in India and Ghana. In multiple variable regression analysis, UIC was significantly associated with strata and household salt iodine category in India and Ghana (p < 0.001. Estimated UIC was 1.6 (95% confidence intervals (CI 1.3, 2.0 times higher (India and 1.4 (95% CI 1.2, 1.6 times higher (Ghana among WRA from households using adequately iodised salt than among WRA from households using non-iodised salt. Other significant associations with UIC were found in India, with having heard of iodine deficiency (1.2 times higher; CI 1.1, 1.3; p < 0.001 and having improved dietary diversity (1.1 times higher, CI 1.0, 1.2; p = 0.015; and in Ghana, with the level of tomato paste consumption the previous week (p = 0.029 (UIC for highest consumption level was 1.2 times lowest level; CI 1.1, 1.4. No significant associations were found in Senegal. Sub-national data on iodine status are required to assess equity of access to optimal iodine intake and to develop strategic responses as needed.
Variable selection in Logistic regression model with genetic algorithm.

Science.gov (United States)

Zhang, Zhongheng; Trevino, Victor; Hoseini, Sayed Shahabuddin; Belciug, Smaranda; Boopathi, Arumugam Manivanna; Zhang, Ping; Gorunescu, Florin; Subha, Velappan; Dai, Songshi

2018-02-01

Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.
Predictors of Placement Stability at the State Level: The Use of Logistic Regression to Inform Practice

Science.gov (United States)

Courtney, Jon R.; Prophet, Retta

2011-01-01

Placement instability is often associated with a number of negative outcomes for children. To gain state level contextual knowledge of factors associated with placement stability/instability, logistic regression was applied to selected variables from the New Mexico Adoption and Foster Care Administrative Reporting System dataset. Predictors…
Introduction to statistical modelling 2: categorical variables and interactions in linear regression.

Science.gov (United States)

Lunt, Mark

2015-07-01

In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Sea-level variability over five glacial cycles.

Science.gov (United States)

Grant, K M; Rohling, E J; Ramsey, C Bronk; Cheng, H; Edwards, R L; Florindo, F; Heslop, D; Marra, F; Roberts, A P; Tamisiea, M E; Williams, F

2014-09-25

Research on global ice-volume changes during Pleistocene glacial cycles is hindered by a lack of detailed sea-level records for time intervals older than the last interglacial. Here we present the first robustly dated, continuous and highly resolved records of Red Sea sea level and rates of sea-level change over the last 500,000 years, based on tight synchronization to an Asian monsoon record. We observe maximum 'natural' (pre-anthropogenic forcing) sea-level rise rates below 2 m per century following periods with up to twice present-day ice volumes, and substantially higher rise rates for greater ice volumes. We also find that maximum sea-level rise rates were attained within 2 kyr of the onset of deglaciations, for 85% of such events. Finally, multivariate regressions of orbital parameters, sea-level and monsoon records suggest that major meltwater pulses account for millennial-scale variability and insolation-lagged responses in Asian monsoon records.
Advanced statistics: linear regression, part I: simple linear regression.

Science.gov (United States)

Marill, Keith A

2004-01-01

Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

A Spline-Based Lack-Of-Fit Test for Independent Variable Effect in Poisson Regression.

Science.gov (United States)

Li, Chin-Shang; Tu, Wanzhu

2007-05-01

In regression analysis of count data, independent variables are often modeled by their linear effects under the assumption of log-linearity. In reality, the validity of such an assumption is rarely tested, and its use is at times unjustifiable. A lack-of-fit test is proposed for the adequacy of a postulated functional form of an independent variable within the framework of semiparametric Poisson regression models based on penalized splines. It offers added flexibility in accommodating the potentially non-loglinear effect of the independent variable. A likelihood ratio test is constructed for the adequacy of the postulated parametric form, for example log-linearity, of the independent variable effect. Simulations indicate that the proposed model performs well, and misspecified parametric model has much reduced power. An example is given.
Groundwater level prediction of landslide based on classification and regression tree

Directory of Open Access Journals (Sweden)

Yannan Zhao

2016-09-01

Full Text Available According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree (CART model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15% respectively. To compare the support vector machine (SVM model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
Analysis of Student and School Level Variables Related to Mathematics Self-Efficacy Level Based on PISA 2012 Results for China-Shanghai, Turkey, and Greece

Science.gov (United States)

Usta, H. Gonca

2016-01-01

This study aims to analyze the student and school level variables that affect students' self-efficacy levels in mathematics in China-Shanghai, Turkey, and Greece based on PISA 2012 results. In line with this purpose, the hierarchical linear regression model (HLM) was employed. The interschool variability is estimated at approximately 17% in…
Continuous-variable quantum Gaussian process regression and quantum singular value decomposition of nonsparse low-rank matrices

Science.gov (United States)

Das, Siddhartha; Siopsis, George; Weedbrook, Christian

2018-02-01

With the significant advancement in quantum computation during the past couple of decades, the exploration of machine-learning subroutines using quantum strategies has become increasingly popular. Gaussian process regression is a widely used technique in supervised classical machine learning. Here we introduce an algorithm for Gaussian process regression using continuous-variable quantum systems that can be realized with technology based on photonic quantum computers under certain assumptions regarding distribution of data and availability of efficient quantum access. Our algorithm shows that by using a continuous-variable quantum computer a dramatic speedup in computing Gaussian process regression can be achieved, i.e., the possibility of exponentially reducing the time to compute. Furthermore, our results also include a continuous-variable quantum-assisted singular value decomposition method of nonsparse low rank matrices and forms an important subroutine in our Gaussian process regression algorithm.
Avoiding and Correcting Bias in Score-Based Latent Variable Regression with Discrete Manifest Items

Science.gov (United States)

Lu, Irene R. R.; Thomas, D. Roland

2008-01-01

This article considers models involving a single structural equation with latent explanatory and/or latent dependent variables where discrete items are used to measure the latent variables. Our primary focus is the use of scores as proxies for the latent variables and carrying out ordinary least squares (OLS) regression on such scores to estimate…
History of Aral Sea level variability and current scientific debates

Science.gov (United States)

Cretaux, Jean-François; Letolle, René; Bergé-Nguyen, Muriel

2013-11-01

The Aral Sea has shrunk drastically over the past 50 years, largely due to water abstraction from the Amu Darya and Syr Darya rivers for land irrigation. Over a longer timescale, Holocene palaeolimnological reconstruction of variability in water levels of the Aral Sea since 11,700 BP indicates a long history of alternating phases of regression and transgression, which have been attributed variously to climate, tectonic and anthropogenic forcing. The hydrological history of the Aral Sea has been investigated by application of a variety of scientific approaches, including archaeology, palaeolimnological palaeoclimate reconstruction, geophysics, sedimentology, and more recently, space science. Many issues concerning lake level variability over the Holocene and more recent timescales, and the processes that drive the changes, are still a matter for active debate. Our aim in this article is to review the current debates regarding key issues surrounding the causes and magnitude of Aral Sea level variability on a variety of timescales from months to thousands of years. Many researchers have shown that the main driving force of Aral Sea regressions and transgressions is climate change, while other authors have argued that anthropogenic forcing is the main cause of Aral Sea water level variations over the Holocene. Particular emphasis is made on contributions from satellite remote sensing data in order to improve our understanding of the influence of groundwater on the current hydrological water budget of the Aral Sea since 2005. Over this period of time, water balance computation has been performed and has shown that the underground water inflow to the Aral Sea is close to zero with an uncertainty of 3 km3/year.
Topex-Poseidon analysis of sea level variability over the Atlantic Ocean

Science.gov (United States)

Catalan P-U, M.; Villares, P.; Catalan, M.; Gomez-Enri, J.

2003-04-01

The variability of sea level and surface geostrophic currents in Atlantic Ocean is investigated using 333 cycles of altimeter information obtained by TOPEX-POSEIDON satellite. After the improvements of orbit accuracy, the most important concern to studies of sea level variability from altimeter height data are related with the formalism used for modelling the altimetric measurement corrections. Presently, one of the main sources of potential error is the correction for atmospheric pressure loading, the so-called ‘inverse barometer effect’. As is well known, this correction is intended to adjust the sea surface elevation for the static effects of the downward force of the mass of the atmosphere on the sea surface, adjusted, in this oversimplified model in 1cm/mbar. The exact response of the sea surface to atmospheric pressure loading depends on the space and time scales of the pressure field and must be specially a concern at high latitudes where atmospheric pressure fluctuations are large due to the intensity of low pressure fields at these latitudes and the additional uncertainty in the model estimates of the local sea level pressure. To study these effects over the whole Atlantic Ocean we compute a linear regression adjustment and an Empirical Orthogonal Functions Decomposition (EOFD), between sea level variation without inverse barometer correction and the atmospheric pressure, in all the Topex-Poseidon cross points over the whole Atlantic, including both the Artic and Antarctic Oceans. We use the barometric factor computed from the linear regression to correct the satellite mean sea level variation, comparing the correlation with the pressure. Our results show an important improvement in the decorrelation between sea level and atmospheric pressure time series, compared with the use of Inverse Barometer model, at most of the satellite cross points. The complicated nature of sea level variability at high latitudes justify that EOFD analysis conclusions
Coupled variable selection for regression modeling of complex treatment patterns in a clinical cancer registry.

Science.gov (United States)

Schmidtmann, I; Elsäßer, A; Weinmann, A; Binder, H

2014-12-30

For determining a manageable set of covariates potentially influential with respect to a time-to-event endpoint, Cox proportional hazards models can be combined with variable selection techniques, such as stepwise forward selection or backward elimination based on p-values, or regularized regression techniques such as component-wise boosting. Cox regression models have also been adapted for dealing with more complex event patterns, for example, for competing risks settings with separate, cause-specific hazard models for each event type, or for determining the prognostic effect pattern of a variable over different landmark times, with one conditional survival model for each landmark. Motivated by a clinical cancer registry application, where complex event patterns have to be dealt with and variable selection is needed at the same time, we propose a general approach for linking variable selection between several Cox models. Specifically, we combine score statistics for each covariate across models by Fisher's method as a basis for variable selection. This principle is implemented for a stepwise forward selection approach as well as for a regularized regression technique. In an application to data from hepatocellular carcinoma patients, the coupled stepwise approach is seen to facilitate joint interpretation of the different cause-specific Cox models. In conditional survival models at landmark times, which address updates of prediction as time progresses and both treatment and other potential explanatory variables may change, the coupled regularized regression approach identifies potentially important, stably selected covariates together with their effect time pattern, despite having only a small number of events. These results highlight the promise of the proposed approach for coupling variable selection between Cox models, which is particularly relevant for modeling for clinical cancer registries with their complex event patterns. Copyright © 2014 John Wiley & Sons
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method

Science.gov (United States)

Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.

2017-04-01

Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP
Penalized regression procedures for variable selection in the potential outcomes framework.

Science.gov (United States)

Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L

2015-05-10

A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple 'impute, then select' class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data, and imputation are drawn. A difference least absolute shrinkage and selection operator algorithm is defined, along with its multiple imputation analogs. The procedures are illustrated using a well-known right-heart catheterization dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Effects of categorization method, regression type, and variable distribution on the inflation of Type-I error rate when categorizing a confounding variable.

Science.gov (United States)

Barnwell-Ménard, Jean-Louis; Li, Qing; Cohen, Alan A

2015-03-15

The loss of signal associated with categorizing a continuous variable is well known, and previous studies have demonstrated that this can lead to an inflation of Type-I error when the categorized variable is a confounder in a regression analysis estimating the effect of an exposure on an outcome. However, it is not known how the Type-I error may vary under different circumstances, including logistic versus linear regression, different distributions of the confounder, and different categorization methods. Here, we analytically quantified the effect of categorization and then performed a series of 9600 Monte Carlo simulations to estimate the Type-I error inflation associated with categorization of a confounder under different regression scenarios. We show that Type-I error is unacceptably high (>10% in most scenarios and often 100%). The only exception was when the variable categorized was a continuous mixture proxy for a genuinely dichotomous latent variable, where both the continuous proxy and the categorized variable are error-ridden proxies for the dichotomous latent variable. As expected, error inflation was also higher with larger sample size, fewer categories, and stronger associations between the confounder and the exposure or outcome. We provide online tools that can help researchers estimate the potential error inflation and understand how serious a problem this is. Copyright © 2014 John Wiley & Sons, Ltd.
Development of planning level transportation safety tools using Geographically Weighted Poisson Regression.

Science.gov (United States)

Hadayeghi, Alireza; Shalaby, Amer S; Persaud, Bhagwant N

2010-03-01

A common technique used for the calibration of collision prediction models is the Generalized Linear Modeling (GLM) procedure with the assumption of Negative Binomial or Poisson error distribution. In this technique, fixed coefficients that represent the average relationship between the dependent variable and each explanatory variable are estimated. However, the stationary relationship assumed may hide some important spatial factors of the number of collisions at a particular traffic analysis zone. Consequently, the accuracy of such models for explaining the relationship between the dependent variable and the explanatory variables may be suspected since collision frequency is likely influenced by many spatially defined factors such as land use, demographic characteristics, and traffic volume patterns. The primary objective of this study is to investigate the spatial variations in the relationship between the number of zonal collisions and potential transportation planning predictors, using the Geographically Weighted Poisson Regression modeling technique. The secondary objective is to build on knowledge comparing the accuracy of Geographically Weighted Poisson Regression models to that of Generalized Linear Models. The results show that the Geographically Weighted Poisson Regression models are useful for capturing spatially dependent relationships and generally perform better than the conventional Generalized Linear Models. Copyright 2009 Elsevier Ltd. All rights reserved.
Multiresponse semiparametric regression for modelling the effect of regional socio-economic variables on the use of information technology

Science.gov (United States)

Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania

2017-03-01

Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Advanced statistics: linear regression, part II: multiple linear regression.

Science.gov (United States)

Marill, Keith A

2004-01-01

The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Regression Analysis to Identify Factors Associated with Urinary Iodine Concentration at the Sub-National Level in India, Ghana, and Senegal

Science.gov (United States)

Knowles, Jacky; Kupka, Roland; Dumble, Sam; Garrett, Greg S.; Pandav, Chandrakant S.; Yadav, Kapil; Touré, Ndeye Khady; Foriwa Amoaful, Esi; Gorstein, Jonathan

2018-01-01

Single and multiple variable regression analyses were conducted using data from stratified, cluster sample design, iodine surveys in India, Ghana, and Senegal to identify factors associated with urinary iodine concentration (UIC) among women of reproductive age (WRA) at the national and sub-national level. Subjects were survey household respondents, typically WRA. For all three countries, UIC was significantly different (p regression analysis, UIC was significantly associated with strata and household salt iodine category in India and Ghana (p < 0.001). Estimated UIC was 1.6 (95% confidence intervals (CI) 1.3, 2.0) times higher (India) and 1.4 (95% CI 1.2, 1.6) times higher (Ghana) among WRA from households using adequately iodised salt than among WRA from households using non-iodised salt. Other significant associations with UIC were found in India, with having heard of iodine deficiency (1.2 times higher; CI 1.1, 1.3; p < 0.001) and having improved dietary diversity (1.1 times higher, CI 1.0, 1.2; p = 0.015); and in Ghana, with the level of tomato paste consumption the previous week (p = 0.029) (UIC for highest consumption level was 1.2 times lowest level; CI 1.1, 1.4). No significant associations were found in Senegal. Sub-national data on iodine status are required to assess equity of access to optimal iodine intake and to develop strategic responses as needed. PMID:29690505
The use of cognitive ability measures as explanatory variables in regression analysis.

Science.gov (United States)

Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J

2012-12-01

Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual's wage, or a decision such as an individual's education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score , constructed via standard psychometric practice from individuals' responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a "mixed effects structural equations" (MESE) model, may be more appropriate in many circumstances.
A comparison on parameter-estimation methods in multiple regression analysis with existence of multicollinearity among independent variables

Directory of Open Access Journals (Sweden)

Hukharnsusatrue, A.

2005-11-01

Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than
Serum bilirubin levels are positively associated with glycemic variability in women with type 2 diabetes.

Science.gov (United States)

Kim, Lee Kyung; Roh, Eun; Kim, Min Joo; Kim, Min Kyeong; Park, Kyeong Seon; Kwak, Soo Heon; Cho, Young Min; Park, Kyong Soo; Jang, Hak Chul; Jung, Hye Seung

2016-11-01

Glycemic variability is known to induce oxidative stress. We investigated the relationships between glycemic variability and serum bilirubin levels, an endogenous anti-oxidant, in patients with diabetes. A cross-sectional study was carried out with 77 patients with type 2 diabetes who had been recruited to two clinical studies from 2008 to 2014. There were no participants with diseases of the pancreas, liver, biliary tract and chronic renal insufficiency. Glycemic variation was calculated by a continuous glucose monitoring system, and correlation analyses were carried out to evaluate their association with bilirubin levels. Multiple linear regression was carried out to identify independent factors influencing bilirubin levels and glycemic variation. Among the participants, 42.3% were men. The mean (standard deviation) age was 61.5 years (10.4 years), body mass index was 24.2 kg/m 2 (2.8 kg/m 2 ), diabetes duration was 17.7 years (9.5 years), hemoglobin A 1c was 60.7 mmol/mol (7.1 mmol/mol; 7.7 [0.7]%) and bilirubin was 11.8 μmol/L (4.10 μmol/L). Serum bilirubin levels were not different according to age, body mass index and hemoglobin A 1c . However, the mean amplitude of glucose excursion was positively associated with bilirubin levels in women (r = 0.588, P bilirubin and mean amplitude of glucose excursion remained significant (r = 0.566, P bilirubin was an independent determinant for the mean amplitude of glucose excursion in women. 1,5-Anhydroglucitol was also associated with bilirubin levels in women. Bilirubin level within the physiological range might be an independent predictor for glycemic variability in women with type 2 diabetes. © 2016 The Authors. Journal of Diabetes Investigation published by Asian Association for the Study of Diabetes (AASD) and John Wiley & Sons Australia, Ltd.
Vector regression introduced

Directory of Open Access Journals (Sweden)

Mok Tik

2014-06-01

Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.
Statistical modelling for precision agriculture: A case study in optimal environmental schedules for Agaricus Bisporus production via variable domain functional regression

Science.gov (United States)

Panayi, Efstathios; Kyriakides, George

2017-01-01

Quantifying the effects of environmental factors over the duration of the growing process on Agaricus Bisporus (button mushroom) yields has been difficult, as common functional data analysis approaches require fixed length functional data. The data available from commercial growers, however, is of variable duration, due to commercial considerations. We employ a recently proposed regression technique termed Variable-Domain Functional Regression in order to be able to accommodate these irregular-length datasets. In this way, we are able to quantify the contribution of covariates such as temperature, humidity and water spraying volumes across the growing process, and for different lengths of growing processes. Our results indicate that optimal oxygen and temperature levels vary across the growing cycle and we propose environmental schedules for these covariates to optimise overall yields. PMID:28961254

Impact of Psychological Variables on Playing Ability of University Level Soccer Players

Directory of Open Access Journals (Sweden)

Ertan Tufekcioglu

2014-10-01

Full Text Available The purpose of the study was to find out the relationship between psychological variables and soccer playing ability among the university level male players. 42 soccer players representing different universities who participated in inter university competitions were selected as the subjects of the study. The dependent variable was soccer playing ability and independent variables were the selected psychological variables. Soccer playing ability was determined through a 10 point scale at the time of competitions. Psychological variables included achievement motivation, anxiety, self-concept and aggression. The data was statistically analyzed using Karl Pearson’s correlation coefficient and multiple regression analysis using SPSS. It was concluded that soccer playing ability has a positive correlation with achievement motivation and self-concept whereas anxiety and aggression have a negative correlation with soccer playing ability.
Multiple linear stepwise regression of liver lipid levels: proton MR spectroscopy study in vivo at 3.0 T

International Nuclear Information System (INIS)

Xu Li; Liang Changhong; Xiao Yuanqiu; Zhang Zhonglin

2010-01-01

Objective: To analyze the correlations between liver lipid level determined by liver 3.0 T 1 H-MRS in vivo and influencing factors using multiple linear stepwise regression. Methods: The prospective study of liver 1 H-MRS was performed with 3.0 T system and eight-channel torso phased-array coils using PRESS sequence. Forty-four volunteers were enrolled in this study. Liver spectra were collected with a TR of 1500 ms, TE of 30 ms, volume of interest of 2 cm×2 cm×2 cm, NSA of 64 times. The acquired raw proton MRS data were processed by using a software program SAGE. For each MRS measurement, using water as the internal reference, the amplitude of the lipid signal was normalized to the sum of the signal from lipid and water to obtain percentage lipid within the liver. The statistical description of height, weight, age and BMI, Line width and water suppression were recorded, and Pearson analysis was applied to test their relationships. Multiple linear stepwise regression was used to set the statistical model for the prediction of Liver lipid content. Results: Age (39.1±12.6) years, body weight (64.4±10.4) kg, BMI (23.3±3.1) kg/m 2 , linewidth (18.9±4.4) and the water suppression (90.7±6.5)% had significant correlation with liver lipid content (0.00 to 0.96%, median 0.02%), r were 0.11, 0.44, 0.40, 0.52, -0.73 respectively (P<0.05). But only age, BMI, line width, and the water suppression entered into the multiple linear regression equation. Liver lipid content prediction equation was as follows: Y= 1.395 - (0.021×water suppression) + (0.022×BMI) + (0.014×line width) - (0.004×age), and the coefficient of determination was 0. 613, corrected coefficient of determination was 0.59. Conclusion: The regression model fitted well, since the variables of age, BMI, width, and water suppression can explain about 60% of liver lipid content changes. (authors)
An improved multiple linear regression and data analysis computer program package

Science.gov (United States)

Sidik, S. M.

1972-01-01

NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
New insights on water level variability for Lake Turkana for the past 15 ka and at 150 ka from relict beaches

Science.gov (United States)

Forman, S. L.; Wright, D.

2015-12-01

Relict beaches adjacent to Lake Turkana provide a record of water level variability for the Late Quaternary. This study focused on deciphering the geomorphology, sedimentology, stratigraphy and 14C chronology of strand plain sequences in the Kalokol and Lothagam areas. Nine >30 m oscillations in water level were documented between ca. 15 and 4 ka. The earliest oscillation between ca. 14.5 and 13 ka is not well constrained with water level to at least 70 m above the present surface and subsequently fell to at least 50 m. Lake level increased to ~ 90 m between ca. 11.2 and 10.4 ka, post Younger Dryas cooling. Water level fell by >30 m by 10.2 ka, with another potential rise at ca. 8.5 ka to >70 m above current level. Lake level regressed by > 40 m at 8.2 ka coincident with cooling in the equatorial Eastern Atlantic Ocean. Two major >70 m lake level oscillations centered at 6.6 and 5.2 ka may reflect enhanced convection with warmer sea surface temperatures in the Western Indian Ocean. The end of the African Humid Period occurred from ca. 8.0 to 4.5 ka and was characterized by variable lake level (± > 40 m), rather than one monotonic fall in water level. This lake level variability reflects a complex response to variations in the extent and intensity of the East and West African Monsoons near geographic and topographic limits within the catchment of Lake Turkana. Also, for this closed lake basin excess and deficits in water input are amplified with a cascading lake effect in the East Rift Valley and through the Chew Bahir Basin. The final regression from a high stand of > 90 m began at. 5.2 ka and water level was below 20 m by 4.5 ka; and for the remainder of the Holocene. This sustained low stand is associated with weakening of the West African Monsoon, a shift of the mean position of Congo Air Boundary west of the Lake Turkana catchment and with meter-scale variability in lake level linked to Walker circulation across the Indian Ocean. A surprising observation is
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

Science.gov (United States)

Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

2017-06-01

A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
[Correlation coefficient-based classification method of hydrological dependence variability: With auto-regression model as example].

Science.gov (United States)

Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi

2018-04-01

Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.
Polychotomization of continuous variables in regression models based on the overall C index

Directory of Open Access Journals (Sweden)

Bax Leon

2006-12-01

Full Text Available Abstract Background When developing multivariable regression models for diagnosis or prognosis, continuous independent variables can be categorized to make a prediction table instead of a prediction formula. Although many methods have been proposed to dichotomize prognostic variables, to date there has been no integrated method for polychotomization. The latter is necessary when dichotomization results in too much loss of information or when central values refer to normal states and more dispersed values refer to less preferable states, a situation that is not unusual in medical settings (e.g. body temperature, blood pressure. The goal of our study was to develop a theoretical and practical method for polychotomization. Methods We used the overall discrimination index C, introduced by Harrel, as a measure of the predictive ability of an independent regressor variable and derived a method for polychotomization mathematically. Since the naïve application of our method, like some existing methods, gives rise to positive bias, we developed a parametric method that minimizes this bias and assessed its performance by the use of Monte Carlo simulation. Results The overall C is closely related to the area under the ROC curve and the produced di(polychotomized variable's predictive performance is comparable to the original continuous variable. The simulation shows that the parametric method is essentially unbiased for both the estimates of performance and the cutoff points. Application of our method to the predictor variables of a previous study on rhabdomyolysis shows that it can be used to make probability profile tables that are applicable to the diagnosis or prognosis of individual patient status. Conclusion We propose a polychotomization (including dichotomization method for independent continuous variables in regression models based on the overall discrimination index C and clarified its meaning mathematically. To avoid positive bias in
Straight line fitting and predictions: On a marginal likelihood approach to linear regression and errors-in-variables models

Science.gov (United States)

Christiansen, Bo

2015-04-01

Linear regression methods are without doubt the most used approaches to describe and predict data in the physical sciences. They are often good first order approximations and they are in general easier to apply and interpret than more advanced methods. However, even the properties of univariate regression can lead to debate over the appropriateness of various models as witnessed by the recent discussion about climate reconstruction methods. Before linear regression is applied important choices have to be made regarding the origins of the noise terms and regarding which of the two variables under consideration that should be treated as the independent variable. These decisions are often not easy to make but they may have a considerable impact on the results. We seek to give a unified probabilistic - Bayesian with flat priors - treatment of univariate linear regression and prediction by taking, as starting point, the general errors-in-variables model (Christiansen, J. Clim., 27, 2014-2031, 2014). Other versions of linear regression can be obtained as limits of this model. We derive the likelihood of the model parameters and predictands of the general errors-in-variables model by marginalizing over the nuisance parameters. The resulting likelihood is relatively simple and easy to analyze and calculate. The well known unidentifiability of the errors-in-variables model is manifested as the absence of a well-defined maximum in the likelihood. However, this does not mean that probabilistic inference can not be made; the marginal likelihoods of model parameters and the predictands have, in general, well-defined maxima. We also include a probabilistic version of classical calibration and show how it is related to the errors-in-variables model. The results are illustrated by an example from the coupling between the lower stratosphere and the troposphere in the Northern Hemisphere winter.
A Poisson regression approach to model monthly hail occurrence in Northern Switzerland using large-scale environmental variables

Science.gov (United States)

Madonna, Erica; Ginsbourger, David; Martius, Olivia

2018-05-01

In Switzerland, hail regularly causes substantial damage to agriculture, cars and infrastructure, however, little is known about its long-term variability. To study the variability, the monthly number of days with hail in northern Switzerland is modeled in a regression framework using large-scale predictors derived from ERA-Interim reanalysis. The model is developed and verified using radar-based hail observations for the extended summer season (April-September) in the period 2002-2014. The seasonality of hail is explicitly modeled with a categorical predictor (month) and monthly anomalies of several large-scale predictors are used to capture the year-to-year variability. Several regression models are applied and their performance tested with respect to standard scores and cross-validation. The chosen model includes four predictors: the monthly anomaly of the two meter temperature, the monthly anomaly of the logarithm of the convective available potential energy (CAPE), the monthly anomaly of the wind shear and the month. This model well captures the intra-annual variability and slightly underestimates its inter-annual variability. The regression model is applied to the reanalysis data back in time to 1980. The resulting hail day time series shows an increase of the number of hail days per month, which is (in the model) related to an increase in temperature and CAPE. The trend corresponds to approximately 0.5 days per month per decade. The results of the regression model have been compared to two independent data sets. All data sets agree on the sign of the trend, but the trend is weaker in the other data sets.
Ordinary least square regression, orthogonal regression, geometric mean regression and their applications in aerosol science

International Nuclear Information System (INIS)

Leng Ling; Zhang Tianyi; Kleinman, Lawrence; Zhu Wei

2007-01-01

Regression analysis, especially the ordinary least squares method which assumes that errors are confined to the dependent variable, has seen a fair share of its applications in aerosol science. The ordinary least squares approach, however, could be problematic due to the fact that atmospheric data often does not lend itself to calling one variable independent and the other dependent. Errors often exist for both measurements. In this work, we examine two regression approaches available to accommodate this situation. They are orthogonal regression and geometric mean regression. Comparisons are made theoretically as well as numerically through an aerosol study examining whether the ratio of organic aerosol to CO would change with age
Two-step variable selection in quantile regression models

Directory of Open Access Journals (Sweden)

FAN Yali

2015-06-01

Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions, in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform ℓ1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.
Assessment of serum HE4 levels throughout the normal menstrual cycle.

Science.gov (United States)

Moore, Richard G; Plante, Beth; Hartnett, Erin; Mitchel, Jessica; Raker, Christine A; Vitek, Wendy; Eklund, Elizabeth; Lambert-Messerlian, Geralyn

2017-07-01

Human epididymis protein 4 is a serum biomarker to aid in differentiating benign and malignant disease in women with a pelvic mass. Interpretation of human epididymis protein 4 results relies on robust normative data. The purpose of this study was to evaluate whether human epididymis protein 4 levels are variable in women during the normal menstrual cycle. Healthy women, 18-45 years old, with regular menstrual cycles were recruited from community gynecologic practices in Rhode Island. Women consented to enroll and to participate by the donation of blood and urine samples at 5 specific times over the course of each cycle. Levels of reproductive hormones and human epididymis protein 4 were determined. Data were analyzed with the use of linear regression after log transformation. Among 74 enrolled cycles, 53 women had confirmed ovulation during the menstrual cycle and completed all 5 sample collections. Levels of estradiol, progesterone, and luteinizing hormone displayed the expected menstrual cycle patterns. Levels of human epididymis protein 4 in serum were relatively stable across the menstrual cycle, except for a small ovulatory (median, 37.0 pM) increase. Levels of human epididymis protein 4 in urine, after correction for creatinine, displayed the same pattern of secretion observed in serum. Serum human epididymis protein 4 levels are relatively stable across the menstrual cycle of reproductive-aged women and can be determined on any day to evaluate risk of ovarian malignancy. A slight increase is expected at ovulation; but even with this higher human epididymis protein 4 level, results are well within the healthy reference range for women (<120 pM). Levels of human epididymis protein 4 in urine warrant further investigation for use in clinical practice as a simple and convenient sample. Copyright © 2017 Elsevier Inc. All rights reserved.
Standardizing effect size from linear regression models with log-transformed variables for meta-analysis.

Science.gov (United States)

Rodríguez-Barranco, Miguel; Tobías, Aurelio; Redondo, Daniel; Molina-Portillo, Elena; Sánchez, María José

2017-03-17

Meta-analysis is very useful to summarize the effect of a treatment or a risk factor for a given disease. Often studies report results based on log-transformed variables in order to achieve the principal assumptions of a linear regression model. If this is the case for some, but not all studies, the effects need to be homogenized. We derived a set of formulae to transform absolute changes into relative ones, and vice versa, to allow including all results in a meta-analysis. We applied our procedure to all possible combinations of log-transformed independent or dependent variables. We also evaluated it in a simulation based on two variables either normally or asymmetrically distributed. In all the scenarios, and based on different change criteria, the effect size estimated by the derived set of formulae was equivalent to the real effect size. To avoid biased estimates of the effect, this procedure should be used with caution in the case of independent variables with asymmetric distributions that significantly differ from the normal distribution. We illustrate an application of this procedure by an application to a meta-analysis on the potential effects on neurodevelopment in children exposed to arsenic and manganese. The procedure proposed has been shown to be valid and capable of expressing the effect size of a linear regression model based on different change criteria in the variables. Homogenizing the results from different studies beforehand allows them to be combined in a meta-analysis, independently of whether the transformations had been performed on the dependent and/or independent variables.
Does the Magnitude of the Link between Unemployment and Crime Depend on the Crime Level? A Quantile Regression Approach

Directory of Open Access Journals (Sweden)

Horst Entorf

2015-07-01

Full Text Available Two alternative hypotheses – referred to as opportunity- and stigma-based behavior – suggest that the magnitude of the link between unemployment and crime also depends on preexisting local crime levels. In order to analyze conjectured nonlinearities between both variables, we use quantile regressions applied to German district panel data. While both conventional OLS and quantile regressions confirm the positive link between unemployment and crime for property crimes, results for assault differ with respect to the method of estimation. Whereas conventional mean regressions do not show any significant effect (which would confirm the usual result found for violent crimes in the literature, quantile regression reveals that size and importance of the relationship are conditional on the crime rate. The partial effect is significantly positive for moderately low and median quantiles of local assault rates.
Understanding logistic regression analysis

OpenAIRE

Sperandei, Sandro

2014-01-01

Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...
Independent contrasts and PGLS regression estimators are equivalent.

Science.gov (United States)

Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary

2012-05-01

We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.
PATH ANALYSIS WITH LOGISTIC REGRESSION MODELS : EFFECT ANALYSIS OF FULLY RECURSIVE CAUSAL SYSTEMS OF CATEGORICAL VARIABLES

OpenAIRE

Nobuoki, Eshima; Minoru, Tabata; Geng, Zhi; Department of Medical Information Analysis, Faculty of Medicine, Oita Medical University; Department of Applied Mathematics, Faculty of Engineering, Kobe University; Department of Probability and Statistics, Peking University

2001-01-01

This paper discusses path analysis of categorical variables with logistic regression models. The total, direct and indirect effects in fully recursive causal systems are considered by using model parameters. These effects can be explained in terms of log odds ratios, uncertainty differences, and an inner product of explanatory variables and a response variable. A study on food choice of alligators as a numerical exampleis reanalysed to illustrate the present approach.
Partitioning the variability of fasting plasma glucose levels in pedigrees. Genetic and environmental factors.

Science.gov (United States)

Boehnke, M; Moll, P P; Kottke, B A; Weidman, W H

1987-04-01

Fasting plasma glucose measurements made in 1972-1977 on normoglycemic individuals in three-generation Caucasian pedigrees from Rochester, Minnesota were analyzed. The authors determined the contributions of polygenic loci and environmental factors to fasting plasma glucose variability in these pedigrees. To that end, fasting plasma glucose measurements were normalized by an inverse normal scores transformation and then regressed separately for males and females on measured concomitants including age, body mass index (weight/height2), season of measurement, sex hormone use, and diuretic use. The authors found that 27.7% of the variability in normalized fasting plasma glucose in these pedigrees is explained by these measured concomitants. Subsequent variance components analysis suggested that unmeasured polygenic loci and unmeasured shared environmental factors together account for at least an additional 36.7% of the variability in normalized fasting plasma glucose, with genes alone accounting for at least 27.3%. These results are consistent with the known familiality of diabetes, for which fasting plasma glucose level is an important predictor. Further, these familial factors provide an explanation for at least half the variability in normalized fasting plasma glucose which remains after regression on known concomitants.
Reconstruction of Local Sea Levels at South West Pacific Islands—A Multiple Linear Regression Approach (1988-2014)

Science.gov (United States)

Kumar, V.; Melet, A.; Meyssignac, B.; Ganachaud, A.; Kessler, W. S.; Singh, A.; Aucan, J.

2018-02-01

Rising sea levels are a critical concern in small island nations. The problem is especially serious in the western south Pacific, where the total sea level rise over the last 60 years has been up to 3 times the global average. In this study, we aim at reconstructing sea levels at selected sites in the region (Suva, Lautoka—Fiji, and Nouméa—New Caledonia) as a multilinear regression (MLR) of atmospheric and oceanic variables. We focus on sea level variability at interannual-to-interdecadal time scales, and trend over the 1988-2014 period. Local sea levels are first expressed as a sum of steric and mass changes. Then a dynamical approach is used based on wind stress curl as a proxy for the thermosteric component, as wind stress curl anomalies can modulate the thermocline depth and resultant sea levels via Rossby wave propagation. Statistically significant predictors among wind stress curl, halosteric sea level, zonal/meridional wind stress components, and sea surface temperature are used to construct a MLR model simulating local sea levels. Although we are focusing on the local scale, the global mean sea level needs to be adjusted for. Our reconstructions provide insights on key drivers of sea level variability at the selected sites, showing that while local dynamics and the global signal modulate sea level to a given extent, most of the variance is driven by regional factors. On average, the MLR model is able to reproduce 82% of the variance in island sea level, and could be used to derive local sea level projections via downscaling of climate models.
Quantile Regression Methods

DEFF Research Database (Denmark)

Fitzenberger, Bernd; Wilke, Ralf Andreas

2015-01-01

if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

Modeling Source Water TOC Using Hydroclimate Variables and Local Polynomial Regression.

Science.gov (United States)

Samson, Carleigh C; Rajagopalan, Balaji; Summers, R Scott

2016-04-19

To control disinfection byproduct (DBP) formation in drinking water, an understanding of the source water total organic carbon (TOC) concentration variability can be critical. Previously, TOC concentrations in water treatment plant source waters have been modeled using streamflow data. However, the lack of streamflow data or unimpaired flow scenarios makes it difficult to model TOC. In addition, TOC variability under climate change further exacerbates the problem. Here we proposed a modeling approach based on local polynomial regression that uses climate, e.g. temperature, and land surface, e.g., soil moisture, variables as predictors of TOC concentration, obviating the need for streamflow. The local polynomial approach has the ability to capture non-Gaussian and nonlinear features that might be present in the relationships. The utility of the methodology is demonstrated using source water quality and climate data in three case study locations with surface source waters including river and reservoir sources. The models show good predictive skill in general at these locations, with lower skills at locations with the most anthropogenic influences in their streams. Source water TOC predictive models can provide water treatment utilities important information for making treatment decisions for DBP regulation compliance under future climate scenarios.
Multiple linear regression analysis

Science.gov (United States)

Edwards, T. R.

1980-01-01

Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
A Matlab program for stepwise regression

Directory of Open Access Journals (Sweden)

Yanhong Qi

2016-03-01

Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.
Logic regression and its extensions.

Science.gov (United States)

Schwender, Holger; Ruczinski, Ingo

2010-01-01

Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.
Forecasting monthly groundwater level fluctuations in coastal aquifers using hybrid Wavelet packet–Support vector regression

Directory of Open Access Journals (Sweden)

N. Sujay Raghavendra

2015-12-01

Full Text Available This research demonstrates the state-of-the-art capability of Wavelet packet analysis in improving the forecasting efficiency of Support vector regression (SVR through the development of a novel hybrid Wavelet packet–Support vector regression (WP–SVR model for forecasting monthly groundwater level fluctuations observed in three shallow unconfined coastal aquifers. The Sequential Minimal Optimization Algorithm-based SVR model is also employed for comparative study with WP–SVR model. The input variables used for modeling were monthly time series of total rainfall, average temperature, mean tide level, and past groundwater level observations recorded during the period 1996–2006 at three observation wells located near Mangalore, India. The Radial Basis function is employed as a kernel function during SVR modeling. Model parameters are calibrated using the first seven years of data, and the remaining three years data are used for model validation using various input combinations. The performance of both the SVR and WP–SVR models is assessed using different statistical indices. From the comparative result analysis of the developed models, it can be seen that WP–SVR model outperforms the classic SVR model in predicting groundwater levels at all the three well locations (e.g. NRMSE(WP–SVR = 7.14, NRMSE(SVR = 12.27; NSE(WP–SVR = 0.91, NSE(SVR = 0.8 during the test phase with respect to well location at Surathkal. Therefore, using the WP–SVR model is highly acceptable for modeling and forecasting of groundwater level fluctuations.
Understanding logistic regression analysis.

Science.gov (United States)

Sperandei, Sandro

2014-01-01

Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Predicting hyperketonemia by logistic and linear regression using test-day milk and performance variables in early-lactation Holstein and Jersey cows.

Science.gov (United States)

Chandler, T L; Pralle, R S; Dórea, J R R; Poock, S E; Oetzel, G R; Fourdraine, R H; White, H M

2018-03-01

Although cowside testing strategies for diagnosing hyperketonemia (HYK) are available, many are labor intensive and costly, and some lack sufficient accuracy. Predicting milk ketone bodies by Fourier transform infrared spectrometry during routine milk sampling may offer a more practical monitoring strategy. The objectives of this study were to (1) develop linear and logistic regression models using all available test-day milk and performance variables for predicting HYK and (2) compare prediction methods (Fourier transform infrared milk ketone bodies, linear regression models, and logistic regression models) to determine which is the most predictive of HYK. Given the data available, a secondary objective was to evaluate differences in test-day milk and performance variables (continuous measurements) between Holsteins and Jerseys and between cows with or without HYK within breed. Blood samples were collected on the same day as milk sampling from 658 Holstein and 468 Jersey cows between 5 and 20 d in milk (DIM). Diagnosis of HYK was at a serum β-hydroxybutyrate (BHB) concentration ≥1.2 mmol/L. Concentrations of milk BHB and acetone were predicted by Fourier transform infrared spectrometry (Foss Analytical, Hillerød, Denmark). Thresholds of milk BHB and acetone were tested for diagnostic accuracy, and logistic models were built from continuous variables to predict HYK in primiparous and multiparous cows within breed. Linear models were constructed from continuous variables for primiparous and multiparous cows within breed that were 5 to 11 DIM or 12 to 20 DIM. Milk ketone body thresholds diagnosed HYK with 64.0 to 92.9% accuracy in Holsteins and 59.1 to 86.6% accuracy in Jerseys. Logistic models predicted HYK with 82.6 to 97.3% accuracy. Internally cross-validated multiple linear regression models diagnosed HYK of Holstein cows with 97.8% accuracy for primiparous and 83.3% accuracy for multiparous cows. Accuracy of Jersey models was 81.3% in primiparous and 83.4
Improved Dietary Guidelines for Vitamin D: Application of Individual Participant Data (IPD)-Level Meta-Regression Analyses

Science.gov (United States)

Cashman, Kevin D.; Ritz, Christian; Kiely, Mairead

2017-01-01

Dietary Reference Values (DRVs) for vitamin D have a key role in the prevention of vitamin D deficiency. However, despite adopting similar risk assessment protocols, estimates from authoritative agencies over the last 6 years have been diverse. This may have arisen from diverse approaches to data analysis. Modelling strategies for pooling of individual subject data from cognate vitamin D randomized controlled trials (RCTs) are likely to provide the most appropriate DRV estimates. Thus, the objective of the present work was to undertake the first-ever individual participant data (IPD)-level meta-regression, which is increasingly recognized as best practice, from seven winter-based RCTs (with 882 participants ranging in age from 4 to 90 years) of the vitamin D intake–serum 25-hydroxyvitamin D (25(OH)D) dose-response. Our IPD-derived estimates of vitamin D intakes required to maintain 97.5% of 25(OH)D concentrations >25, 30, and 50 nmol/L across the population are 10, 13, and 26 µg/day, respectively. In contrast, standard meta-regression analyses with aggregate data (as used by several agencies in recent years) from the same RCTs estimated that a vitamin D intake requirement of 14 µg/day would maintain 97.5% of 25(OH)D >50 nmol/L. These first IPD-derived estimates offer improved dietary recommendations for vitamin D because the underpinning modeling captures the between-person variability in response of serum 25(OH)D to vitamin D intake. PMID:28481259
Improved Dietary Guidelines for Vitamin D: Application of Individual Participant Data (IPD-Level Meta-Regression Analyses

Directory of Open Access Journals (Sweden)

Kevin D. Cashman

2017-05-01

Full Text Available Dietary Reference Values (DRVs for vitamin D have a key role in the prevention of vitamin D deficiency. However, despite adopting similar risk assessment protocols, estimates from authoritative agencies over the last 6 years have been diverse. This may have arisen from diverse approaches to data analysis. Modelling strategies for pooling of individual subject data from cognate vitamin D randomized controlled trials (RCTs are likely to provide the most appropriate DRV estimates. Thus, the objective of the present work was to undertake the first-ever individual participant data (IPD-level meta-regression, which is increasingly recognized as best practice, from seven winter-based RCTs (with 882 participants ranging in age from 4 to 90 years of the vitamin D intake–serum 25-hydroxyvitamin D (25(OHD dose-response. Our IPD-derived estimates of vitamin D intakes required to maintain 97.5% of 25(OHD concentrations >25, 30, and 50 nmol/L across the population are 10, 13, and 26 µg/day, respectively. In contrast, standard meta-regression analyses with aggregate data (as used by several agencies in recent years from the same RCTs estimated that a vitamin D intake requirement of 14 µg/day would maintain 97.5% of 25(OHD >50 nmol/L. These first IPD-derived estimates offer improved dietary recommendations for vitamin D because the underpinning modeling captures the between-person variability in response of serum 25(OHD to vitamin D intake.
Dynamic and Regression Modeling of Ocean Variability in the Tide-Gauge Record at Seasonal and Longer Periods

Science.gov (United States)

Hill, Emma M.; Ponte, Rui M.; Davis, James L.

2007-01-01

Comparison of monthly mean tide-gauge time series to corresponding model time series based on a static inverted barometer (IB) for pressure-driven fluctuations and a ocean general circulation model (OM) reveals that the combined model successfully reproduces seasonal and interannual changes in relative sea level at many stations. Removal of the OM and IB from the tide-gauge record produces residual time series with a mean global variance reduction of 53%. The OM is mis-scaled for certain regions, and 68% of the residual time series contain a significant seasonal variability after removal of the OM and IB from the tide-gauge data. Including OM admittance parameters and seasonal coefficients in a regression model for each station, with IB also removed, produces residual time series with mean global variance reduction of 71%. Examination of the regional improvement in variance caused by scaling the OM, including seasonal terms, or both, indicates weakness in the model at predicting sea-level variation for constricted ocean regions. The model is particularly effective at reproducing sea-level variation for stations in North America, Europe, and Japan. The RMS residual for many stations in these areas is 25-35 mm. The production of "cleaner" tide-gauge time series, with oceanographic variability removed, is important for future analysis of nonsecular and regionally differing sea-level variations. Understanding the ocean model's strengths and weaknesses will allow for future improvements of the model.
Solving the Omitted Variables Problem of Regression Analysis Using the Relative Vertical Position of Observations

Directory of Open Access Journals (Sweden)

Jonathan E. Leightner

2012-01-01

Full Text Available The omitted variables problem is one of regression analysis’ most serious problems. The standard approach to the omitted variables problem is to find instruments, or proxies, for the omitted variables, but this approach makes strong assumptions that are rarely met in practice. This paper introduces best projection reiterative truncated projected least squares (BP-RTPLS, the third generation of a technique that solves the omitted variables problem without using proxies or instruments. This paper presents a theoretical argument that BP-RTPLS produces unbiased reduced form estimates when there are omitted variables. This paper also provides simulation evidence that shows OLS produces between 250% and 2450% more errors than BP-RTPLS when there are omitted variables and when measurement and round-off error is 1 percent or less. In an example, the government spending multiplier, , is estimated using annual data for the USA between 1929 and 2010.
Linear regression in astronomy. II

Science.gov (United States)

Feigelson, Eric D.; Babu, Gutti J.

1992-01-01

A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Mechanisms of long-term mean sea level variability in the North Sea

Science.gov (United States)

Dangendorf, Sönke; Calafat, Francisco; Øie Nilsen, Jan Even; Richter, Kristin; Jensen, Jürgen

2015-04-01

We examine mean sea level (MSL) variations in the North Sea on timescales ranging from months to decades under the consideration of different forcing factors since the late 19th century. We use multiple linear regression models, which are validated for the second half of the 20th century against the output of a state-of-the-art tide+surge model (HAMSOM), to determine the barotropic response of the ocean to fluctuations in atmospheric forcing. We demonstrate that local atmospheric forcing mainly triggers MSL variability on timescales up to a few years, with the inverted barometric effect dominating the variability along the UK and Norwegian coastlines and wind (piling up the water along the coast) controlling the MSL variability in the south from Belgium up to Denmark. However, in addition to the large inter-annual sea level variability there is also a considerable fraction of decadal scale variability. We show that on decadal timescales MSL variability in the North Sea mainly reflects steric changes, which are mostly remotely forced. A spatial correlation analysis of altimetry observations and baroclinic ocean model outputs suggests evidence for a coherent signal extending from the Norwegian shelf down to the Canary Islands. This supports the theory of longshore wind forcing along the eastern boundary of the North Atlantic causing coastally trapped waves to propagate along the continental slope. With a combination of oceanographic and meteorological measurements we demonstrate that ~80% of the decadal sea level variability in the North Sea can be explained as response of the ocean to longshore wind forcing, including boundary wave propagation in the Northeast Atlantic. These findings have important implications for (i) detecting significant accelerations in North Sea MSL, (ii) the conceptual set up of regional ocean models in terms of resolution and boundary conditions, and (iii) the development of adequate and realistic regional climate change projections.
Régression orthogonale de trois variables liées Orthogonal Regression of Linked Variables

Directory of Open Access Journals (Sweden)

Phelizon J. -F.

2006-11-01

Full Text Available On propose dans cet article un algorithme permettant de déterminer les paramètres de l'équation de régression orthogonale de trois variables liées par une relation linéaire. Cet algorithme est remarquablement simple puisqu'il n'implique pas de devoir calculer les valeurs propres de la matrice des covariances. D'autre part, on montre que l'équation obtenue (celle d'une droite dans l'espace à trois dimensions caractérise aussi une droite dans un diagramme triangulaire, ce qui rend l'interprétation des résultats immédiate. L'exposé théorique se poursuit par deux exemples qui ont été effectivement testés sur ordinateur. This article proposes on algorithm for determining the parameters of the equation for the orthogonal regression of three variables linked by a linear relation. This algorithm is remarkably simple in that il does not require the actual values of the covariance matrix to be calculated. In addition, the equation obtained (for a straight line in three-dimensional space is shown to characterize a straight line in a triang ular diagram as well, thus making il immediately possible ta interpret the resulis. The theoretical explanation continues with two examples that were actually tried out on a computer.
Assessing risk factors for periodontitis using regression

Science.gov (United States)

Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa

2013-10-01

Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Multiple Regression and Mediator Variables can be used to Avoid Double Counting when Economic Values are Derived using Stochastic Herd Simulation

DEFF Research Database (Denmark)

Østergaard, Søren; Ettema, Jehan Frans; Hjortø, Line

Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent v...... in multiparous cows. The merit of using this approach was demonstrated since the economic value of metritis was estimated to be 81% higher when no mediator variables were included in the multiple regression analysis......Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent...... variable, while using the traits representing the direct effects of metritis on yield, fertility and occurrence of other diseases as mediator variables. The economic value of metritis was estimated to be €78 per 100 cow-years for each 1% increase of metritis in the period of 1-100 days in milk...
Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

Science.gov (United States)

Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.

2013-06-01

This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.
An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis

Directory of Open Access Journals (Sweden)

Wen-Tsao Pan

2016-01-01

Full Text Available Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quantile of satisfaction level. The result of the quantile regression analysis provided a bank manager with information to formulate policies to further promote satisfaction of the customers at different quantiles of satisfaction level. We also compared the prediction accuracies of the regression models at different quantiles. The experiment result showed that, among the seven quantile regression models, the median regression model has the best performance in terms of RMSE, RTIC, and CE performance measures.
Susceptibility assessment of earthquake-triggered landslides in El Salvador using logistic regression

Science.gov (United States)

García-Rodríguez, M. J.; Malpica, J. A.; Benito, B.; Díaz, M.

2008-03-01

This work has evaluated the probability of earthquake-triggered landslide occurrence in the whole of El Salvador, with a Geographic Information System (GIS) and a logistic regression model. Slope gradient, elevation, aspect, mean annual precipitation, lithology, land use, and terrain roughness are the predictor variables used to determine the dependent variable of occurrence or non-occurrence of landslides within an individual grid cell. The results illustrate the importance of terrain roughness and soil type as key factors within the model — using only these two variables the analysis returned a significance level of 89.4%. The results obtained from the model within the GIS were then used to produce a map of relative landslide susceptibility.
Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

Directory of Open Access Journals (Sweden)

Hardt Jochen

2012-12-01

Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.

Boosted beta regression.

Directory of Open Access Journals (Sweden)

Matthias Schmid

Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.

Science.gov (United States)

van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B

2016-11-24

Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
CD4 cell levels during treatment for tuberculosis (TB in Ethiopian adults and clinical markers associated with CD4 lymphocytopenia.

Directory of Open Access Journals (Sweden)

Sten Skogmar

Full Text Available BACKGROUND: The clinical correlations and significance of subnormal CD4 levels in HIV-negative patients with TB are unclear. We have determined CD4 cell levels longitudinally during anti-tuberculosis treatment (ATT in patients, with and without HIV co-infection, and their associations with clinical variables. METHOD: Adults diagnosed with TB (maximum duration of ATT for 2 weeks, and with no history of antiretroviral therapy (ART in HIV-positive subjects were included consecutively in eight out-patient clinics in Ethiopia. Healthy individuals were recruited for comparison at one of the study health centers. Data on patient characteristics and physical findings were collected by trained nurses following a structured questionnaire at inclusion and on follow-up visits at 2 and 6 months. In parallel, peripheral blood CD4 cell levels were determined. The evolution of CD4 cell levels during ATT was assessed, and the association between clinical characteristics and low CD4 cell levels at baseline was investigated using regression analysis. RESULTS: In total, 1116 TB patients were included (307 HIV-infected. Among 809 HIV-negative patients, 200 (25% had subnormal CD4 cell counts (<500 cells/mm(3, with <350 cells/mm(3 in 82 (10% individuals. CD4 cell levels increased significantly during the course of ATT in both HIV+ and HIV- TB-patients, but did not reach the levels in healthy subjects (median 896 cells/mm(3. Sputum smear status, signs of wasting (low mid upper arm circumference (MUAC, and bedridden state were significantly associated with low CD4 cell counts. CONCLUSION: A high proportion of Ethiopian TB patients have subnormal CD4 cell counts before starting treatment. Low CD4 cell levels are associated with smear positive disease and signs of wasting. The continuous increase of CD4 cell counts during the course of ATT suggest a reversible impact of active TB on CD4 cell homeostasis, which may be considered in interpretation of CD4 cell counts in HIV
The Analysis of Nonstationary Time Series Using Regression, Correlation and Cointegration with an Application to Annual Mean Temperature and Sea Level

DEFF Research Database (Denmark)

Johansen, Søren

There are simple well-known conditions for the validity of regression and correlation as statistical tools. We analyse by examples the effect of nonstationarity on inference using these methods and compare them to model based inference. Finally we analyse some data on annual mean temperature...... and sea level, by applying the cointegrated vector autoregressive model, which explicitly takes into account the nonstationarity of the variables....
Effect of Genetic Variability in the CYP4F2, CYP4F11, and CYP4F12 Genes on Liver mRNA Levels and Warfarin Response

Directory of Open Access Journals (Sweden)

J. E. Zhang

2017-05-01

Full Text Available Genetic polymorphisms in the gene encoding cytochrome P450 (CYP 4F2, a vitamin K oxidase, affect stable warfarin dose requirements and time to therapeutic INR. CYP4F2 is part of the CYP4F gene cluster, which is highly polymorphic and exhibits a high degree of linkage disequilibrium, making it difficult to define causal variants. Our objective was to examine the effect of genetic variability in the CYP4F gene cluster on expression of the individual CYP4F genes and warfarin response. mRNA levels of the CYP4F gene cluster were quantified in human liver samples (n = 149 obtained from a well-characterized liver bank and fine mapping of the CYP4F gene cluster encompassing CYP4F2, CYP4F11, and CYP4F12 was performed. Genome-wide association study (GWAS data from a prospective cohort of warfarin-treated patients (n = 711 was also analyzed for genetic variations across the CYP4F gene cluster. In addition, SNP-gene expression in human liver tissues and interactions between CYP4F genes were explored in silico using publicly available data repositories. We found that SNPs in CYP4F2, CYP4F11, and CYP4F12 were associated with mRNA expression in the CYP4F gene cluster. In particular, CYP4F2 rs2108622 was associated with increased CYP4F2 expression while CYP4F11 rs1060467 was associated with decreased CYP4F2 expression. Interestingly, these CYP4F2 and CYP4F11 SNPs showed similar effects with warfarin stable dose where CYP4F11 rs1060467 was associated with a reduction in daily warfarin dose requirement (∼1 mg/day, Pc = 0.017, an effect opposite to that previously reported with CYP4F2 (rs2108622. However, inclusion of either or both of these SNPs in a pharmacogenetic algorithm consisting of age, body mass index (BMI, gender, baseline clotting factor II level, CYP2C9∗2 rs1799853, CYP2C9∗3 rs1057910, and VKORC1 rs9923231 improved warfarin dose variability only by 0.5–0.7% with an improvement in dose prediction accuracy of ∼1–2%. Although there is complex
Sea level rise and variability around Peninsular Malaysia

Science.gov (United States)

Tkalich, Pavel; Luu, Quang-Hung; Tay, Tze-Wei

2014-05-01

Peninsular Malaysia is bounded from the west by Malacca Strait and the Andaman Sea, both connected to the Indian Ocean, and from the east by South China Sea being largest marginal sea in the Pacific Basin. As a result, sea level along Peninsular Malaysia coast is assumed to be governed by various regional phenomena associated with the adjacent parts of the Indian and Pacific Oceans. At annual scale, sea level anomalies (SLAs) are generated by the Asian monsoon; interannual sea level variability is determined by the El Niño-Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD); whilst long term sea level trend is coordinated by the global climate change. To quantify the relative impacts of these multi-scale phenomena on sea level trend and variability surrounding the Peninsular Malaysia, long-term tide gauge record and satellite altimetry are used. During 1984-2011, relative sea level rise (SLR) rates in waters of Malacca Strait and eastern Peninsular Malaysia are found to be 2.4 ± 0.8 mm/yr and 2.7 ± 0.6 mm/yr, respectively. Discounting for their vertical land movements (0.8 ± 2.6 mm/yr and 0.9 ± 2.2 mm/yr, respectively), their pure SLR rates are 1.6 ± 3.4 mm/yr and 1.8 ± 2.8 mm/yr, respectively, which are lower than the global tendency. At interannual scale, ENSO affects sea level over the Malaysian east coast in the range of ± 5 cm with very high correlation coefficient. Meanwhile, IOD modulates sea level anomalies in the Malacca Strait in the range of ± 2 cm with high correlation coefficient. Interannual regional sea level drops are associated with El Niño events and positive phases of the IOD index; while the rises are correlated with La Niña episodes and the negative periods of the IOD index. Seasonally, SLAs are mainly monsoon-driven, in the order of 10-25 cm. Geographically, sea level responds differently to the monsoon: two cycles per year are observed in the Malacca Strait, presumably due to South Asian - Indian Monsoon; while single
Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis

Science.gov (United States)

Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried

2018-03-01

This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease ( P 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.
To resuscitate or not to resuscitate: a logistic regression analysis of physician-related variables influencing the decision.

Science.gov (United States)

Einav, Sharon; Alon, Gady; Kaufman, Nechama; Braunstein, Rony; Carmel, Sara; Varon, Joseph; Hersch, Moshe

2012-09-01

To determine whether variables in physicians' backgrounds influenced their decision to forego resuscitating a patient they did not previously know. Questionnaire survey of a convenience sample of 204 physicians working in the departments of internal medicine, anaesthesiology and cardiology in 11 hospitals in Israel. Twenty per cent of the participants had elected to forego resuscitating a patient they did not previously know without additional consultation. Physicians who had more frequently elected to forego resuscitation had practised medicine for more than 5 years (p=0.013), estimated the number of resuscitations they had performed as being higher (p=0.009), and perceived their experience in resuscitation as sufficient (p=0.001). The variable that predicted the outcome of always performing resuscitation in the logistic regression model was less than 5 years of experience in medicine (OR 0.227, 95% CI 0.065 to 0.793; p=0.02). Physicians' level of experience may affect the probability of a patient's receiving resuscitation, whereas the physicians' personal beliefs and values did not seem to affect this outcome.
Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

Science.gov (United States)

Granato, Gregory E.

2006-01-01

The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and
Simple and multiple linear regression: sample size considerations.

Science.gov (United States)

Hanley, James A

2016-11-01

The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright Â© 2016 Elsevier Inc. All rights reserved.
Two levels ARIMAX and regression models for forecasting time series data with calendar variation effects

Science.gov (United States)

Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi

2015-12-01

The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.
Risk assessment of groundwater level variability using variable Kriging methods

Science.gov (United States)

Spanoudaki, Katerina; Kampanis, Nikolaos A.

2015-04-01

Assessment of the water table level spatial variability in aquifers provides useful information regarding optimal groundwater management. This information becomes more important in basins where the water table level has fallen significantly. The spatial variability of the water table level in this work is estimated based on hydraulic head measured during the wet period of the hydrological year 2007-2008, in a sparsely monitored basin in Crete, Greece, which is of high socioeconomic and agricultural interest. Three Kriging-based methodologies are elaborated in Matlab environment to estimate the spatial variability of the water table level in the basin. The first methodology is based on the Ordinary Kriging approach, the second involves auxiliary information from a Digital Elevation Model in terms of Residual Kriging and the third methodology calculates the probability of the groundwater level to fall below a predefined minimum value that could cause significant problems in groundwater resources availability, by means of Indicator Kriging. The Box-Cox methodology is applied to normalize both the data and the residuals for improved prediction results. In addition, various classical variogram models are applied to determine the spatial dependence of the measurements. The Matérn model proves to be the optimal, which in combination with Kriging methodologies provides the most accurate cross validation estimations. Groundwater level and probability maps are constructed to examine the spatial variability of the groundwater level in the basin and the associated risk that certain locations exhibit regarding a predefined minimum value that has been set for the sustainability of the basin's groundwater resources. Acknowledgement The work presented in this paper has been funded by the Greek State Scholarships Foundation (IKY), Fellowships of Excellence for Postdoctoral Studies (Siemens Program), 'A simulation-optimization model for assessing the best practices for the
REGSTEP - stepwise multivariate polynomial regression with singular extensions

International Nuclear Information System (INIS)

Davierwalla, D.M.

1977-09-01

The program REGSTEP determines a polynomial approximation, in the least squares sense, to tabulated data. The polynomial may be univariate or multivariate. The computational method is that of stepwise regression. A variable is inserted into the regression basis if it is significant with respect to an appropriate F-test at a preselected risk level. In addition, should a variable already in the basis, become nonsignificant (again with respect to an appropriate F-test) after the entry of a new variable, it is expelled from the model. Thus only significant variables are retained in the model. Although written expressly to be incorporated into CORCOD, a code for predicting nuclear cross sections for given values of power, temperature, void fractions, Boron content etc. there is nothing to limit the use of REGSTEP to nuclear applications, as the examples demonstrate. A separate version has been incorporated into RSYST for the general user. (Auth.)
Bias correction by use of errors-in-variables regression models in studies with K-X-ray fluorescence bone lead measurements.

Science.gov (United States)

Lamadrid-Figueroa, Héctor; Téllez-Rojo, Martha M; Angeles, Gustavo; Hernández-Ávila, Mauricio; Hu, Howard

2011-01-01

In-vivo measurement of bone lead by means of K-X-ray fluorescence (KXRF) is the preferred biological marker of chronic exposure to lead. Unfortunately, considerable measurement error associated with KXRF estimations can introduce bias in estimates of the effect of bone lead when this variable is included as the exposure in a regression model. Estimates of uncertainty reported by the KXRF instrument reflect the variance of the measurement error and, although they can be used to correct the measurement error bias, they are seldom used in epidemiological statistical analyzes. Errors-in-variables regression (EIV) allows for correction of bias caused by measurement error in predictor variables, based on the knowledge of the reliability of such variables. The authors propose a way to obtain reliability coefficients for bone lead measurements from uncertainty data reported by the KXRF instrument and compare, by the use of Monte Carlo simulations, results obtained using EIV regression models vs. those obtained by the standard procedures. Results of the simulations show that Ordinary Least Square (OLS) regression models provide severely biased estimates of effect, and that EIV provides nearly unbiased estimates. Although EIV effect estimates are more imprecise, their mean squared error is much smaller than that of OLS estimates. In conclusion, EIV is a better alternative than OLS to estimate the effect of bone lead when measured by KXRF. Copyright Â© 2010 Elsevier Inc. All rights reserved.
CYP1A2 and NAT2 phenotyping and 3-aminobiphenyl and 4-aminobiphenyl hemoglobin adduct levels in smokers and non-smokers

International Nuclear Information System (INIS)

Sarkar, Mohamadi; Stabbert, Regina; Kinser, Robin D.; Oey, Jan; Rustemeier, Klaus; Holt, Klaus von; Schepers, Georg; Walk, Roger A.; Roethig, Hans J.

2006-01-01

Some aromatic amines are considered to be putative bladder carcinogens. Hemoglobin (Hb) adducts of 3-aminobiphenyl (3-ABP) and 4-aminobiphenyl (4-ABP) have been used as biomarkers of exposure to aromatic amines from cigarette smoke. One of the goals of this study was to determine intra- and inter-individual variability in 3-ABP and 4-ABP Hb adducts and to explore the predictability of ABP Hb adduct levels based on caffeine phenotyping. The study was conducted in adult smokers (S, n = 65) and non-smokers (NS, n 65). The subjects were phenotyped for CYP1A2 and NAT2 using urinary caffeine metabolites. Blood samples were collected twice within 6 weeks and adducts measured by GC/MS. The levels of 4-ABP Hb adducts were significantly (p < 0.0001) greater in S (34.5 ± 21.06 pg/g Hb) compared to NS (6.3 ± 3.02 pg/g Hb). The levels of 3-ABP Hb adducts were below the limit of quantification (BLOQ) in most (82%) of the NS and about 10-fold lower in S (3.6 ± 3.29 pg/g Hb) compared to 4-ABP Hb adducts. No differences were observed in the adduct levels between weeks 1 and 6 in the smokers, suggesting that a single sample would be adequate to monitor cigarette smoke exposure. The regression model developed with CYP1A2, NAT2 phenotype and number of cigarettes smoked (NCIG) accounted for 47% of the variability in 3-ABP adducts, whereas 32% variability in 4-ABP adducts was accounted by CYP1A2 and NCIG. The ratio of 4-ABP Hb adducts in adult S:NS was ∼ 5:1, whereas 3-ABP Hb adducts levels were BLOQ in some S, exhibited large interindividual variability (∼ 91% compared to 57% for 4-ABP Hb) and poor dose response relationship. Therefore, 4-ABP Hb adduct levels may be a more useful biomarker of aminobiphenyl exposure from cigarette smoke
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis

Directory of Open Access Journals (Sweden)

Maarten van Smeden

2016-11-01

Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
[From clinical judgment to linear regression model.

Science.gov (United States)

Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

2013-01-01

When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis.

Science.gov (United States)

Oguntunde, Philip G; Lischeid, Gunnar; Dietrich, Ottfried

2018-03-01

This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease (P 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.
Multilevel covariance regression with correlated random effects in the mean and variance structure.

Science.gov (United States)

Quintero, Adrian; Lesaffre, Emmanuel

2017-09-01

Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Effect of Latent Binary Variables on the Uncertainty of the Prediction of a Dichotomous Outcome Using Logistic Regression Based Propensity Score Matching.

Science.gov (United States)

Szekér, Szabolcs; Vathy-Fogarassy, Ágnes

2018-01-01

Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.

Variability and change of sea level and its components in the Indo-Pacific region during the altimetry era

Science.gov (United States)

Wu, Quran; Zhang, Xuebin; Church, John A.; Hu, Jianyu

2017-03-01

Previous studies have shown that regional sea level exhibits interannual and decadal variations associated with the modes of climate variability. A better understanding of those low-frequency sea level variations benefits the detection and attribution of climate change signals. Nonetheless, the contributions of thermosteric, halosteric, and mass sea level components to sea level variability and trend patterns remain unclear. By focusing on signals associated with dominant climate modes in the Indo-Pacific region, we estimate the interannual and decadal fingerprints and trend of each sea level component utilizing a multivariate linear regression of two adjoint-based ocean reanalyses. Sea level interannual, decadal, and trend patterns primarily come from thermosteric sea level (TSSL). Halosteric sea level (HSSL) is of regional importance in the Pacific Ocean on decadal time scale and dominates sea level trends in the northeast subtropical Pacific. The compensation between TSSL and HSSL is identified in their decadal variability and trends. The interannual and decadal variability of temperature generally peak at subsurface around 100 m but that of salinity tend to be surface-intensified. Decadal temperature and salinity signals extend deeper into the ocean in some regions than their interannual equivalents. Mass sea level (MassSL) is critical for the interannual and decadal variability of sea level over shelf seas. Inconsistencies exist in MassSL trend patterns among various estimates. This study highlights regions where multiple processes work together to control sea level variability and change. Further work is required to better understand the interaction of different processes in those regions.
Joint Bayesian variable and graph selection for regression models with network-structured predictors

Science.gov (United States)

Peterson, C. B.; Stingo, F. C.; Vannucci, M.

2015-01-01

In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications since it allows the identification of pathways of functionally related genes or proteins which impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings, and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival. PMID:26514925
Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

Science.gov (United States)

Ng, Kar Yong; Awang, Norhashidah

2018-01-06

Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.
The analysis of nonstationary time series using regression, correlation and cointegration – with an application to annual mean temperature and sea level

DEFF Research Database (Denmark)

Johansen, Søren

There are simple well-known conditions for the validity of regression and correlation as statistical tools. We analyse by examples the effect of nonstationarity on inference using these methods and compare them to model based inference. Finally we analyse some data on annual mean temperature...... and sea level, by applying the cointegrated vector autoregressive model, which explicitly takes into account the nonstationarity of the variables....
Sea-Level Trend Uncertainty With Pacific Climatic Variability and Temporally-Correlated Noise

Science.gov (United States)

Royston, Sam; Watson, Christopher S.; Legrésy, Benoît; King, Matt A.; Church, John A.; Bos, Machiel S.

2018-03-01

Recent studies have identified climatic drivers of the east-west see-saw of Pacific Ocean satellite altimetry era sea level trends and a number of sea-level trend and acceleration assessments attempt to account for this. We investigate the effect of Pacific climate variability, together with temporally-correlated noise, on linear trend error estimates and determine new time-of-emergence (ToE) estimates across the Indian and Pacific Oceans. Sea-level trend studies often advocate the use of auto-regressive (AR) noise models to adequately assess formal uncertainties, yet sea level often exhibits colored but non-AR(1) noise. Standard error estimates are over- or under-estimated by an AR(1) model for much of the Indo-Pacific sea level. Allowing for PDO and ENSO variability in the trend estimate only reduces standard errors across the tropics and we find noise characteristics are largely unaffected. Of importance for trend and acceleration detection studies, formal error estimates remain on average up to 1.6 times those from an AR(1) model for long-duration tide gauge data. There is an even chance that the observed trend from the satellite altimetry era exceeds the noise in patches of the tropical Pacific and Indian Oceans and the south-west and north-east Pacific gyres. By including climate indices in the trend analysis, the time it takes for the observed linear sea-level trend to emerge from the noise reduces by up to 2 decades.
Exploring emergency department 4-hour target performance and cancelled elective operations: a regression analysis of routinely collected and openly reported NHS trust data.

Science.gov (United States)

Keogh, Brad; Culliford, David; Guerrero-Ludueña, Richard; Monks, Thomas

2018-05-24

To quantify the effect of intrahospital patient flow on emergency department (ED) performance targets and indicate if the expectations set by the National Health Service (NHS) England 5-year forward review are realistic in returning emergency services to previous performance levels. Linear regression analysis of routinely reported trust activity and performance data using a series of cross-sectional studies. NHS trusts in England submitting routine nationally reported measures to NHS England. 142 acute non-specialist trusts operating in England between 2012 and 2016. The primary outcome measures were proportion of 4-hour waiting time breaches and cancelled elective operations. Univariate and multivariate linear regression models were used to show relationships between the outcome measures and various measures of trust activity including empty day beds, empty night beds, day bed to night bed ratio, ED conversion ratio and delayed transfers of care. Univariate regression results using the outcome of 4-hour breaches showed clear relationships with empty night beds and ED conversion ratio between 2012 and 2016. The day bed to night bed ratio showed an increasing ability to explain variation in performance between 2015 and 2016. Delayed transfers of care showed little evidence of an association. Multivariate model results indicated that the ability of patient flow variables to explain 4-hour target performance had reduced between 2012 and 2016 (19% to 12%), and had increased in explaining cancelled elective operations (7% to 17%). The flow of patients through trusts is shown to influence ED performance; however, performance has become less explainable by intratrust patient flow between 2012 and 2016. Some commonly stated explanatory factors such as delayed transfers of care showed limited evidence of being related. The results indicate some of the measures proposed by NHS England to reduce pressure on EDs may not have the desired impact on returning services to previous
Regression: The Apple Does Not Fall Far From the Tree.

Science.gov (United States)

Vetter, Thomas R; Schober, Patrick

2018-05-15

Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
PAI-1 4G/5G polymorphism and plasma levels association in patients with coronary artery disease.

Science.gov (United States)

Lima, Luciana Moreira; Carvalho, Maria das Graças; Fonseca Neto, Cirilo Pereira; Garcia, José Carlos Faria; Sousa, Marinez Oliveira

2011-12-01

Type-1 plasminogen activator inhibitor (PAI-1) 4G/5G polymorphism may influence the PAI-1 expression. High plasma levels of PAI-1 are associated with coronary artery disease (CAD). This study investigated the influence of PAI-1 4G/5G polymorphism on plasma PAI-1 levels and its association with CAD assessed by coronary angiography. Blood sample of 35 individuals with angiographically normal coronary arteries, 31 individuals presenting mild/moderate atheromatosis, 57 individuals presenting severe atheromatosis and 38 healthy individuals (controls) were evaluated. In patients and controls, the PAI-1 4G/5G polymorphism was determined by PCR amplification using allele-specific primers. Plasma PAI-1 levels were quantified by ELISA assay (American Diagnostica). No difference was found between groups regarding age, gender and body mass index. Plasma PAI-1 levels and 4G/4G genotype frequency were significantly higher in the severe atheromatosis group compared to the other groups (p5G/5G genotype (r=0.02, p=0.4511). In addition, in a multiple logistic regression model, adjusted for all the other variables, PAI-1 was observed to be independently associated with CAD > 70% (p<0.001). The most important finding of this study was the association between 4G/4G genotype, high plasma PAI-1 levels and coronary stenosis higher than 70% in Brazilian individuals. Whether high plasma PAI-1 levels are a decisive factor for atherosclerosis worsening or it is a consequence remains to be established.
The correlation between serum free thyroxine and regression of dyslipidemia in adult males: A 4.5-year prospective study.

Science.gov (United States)

Wang, Haoyu; Liu, Aihua; Zhou, Yingying; Xiao, Yue; Yan, Yumeng; Zhao, Tong; Gong, Xun; Pang, Tianxiao; Fan, Chenling; Zhao, Jiajun; Teng, Weiping; Shan, Zhongyan; Lai, Yaxin

2017-09-01

Elevated free thyroxine (FT4) levels may play a protective role in development of dyslipidemia. However, few prospective studies have been performed to definite the effects of thyroid hormones on the improvement of dyslipidemia and its components. Thus, this study aims to clarify the association between thyroid hormones within normal range and reversal of dyslipidemia in the absence of intervention.A prospective analysis including 134 adult males was performed between 2010 and 2014. Anthropometric parameters, thyroid function, and lipid profile were measured at baseline and during follow-up. Logistic regression and receiver operating characteristic (ROC) analysis were conducted to identify the variables in forecasting the reversal of dyslipidemia and its components.During 4.5-year follow-up, 36.6% (49/134) patients resolved their dyslipidemia status without drug intervention. Compared with the continuous dyslipidemia group, subjects in reversal group had elevated FT4 and high-density lipoprotein cholesterol (HDL-C) levels, as well as decreased total cholesterol (TC), triglycerides (TG), and low-density lipoprotein cholesterol (LDL-C) levels at baseline. Furthermore, baseline FT4 is negatively associated with the change percentages of TG (r = -0.286, P = .001), while positively associated with HDL-C (r = 0.227, P = .008). However, no correlation of lipid profile change percentages with FT3 and TSH were observed. Furthermore, the improving effects of baseline FT4 on dyslipidemia, high TG, and low HDL-C status were still observed after multivariable adjustment. In ROC analysis, areas under curve (AUCs) for FT4 in predicting the reversal of dyslipidemia, high TG, and low HDL-C were 0.666, 0.643, and 0.702, respectively (P = .001 for dyslipidemia, .018 for high TG, and .001 for low HDL-C).Higher FT4 value within normal range may ameliorate the dyslipidemia, especially high TG and low HDL-C status, in males without drug intervention. This suggests
Assessing the impact of local meteorological variables on surface ozone in Hong Kong during 2000-2015 using quantile and multiple line regression models

Science.gov (United States)

Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo

2016-11-01

The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.
Land use regression modeling of intra-urban residential variability in multiple traffic-related air pollutants

Directory of Open Access Journals (Sweden)

Baxter Lisa K

2008-05-01

Full Text Available Abstract Background There is a growing body of literature linking GIS-based measures of traffic density to asthma and other respiratory outcomes. However, no consensus exists on which traffic indicators best capture variability in different pollutants or within different settings. As part of a study on childhood asthma etiology, we examined variability in outdoor concentrations of multiple traffic-related air pollutants within urban communities, using a range of GIS-based predictors and land use regression techniques. Methods We measured fine particulate matter (PM2.5, nitrogen dioxide (NO2, and elemental carbon (EC outside 44 homes representing a range of traffic densities and neighborhoods across Boston, Massachusetts and nearby communities. Multiple three to four-day average samples were collected at each home during winters and summers from 2003 to 2005. Traffic indicators were derived using Massachusetts Highway Department data and direct traffic counts. Multivariate regression analyses were performed separately for each pollutant, using traffic indicators, land use, meteorology, site characteristics, and central site concentrations. Results PM2.5 was strongly associated with the central site monitor (R2 = 0.68. Additional variability was explained by total roadway length within 100 m of the home, smoking or grilling near the monitor, and block-group population density (R2 = 0.76. EC showed greater spatial variability, especially during winter months, and was predicted by roadway length within 200 m of the home. The influence of traffic was greater under low wind speed conditions, and concentrations were lower during summer (R2 = 0.52. NO2 showed significant spatial variability, predicted by population density and roadway length within 50 m of the home, modified by site characteristics (obstruction, and with higher concentrations during summer (R2 = 0.56. Conclusion Each pollutant examined displayed somewhat different spatial patterns
Sea level trend and variability around Peninsular Malaysia

Science.gov (United States)

Luu, Q. H.; Tkalich, P.; Tay, T. W.

2015-08-01

Sea level rise due to climate change is non-uniform globally, necessitating regional estimates. Peninsular Malaysia is located in the middle of Southeast Asia, bounded from the west by the Malacca Strait, from the east by the South China Sea (SCS), and from the south by the Singapore Strait. The sea level along the peninsula may be influenced by various regional phenomena native to the adjacent parts of the Indian and Pacific oceans. To examine the variability and trend of sea level around the peninsula, tide gauge records and satellite altimetry are analyzed taking into account vertical land movements (VLMs). At annual scale, sea level anomalies (SLAs) around Peninsular Malaysia on the order of 5-25 cm are mainly monsoon driven. Sea levels at eastern and western coasts respond differently to the Asian monsoon: two peaks per year in the Malacca Strait due to South Asian-Indian monsoon; an annual cycle in the remaining region mostly due to the East Asian-western Pacific monsoon. At interannual scale, regional sea level variability in the range of ±6 cm is correlated with El Nino-Southern Oscillation (ENSO). SLAs in the Malacca Strait side are further correlated with the Indian Ocean Dipole (IOD) in the range of ±5 cm. Interannual regional sea level falls are associated with El Nino events and positive phases of IOD, whilst rises are correlated with La Nina episodes and negative values of the IOD index. At seasonal to interannual scales, we observe the separation of the sea level patterns in the Singapore Strait, between the Raffles Lighthouse and Tanjong Pagar tide stations, likely caused by a dynamic constriction in the narrowest part. During the observation period 1986-2013, average relative rates of sea level rise derived from tide gauges in Malacca Strait and along the east coast of the peninsula are 3.6±1.6 and 3.7±1.1 mm yr-1, respectively. Correcting for respective VLMs (0.8±2.6 and 0.9±2.2 mm yr-1), their corresponding geocentric sea level rise rates
Impact of multicollinearity on small sample hydrologic regression models

Science.gov (United States)

Kroll, Charles N.; Song, Peter

2013-06-01

Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
From Rasch scores to regression

DEFF Research Database (Denmark)

Christensen, Karl Bang

2006-01-01

Rasch models provide a framework for measurement and modelling latent variables. Having measured a latent variable in a population a comparison of groups will often be of interest. For this purpose the use of observed raw scores will often be inadequate because these lack interval scale propertie....... This paper compares two approaches to group comparison: linear regression models using estimated person locations as outcome variables and latent regression models based on the distribution of the score....
Regression-based season-ahead drought prediction for southern Peru conditioned on large-scale climate variables

Science.gov (United States)

Mortensen, Eric; Wu, Shu; Notaro, Michael; Vavrus, Stephen; Montgomery, Rob; De Piérola, José; Sánchez, Carlos; Block, Paul

2018-01-01

Located at a complex topographic, climatic, and hydrologic crossroads, southern Peru is a semiarid region that exhibits high spatiotemporal variability in precipitation. The economic viability of the region hinges on this water, yet southern Peru is prone to water scarcity caused by seasonal meteorological drought. Meteorological droughts in this region are often triggered during El Niño episodes; however, other large-scale climate mechanisms also play a noteworthy role in controlling the region's hydrologic cycle. An extensive season-ahead precipitation prediction model is developed to help bolster the existing capacity of stakeholders to plan for and mitigate deleterious impacts of drought. In addition to existing climate indices, large-scale climatic variables, such as sea surface temperature, are investigated to identify potential drought predictors. A principal component regression framework is applied to 11 potential predictors to produce an ensemble forecast of regional January-March precipitation totals. Model hindcasts of 51 years, compared to climatology and another model conditioned solely on an El Niño-Southern Oscillation index, achieve notable skill and perform better for several metrics, including ranked probability skill score and a hit-miss statistic. The information provided by the developed model and ancillary modeling efforts, such as extending the lead time of and spatially disaggregating precipitation predictions to the local level as well as forecasting the number of wet-dry days per rainy season, may further assist regional stakeholders and policymakers in preparing for drought.
Diagnosis of cranial hemangioma: Comparison between logistic regression analysis and neuronal network

International Nuclear Information System (INIS)

Arana, E.; Marti-Bonmati, L.; Bautista, D.; Paredes, R.

1998-01-01

To study the utility of logistic regression and the neuronal network in the diagnosis of cranial hemangiomas. Fifteen patients presenting hemangiomas were selected form a total of 167 patients with cranial lesions. All were evaluated by plain radiography and computed tomography (CT). Nineteen variables in their medical records were reviewed. Logistic regression and neuronal network models were constructed and validated by the jackknife (leave-one-out) approach. The yields of the two models were compared by means of ROC curves, using the area under the curve as parameter. Seven men and 8 women presented hemangiomas. The mean age of these patients was 38.4 (15.4 years (mea ± standard deviation). Logistic regression identified as significant variables the shape, soft tissue mass and periosteal reaction. The neuronal network lent more importance to the existence of ossified matrix, ruptured cortical vein and the mixed calcified-blastic (trabeculated) pattern. The neuronal network showed a greater yield than logistic regression (Az, 0.9409) (0.004 versus 0.7211± 0.075; p<0.001). The neuronal network discloses hidden interactions among the variables, providing a higher yield in the characterization of cranial hemangiomas and constituting a medical diagnostic acid. (Author)29 refs
Purposeful selection of variables in logistic regression

Directory of Open Access Journals (Sweden)

Williams David Keith

2008-12-01

Full Text Available Abstract Background The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results We show that the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS data. Conclusion If an analyst is in need of an algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.
What Are the Odds of that? A Primer on Understanding Logistic Regression

Science.gov (United States)

Huang, Francis L.; Moon, Tonya R.

2013-01-01

The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
Hierarchical regression analysis in structural Equation Modeling

NARCIS (Netherlands)

de Jong, P.F.

1999-01-01

In a hierarchical or fixed-order regression analysis, the independent variables are entered into the regression equation in a prespecified order. Such an analysis is often performed when the extra amount of variance accounted for in a dependent variable by a specific independent variable is the main
Regression analysis by example

CERN Document Server

Chatterjee, Samprit

2012-01-01

Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

Blood glucose level prediction based on support vector regression using mobile platforms.

Science.gov (United States)

Reymann, Maximilian P; Dorschky, Eva; Groh, Benjamin H; Martindale, Christine; Blank, Peter; Eskofier, Bjoern M

2016-08-01

The correct treatment of diabetes is vital to a patient's health: Staying within defined blood glucose levels prevents dangerous short- and long-term effects on the body. Mobile devices informing patients about their future blood glucose levels could enable them to take counter-measures to prevent hypo or hyper periods. Previous work addressed this challenge by predicting the blood glucose levels using regression models. However, these approaches required a physiological model, representing the human body's response to insulin and glucose intake, or are not directly applicable to mobile platforms (smart phones, tablets). In this paper, we propose an algorithm for mobile platforms to predict blood glucose levels without the need for a physiological model. Using an online software simulator program, we trained a Support Vector Regression (SVR) model and exported the parameter settings to our mobile platform. The prediction accuracy of our mobile platform was evaluated with pre-recorded data of a type 1 diabetes patient. The blood glucose level was predicted with an error of 19 % compared to the true value. Considering the permitted error of commercially used devices of 15 %, our algorithm is the basis for further development of mobile prediction algorithms.
Spatial variability of excess mortality during prolonged dust events in a high-density city: a time-stratified spatial regression approach.

Science.gov (United States)

Wong, Man Sing; Ho, Hung Chak; Yang, Lin; Shi, Wenzhong; Yang, Jinxin; Chan, Ta-Chien

2017-07-24

Dust events have long been recognized to be associated with a higher mortality risk. However, no study has investigated how prolonged dust events affect the spatial variability of mortality across districts in a downwind city. In this study, we applied a spatial regression approach to estimate the district-level mortality during two extreme dust events in Hong Kong. We compared spatial and non-spatial models to evaluate the ability of each regression to estimate mortality. We also compared prolonged dust events with non-dust events to determine the influences of community factors on mortality across the city. The density of a built environment (estimated by the sky view factor) had positive association with excess mortality in each district, while socioeconomic deprivation contributed by lower income and lower education induced higher mortality impact in each territory planning unit during a prolonged dust event. Based on the model comparison, spatial error modelling with the 1st order of queen contiguity consistently outperformed other models. The high-risk areas with higher increase in mortality were located in an urban high-density environment with higher socioeconomic deprivation. Our model design shows the ability to predict spatial variability of mortality risk during an extreme weather event that is not able to be estimated based on traditional time-series analysis or ecological studies. Our spatial protocol can be used for public health surveillance, sustainable planning and disaster preparation when relevant data are available.
FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

Science.gov (United States)

Lorenzo-Seva, Urbano; Ferrando, Pere J

2011-03-01

We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Support vector regression model based predictive control of water level of U-tube steam generators

Energy Technology Data Exchange (ETDEWEB)

Kavaklioglu, Kadir, E-mail: kadir.kavaklioglu@pau.edu.tr

2014-10-15

Highlights: • Water level of U-tube steam generators was controlled in a model predictive fashion. • Models for steam generator water level were built using support vector regression. • Cost function minimization for future optimal controls was performed by using the steepest descent method. • The results indicated the feasibility of the proposed method. - Abstract: A predictive control algorithm using support vector regression based models was proposed for controlling the water level of U-tube steam generators of pressurized water reactors. Steam generator data were obtained using a transfer function model of U-tube steam generators. Support vector regression based models were built using a time series type model structure for five different operating powers. Feedwater flow controls were calculated by minimizing a cost function that includes the level error, the feedwater change and the mismatch between feedwater and steam flow rates. Proposed algorithm was applied for a scenario consisting of a level setpoint change and a steam flow disturbance. The results showed that steam generator level can be controlled at all powers effectively by the proposed method.
Post-processing through linear regression

Science.gov (United States)

van Schaeybroeck, B.; Vannitsem, S.

2011-03-01

Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Sea level variability at Adriatic coast and its relationship to atmospheric forcing

Energy Technology Data Exchange (ETDEWEB)

Bergant, K. [Centre for Atmospheric Research, Nova Gorica Polytechnic, Nova Gorica (Slovenia); Susnik, M.; Strojan, I. [Dept. of Hydrology, Environmental Agency of the Republic of Slovenia, Ljubljana (Slovenia); Shaw, A.G.P. [James Rennel Div., National Oceanography Centre, Empress Dock, Southampton (United Kingdom)

2005-07-01

Sea level (SLH) variability at the Adriatic coast was investigated for the period 1872-2001 using monthly average values of observations at 13 tide gauge stations. Linear trends and seasonal cycles were investigated first and removed afterwards from the data. Empirical orthogonal functions (EOF) analysis was used further on remaining anomalies (SLA) to extract the regional intermonthly variability of SLH. It was found that the leading EOF and its principal component (PC) explain a major part of SLA variability (92%). The correlation between the reconstructed SLA, based on leading EOF and its PC, and overlapping observed SLA values for selected tide gauge stations is between 0.93 and 0.99. Actual SLH values at tide gauge stations can be reconstructed and some gaps in the data can be filled in on the basis of estimated SLA values if reasonable estimates of long-term trends and seasonal cycles are also available. A strong, seasonally dependent relationship between SLA at the Adriatic coast and atmospheric forcing, represented by sea level pressure (SLP) fields, was also found. Comparing the time series of leading PC and gridded SLP data for the period 1948-2001, the highest correlation coefficients (r) of -0.92 in winter, -0.84 in spring, -0.66 in summer, and -0.91 in autumn were estimated for a SLP grid point located in northern Italy. The SLP variability on this grid point contains information about the isostatic response of the sea level at the Adriatic coast, but can also be treated as a sort of teleconnection index representing the large-scale SLP variability across central and southern Europe. To some extent the large-scale SLP variability that affects the SLA at the Adriatic coast can be related to the North Atlantic Oscillation (NAO), because significant correlations were found between the NAO index and the first PC of SLA (r{sub winter}=-0.56, r{sub spring}=-0.45, r{sub summer}=-0.48, and r{sub autumn}=-0.43) for the period 1872-2001. The use of partial least
Mixed geographically weighted regression (MGWR) model with weighted adaptive bi-square for case of dengue hemorrhagic fever (DHF) in Surakarta

Science.gov (United States)

Astuti, H. N.; Saputro, D. R. S.; Susanti, Y.

2017-06-01

MGWR model is combination of linear regression model and geographically weighted regression (GWR) model, therefore, MGWR model could produce parameter estimation that had global parameter estimation, and other parameter that had local parameter in accordance with its observation location. The linkage between locations of the observations expressed in specific weighting that is adaptive bi-square. In this research, we applied MGWR model with weighted adaptive bi-square for case of DHF in Surakarta based on 10 factors (variables) that is supposed to influence the number of people with DHF. The observation unit in the research is 51 urban villages and the variables are number of inhabitants, number of houses, house index, many public places, number of healthy homes, number of Posyandu, area width, level population density, welfare of the family, and high-region. Based on this research, we obtained 51 MGWR models. The MGWR model were divided into 4 groups with significant variable is house index as a global variable, an area width as a local variable and the remaining variables vary in each. Global variables are variables that significantly affect all locations, while local variables are variables that significantly affect a specific location.
Prevalence of antibiotic prescription in southern Italian outpatients: real-world data analysis of socioeconomic and sociodemographic variables at a municipality level.

Science.gov (United States)

Russo, Veronica; Monetti, Valeria Marina; Guerriero, Francesca; Trama, Ugo; Guida, Antonella; Menditto, Enrica; Orlando, Valentina

2018-01-01

The aim of this study was to analyze the geographic variation in systemic antibiotic prescription at a regional level and to explore the influence of socioeconomic and sociodemographic variables. This study was a retrospective analysis of reimbursement pharmacy records in the outpatient settings of Italy's Campania Region in 2016. Standardized antibiotic prescription rates were calculated at municipality and Local Health Unit (LHU) level. Antibiotic consumption was analyzed as defined daily doses (DDD)/1000 inhabitants per day (DID). Logistic regression was performed to evaluate the association between antibiotic prescription and sociodemographic and socioeconomic determinants at a municipality level. The average antibiotic prevalence rate was 46.8%. At LHU level, the age-adjusted prevalence rates ranged from 41.1% in Benevento to 51.0% in Naples2. Significant differences were found among municipalities, from 15.2% in Omignano (Salerno LHU [Sa-LHU]) to 61.9% in Moschiano (Avellino [Av-LHU]). The geographic distribution also showed significant differences in terms of antibiotic consumption, from 6.7 DID in Omignano to 41.6 in San Marcelino (Caserta [Ce-LHU]). Logistic regression showed that both municipality type and average annual income level were the main determinants of antibiotic prescription. Urban municipalities were more than eight times as likely to have antibiotic high prevalence rates compared to rural municipalities (adjusted odds ratio [OR]: 8.62; 95% confidence interval [CI]: 4.06-18.30, P <0.001). Low average annual income level municipalities were more than eight times as likely to have antibiotic high prevalence rates compared to high average annual income level municipalities (adjusted OR: 8.48; 95% CI: 3.45-20.81, P <0.001). We provide a snapshot of Campania's antibiotic consumption, evidencing the impact of both socioeconomic and sociodemographic factors on the prevalence of antibiotic prescription. The observed intraregional variability
Estimation of Geographically Weighted Regression Case Study on Wet Land Paddy Productivities in Tulungagung Regency

Directory of Open Access Journals (Sweden)

Danang Ariyanto

2017-11-01

Full Text Available Regression is a method connected independent variable and dependent variable with estimation parameter as an output. Principal problem in this method is its application in spatial data. Geographically Weighted Regression (GWR method used to solve the problem. GWR is a regression technique that extends the traditional regression framework by allowing the estimation of local rather than global parameters. In other words, GWR runs a regression for each location, instead of a sole regression for the entire study area. The purpose of this research is to analyze the factors influencing wet land paddy productivities in Tulungagung Regency. The methods used in this research is GWR using cross validation bandwidth and weighted by adaptive Gaussian kernel fungtion.This research using 4 variables which are presumed affecting the wet land paddy productivities such as: the rate of rainfall(X1, the average cost of fertilizer per hectare(X2, the average cost of pestisides per hectare(X3 and Allocation of subsidized NPK fertilizer of food crops sub-sector(X4. Based on the result, X1, X2, X3 and X4 has a different effect on each Distric. So, to improve the productivity of wet land paddy in Tulungagung Regency required a special policy based on the GWR model in each distric.
Determinants of Non-Performing Assets in India - Panel Regression

Directory of Open Access Journals (Sweden)

Saikat Ghosh Roy

2014-12-01

Full Text Available It is well known that level of banks‟ credit plays an important role in economic developments. Indian banking sector has played a seminal role in supporting economic growth in India. Recently, Indian banks are experiencing consistent increase in non-performing assets (NPA. In this perspective, this paper investigates the trends in NPA in Indian banks and its determinants. The panel regressions, fixed effect allows evaluating the impact of selected macroeconomic variables on the NPA. The Panel regression result indicates that the GDP growth, change in exchange rate and global volatility have major effects on the NPA level of Indian banking sector.
Variability of fasting and post-menthionine plasma homocysteine levels in normo- and hyperhomocysteinaemic individuals

NARCIS (Netherlands)

van den Berg, M.; de Jong, S.C.; Devilli, W.; Rauwerda, J.A.; Jakobs, C.A.J.M.; Pals, G.; Boers, G.H.J.; Stehouwer, C.D.A.

1999-01-01

To assess the variability of plasma homocysteine levels, fasting and post-methionine homocysteine levels were measured twice, at baseline and after follow-up of 1-4 months, in 16 individuals with normal and 26 with elevated homocysteine levels after methionine loading. The intra-individual
Gaussian process regression analysis for functional data

CERN Document Server

Shi, Jian Qing

2011-01-01

Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime
Eddy covariance flux measurements confirm extreme CH4 emissions from a Swiss hydropower reservoir and resolve their short-term variability

Directory of Open Access Journals (Sweden)

S. Sobek

2011-09-01

Full Text Available Greenhouse gas budgets quantified via land-surface eddy covariance (EC flux sites differ significantly from those obtained via inverse modeling. A possible reason for the discrepancy between methods may be our gap in quantitative knowledge of methane (CH4 fluxes. In this study we carried out EC flux measurements during two intensive campaigns in summer 2008 to quantify methane flux from a hydropower reservoir and link its temporal variability to environmental driving forces: water temperature and pressure changes (atmospheric and due to changes in lake level. Methane fluxes were extremely high and highly variable, but consistently showed gas efflux from the lake when the wind was approaching the EC sensors across the open water, as confirmed by floating chamber flux measurements. The average flux was 3.8 ± 0.4 μg C m−2 s−1 (mean ± SE with a median of 1.4 μg C m−2 s−1, which is quite high even compared to tropical reservoirs. Floating chamber fluxes from four selected days confirmed such high fluxes with 7.4 ± 1.3 μg C m−2 s−1. Fluxes increased exponentially with increasing temperatures, but were decreasing exponentially with increasing atmospheric and/or lake level pressure. A multiple regression using lake surface temperatures (0.1 m depth, temperature at depth (10 m deep in front of the dam, atmospheric pressure, and lake level was able to explain 35.4% of the overall variance. This best fit included each variable averaged over a 9-h moving window, plus the respective short-term residuals thereof. We estimate that an annual average of 3% of the particulate organic matter (POM input via the river is sufficient to sustain these large CH4 fluxes. To compensate the global warming potential associated with the CH4 effluxes from this hydropower reservoir a 1.3 to 3.7 times larger terrestrial area with net carbon dioxide uptake is needed if a European-scale compilation of grasslands, croplands and forests is taken as reference. This
Regression Analysis to Identify Factors Associated with Household Salt Iodine Content at the Sub-National Level in Bangladesh, India, Ghana and Senegal

Science.gov (United States)

Knowles, Jacky; Kupka, Roland; Dumble, Sam; Garrett, Greg S.; Pandav, Chandrakant S.; Yadav, Kapil; Nahar, Baitun; Touré, Ndeye Khady; Amoaful, Esi Foriwa; Gorstein, Jonathan

2018-01-01

Regression analyses of data from stratified, cluster sample, household iodine surveys in Bangladesh, India, Ghana and Senegal were conducted to identify factors associated with household access to adequately iodised salt. For all countries, in single variable analyses, household salt iodine was significantly different (p < 0.05) between strata (geographic areas with representative data, defined by survey design), and significantly higher (p < 0.05) among households: with better living standard scores, where the respondent knew about iodised salt and/or looked for iodised salt at purchase, using salt bought in a sealed package, or using refined grain salt. Other country-level associations were also found. Multiple variable analyses showed a significant association between salt iodine and strata (p < 0.001) in India, Ghana and Senegal and that salt grain type was significantly associated with estimated iodine content in all countries (p < 0.001). Salt iodine relative to the reference (coarse salt) ranged from 1.3 (95% CI 1.2, 1.5) times higher for fine salt in Senegal to 3.6 (95% CI 2.6, 4.9) times higher for washed and 6.5 (95% CI 4.9, 8.8) times higher for refined salt in India. Sub-national data are required to monitor equity of access to adequately iodised salt. Improving household access to refined iodised salt in sealed packaging, would improve iodine intake from household salt in all four countries in this analysis, particularly in areas where there is significant small-scale salt production. PMID:29671774
Post-processing through linear regression

Directory of Open Access Journals (Sweden)

B. Van Schaeybroeck

2011-03-01

Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.

These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Separating decadal global water cycle variability from sea level rise.

Science.gov (United States)

Hamlington, B D; Reager, J T; Lo, M-H; Karnauskas, K B; Leben, R R

2017-04-20

Under a warming climate, amplification of the water cycle and changes in precipitation patterns over land are expected to occur, subsequently impacting the terrestrial water balance. On global scales, such changes in terrestrial water storage (TWS) will be reflected in the water contained in the ocean and can manifest as global sea level variations. Naturally occurring climate-driven TWS variability can temporarily obscure the long-term trend in sea level rise, in addition to modulating the impacts of sea level rise through natural periodic undulation in regional and global sea level. The internal variability of the global water cycle, therefore, confounds both the detection and attribution of sea level rise. Here, we use a suite of observations to quantify and map the contribution of TWS variability to sea level variability on decadal timescales. In particular, we find that decadal sea level variability centered in the Pacific Ocean is closely tied to low frequency variability of TWS in key areas across the globe. The unambiguous identification and clean separation of this component of variability is the missing step in uncovering the anthropogenic trend in sea level and understanding the potential for low-frequency modulation of future TWS impacts including flooding and drought.
Interval ridge regression (iRR) as a fast and robust method for quantitative prediction and variable selection applied to edible oil adulteration.

Science.gov (United States)

Jović, Ozren; Smrečki, Neven; Popović, Zora

2016-04-01

A novel quantitative prediction and variable selection method called interval ridge regression (iRR) is studied in this work. The method is performed on six data sets of FTIR, two data sets of UV-vis and one data set of DSC. The obtained results show that models built with ridge regression on optimal variables selected with iRR significantly outperfom models built with ridge regression on all variables in both calibration (6 out of 9 cases) and validation (2 out of 9 cases). In this study, iRR is also compared with interval partial least squares regression (iPLS). iRR outperfomed iPLS in validation (insignificantly in 6 out of 9 cases and significantly in one out of 9 cases for poil, a well known health beneficial nutrient, is studied in this work by mixing it with cheap and widely used oils such as soybean (So) oil, rapeseed (R) oil and sunflower (Su) oil. Binary mixture sets of hempseed oil with these three oils (HSo, HR and HSu) and a ternary mixture set of H oil, R oil and Su oil (HRSu) were considered. The obtained accuracy indicates that using iRR on FTIR and UV-vis data, each particular oil can be very successfully quantified (in all 8 cases RMSEPoil (R(2)>0.99). Copyright © 2015 Elsevier B.V. All rights reserved.
Correlation and simple linear regression.

Science.gov (United States)

Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

2003-06-01

In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Analysis models for variables associated with breastfeeding duration

Directory of Open Access Journals (Sweden)

Edson Theodoro dos S. Neto

2013-09-01

Full Text Available OBJECTIVE To analyze the factors associated with breastfeeding duration by two statistical models. METHODS A population-based cohort study was conducted with 86 mothers and newborns from two areas primary covered by the National Health System, with high rates of infant mortality in Vitória, Espírito Santo, Brazil. During 30 months, 67 (78% children and mothers were visited seven times at home by trained interviewers, who filled out survey forms. Data on food and sucking habits, socioeconomic and maternal characteristics were collected. Variables were analyzed by Cox regression models, considering duration of breastfeeding as the dependent variable, and logistic regression (dependent variables, was the presence of a breastfeeding child in different post-natal ages. RESULTS In the logistic regression model, the pacifier sucking (adjusted Odds Ratio: 3.4; 95%CI 1.2-9.55 and bottle feeding (adjusted Odds Ratio: 4.4; 95%CI 1.6-12.1 increased the chance of weaning a child before one year of age. Variables associated to breastfeeding duration in the Cox regression model were: pacifier sucking (adjusted Hazard Ratio 2.0; 95%CI 1.2-3.3 and bottle feeding (adjusted Hazard Ratio 2.0; 95%CI 1.2-3.5. However, protective factors (maternal age and family income differed between both models. CONCLUSIONS Risk and protective factors associated with cessation of breastfeeding may be analyzed by different models of statistical regression. Cox Regression Models are adequate to analyze such factors in longitudinal studies.
Sea level trend and variability around the Peninsular Malaysia

Science.gov (United States)

Luu, Q. H.; Tkalich, P.; Tay, T. W.

2014-06-01

Peninsular Malaysia is bounded from the west by Malacca Strait and the Andaman Sea both connected to the Indian Ocean, and from the east by South China Sea being largest marginal sea in the Pacific Basin. Resulting sea level along Peninsular Malaysia coast is assumed to be governed by various regional phenomena associated with the adjacent parts of the Indian and Pacific Oceans. At annual scale, sea level anomalies (SLAs) are generated by the Asian monsoon; interannual sea level variability is determined by the El Niño-Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD); while long-term sea level trend is related to global climate change. To quantify the relative impacts of these multi-scale phenomena on sea level trend and variability around the Peninsular Malaysia, long-term tide gauge record and satellite altimetry are used. During 1984-2011, relative sea level rise (SLR) rates in waters of Malacca Strait and eastern Peninsular Malaysia are found to be 2.4 ± 1.6 mm yr-1 and 2.7 ± 1.0 mm yr-1, respectively. Allowing for corresponding vertical land movements (VLM; 0.8 ± 2.6 mm yr-1 and 0.9 ± 2.2 mm yr-1), their absolute SLR rates are 3.2 ± 4.2 mm yr-1 and 3.6 ± 3.2 mm yr-1, respectively. For the common period 1993-2009, absolute SLR rates obtained from both tide gauge and satellite altimetry in Peninsular Malaysia are similar; and they are slightly higher than the global tendency. It further underlines that VLM should be taken into account to get better estimates of SLR observations. At interannual scale, ENSO affects sea level over the Malaysian coast in the range of ±5 cm with a very high correlation. Meanwhile, IOD modulates sea level anomalies mainly in the Malacca Strait in the range of ±2 cm with a high correlation coefficient. Interannual regional sea level drops are associated with El Niño events and positive phases of the IOD index; while the rises are correlated with La Niña episodes and the negative periods of the IOD index

A gentle introduction to quantile regression for ecologists

Science.gov (United States)

Cade, B.S.; Noon, B.R.

2003-01-01

Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.
Estimating severity of sideways fall using a generic multi linear regression model based on kinematic input variables.

Science.gov (United States)

van der Zijden, A M; Groen, B E; Tanck, E; Nienhuis, B; Verdonschot, N; Weerdesteyn, V

2017-03-21

Many research groups have studied fall impact mechanics to understand how fall severity can be reduced to prevent hip fractures. Yet, direct impact force measurements with force plates are restricted to a very limited repertoire of experimental falls. The purpose of this study was to develop a generic model for estimating hip impact forces (i.e. fall severity) in in vivo sideways falls without the use of force plates. Twelve experienced judokas performed sideways Martial Arts (MA) and Block ('natural') falls on a force plate, both with and without a mat on top. Data were analyzed to determine the hip impact force and to derive 11 selected (subject-specific and kinematic) variables. Falls from kneeling height were used to perform a stepwise regression procedure to assess the effects of these input variables and build the model. The final model includes four input variables, involving one subject-specific measure and three kinematic variables: maximum upper body deceleration, body mass, shoulder angle at the instant of 'maximum impact' and maximum hip deceleration. The results showed that estimated and measured hip impact forces were linearly related (explained variances ranging from 46 to 63%). Hip impact forces of MA falls onto the mat from a standing position (3650±916N) estimated by the final model were comparable with measured values (3698±689N), even though these data were not used for training the model. In conclusion, a generic linear regression model was developed that enables the assessment of fall severity through kinematic measures of sideways falls, without using force plates. Copyright © 2017 Elsevier Ltd. All rights reserved.
Ordinal regression models to describe tourist satisfaction with Sintra's world heritage

Science.gov (United States)

Mouriño, Helena

2013-10-01

In Tourism Research, ordinal regression models are becoming a very powerful tool in modelling the relationship between an ordinal response variable and a set of explanatory variables. In August and September 2010, we conducted a pioneering Tourist Survey in Sintra, Portugal. The data were obtained by face-to-face interviews at the entrances of the Palaces and Parks of Sintra. The work developed in this paper focus on two main points: tourists' perception of the entrance fees; overall level of satisfaction with this heritage site. For attaining these goals, ordinal regression models were developed. We concluded that tourist's nationality was the only significant variable to describe the perception of the admission fees. Also, Sintra's image among tourists depends not only on their nationality, but also on previous knowledge about Sintra's World Heritage status.
Landslide Hazard Mapping in Rwanda Using Logistic Regression

Science.gov (United States)

Piller, A.; Anderson, E.; Ballard, H.

2015-12-01

Landslides in the United States cause more than $1 billion in damages and 50 deaths per year (USGS 2014). Globally, figures are much more grave, yet monitoring, mapping and forecasting of these hazards are less than adequate. Seventy-five percent of the population of Rwanda earns a living from farming, mostly subsistence. Loss of farmland, housing, or life, to landslides is a very real hazard. Landslides in Rwanda have an impact at the economic, social, and environmental level. In a developing nation that faces challenges in tracking, cataloging, and predicting the numerous landslides that occur each year, satellite imagery and spatial analysis allow for remote study. We have focused on the development of a landslide inventory and a statistical methodology for assessing landslide hazards. Using logistic regression on approximately 30 test variables (i.e. slope, soil type, land cover, etc.) and a sample of over 200 landslides, we determine which variables are statistically most relevant to landslide occurrence in Rwanda. A preliminary predictive hazard map for Rwanda has been produced, using the variables selected from the logistic regression analysis.
Linear regression

CERN Document Server

Olive, David J

2017-01-01

This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
The prediction of intelligence in preschool children using alternative models to regression.

Science.gov (United States)

Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

2011-12-01

Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.
Applied Regression Modeling A Business Approach

CERN Document Server

Pardoe, Iain

2012-01-01

An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
bayesQR: A Bayesian Approach to Quantile Regression

Directory of Open Access Journals (Sweden)

Dries F. Benoit

2017-01-01

Full Text Available After its introduction by Koenker and Basset (1978, quantile regression has become an important and popular tool to investigate the conditional response distribution in regression. The R package bayesQR contains a number of routines to estimate quantile regression parameters using a Bayesian approach based on the asymmetric Laplace distribution. The package contains functions for the typical quantile regression with continuous dependent variable, but also supports quantile regression for binary dependent variables. For both types of dependent variables, an approach to variable selection using the adaptive lasso approach is provided. For the binary quantile regression model, the package also contains a routine that calculates the fitted probabilities for each vector of predictors. In addition, functions for summarizing the results, creating traceplots, posterior histograms and drawing quantile plots are included. This paper starts with a brief overview of the theoretical background of the models used in the bayesQR package. The main part of this paper discusses the computational problems that arise in the implementation of the procedure and illustrates the usefulness of the package through selected examples.
A Simulation Investigation of Principal Component Regression.

Science.gov (United States)

Allen, David E.

Regression analysis is one of the more common analytic tools used by researchers. However, multicollinearity between the predictor variables can cause problems in using the results of regression analyses. Problems associated with multicollinearity include entanglement of relative influences of variables due to reduced precision of estimation,…
Regression analysis with categorized regression calibrated exposure: some interesting findings

Directory of Open Access Journals (Sweden)

Hjartåker Anette

2006-07-01

Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a
Introduction to the use of regression models in epidemiology.

Science.gov (United States)

Bender, Ralf

2009-01-01

Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Multicollinearity and Regression Analysis

Science.gov (United States)

Daoud, Jamal I.

2017-12-01

In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.
Evaluation of heat transfer mathematical models and multiple linear regression to predict the inside variables in semi-solar greenhouse

Directory of Open Access Journals (Sweden)

M Taki

2017-05-01

Full Text Available Introduction Controlling greenhouse microclimate not only influences the growth of plants, but also is critical in the spread of diseases inside the greenhouse. The microclimate parameters were inside air, greenhouse roof and soil temperature, relative humidity and solar radiation intensity. Predicting the microclimate conditions inside a greenhouse and enabling the use of automatic control systems are the two main objectives of greenhouse climate model. The microclimate inside a greenhouse can be predicted by conducting experiments or by using simulation. Static and dynamic models are used for this purpose as a function of the metrological conditions and the parameters of the greenhouse components. Some works were done in past to 2015 year to simulation and predict the inside variables in different greenhouse structures. Usually simulation has a lot of problems to predict the inside climate of greenhouse and the error of simulation is higher in literature. The main objective of this paper is comparison between heat transfer and regression models to evaluate them to predict inside air and roof temperature in a semi-solar greenhouse in Tabriz University. Materials and Methods In this study, a semi-solar greenhouse was designed and constructed at the North-West of Iran in Azerbaijan Province (geographical location of 38°10′ N and 46°18′ E with elevation of 1364 m above the sea level. In this research, shape and orientation of the greenhouse, selected between some greenhouses common shapes and according to receive maximum solar radiation whole the year. Also internal thermal screen and cement north wall was used to store and prevent of heat lost during the cold period of year. So we called this structure, ‘semi-solar’ greenhouse. It was covered with glass (4 mm thickness. It occupies a surface of approximately 15.36 m2 and 26.4 m3. The orientation of this greenhouse was East–West and perpendicular to the direction of the wind prevailing
Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.

Science.gov (United States)

Pralle, R S; Weigel, K W; White, H M

2018-05-01

Prediction of postpartum hyperketonemia (HYK) using Fourier transform infrared (FTIR) spectrometry analysis could be a practical diagnostic option for farms because these data are now available from routine milk analysis during Dairy Herd Improvement testing. The objectives of this study were to (1) develop and evaluate blood β-hydroxybutyrate (BHB) prediction models using multivariate linear regression (MLR), partial least squares regression (PLS), and artificial neural network (ANN) methods and (2) evaluate whether milk FTIR spectrum (mFTIR)-based models are improved with the inclusion of test-day variables (mTest; milk composition and producer-reported data). Paired blood and milk samples were collected from multiparous cows 5 to 18 d postpartum at 3 Wisconsin farms (3,629 observations from 1,013 cows). Blood BHB concentration was determined by a Precision Xtra meter (Abbot Diabetes Care, Alameda, CA), and milk samples were analyzed by a privately owned laboratory (AgSource, Menomonie, WI) for components and FTIR spectrum absorbance. Producer-recorded variables were extracted from farm management software. A blood BHB ≥1.2 mmol/L was considered HYK. The data set was divided into a training set (n = 3,020) and an external testing set (n = 609). Model fitting was implemented with JMP 12 (SAS Institute, Cary, NC). A 5-fold cross-validation was performed on the training data set for the MLR, PLS, and ANN prediction methods, with square root of blood BHB as the dependent variable. Each method was fitted using 3 combinations of variables: mFTIR, mTest, or mTest + mFTIR variables. Models were evaluated based on coefficient of determination, root mean squared error, and area under the receiver operating characteristic curve. Four models (PLS-mTest + mFTIR, ANN-mFTIR, ANN-mTest, and ANN-mTest + mFTIR) were chosen for further evaluation in the testing set after fitting to the full training set. In the cross-validation analysis, model fit was greatest for ANN, followed
A Seemingly Unrelated Poisson Regression Model

OpenAIRE

King, Gary

1989-01-01

This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.
Penalized variable selection in competing risks regression.

Science.gov (United States)

Fu, Zhixuan; Parikh, Chirag R; Zhou, Bingqing

2017-07-01

Penalized variable selection methods have been extensively studied for standard time-to-event data. Such methods cannot be directly applied when subjects are at risk of multiple mutually exclusive events, known as competing risks. The proportional subdistribution hazard (PSH) model proposed by Fine and Gray (J Am Stat Assoc 94:496-509, 1999) has become a popular semi-parametric model for time-to-event data with competing risks. It allows for direct assessment of covariate effects on the cumulative incidence function. In this paper, we propose a general penalized variable selection strategy that simultaneously handles variable selection and parameter estimation in the PSH model. We rigorously establish the asymptotic properties of the proposed penalized estimators and modify the coordinate descent algorithm for implementation. Simulation studies are conducted to demonstrate the good performance of the proposed method. Data from deceased donor kidney transplants from the United Network of Organ Sharing illustrate the utility of the proposed method.
Secondary mediation and regression analyses of the PTClinResNet database: determining causal relationships among the International Classification of Functioning, Disability and Health levels for four physical therapy intervention trials.

Science.gov (United States)

Mulroy, Sara J; Winstein, Carolee J; Kulig, Kornelia; Beneck, George J; Fowler, Eileen G; DeMuth, Sharon K; Sullivan, Katherine J; Brown, David A; Lane, Christianne J

2011-12-01

Each of the 4 randomized clinical trials (RCTs) hosted by the Physical Therapy Clinical Research Network (PTClinResNet) targeted a different disability group (low back disorder in the Muscle-Specific Strength Training Effectiveness After Lumbar Microdiskectomy [MUSSEL] trial, chronic spinal cord injury in the Strengthening and Optimal Movements for Painful Shoulders in Chronic Spinal Cord Injury [STOMPS] trial, adult stroke in the Strength Training Effectiveness Post-Stroke [STEPS] trial, and pediatric cerebral palsy in the Pediatric Endurance and Limb Strengthening [PEDALS] trial for children with spastic diplegic cerebral palsy) and tested the effectiveness of a muscle-specific or functional activity-based intervention on primary outcomes that captured pain (STOMPS, MUSSEL) or locomotor function (STEPS, PEDALS). The focus of these secondary analyses was to determine causal relationships among outcomes across levels of the International Classification of Functioning, Disability and Health (ICF) framework for the 4 RCTs. With the database from PTClinResNet, we used 2 separate secondary statistical approaches-mediation analysis for the MUSSEL and STOMPS trials and regression analysis for the STEPS and PEDALS trials-to test relationships among muscle performance, primary outcomes (pain related and locomotor related), activity and participation measures, and overall quality of life. Predictive models were stronger for the 2 studies with pain-related primary outcomes. Change in muscle performance mediated or predicted reductions in pain for the MUSSEL and STOMPS trials and, to some extent, walking speed for the STEPS trial. Changes in primary outcome variables were significantly related to changes in activity and participation variables for all 4 trials. Improvement in activity and participation outcomes mediated or predicted increases in overall quality of life for the 3 trials with adult populations. Variables included in the statistical models were limited to those
Growth hormone responsiveness: peak stimulated growth hormone levels and other variables in idiopathic short stature (ISS): data from the National Cooperative Growth Study.

Science.gov (United States)

Moore, Wayne V; Dana, Ken; Frane, James; Lippe, Barbara

2008-09-01

In children with idiopathic short stature (ISS), growth hormone (GH) response to a provocative test will be inversely related to the first year response to hGH and be a variable accounting for a degree of responsiveness. Because high levels of GH are a characteristic of GH insensitivity, such as in Laron syndrome, it is possible that a high stimulated GH is associated with a lower first year height velocity among children diagnosed as having ISS. We examined the relationship between the peak stimulated GH levels in 3 ISS groups; GH >10 -40 ng/mL and the first year growth response to rhGH therapy. We also looked at 8 other predictor variables (age, sex, height SDS, height age, body mass index (BMI), bone age, dose, and SDS deficit from target parental height. Multiple regression analysis with the first year height as the dependent variable and peak stimulated GH was the primary endpoint. The predictive value of adding each of the other variables was then assessed. Mean change in height velocity was similar among the three groups, with a maximum difference among the groups of 0.6 cm/yr. There was a small but statistically significant correlation (r=-0.12) between the stimulated GH and first year height velocity. The small correlation between first year growth response and peak GH is not clinically relevant in defining GH resistance. No cut off level by peak GH could be determined to enhance the usefulness of this measure to predict response. Baseline age was the only clinically significant predictor, R-squared, 6.4%. All other variables contributed less than an additional 2% to the R-squared.
Predicting farm-level animal populations using environmental and socioeconomic variables.

Science.gov (United States)

van Andel, Mary; Jewell, Christopher; McKenzie, Joanna; Hollings, Tracey; Robinson, Andrew; Burgman, Mark; Bingham, Paul; Carpenter, Tim

2017-09-15

Accurate information on the geographic distribution of domestic animal populations helps biosecurity authorities to efficiently prepare for and rapidly eradicate exotic diseases, such as Foot and Mouth Disease (FMD). Developing and maintaining sufficiently high-quality data resources is expensive and time consuming. Statistical modelling of population density and distribution has only begun to be applied to farm animal populations, although it is commonly used in wildlife ecology. We developed zero-inflated Poisson regression models in a Bayesian framework using environmental and socioeconomic variables to predict the counts of livestock units (LSUs) and of cattle on spatially referenced farm polygons in a commercially available New Zealand farm database, Agribase. Farm-level counts of cattle and of LSUs varied considerably by region, because of the heterogeneous farming landscape in New Zealand. The amount of high quality pasture per farm was significantly associated with the presence of both cattle and LSUs. Internal model validation (predictive performance) showed that the models were able to predict the count of the animal population on groups of farms that were located in randomly selected 3km zones with a high level of accuracy. Predicting cattle or LSU counts on individual farms was less accurate. Predicted counts were statistically significantly more variable for farms that were contract grazing dry stock, such as replacement dairy heifers and dairy cattle not currently producing milk, compared with other farm types. This analysis presents a way to predict numbers of LSUs and cattle for farms using environmental and socio-economic data. The technique has the potential to be extrapolated to predicting other pastoral based livestock species. Copyright © 2017 Elsevier B.V. All rights reserved.
Regression Analysis to Identify Factors Associated with Household Salt Iodine Content at the Sub-National Level in Bangladesh, India, Ghana and Senegal

Directory of Open Access Journals (Sweden)

Jacky Knowles

2018-04-01

Full Text Available Regression analyses of data from stratified, cluster sample, household iodine surveys in Bangladesh, India, Ghana and Senegal were conducted to identify factors associated with household access to adequately iodised salt. For all countries, in single variable analyses, household salt iodine was significantly different (p < 0.05 between strata (geographic areas with representative data, defined by survey design, and significantly higher (p < 0.05 among households: with better living standard scores, where the respondent knew about iodised salt and/or looked for iodised salt at purchase, using salt bought in a sealed package, or using refined grain salt. Other country-level associations were also found. Multiple variable analyses showed a significant association between salt iodine and strata (p < 0.001 in India, Ghana and Senegal and that salt grain type was significantly associated with estimated iodine content in all countries (p < 0.001. Salt iodine relative to the reference (coarse salt ranged from 1.3 (95% CI 1.2, 1.5 times higher for fine salt in Senegal to 3.6 (95% CI 2.6, 4.9 times higher for washed and 6.5 (95% CI 4.9, 8.8 times higher for refined salt in India. Sub-national data are required to monitor equity of access to adequately iodised salt. Improving household access to refined iodised salt in sealed packaging, would improve iodine intake from household salt in all four countries in this analysis, particularly in areas where there is significant small-scale salt production.

Biostatistics Series Module 6: Correlation and Linear Regression.

Science.gov (United States)

Hazra, Avijit; Gogtay, Nithya

2016-01-01

Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Few crystal balls are crystal clear : eyeballing regression

International Nuclear Information System (INIS)

Wittebrood, R.T.

1998-01-01

The theory of regression and statistical analysis as it applies to reservoir analysis was discussed. It was argued that regression lines are not always the final truth. It was suggested that regression lines and eyeballed lines are often equally accurate. The many conditions that must be fulfilled to calculate a proper regression were discussed. Mentioned among these conditions were the distribution of the data, hidden variables, knowledge of how the data was obtained, the need for causal correlation of the variables, and knowledge of the manner in which the regression results are going to be used. 1 tab., 13 figs
A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

Science.gov (United States)

Smith, Paul F; Ganesh, Siva; Liu, Ping

2013-10-30

Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.
Bayesian modeling of measurement error in predictor variables

NARCIS (Netherlands)

Fox, Gerardus J.A.; Glas, Cornelis A.W.

2003-01-01

It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between
RAWS II: A MULTIPLE REGRESSION ANALYSIS PROGRAM,

Science.gov (United States)

This memorandum gives instructions for the use and operation of a revised version of RAWS, a multiple regression analysis program. The program...of preprocessed data, the directed retention of variable, listing of the matrix of the normal equations and its inverse, and the bypassing of the regression analysis to provide the input variable statistics only. (Author)
Neighborhood social capital and crime victimization: comparison of spatial regression analysis and hierarchical regression analysis.

Science.gov (United States)

Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro

2012-11-01

Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan. Copyright
Yet another look at MIDAS regression

NARCIS (Netherlands)

Ph.H.B.F. Franses (Philip Hans)

2016-01-01

textabstractA MIDAS regression involves a dependent variable observed at a low frequency and independent variables observed at a higher frequency. This paper relates a true high frequency data generating process, where also the dependent variable is observed (hypothetically) at the high frequency,
Sparse reduced-rank regression with covariance estimation

KAUST Repository

Chen, Lisha

2014-12-08

Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
Sparse reduced-rank regression with covariance estimation

KAUST Repository

Chen, Lisha; Huang, Jianhua Z.

2014-01-01

Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
Evaluating an Organizational-Level Occupational Health Intervention in a Combined Regression Discontinuity and Randomized Control Design.

Science.gov (United States)

Sørensen, By Ole H

2016-10-01

Organizational-level occupational health interventions have great potential to improve employees' health and well-being. However, they often compare unfavourably to individual-level interventions. This calls for improving methods for designing, implementing and evaluating organizational interventions. This paper presents and discusses the regression discontinuity design because, like the randomized control trial, it is a strong summative experimental design, but it typically fits organizational-level interventions better. The paper explores advantages and disadvantages of a regression discontinuity design with an embedded randomized control trial. It provides an example from an intervention study focusing on reducing sickness absence in 196 preschools. The paper demonstrates that such a design fits the organizational context, because it allows management to focus on organizations or workgroups with the most salient problems. In addition, organizations may accept an embedded randomized design because the organizations or groups with most salient needs receive obligatory treatment as part of the regression discontinuity design. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A regression modeling approach for studying carbonate system variability in the northern Gulf of Alaska

Science.gov (United States)

Evans, Wiley; Mathis, Jeremy T.; Winsor, Peter; Statscewich, Hank; Whitledge, Terry E.

2013-01-01

northern Gulf of Alaska (GOA) shelf experiences carbonate system variability on seasonal and annual time scales, but little information exists to resolve higher frequency variability in this region. To resolve this variability using platforms-of-opportunity, we present multiple linear regression (MLR) models constructed from hydrographic data collected along the Northeast Pacific Global Ocean Ecosystems Dynamics (GLOBEC) Seward Line. The empirical algorithms predict dissolved inorganic carbon (DIC) and total alkalinity (TA) using observations of nitrate (NO3-), temperature, salinity and pressure from the surface to 500 m, with R2s > 0.97 and RMSE values of 11 µmol kg-1 for DIC and 9 µmol kg-1 for TA. We applied these relationships to high-resolution NO3- data sets collected during a novel 20 h glider flight and a GLOBEC mesoscale SeaSoar survey. Results from the glider flight demonstrated time/space along-isopycnal variability of aragonite saturations (Ωarag) associated with a dicothermal layer (a cold near-surface layer found in high latitude oceans) that rivaled changes seen vertically through the thermocline. The SeaSoar survey captured the uplift to aragonite saturation horizon (depth where Ωarag = 1) shoaled to a previously unseen depth in the northern GOA. This work is similar to recent studies aimed at predicting the carbonate system in continental margin settings, albeit demonstrates that a NO3--based approach can be applied to high-latitude data collected from platforms capable of high-frequency measurements.
Nonparametric instrumental regression with non-convex constraints

International Nuclear Information System (INIS)

Grasmair, M; Scherzer, O; Vanhems, A

2013-01-01

This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition. (paper)
Nonparametric instrumental regression with non-convex constraints

Science.gov (United States)

Grasmair, M.; Scherzer, O.; Vanhems, A.

2013-03-01

This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Principal component regression for crop yield estimation

CERN Document Server

Suryanarayana, T M V

2016-01-01

This book highlights the estimation of crop yield in Central Gujarat, especially with regard to the development of Multiple Regression Models and Principal Component Regression (PCR) models using climatological parameters as independent variables and crop yield as a dependent variable. It subsequently compares the multiple linear regression (MLR) and PCR results, and discusses the significance of PCR for crop yield estimation. In this context, the book also covers Principal Component Analysis (PCA), a statistical procedure used to reduce a number of correlated variables into a smaller number of uncorrelated variables called principal components (PC). This book will be helpful to the students and researchers, starting their works on climate and agriculture, mainly focussing on estimation models. The flow of chapters takes the readers in a smooth path, in understanding climate and weather and impact of climate change, and gradually proceeds towards downscaling techniques and then finally towards development of ...
Regression Analysis by Example. 5th Edition

Science.gov (United States)

Chatterjee, Samprit; Hadi, Ali S.

2012-01-01

Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
Logistic Regression: Concept and Application

Science.gov (United States)

Cokluk, Omay

2010-01-01

The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
BOX-COX REGRESSION METHOD IN TIME SCALING

Directory of Open Access Journals (Sweden)

ATİLLA GÖKTAŞ

2013-06-01

Full Text Available Box-Cox regression method with λj, for j = 1, 2, ..., k, power transformation can be used when dependent variable and error term of the linear regression model do not satisfy the continuity and normality assumptions. The situation obtaining the smallest mean square error when optimum power λj, transformation for j = 1, 2, ..., k, of Y has been discussed. Box-Cox regression method is especially appropriate to adjust existence skewness or heteroscedasticity of error terms for a nonlinear functional relationship between dependent and explanatory variables. In this study, the advantage and disadvantage use of Box-Cox regression method have been discussed in differentiation and differantial analysis of time scale concept.
Modelos de regresión para variables expresadas como una proporción continua Regression models for variables expressed as a continuous proportion

Directory of Open Access Journals (Sweden)

Aarón Salinas-Rodríguez

2006-10-01

the Public Health field. MATERIAL AND METHODS: From the National Reproductive Health Survey performed in 2003, the proportion of individual coverage in the family planning program -proposed in one study carried out in the National Institute of Public Health in Cuernavaca, Morelos, Mexico (2005- was modeled using the Normal, Gamma, Beta and quasi-likelihood regression models. The Akaike Information Criterion (AIC proposed by McQuarrie and Tsai was used to define the best model. Then, using a simulation (Monte Carlo/Markov Chains approach a variable with a Beta distribution was generated to evaluate the behavior of the 4 models while varying the sample size from 100 to 18 000 observations. RESULTS: Results showed that the best statistical option for the analysis of continuous proportions was the Beta regression model, since its assumptions are easily accomplished and because it had the lowest AIC value. Simulation evidenced that while the sample size increases the Gamma, and even more so the quasi-likelihood, models come significantly close to the Beta regression model. CONCLUSIONS: The use of parametric Beta regression is highly recommended to model continuous proportions and the normal model should be avoided. If the sample size is large enough, the use of quasi-likelihood model represents a good alternative.
Association among retinol-binding protein 4, small dense LDL cholesterol and oxidized LDL levels in dyslipidemia subjects.

Science.gov (United States)

Wu, Jia; Shi, Yong-hui; Niu, Dong-mei; Li, Han-qing; Zhang, Chun-ni; Wang, Jun-jun

2012-06-01

To investigate retinol-binding protein 4 (RBP4), small dense low-density lipoprotein cholesterol (sdLDL-C) and oxidized low-density lipoprotein (ox-LDL) levels and their associations in dyslipidemia subjects. We determined RBP4, sdLDL-C, ox-LDL levels in 150 various dyslipidemia subjects and 50 controls. The correlation analysis and multiple linear regression analysis were performed. The RBP4, sdLDL-C and ox-LDL levels were found increased in various dyslipidemia subjects. The sdLDL-C levels were positively correlated with RBP4 (r=0.273, P=0.001) and ox-LDL (r=0.273, P=0.001). RBP4 levels were also correlated with ox-LDL (r=0.167, P=0.043). The multiple regression analysis showed that only sdLDL-C was a significant independent predictor for RBP4 (β coefficient=0.219, P=0.009; adjusted R(2)=0.041) and ox-LDL (β coefficient=0.253, P=0.003; adjusted R(2)=0.057) levels, respectively. The independent associations of sdLDL-C with RBP4 and ox-LDL were observed in dyslipidemia subjects. RBP4 may play an important role in lipid metabolism of atherosclerosis, particularly in formation of sdLDL. Copyright © 2012 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression

Science.gov (United States)

Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.

2013-02-01

Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local

Heterozygosity level and its relationship with genetic variability mechanisms in beans

Directory of Open Access Journals (Sweden)

Rita Carolina de Melo

Full Text Available ABSTRACT Heterozygosity is an extremely important resource in early breeding programs using autogamous plants because it is usually associated with the presence of genetic variability. Induced mutation and artificial hybridization can increase distinctly the proportion of loci in heterozygosis. This study aimed to compare segregating and mutant populations and relate the mechanisms used to generate variability with their respective heterozygosity levels tested. The treatments mutant populations (M2, M3, M4, M5, M6 and M7, segregating populations (F4, F5 and F6 and lines (BRS Pérola and IPR Uirapuru were evaluated by multivariate analysis and compared by orthogonal contrasts. The canonical discriminant analysis revealed which response variables contributed to differentiate the treatments assessed. All orthogonal contrasts involving the mutant populations showed significant differences, except the contrast between M2 vs. M3, M4, M5, M6, M7. The orthogonal contrast between the mutant and segregating populations denotes a significant variation in the interest in genetic breeding. The traits stem diameter (1.41 and number of legumes per plant (2.72 showed the highest canonical weight in this contrast. Conversely, number of grains per plant (-3.58 approached the mutant and segregating populations. No significant difference was observed in the linear comparison of means F5 vs. F6. The traits are fixed early in the segregant populations, unlike the mutant populations. Comparatively, induced mutation provides more loci in heterozygosis than artificial hybridization. Selection pressure should vary according to the variability creation mechanism used at the beginning of the breeding program.
Significance tests to determine the direction of effects in linear regression models.

Science.gov (United States)

Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander

2015-02-01

Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice. © 2014 The British Psychological Society.
Principal component regression analysis with SPSS.

Science.gov (United States)

Liu, R X; Kuang, J; Gong, Q; Hou, X L

2003-06-01

The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Indo-Pacific sea level variability during recent decades

Science.gov (United States)

Yamanaka, G.; Tsujino, H.; Nakano, H.; Urakawa, S. L.; Sakamoto, K.

2016-12-01

Decadal variability of sea level in the Indo-Pacific region is investigated using a historical OGCM simulation. The OGCM driven by the atmospheric forcing removing long-term trends clearly exhibits decadal sea level variability in the Pacific Ocean, which is associated with eastern tropical Pacific thermal anomalies. During the period of 1977-1987, the sea level anomalies are positive in the eastern equatorial Pacific and show deviations from a north-south symmetric distribution, with strongly negative anomalies in the western tropical South Pacific. During the period of 1996-2006, in contrast, the sea level anomalies are negative in the eastern equatorial Pacific and show a nearly north-south symmetric pattern, with positive anomalies in both hemispheres. Concurrently, sea level anomalies in the south-eastern Indian Ocean vary with those in the western tropical Pacific. These sea level variations are closely related to large-scale wind fields. Indo-Pacific sea level distributions are basically determined by wind anomalies over the equatorial region as well as wind stress curl anomalies over the off-equatorial region.
Poisson Mixture Regression Models for Heart Disease Prediction.

Science.gov (United States)

Mufudza, Chipo; Erol, Hamza

2016-01-01

Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction

Science.gov (United States)

Erol, Hamza

2016-01-01

Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Area-level poverty and preterm birth risk: A population-based multilevel analysis

Science.gov (United States)

DeFranco, Emily A; Lian, Min; Muglia, Louis A; Schootman, Mario

2008-01-01

Background Preterm birth is a complex disease with etiologic influences from a variety of social, environmental, hormonal, genetic, and other factors. The purpose of this study was to utilize a large population-based birth registry to estimate the independent effect of county-level poverty on preterm birth risk. To accomplish this, we used a multilevel logistic regression approach to account for multiple co-existent individual-level variables and county-level poverty rate. Methods Population-based study utilizing Missouri's birth certificate database (1989–1997). We conducted a multilevel logistic regression analysis to estimate the effect of county-level poverty on PTB risk. Of 634,994 births nested within 115 counties in Missouri, two levels were considered. Individual-level variables included demographics factors, prenatal care, health-related behavioral risk factors, and medical risk factors. The area-level variable included the percentage of the population within each county living below the poverty line (US census data, 1990). Counties were divided into quartiles of poverty; the first quartile (lowest rate of poverty) was the reference group. Results PTB rate of PTB poverty and increased through the 4th quartile (4.9%), p poverty was significantly associated with PTB risk. PTB risk (poverty, adjusted odds ratio (adjOR) 1.18 (95% CI 1.03, 1.35), with a similar effect at earlier gestational ages (birth, above other underlying risk factors. Although the risk increase is modest, it affects a large number of pregnancies. PMID:18793437
Density dependence and climate effects in Rocky Mountain elk: an application of regression with instrumental variables for population time series with sampling error.

Science.gov (United States)

Creel, Scott; Creel, Michael

2009-11-01

1. Sampling error in annual estimates of population size creates two widely recognized problems for the analysis of population growth. First, if sampling error is mistakenly treated as process error, one obtains inflated estimates of the variation in true population trajectories (Staples, Taper & Dennis 2004). Second, treating sampling error as process error is thought to overestimate the importance of density dependence in population growth (Viljugrein et al. 2005; Dennis et al. 2006). 2. In ecology, state-space models are used to account for sampling error when estimating the effects of density and other variables on population growth (Staples et al. 2004; Dennis et al. 2006). In econometrics, regression with instrumental variables is a well-established method that addresses the problem of correlation between regressors and the error term, but requires fewer assumptions than state-space models (Davidson & MacKinnon 1993; Cameron & Trivedi 2005). 3. We used instrumental variables to account for sampling error and fit a generalized linear model to 472 annual observations of population size for 35 Elk Management Units in Montana, from 1928 to 2004. We compared this model with state-space models fit with the likelihood function of Dennis et al. (2006). We discuss the general advantages and disadvantages of each method. Briefly, regression with instrumental variables is valid with fewer distributional assumptions, but state-space models are more efficient when their distributional assumptions are met. 4. Both methods found that population growth was negatively related to population density and winter snow accumulation. Summer rainfall and wolf (Canis lupus) presence had much weaker effects on elk (Cervus elaphus) dynamics [though limitation by wolves is strong in some elk populations with well-established wolf populations (Creel et al. 2007; Creel & Christianson 2008)]. 5. Coupled with predictions for Montana from global and regional climate models, our results
Advanced colorectal neoplasia risk stratification by penalized logistic regression.

Science.gov (United States)

Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F

2016-08-01

Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.
Nonlinear Forecasting With Many Predictors Using Kernel Ridge Regression

DEFF Research Database (Denmark)

Exterkate, Peter; Groenen, Patrick J.F.; Heij, Christiaan

This paper puts forward kernel ridge regression as an approach for forecasting with many predictors that are related nonlinearly to the target variable. In kernel ridge regression, the observed predictor variables are mapped nonlinearly into a high-dimensional space, where estimation of the predi...
Relationship between the curve of Spee and craniofacial variables: A regression analysis.

Science.gov (United States)

Halimi, Abdelali; Benyahia, Hicham; Azeroual, Mohamed-Faouzi; Bahije, Loubna; Zaoui, Fatima

2018-06-01

The aim of this regression analysis was to identify the determining factors, which impact the curve of Spee during its genesis, its therapeutic reconstruction, and its stability, within a continuously evolving craniofacial morphology throughout life. We selected a total of 107 patients, according to the inclusion criteria. A morphological and functional clinical examination was performed for each patient: plaster models, tracing of the curve of Spee, crowding, Angle's classification, overjet and overbite were thus recorded. Then, we made a cephalometric analysis based on the standardized lateral cephalograms. In the sagittal dimension, we measured the values of angles ANB, SNA, SNB, SND, I/i; and the following distances: AoBo, I/NA, i/NB, SE and SL. In the vertical dimension, we measured the values of angles FMA, GoGn/SN, the occlusal plane, and the following distances: SAr, ArD, Ar/Con, Con/Gn, GoPo, HFP, HFA and IF. The statistical analysis was performed using the SPSS software with a significance level of 0.05. Our sample including 107 subjects was composed of 77 female patients (71.3%) and 30 male patients (27.8%) 7 hypodivergent patients (6.5%), 56 hyperdivergent patients (52.3%) and 44 normodivergent patients (41.1%). Patients' mean age was 19.35±5.95 years. The hypodivergent patients presented more pronounced curves of Spee compared to the normodivergent and the hyperdivergent populations; patients in skeletal Class I presented less pronounced curves of Spee compared to patients in skeletal Class II and Class III. These differences were non significant (P>0.05). The curve of Spee was positively and moderately correlated with Angle's classification, overjet, overbite, sellion-articulare distance, and breathing type (P0.05). Seventy five percent (75%) of the hyperdivergent patients with an oral breathing presented an overbite of 3mm, which is quite excessive given the characteristics often admitted for this typology; this parameter could explain the overbite
Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

Science.gov (United States)

Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad

2015-11-01

One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.
Casemix funding for a specialist paediatrics hospital: a hedonic regression approach.

Science.gov (United States)

Bridges, J F; Hanson, R M

2000-01-01

This paper inquires into the effects that Diagnosis Related Groups (DRGs) have had on the ability to explain patient-level costs in a specialist paediatrics hospital. Two hedonic models are estimated using 1996/97 New Children's Hospital (NCH) patient level cost data, one with and one without a casemix index (CMI). The results show that the inclusion of a casemix index as an explanatory variable leads to a better accounting of cost. The full hedonic model is then used to simulate a funding model for the 1997/98 NCH cost data. These costs are highly correlated with the actual costs reported for that year. In addition, univariate regression indicates that there has been inflation in costs in the order of 4.8% between the two years. In conclusion, hedonic analysis can provide valuable evidence for the design of funding models that account for casemix.
Sea-level rise impacts on the temporal and spatial variability of extreme water levels: A case study for St. Peter-Ording, Germany

Science.gov (United States)

Santamaria-Aguilar, S.; Arns, A.; Vafeidis, A. T.

2017-04-01

Both the temporal and spatial variability of storm surge water level (WL) curves are usually not taken into account in flood risk assessments as observational data are often scarce. In addition, sea-level rise (SLR) can further affect the variability of WLs. We analyze the temporal and spatial variability of the WL curve of 75 historical storm surge events that have been numerically simulated for St. Peter-Ording at the German North Sea coast, considering the effects induced by three SLR scenarios (RCP 4.5, RCP 8.5, and a RCP 8.5 high end scenario). We assess potential impacts of these scenarios on two parameters related to flooding: overflow volumes and fullness. Our results indicate that due to both the temporal and spatial variability of those events the resulting overflow volume can be two or even three times greater. We observe a steepening of the WL curve with an increase of the tidal range under the three SLR scenarios, although SLR induced effects are relatively higher for the RCP 4.5. The steepening of the WL curve with SLR produces a reduction of the fullness, but the changes in overflow volumes also depend on the magnitude of the storm surge event.
Tutorial on Using Regression Models with Count Outcomes Using R

Directory of Open Access Journals (Sweden)

A. Alexander Beaujean

2016-02-01

Full Text Available Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix.
Better Autologistic Regression

Directory of Open Access Journals (Sweden)

Mark A. Wolters

2017-11-01

Full Text Available Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent. Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding—the two numbers used to represent the two possible states of the variables—might differ. Common coding choices are (zero, one and (minus one, plus one. Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modeling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.
Genetic variability, partial regression, Co-heritability studies and their implication in selection of high yielding potato gen

International Nuclear Information System (INIS)

Iqbal, Z.M.; Khan, S.A.

2003-01-01

Partial regression coefficient, genotypic and phenotypic variabilities, heritability co-heritability and genetic advance were studied in 15 Potato varieties of exotic and local origin. Both genotypic and phenotypic coefficients of variations were high for scab and rhizoctonia incidence percentage. Significant partial regression coefficient for emergence percentage indicated its relative importance in tuber yield. High heritability (broadsense) estimates coupled with high genetic advance for plant height, number of stems per plant and scab percentage revealed substantial contribution of additive genetic variance in the expression of these traits. Hence, the selection based on these characters could play a significant role in their improvement the dominance and epistatic variance was more important for character expression of yield ha/sup -1/, emergence and rhizoctonia percentage. This phenomenon is mainly due to the accumulative effects of low heritability and low to moderate genetic advance. The high co-heritability coupled with negative genotypic and phenotypic covariance revealed that selection of varieties having low scab and rhizoctonia percentage resulted in more potato yield. (author)
Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

Science.gov (United States)

Gorgees, HazimMansoor; Mahdi, FatimahAssim

2018-05-01

This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.
Implicit collinearity effect in linear regression: Application to basal ...

African Journals Online (AJOL)

Collinearity of predictor variables is a severe problem in the least square regression analysis. It contributes to the instability of regression coefficients and leads to a wrong prediction accuracy. Despite these problems, studies are conducted with a large number of observed and derived variables linked with a response ...
Regression assumptions in clinical psychology research practice-a systematic review of common misconceptions.

Science.gov (United States)

Ernst, Anja F; Albers, Casper J

2017-01-01

Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking.

Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

Science.gov (United States)

Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

2017-12-01

The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
Radon in Rented Accommodation and Variables Determining Its Level

DEFF Research Database (Denmark)

Rasmussen, Torben Valdbjørn

2017-01-01

Indoor radon levels were measured in 221 homes in rented accommodation. In addition, buildings were registered for a series of variables describing building characteristics and used materials. The mean year value of the indoor radon level was 30.7 (1~250) Bq/m3. The indoor radon level exceeded 100...... Bq/m3 in 5.9% of the homes. Of the investigated variables, only homes in single-family terraced houses, were statistically significant. Approx. 75% of homes exceeding 100 Bq/m3 indoor radon level had levels between 100 and 200 Bq/m3 and 25% had indoor radon levels exceeding 200 Bq/m3. Significant...... differences in indoor radon levels were found in homes located in multi-occupant houses. Additionally, the risk of indoor radon levels exceeding 100 Bq/m3 in homes in multi-occupant houses was found to be very low, but the risk was the highest on the ground floor in a building constructed with slab on ground....
Comparison of Classical Linear Regression and Orthogonal Regression According to the Sum of Squares Perpendicular Distances

OpenAIRE

KELEŞ, Taliha; ALTUN, Murat

2016-01-01

Regression analysis is a statistical technique for investigating and modeling the relationship between variables. The purpose of this study was the trivial presentation of the equation for orthogonal regression (OR) and the comparison of classical linear regression (CLR) and OR techniques with respect to the sum of squared perpendicular distances. For that purpose, the analyses were shown by an example. It was found that the sum of squared perpendicular distances of OR is smaller. Thus, it wa...
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression

Science.gov (United States)

Khikmah, L.; Wijayanto, H.; Syafitri, U. D.

2017-04-01

The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
Individual relocation decisions after tornadoes: a multi-level analysis.

Science.gov (United States)

Cong, Zhen; Nejat, Ali; Liang, Daan; Pei, Yaolin; Javid, Roxana J

2018-04-01

This study examines how multi-level factors affected individuals' relocation decisions after EF4 and EF5 (Enhanced Fujita Tornado Intensity Scale) tornadoes struck the United States in 2013. A telephone survey was conducted with 536 respondents, including oversampled older adults, one year after these two disaster events. Respondents' addresses were used to associate individual information with block group-level variables recorded by the American Community Survey. Logistic regression revealed that residential damage and homeownership are important predictors of relocation. There was also significant interaction between these two variables, indicating less difference between homeowners and renters at higher damage levels. Homeownership diminished the likelihood of relocation among younger respondents. Random effects logistic regression found that the percentage of homeownership and of higher income households in the community buffered the effect of damage on relocation; the percentage of older adults reduced the likelihood of this group relocating. The findings are assessed from the standpoint of age difference, policy implications, and social capital and vulnerability. © 2018 The Author(s). Disasters © Overseas Development Institute, 2018.
[Logistic regression model of noninvasive prediction for portal hypertensive gastropathy in patients with hepatitis B associated cirrhosis].

Science.gov (United States)

Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo

2015-05-12

To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
Influence of school-level and family-level variables on Chinese college students' aggression.

Science.gov (United States)

Zhou, Jiawei; Yang, Jiarun; Yu, Yunmiao; Wang, Lin; Han, Dong; Zhu, Xiongzhao; He, Jincai; Qiu, Xiaohui; Yang, Xiuxian; Qiao, Zhengxue; Sui, Hong; Yang, Yanjie

2017-08-01

With the frequent occurrence of campus violence, scholars have devoted increasing attention to college students' aggression. This study aims to estimate the prevalence of aggression in Chinese university students and identify factors that could influence their aggression. We can thus find methods to reduce the incidence of college students' aggression in the future. A multi-stage stratified sampling procedure was used to select university students (N = 4565) aged 16-25 years in Harbin. The Aggression Questionnaire, the Adolescent Self-Rating Life Events Checklist and the Social Support Revalued Scale were used to collect data. Females reported lower levels of aggression than males (p aggression, and the model was highly significant (R 2 = .233, Ad R 2 = .230, p aggression is affected by gender, family-level and school-level variables. Aggression scores are significantly correlated with not only family-level or school-level variables independently, but their combination as well. We find that the risk factors for aggression include a dissatisfying profession, higher levels of study pressure, poor parental relationships, poor interpersonal relationships, the presence of siblings, punishment, health maladjustment, less subjective support, and lower levels of utilization of social support.
Detecting overdispersion in count data: A zero-inflated Poisson regression analysis

Science.gov (United States)

Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Nor, Maria Elena; Mohamed, Maryati; Ismail, Norradihah

2017-09-01

This study focusing on analysing count data of butterflies communities in Jasin, Melaka. In analysing count dependent variable, the Poisson regression model has been known as a benchmark model for regression analysis. Continuing from the previous literature that used Poisson regression analysis, this study comprising the used of zero-inflated Poisson (ZIP) regression analysis to gain acute precision on analysing the count data of butterfly communities in Jasin, Melaka. On the other hands, Poisson regression should be abandoned in the favour of count data models, which are capable of taking into account the extra zeros explicitly. By far, one of the most popular models include ZIP regression model. The data of butterfly communities which had been called as the number of subjects in this study had been taken in Jasin, Melaka and consisted of 131 number of subjects visits Jasin, Melaka. Since the researchers are considering the number of subjects, this data set consists of five families of butterfly and represent the five variables involve in the analysis which are the types of subjects. Besides, the analysis of ZIP used the SAS procedure of overdispersion in analysing zeros value and the main purpose of continuing the previous study is to compare which models would be better than when exists zero values for the observation of the count data. The analysis used AIC, BIC and Voung test of 5% level significance in order to achieve the objectives. The finding indicates that there is a presence of over-dispersion in analysing zero value. The ZIP regression model is better than Poisson regression model when zero values exist.
Uncertainties in Future Regional Sea Level Trends: How to Deal with the Internal Climate Variability?

Science.gov (United States)

Becker, M.; Karpytchev, M.; Hu, A.; Deser, C.; Lennartz-Sassinek, S.

2017-12-01

Today, the Climate models (CM) are the main tools for forecasting sea level rise (SLR) at global and regional scales. The CM forecasts are accompanied by inherent uncertainties. Understanding and reducing these uncertainties is becoming a matter of increasing urgency in order to provide robust estimates of SLR impact on coastal societies, which need sustainable choices of climate adaptation strategy. These CM uncertainties are linked to structural model formulation, initial conditions, emission scenario and internal variability. The internal variability is due to complex non-linear interactions within the Earth Climate System and can induce diverse quasi-periodic oscillatory modes and long-term persistences. To quantify the effects of internal variability, most studies used multi-model ensembles or sea level projections from a single model ran with perturbed initial conditions. However, large ensembles are not generally available, or too small, and computationally expensive. In this study, we use a power-law scaling of sea level fluctuations, as observed in many other geophysical signals and natural systems, which can be used to characterize the internal climate variability. From this specific statistical framework, we (1) use the pre-industrial control run of the National Center for Atmospheric Research Community Climate System Model (NCAR-CCSM) to test the robustness of the power-law scaling hypothesis; (2) employ the power-law statistics as a tool for assessing the spread of regional sea level projections due to the internal climate variability for the 21st century NCAR-CCSM; (3) compare the uncertainties in predicted sea level changes obtained from a NCAR-CCSM multi-member ensemble simulations with estimates derived for power-law processes, and (4) explore the sensitivity of spatial patterns of the internal variability and its effects on regional sea level projections.
Logistic regression applied to natural hazards: rare event logistic regression with replications

Science.gov (United States)

Guns, M.; Vanacker, V.

2012-06-01

Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Linear regression in astronomy. I

Science.gov (United States)

Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

1990-01-01

Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
Stochastic variability in stress, sleep duration, and sleep quality across the distribution of body mass index: insights from quantile regression.

Science.gov (United States)

Yang, Tse-Chuan; Matthews, Stephen A; Chen, Vivian Y-J

2014-04-01

Obesity has become a problem in the USA and identifying modifiable factors at the individual level may help to address this public health concern. A burgeoning literature has suggested that sleep and stress may be associated with obesity; however, little is know about whether these two factors moderate each other and even less is known about whether their impacts on obesity differ by gender. This study investigates whether sleep and stress are associated with body mass index (BMI) respectively, explores whether the combination of stress and sleep is also related to BMI, and demonstrates how these associations vary across the distribution of BMI values. We analyze the data from 3,318 men and 6,689 women in the Philadelphia area using quantile regression (QR) to evaluate the relationships between sleep, stress, and obesity by gender. Our substantive findings include: (1) high and/or extreme stress were related to roughly an increase of 1.2 in BMI after accounting for other covariates; (2) the pathways linking sleep and BMI differed by gender, with BMI for men increasing by 0.77-1 units with reduced sleep duration and BMI for women declining by 0.12 unit with 1 unit increase in sleep quality; (3) stress- and sleep-related variables were confounded, but there was little evidence for moderation between these two; (4) the QR results demonstrate that the association between high and/or extreme stress to BMI varied stochastically across the distribution of BMI values, with an upward trend, suggesting that stress played a more important role among adults with higher BMI (i.e., BMI > 26 for both genders); and (5) the QR plots of sleep-related variables show similar patterns, with stronger effects on BMI at the upper end of BMI distribution. Our findings suggested that sleep and stress were two seemingly independent predictors for BMI and their relationships with BMI were not constant across the BMI distribution.
Robustness of observation-based decadal sea level variability in the Indo-Pacific Ocean

Science.gov (United States)

Nidheesh, A. G.; Lengaigne, M.; Vialard, J.; Izumo, T.; Unnikrishnan, A. S.; Meyssignac, B.; Hamlington, B.; de Boyer Montegut, C.

2017-07-01

We examine the consistency of Indo-Pacific decadal sea level variability in 10 gridded, observation-based sea level products for the 1960-2010 period. Decadal sea level variations are robust in the Pacific, with more than 50% of variance explained by decadal modulation of two flavors of El Niño-Southern Oscillation (classical ENSO and Modoki). Amplitude of decadal sea level variability is weaker in the Indian Ocean than in the Pacific. All data sets indicate a transmission of decadal sea level signals from the western Pacific to the northwest Australian coast through the Indonesian throughflow. The southern tropical Indian Ocean sea level variability is associated with decadal modulations of ENSO in reconstructions but not in reanalyses or in situ data set. The Pacific-independent Indian Ocean decadal sea level variability is not robust but tends to be maximum in the southwestern tropical Indian Ocean. The inconsistency of Indian Ocean decadal variability across the sea level products calls for caution in making definitive conclusions on decadal sea level variability in this basin.
How to regress and predict in a Bland-Altman plot? Review and contribution based on tolerance intervals and correlated-errors-in-variables models.

Science.gov (United States)

Francq, Bernard G; Govaerts, Bernadette

2016-06-30

Two main methodologies for assessing equivalence in method-comparison studies are presented separately in the literature. The first one is the well-known and widely applied Bland-Altman approach with its agreement intervals, where two methods are considered interchangeable if their differences are not clinically significant. The second approach is based on errors-in-variables regression in a classical (X,Y) plot and focuses on confidence intervals, whereby two methods are considered equivalent when providing similar measures notwithstanding the random measurement errors. This paper reconciles these two methodologies and shows their similarities and differences using both real data and simulations. A new consistent correlated-errors-in-variables regression is introduced as the errors are shown to be correlated in the Bland-Altman plot. Indeed, the coverage probabilities collapse and the biases soar when this correlation is ignored. Novel tolerance intervals are compared with agreement intervals with or without replicated data, and novel predictive intervals are introduced to predict a single measure in an (X,Y) plot or in a Bland-Atman plot with excellent coverage probabilities. We conclude that the (correlated)-errors-in-variables regressions should not be avoided in method comparison studies, although the Bland-Altman approach is usually applied to avert their complexity. We argue that tolerance or predictive intervals are better alternatives than agreement intervals, and we provide guidelines for practitioners regarding method comparison studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Prediction accuracy and stability of regression with optimal scaling transformations

NARCIS (Netherlands)

Kooij, van der Anita J.

2007-01-01

The central topic of this thesis is the CATREG approach to nonlinear regression. This approach finds optimal quantifications for categorical variables and/or nonlinear transformations for numerical variables in regression analysis. (CATREG is implemented in SPSS Categories by the author of the
Poisson regression for modeling count and frequency outcomes in trauma research.

Science.gov (United States)

Gagnon, David R; Doron-LaMarca, Susan; Bell, Margret; O'Farrell, Timothy J; Taft, Casey T

2008-10-01

The authors describe how the Poisson regression method for analyzing count or frequency outcome variables can be applied in trauma studies. The outcome of interest in trauma research may represent a count of the number of incidents of behavior occurring in a given time interval, such as acts of physical aggression or substance abuse. Traditional linear regression approaches assume a normally distributed outcome variable with equal variances over the range of predictor variables, and may not be optimal for modeling count outcomes. An application of Poisson regression is presented using data from a study of intimate partner aggression among male patients in an alcohol treatment program and their female partners. Results of Poisson regression and linear regression models are compared.
HIV-, HCV-, and co-infections and associated risk factors among drug users in southwestern China: a township-level ecological study incorporating spatial regression.

Directory of Open Access Journals (Sweden)

Yi-Biao Zhou

Full Text Available BACKGROUND: The human immunodeficiency virus (HIV and hepatitis C virus (HCV are major public health problems. Many studies have been performed to investigate the association between demographic and behavioral factors and HIV or HCV infection. However, some of the results of these studies have been in conflict. METHODOLOGY/PRINCIPAL FINDINGS: The data of all entrants in the 11 national methadone clinics in the Yi Autonomous Prefecture from March 2004 to December 2012 were collected from the national database. Several spatial regression models were used to analyze specific community characteristics associated with the prevalence of HIV and HCV infection at the township level. The study enrolled 6,417 adult patients. The prevalence of HIV infection, HCV infection and co-infection was 25.4%, 30.9%, and 11.0%, respectively. Prevalence exhibited stark geographical variations in the area studied. The four regression models showed Yi ethnicity to be associated with both the prevalence of HIV and of HIV/HCV co-infection. The male drug users in some northwestern counties had greater odds of being infected with HIV than female drug users, but the opposite was observed in some eastern counties. The 'being in drug rehabilitation variable was found to be positively associated with prevalence of HCV infection in some southern townships, however, it was found to be negatively associated with it in some northern townships. CONCLUSIONS/SIGNIFICANCE: The spatial modeling creates better representations of data such that public health interventions must focus on areas with high frequency of HIV/HCV to prevent further transmission of both HIV and HCV.
Exploring factors associated with traumatic dental injuries in preschool children: a Poisson regression analysis.

Science.gov (United States)

Feldens, Carlos Alberto; Kramer, Paulo Floriani; Ferreira, Simone Helena; Spiguel, Mônica Hermann; Marquezan, Marcela

2010-04-01

This cross-sectional study aimed to investigate the factors associated with dental trauma in preschool children using Poisson regression analysis with robust variance. The study population comprised 888 children aged 3- to 5-year-old attending public nurseries in Canoas, southern Brazil. Questionnaires assessing information related to the independent variables (age, gender, race, mother's educational level and family income) were completed by the parents. Clinical examinations were carried out by five trained examiners in order to assess traumatic dental injuries (TDI) according to Andreasen's classification. One of the five examiners was calibrated to assess orthodontic characteristics (open bite and overjet). Multivariable Poisson regression analysis with robust variance was used to determine the factors associated with dental trauma as well as the strengths of association. Traditional logistic regression was also performed in order to compare the estimates obtained by both methods of statistical analysis. 36.4% (323/888) of the children suffered dental trauma and there was no difference in prevalence rates from 3 to 5 years of age. Poisson regression analysis showed that the probability of the outcome was almost 30% higher for children whose mothers had more than 8 years of education (Prevalence Ratio = 1.28; 95% CI = 1.03-1.60) and 63% higher for children with an overjet greater than 2 mm (Prevalence Ratio = 1.63; 95% CI = 1.31-2.03). Odds ratios clearly overestimated the size of the effect when compared with prevalence ratios. These findings indicate the need for preventive orientation regarding TDI, in order to educate parents and caregivers about supervising infants, particularly those with increased overjet and whose mothers have a higher level of education. Poisson regression with robust variance represents a better alternative than logistic regression to estimate the risk of dental trauma in preschool children.
Regression assumptions in clinical psychology research practice—a systematic review of common misconceptions

Science.gov (United States)

Ernst, Anja F.

2017-01-01

Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking. PMID:28533971
Regression assumptions in clinical psychology research practice—a systematic review of common misconceptions

Directory of Open Access Journals (Sweden)

Anja F. Ernst

2017-05-01

Full Text Available Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking.

Stochastic development regression using method of moments

DEFF Research Database (Denmark)

Kühnel, Line; Sommer, Stefan Horst

2017-01-01

This paper considers the estimation problem arising when inferring parameters in the stochastic development regression model for manifold valued non-linear data. Stochastic development regression captures the relation between manifold-valued response and Euclidean covariate variables using...... the stochastic development construction. It is thereby able to incorporate several covariate variables and random effects. The model is intrinsically defined using the connection of the manifold, and the use of stochastic development avoids linearizing the geometry. We propose to infer parameters using...... the Method of Moments procedure that matches known constraints on moments of the observations conditional on the latent variables. The performance of the model is investigated in a simulation example using data on finite dimensional landmark manifolds....
Model-based Quantile Regression for Discrete Data

KAUST Repository

Padellini, Tullia

2018-04-10

Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution\\'s parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.
Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method

Science.gov (United States)

Prahutama, Alan; Sudarno

2018-05-01

The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).
Logistic regression applied to natural hazards: rare event logistic regression with replications

Directory of Open Access Journals (Sweden)

M. Guns

2012-06-01

Full Text Available Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Prediction of hearing outcomes by multiple regression analysis in patients with idiopathic sudden sensorineural hearing loss.

Science.gov (United States)

Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki

2014-12-01

This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Estimating water equivalent snow depth from related meteorological variables

International Nuclear Information System (INIS)

Steyaert, L.T.; LeDuc, S.K.; Strommen, N.D.; Nicodemus, M.L.; Guttman, N.B.

1980-05-01

Engineering design must take into consideration natural loads and stresses caused by meteorological elements, such as, wind, snow, precipitation and temperature. The purpose of this study was to determine a relationship of water equivalent snow depth measurements to meteorological variables. Several predictor models were evaluated for use in estimating water equivalent values. These models include linear regression, principal component regression, and non-linear regression models. Linear, non-linear and Scandanavian models are used to generate annual water equivalent estimates for approximately 1100 cooperative data stations where predictor variables are available, but which have no water equivalent measurements. These estimates are used to develop probability estimates of snow load for each station. Map analyses for 3 probability levels are presented
Micro-macro multilevel latent class models with multiple discrete individual-level variables

NARCIS (Netherlands)

Bennink, M.; Croon, M.A.; Kroon, B.; Vermunt, J.K.

2016-01-01

An existing micro-macro method for a single individual-level variable is extended to the multivariate situation by presenting two multilevel latent class models in which multiple discrete individual-level variables are used to explain a group-level outcome. As in the univariate case, the
10 km running performance predicted by a multiple linear regression model with allometrically adjusted variables.

Science.gov (United States)

Abad, Cesar C C; Barros, Ronaldo V; Bertuzzi, Romulo; Gagliardi, João F L; Lima-Silva, Adriano E; Lambert, Mike I; Pires, Flavio O

2016-06-01

The aim of this study was to verify the power of VO 2max , peak treadmill running velocity (PTV), and running economy (RE), unadjusted or allometrically adjusted, in predicting 10 km running performance. Eighteen male endurance runners performed: 1) an incremental test to exhaustion to determine VO 2max and PTV; 2) a constant submaximal run at 12 km·h -1 on an outdoor track for RE determination; and 3) a 10 km running race. Unadjusted (VO 2max , PTV and RE) and adjusted variables (VO 2max 0.72 , PTV 0.72 and RE 0.60 ) were investigated through independent multiple regression models to predict 10 km running race time. There were no significant correlations between 10 km running time and either the adjusted or unadjusted VO 2max . Significant correlations (p 0.84 and power > 0.88. The allometrically adjusted predictive model was composed of PTV 0.72 and RE 0.60 and explained 83% of the variance in 10 km running time with a standard error of the estimate (SEE) of 1.5 min. The unadjusted model composed of a single PVT accounted for 72% of the variance in 10 km running time (SEE of 1.9 min). Both regression models provided powerful estimates of 10 km running time; however, the unadjusted PTV may provide an uncomplicated estimation.
Linear regression and the normality assumption.

Science.gov (United States)

Schmidt, Amand F; Finan, Chris

2017-12-16

Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates. Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient. Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings. Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations. Copyright © 2017 Elsevier Inc. All rights reserved.
Calculating the true level of predictors significance when carrying out the procedure of regression equation specification

Directory of Open Access Journals (Sweden)

Nikita A. Moiseev

2017-01-01

Full Text Available The paper is devoted to a new randomization method that yields unbiased adjustments of p-values for linear regression models predictors by incorporating the number of potential explanatory variables, their variance-covariance matrix and its uncertainty, based on the number of observations. This adjustment helps to control type I errors in scientific studies, significantly decreasing the number of publications that report false relations to be authentic ones. Comparative analysis with such existing methods as Bonferroni correction and Shehata and White adjustments explicitly shows their imperfections, especially in case when the number of observations and the number of potential explanatory variables are approximately equal. Also during the comparative analysis it was shown that when the variance-covariance matrix of a set of potential predictors is diagonal, i.e. the data are independent, the proposed simple correction is the best and easiest way to implement the method to obtain unbiased corrections of traditional p-values. However, in the case of the presence of strongly correlated data, a simple correction overestimates the true pvalues, which can lead to type II errors. It was also found that the corrected p-values depend on the number of observations, the number of potential explanatory variables and the sample variance-covariance matrix. For example, if there are only two potential explanatory variables competing for one position in the regression model, then if they are weakly correlated, the corrected p-value will be lower than when the number of observations is smaller and vice versa; if the data are highly correlated, the case with a larger number of observations will show a lower corrected p-value. With increasing correlation, all corrections, regardless of the number of observations, tend to the original p-value. This phenomenon is easy to explain: as correlation coefficient tends to one, two variables almost linearly depend on each
European Wintertime Windstorms and its Links to Large-Scale Variability Modes

Science.gov (United States)

Befort, D. J.; Wild, S.; Walz, M. A.; Knight, J. R.; Lockwood, J. F.; Thornton, H. E.; Hermanson, L.; Bett, P.; Weisheimer, A.; Leckebusch, G. C.

2017-12-01

Winter storms associated with extreme wind speeds and heavy precipitation are the most costly natural hazard in several European countries. Improved understanding and seasonal forecast skill of winter storms will thus help society, policy-makers and (re-) insurance industry to be better prepared for such events. We firstly assess the ability to represent extra-tropical windstorms over the Northern Hemisphere of three seasonal forecast ensemble suites: ECMWF System3, ECMWF System4 and GloSea5. Our results show significant skill for inter-annual variability of windstorm frequency over parts of Europe in two of these forecast suites (ECMWF-S4 and GloSea5) indicating the potential use of current seasonal forecast systems. In a regression model we further derive windstorm variability using the forecasted NAO from the seasonal model suites thus estimating the suitability of the NAO as the only predictor. We find that the NAO as the main large-scale mode over Europe can explain some of the achieved skill and is therefore an important source of variability in the seasonal models. However, our results show that the regression model fails to reproduce the skill level of the directly forecast windstorm frequency over large areas of central Europe. This suggests that the seasonal models also capture other sources of variability/predictability of windstorms than the NAO. In order to investigate which other large-scale variability modes steer the interannual variability of windstorms we develop a statistical model using a Poisson GLM. We find that the Scandinavian Pattern (SCA) in fact explains a larger amount of variability for Central Europe during the 20th century than the NAO. This statistical model is able to skilfully reproduce the interannual variability of windstorm frequency especially for the British Isles and Central Europe with correlations up to 0.8.
A comparison of the performances of an artificial neural network and a regression model for GFR estimation.

Science.gov (United States)

Liu, Xun; Li, Ning-shan; Lv, Lin-sheng; Huang, Jian-hua; Tang, Hua; Chen, Jin-xia; Ma, Hui-juan; Wu, Xiao-ming; Lou, Tan-qi

2013-12-01

Accurate estimation of glomerular filtration rate (GFR) is important in clinical practice. Current models derived from regression are limited by the imprecision of GFR estimates. We hypothesized that an artificial neural network (ANN) might improve the precision of GFR estimates. A study of diagnostic test accuracy. 1,230 patients with chronic kidney disease were enrolled, including the development cohort (n=581), internal validation cohort (n=278), and external validation cohort (n=371). Estimated GFR (eGFR) using a new ANN model and a new regression model using age, sex, and standardized serum creatinine level derived in the development and internal validation cohort, and the CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration) 2009 creatinine equation. Measured GFR (mGFR). GFR was measured using a diethylenetriaminepentaacetic acid renal dynamic imaging method. Serum creatinine was measured with an enzymatic method traceable to isotope-dilution mass spectrometry. In the external validation cohort, mean mGFR was 49±27 (SD) mL/min/1.73 m2 and biases (median difference between mGFR and eGFR) for the CKD-EPI, new regression, and new ANN models were 0.4, 1.5, and -0.5 mL/min/1.73 m2, respectively (P30% from mGFR) were 50.9%, 77.4%, and 78.7%, respectively (Psource of systematic bias in comparisons of new models to CKD-EPI, and both the derivation and validation cohorts consisted of a group of patients who were referred to the same institution. An ANN model using 3 variables did not perform better than a new regression model. Whether ANN can improve GFR estimation using more variables requires further investigation. Copyright © 2013 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.
The M Word: Multicollinearity in Multiple Regression.

Science.gov (United States)

Morrow-Howell, Nancy

1994-01-01

Notes that existence of substantial correlation between two or more independent variables creates problems of multicollinearity in multiple regression. Discusses multicollinearity problem in social work research in which independent variables are usually intercorrelated. Clarifies problems created by multicollinearity, explains detection of…
Analysis of sparse data in logistic regression in medical research: A newer approach

Directory of Open Access Journals (Sweden)

S Devika

2016-01-01

Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell
Determination of osteoporosis risk factors using a multiple logistic regression model in postmenopausal Turkish women.

Science.gov (United States)

Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal

2005-09-01

To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.
Determining Balıkesir’s Energy Potential Using a Regression Analysis Computer Program

Directory of Open Access Journals (Sweden)

Bedri Yüksel

2014-01-01

Full Text Available Solar power and wind energy are used concurrently during specific periods, while at other times only the more efficient is used, and hybrid systems make this possible. When establishing a hybrid system, the extent to which these two energy sources support each other needs to be taken into account. This paper is a study of the effects of wind speed, insolation levels, and the meteorological parameters of temperature and humidity on the energy potential in Balıkesir, in the Marmara region of Turkey. The relationship between the parameters was studied using a multiple linear regression method. Using a designed-for-purpose computer program, two different regression equations were derived, with wind speed being the dependent variable in the first and insolation levels in the second. The regression equations yielded accurate results. The computer program allowed for the rapid calculation of different acceptance rates. The results of the statistical analysis proved the reliability of the equations. An estimate of identified meteorological parameters and unknown parameters could be produced with a specified precision by using the regression analysis method. The regression equations also worked for the evaluation of energy potential.
Bounded Gaussian process regression

DEFF Research Database (Denmark)

Jensen, Bjørn Sand; Nielsen, Jens Brehm; Larsen, Jan

2013-01-01

We extend the Gaussian process (GP) framework for bounded regression by introducing two bounded likelihood functions that model the noise on the dependent variable explicitly. This is fundamentally different from the implicit noise assumption in the previously suggested warped GP framework. We...... with the proposed explicit noise-model extension....
Air pollution and heart rate variability: effect modification by chronic lead exposure.

Science.gov (United States)

Park, Sung Kyun; O'Neill, Marie S; Vokonas, Pantel S; Sparrow, David; Wright, Robert O; Coull, Brent; Nie, Huiling; Hu, Howard; Schwartz, Joel

2008-01-01

Outdoor air pollution and lead exposure can disturb cardiac autonomic function, but the effects of both these exposures together have not been studied. We examined whether higher cumulative lead exposures, as measured by bone lead, modified cross-sectional associations between air pollution and heart rate variability among 384 elderly men from the Normative Aging Study. We used linear regression, controlling for clinical, demographic, and environmental covariates. We found graded, significant reductions in both high-frequency and low-frequency powers of heart rate variability in relation to ozone and sulfate across the quartiles of tibia lead. Interquartile range increases in ozone and sulfate were associated respectively, with 38% decrease (95% confidence interval = -54.6% to -14.9%) and 22% decrease (-40.4% to 1.6%) in high frequency, and 38% decrease (-51.9% to -20.4%) and 12% decrease (-28.6% to 9.3%) in low frequency, in the highest quartile of tibia lead after controlling for potential confounders. We observed similar but weaker effect modification by tibia lead adjusted for education and cumulative traffic (residuals of the regression of tibia lead on education and cumulative traffic). Patella lead modified only the ozone effect on heart rate variability. People with long-term exposure to higher levels of lead may be more sensitive to cardiac autonomic dysfunction on high air pollution days. Efforts to understand how environmental exposures affect the health of an aging population should consider both current levels of pollution and history of lead exposure as susceptibility factors.
Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis.

Science.gov (United States)

Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

2015-01-01

Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Targeting: Logistic Regression, Special Cases and Extensions

Directory of Open Access Journals (Sweden)

Helmut Schaeben

2014-12-01

Full Text Available Logistic regression is a classical linear model for logit-transformed conditional probabilities of a binary target variable. It recovers the true conditional probabilities if the joint distribution of predictors and the target is of log-linear form. Weights-of-evidence is an ordinary logistic regression with parameters equal to the differences of the weights of evidence if all predictor variables are discrete and conditionally independent given the target variable. The hypothesis of conditional independence can be tested in terms of log-linear models. If the assumption of conditional independence is violated, the application of weights-of-evidence does not only corrupt the predicted conditional probabilities, but also their rank transform. Logistic regression models, including the interaction terms, can account for the lack of conditional independence, appropriate interaction terms compensate exactly for violations of conditional independence. Multilayer artificial neural nets may be seen as nested regression-like models, with some sigmoidal activation function. Most often, the logistic function is used as the activation function. If the net topology, i.e., its control, is sufficiently versatile to mimic interaction terms, artificial neural nets are able to account for violations of conditional independence and yield very similar results. Weights-of-evidence cannot reasonably include interaction terms; subsequent modifications of the weights, as often suggested, cannot emulate the effect of interaction terms.

Using Logistic Regression To Predict the Probability of Debris Flows Occurring in Areas Recently Burned By Wildland Fires

Science.gov (United States)

Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.

2003-01-01

Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity
Identification of Determinants of Sports Skill Level in Badminton Players Using the Multiple Regression Model

Directory of Open Access Journals (Sweden)

Jaworski Janusz

2016-03-01

Full Text Available Purpose. The aim of the study was to evaluate somatic and functional determinants of sports skill level in badminton players at three consecutive stages of training. Methods. The study examined 96 badminton players aged 11 to 19 years. The scope of the study included somatic characteristics, physical abilities and neurosensory abilities. Thirty nine variables were analysed in each athlete. Coefficients of multiple determination were used to evaluate the effect of structural and functional parameters on sports skill level in badminton players. Results. In the group of younger cadets, quality and effectiveness of playing were mostly determined by the level of physical abilities. In the group of cadets, the most important determinants were physical abilities, followed by somatic characteristics. In this group, coordination abilities were also important. In juniors, the most pronounced was a set of the variables that reflect physical abilities. Conclusions. Models of determination of sports skill level are most noticeable in the group of cadets. In all three groups of badminton players, the dominant effect on the quality of playing is due to a set of the variables that determine physical abilities.
Market Designs for High Levels of Variable Generation: Preprint

Energy Technology Data Exchange (ETDEWEB)

Milligan, M.; Holttinen, H.; Kiviluoma, J.; Orths, A.; Lynch, M.; Soder, L.

2014-10-01

Variable renewable generation is increasing in penetration in modern power systems, leading to higher variability in the supply and price of electricity as well as lower average spot prices. This raises new challenges, particularly in ensuring sufficient capacity and flexibility from conventional technologies. Because the fixed costs and lifetimes of electricity generation investments are significant, designing markets and regulations that ensure the efficient integration of renewable generation is a significant challenge. This papers reviews the state of play of market designs for high levels of variable generation in the United States and Europe and considers new developments in both regions.
Establishment of regression dependences. Linear and nonlinear dependences

International Nuclear Information System (INIS)

Onishchenko, A.M.

1994-01-01

The main problems of determination of linear and 19 types of nonlinear regression dependences are completely discussed. It is taken into consideration that total dispersions are the sum of measurement dispersions and parameter variation dispersions themselves. Approaches to all dispersions determination are described. It is shown that the least square fit gives inconsistent estimation for industrial objects and processes. The correction methods by taking into account comparable measurement errors for both variable give an opportunity to obtain consistent estimation for the regression equation parameters. The condition of the correction technique application expediency is given. The technique for determination of nonlinear regression dependences taking into account the dependence form and comparable errors of both variables is described. 6 refs., 1 tab
Area-level poverty and preterm birth risk: A population-based multilevel analysis

Directory of Open Access Journals (Sweden)

Muglia Louis A

2008-09-01

Full Text Available Abstract Background Preterm birth is a complex disease with etiologic influences from a variety of social, environmental, hormonal, genetic, and other factors. The purpose of this study was to utilize a large population-based birth registry to estimate the independent effect of county-level poverty on preterm birth risk. To accomplish this, we used a multilevel logistic regression approach to account for multiple co-existent individual-level variables and county-level poverty rate. Methods Population-based study utilizing Missouri's birth certificate database (1989–1997. We conducted a multilevel logistic regression analysis to estimate the effect of county-level poverty on PTB risk. Of 634,994 births nested within 115 counties in Missouri, two levels were considered. Individual-level variables included demographics factors, prenatal care, health-related behavioral risk factors, and medical risk factors. The area-level variable included the percentage of the population within each county living below the poverty line (US census data, 1990. Counties were divided into quartiles of poverty; the first quartile (lowest rate of poverty was the reference group. Results PTB th quartile (4.9%, p adjOR 1.18 (95% CI 1.03, 1.35, with a similar effect at earlier gestational ages (adjOR 1.27 (95% CI 1.06, 1.52. Conclusion Women residing in socioeconomically deprived areas are at increased risk of preterm birth, above other underlying risk factors. Although the risk increase is modest, it affects a large number of pregnancies.
Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.

Science.gov (United States)

Liu, Cong; Wang, Xujun; Genchev, Georgi Z; Lu, Hui

2017-07-15

New developments in high-throughput genomic technologies have enabled the measurement of diverse types of omics biomarkers in a cost-efficient and clinically-feasible manner. Developing computational methods and tools for analysis and translation of such genomic data into clinically-relevant information is an ongoing and active area of investigation. For example, several studies have utilized an unsupervised learning framework to cluster patients by integrating omics data. Despite such recent advances, predicting cancer prognosis using integrated omics biomarkers remains a challenge. There is also a shortage of computational tools for predicting cancer prognosis by using supervised learning methods. The current standard approach is to fit a Cox regression model by concatenating the different types of omics data in a linear manner, while penalty could be added for feature selection. A more powerful approach, however, would be to incorporate data by considering relationships among omics datatypes. Here we developed two methods: a SKI-Cox method and a wLASSO-Cox method to incorporate the association among different types of omics data. Both methods fit the Cox proportional hazards model and predict a risk score based on mRNA expression profiles. SKI-Cox borrows the information generated by these additional types of omics data to guide variable selection, while wLASSO-Cox incorporates this information as a penalty factor during model fitting. We show that SKI-Cox and wLASSO-Cox models select more true variables than a LASSO-Cox model in simulation studies. We assess the performance of SKI-Cox and wLASSO-Cox using TCGA glioblastoma multiforme and lung adenocarcinoma data. In each case, mRNA expression, methylation, and copy number variation data are integrated to predict the overall survival time of cancer patients. Our methods achieve better performance in predicting patients' survival in glioblastoma and lung adenocarcinoma. Copyright © 2017. Published by Elsevier
Mixed Frequency Data Sampling Regression Models: The R Package midasr

Directory of Open Access Journals (Sweden)

Eric Ghysels

2016-08-01

Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.
How efficient are referral hospitals in Uganda? A data envelopment analysis and tobit regression approach.

Science.gov (United States)

Mujasi, Paschal N; Asbu, Eyob Z; Puig-Junoy, Jaume

2016-07-08

Hospitals represent a significant proportion of health expenditures in Uganda, accounting for about 26 % of total health expenditure. Improving the technical efficiency of hospitals in Uganda can result in large savings which can be devoted to expand access to services and improve quality of care. This paper explores the technical efficiency of referral hospitals in Uganda during the 2012/2013 financial year. This was a cross sectional study using secondary data. Input and output data were obtained from the Uganda Ministry of Health annual health sector performance report for the period July 1, 2012 to June 30, 2013 for the 14 public sector regional referral and 4 large private not for profit hospitals. We assumed an output-oriented model with Variable Returns to Scale to estimate the efficiency score for each hospital using Data Envelopment Analysis (DEA) with STATA13. Using a Tobit model DEA, efficiency scores were regressed against selected institutional and contextual/environmental factors to estimate their impacts on efficiency. The average variable returns to scale (Pure) technical efficiency score was 91.4 % and the average scale efficiency score was 87.1 % while the average constant returns to scale technical efficiency score was 79.4 %. Technically inefficient hospitals could have become more efficient by increasing the outpatient department visits by 45,943; and inpatient days by 31,425 without changing the total number of inputs. Alternatively, they would achieve efficiency by for example transferring the excess 216 medical staff and 454 beds to other levels of the health system without changing the total number of outputs. Tobit regression indicates that significant factors in explaining hospital efficiency are: hospital size (p Uganda.
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days

Energy Technology Data Exchange (ETDEWEB)

Bramer, L. M.; Rounds, J.; Burleyson, C. D.; Fortin, D.; Hathaway, J.; Rice, J.; Kraucunas, I.

2017-11-01

Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.
Analysis of 4th Grade Students' Problem Solving Skills in Terms of Several Variables

Science.gov (United States)

Sungur, Gülcan; Bal, Pervin Nedim

2016-01-01

The aim of this study is to examine if the level of primary school students in solving problems differs according to some demographic variables. The research is descriptive type in the general survey method, it was carried out with quantitative research techniques. The sample of the study consisted of 587 primary school students in Grade 4. The…
Predicting Social Trust with Binary Logistic Regression

Science.gov (United States)

Adwere-Boamah, Joseph; Hufstedler, Shirley

2015-01-01

This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
General Nature of Multicollinearity in Multiple Regression Analysis.

Science.gov (United States)

Liu, Richard

1981-01-01

Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Modeling temporal and spatial variability of traffic-related air pollution: Hourly land use regression models for black carbon

Science.gov (United States)

Dons, Evi; Van Poppel, Martine; Kochan, Bruno; Wets, Geert; Int Panis, Luc

2013-08-01

Land use regression (LUR) modeling is a statistical technique used to determine exposure to air pollutants in epidemiological studies. Time-activity diaries can be combined with LUR models, enabling detailed exposure estimation and limiting exposure misclassification, both in shorter and longer time lags. In this study, the traffic related air pollutant black carbon was measured with μ-aethalometers on a 5-min time base at 63 locations in Flanders, Belgium. The measurements show that hourly concentrations vary between different locations, but also over the day. Furthermore the diurnal pattern is different for street and background locations. This suggests that annual LUR models are not sufficient to capture all the variation. Hourly LUR models for black carbon are developed using different strategies: by means of dummy variables, with dynamic dependent variables and/or with dynamic and static independent variables. The LUR model with 48 dummies (weekday hours and weekend hours) performs not as good as the annual model (explained variance of 0.44 compared to 0.77 in the annual model). The dataset with hourly concentrations of black carbon can be used to recalibrate the annual model, resulting in many of the original explaining variables losing their statistical significance, and certain variables having the wrong direction of effect. Building new independent hourly models, with static or dynamic covariates, is proposed as the best solution to solve these issues. R2 values for hourly LUR models are mostly smaller than the R2 of the annual model, ranging from 0.07 to 0.8. Between 6 a.m. and 10 p.m. on weekdays the R2 approximates the annual model R2. Even though models of consecutive hours are developed independently, similar variables turn out to be significant. Using dynamic covariates instead of static covariates, i.e. hourly traffic intensities and hourly population densities, did not significantly improve the models' performance.
Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

Science.gov (United States)

Song, Chao; Kwan, Mei-Po; Zhu, Jiping

2017-04-08

An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Asymptotics of Multivariate Regression with Consecutively Added Dependent Varibles

NARCIS (Netherlands)

Raats, V.M.; van der Genugten, B.B.; Moors, J.J.A.

2004-01-01

We consider multivariate regression where new dependent variables are consecutively added during the experiment (or in time).So, viewed at the end of the experiment, the number of observations decreases with each added variable. The explanatory variables are observed throughout.In a previous paper
Model building strategy for logistic regression: purposeful selection.

Science.gov (United States)

Zhang, Zhongheng

2016-03-01

Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.
Regression modeling of ground-water flow

Science.gov (United States)

Cooley, R.L.; Naff, R.L.

1985-01-01

Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Censored Hurdle Negative Binomial Regression (Case Study: Neonatorum Tetanus Case in Indonesia)

Science.gov (United States)

Yuli Rusdiana, Riza; Zain, Ismaini; Wulan Purnami, Santi

2017-06-01

Hurdle negative binomial model regression is a method that can be used for discreate dependent variable, excess zero and under- and overdispersion. It uses two parts approach. The first part estimates zero elements from dependent variable is zero hurdle model and the second part estimates not zero elements (non-negative integer) from dependent variable is called truncated negative binomial models. The discrete dependent variable in such cases is censored for some values. The type of censor that will be studied in this research is right censored. This study aims to obtain the parameter estimator hurdle negative binomial regression for right censored dependent variable. In the assessment of parameter estimation methods used Maximum Likelihood Estimator (MLE). Hurdle negative binomial model regression for right censored dependent variable is applied on the number of neonatorum tetanus cases in Indonesia. The type data is count data which contains zero values in some observations and other variety value. This study also aims to obtain the parameter estimator and test statistic censored hurdle negative binomial model. Based on the regression results, the factors that influence neonatorum tetanus case in Indonesia is the percentage of baby health care coverage and neonatal visits.
Background stratified Poisson regression analysis of cohort data.

Science.gov (United States)

Richardson, David B; Langholz, Bryan

2012-03-01

Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
Multiple regression for physiological data analysis: the problem of multicollinearity.

Science.gov (United States)

Slinker, B K; Glantz, S A

1985-07-01

Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.

Instrumental Variables in the Long Run

DEFF Research Database (Denmark)

Casey, Gregory; Klemp, Marc Patrick Brag

2017-01-01

In the study of long-run economic growth, it is common to use historical or geographical variables as instruments for contemporary endogenous regressors. We study the interpretation of these conventional instrumental variable (IV) regressions in a general, yet simple, framework. Our aim...... quantitative implications for the field of long-run economic growth. We also use our framework to examine related empirical techniques. We find that two prominent regression methodologies - using gravity-based instruments for trade and including ancestry-adjusted variables in linear regression models - have...... is to estimate the long-run causal effect of changes in the endogenous explanatory variable. We find that conventional IV regressions generally cannot recover this parameter of interest. To estimate this parameter, therefore, we develop an augmented IV estimator that combines the conventional regression...
The solar forcing on the 7Be-air concentration variability at ground level

International Nuclear Information System (INIS)

Talpos, Simona

2004-01-01

This paper analyses the correlation between the temporal and spatial variability of 7 Be-air concentration at ground level and the amount of precipitation. There were used the measured data from 26 stations distributed on North America, South America, Australia and Antarctica. The variability study was made using EOF and principal components analysis. The presented results show that the variability of 7 Be air concentration at ground level is simultaneously influenced by the solar cycle and some atmospheric processes like precipitation, turbulent transport, advection, etc. The solar forcing on the 7 Be variability at ground level was outlined for time-scales longer than 1 year and can be considered a global phenomenon. The atmospheric processes influence the 7 Be variability for scale shorter than one year and can be considered a local phenomenon. (author)
Prediction of Vitamin D Deficiency Among Tabriz Elderly and Nursing Home Residents Using Stereotype Regression Model

Directory of Open Access Journals (Sweden)

Zohreh Razzaghi

2011-07-01

Full Text Available Objectives: Vitamin D deficiency is one of the most important health problems of any society. It is more common in elderly even in those dwelling in rest homes. By now, several studies have been conducted on vitamin D deficiency using current statistical models. In this study, corresponding proportional odds and stereotype regression methods were used to identify threatening factors related to vitamin D deficiency in elderly living in rest homes and comparing them with those who live out of the mentioned places. Methods & Materials: In this case-control study, there were 140 older persons living in rest homes and 140 ones not dwelling in these centers. In the present study, 25(OHD serum level variable and age, sex, body mass index, duration of exposure to sunlight variables were regarded as response and predictive variables to vitamin D deficiency, respectively. The analyses were carried out using corresponding proportional odds and stereotype regression methods and estimating parameters of these two models. Deviation statistics (AIC was used to evaluate and compare the mentioned methods. Stata.9.1 software was elected to conduct the analyses. Results: Average serum level of 25(OHD was 16.10±16.65 ng/ml and 39.62±24.78 ng/ml in individuals living in rest homes and those not living there, respectively (P=0.001. Prevalence of vitamin D deficiency (less than 20 ng/ml was observed in 75% of members of the group consisting of those living in rest homes and 23.78% of members of another group. Using corresponding proportional odds and stereotype regression methods, age, sex, body mass index, duration of exposure to sunlight variables and whether they are member of rest home were fitted. In both models, variables of group and duration of exposure to sunlight were regarded as meaningful (P<0.001. Stereotype regression model included group variable (odd ratio for a group suffering from severe vitamin D deficiency was 42.85, 95%CI:9.93-185.67 and
A regression approach for Zircaloy-2 in-reactor creep constitutive equations

International Nuclear Information System (INIS)

Yung Liu, Y.; Bement, A.L.

1977-01-01

In this paper the methodology of multiple regressions as applied to Zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor Zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) When there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets. (2) Regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections. Multiple regression analysis performed on a set of carefully selected Zircaloy-2 in-reactor creep data leads to a model which provides excellent correlations for the data. (Auth.)
Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)

Science.gov (United States)

Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul

2018-05-01

The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.
An empirical tool to evaluate the safety of cyclists: Community based, macro-level collision prediction models using negative binomial regression.

Science.gov (United States)

Wei, Feng; Lovegrove, Gordon

2013-12-01

Today, North American governments are more willing to consider compact neighborhoods with increased use of sustainable transportation modes. Bicycling, one of the most effective modes for short trips with distances less than 5km is being encouraged. However, as vulnerable road users (VRUs), cyclists are more likely to be injured when involved in collisions. In order to create a safe road environment for them, evaluating cyclists' road safety at a macro level in a proactive way is necessary. In this paper, different generalized linear regression methods for collision prediction model (CPM) development are reviewed and previous studies on micro-level and macro-level bicycle-related CPMs are summarized. On the basis of insights gained in the exploration stage, this paper also reports on efforts to develop negative binomial models for bicycle-auto collisions at a community-based, macro-level. Data came from the Central Okanagan Regional District (CORD), of British Columbia, Canada. The model results revealed two types of statistical associations between collisions and each explanatory variable: (1) An increase in bicycle-auto collisions is associated with an increase in total lane kilometers (TLKM), bicycle lane kilometers (BLKM), bus stops (BS), traffic signals (SIG), intersection density (INTD), and arterial-local intersection percentage (IALP). (2) A decrease in bicycle collisions was found to be associated with an increase in the number of drive commuters (DRIVE), and in the percentage of drive commuters (DRP). These results support our hypothesis that in North America, with its current low levels of bicycle use (macro-level CPMs. Copyright © 2012. Published by Elsevier Ltd.
Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

Science.gov (United States)

Shen, Chung-Wei; Chen, Yi-Hau

2018-03-13

We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.
Testosterone levels in healthy men correlate negatively with serotonin 4 receptor binding

DEFF Research Database (Denmark)

Perfalk, Erik; Cunha-Bang, Sofi da; Holst, Klaus K

2017-01-01

The serotonergic system integrates sex steroid information and plays a central role in mood and stress regulation, cognition, appetite and sleep. This interplay may be critical for likelihood of developing depressive episodes, at least in a subgroup of sensitive individuals. The serotonin 4...... positron emission tomography in a group of 41 healthy men. We estimated global 5-HT4R binding using a latent variable model framework, which models shared correlation between 5-HT4R across multiple brain regions (hippocampus, amygdala, posterior and anterior cingulate, thalamus, pallidostriatum...... and neocortex). We tested whether testosterone and estradiol predict global 5-HT4R, adjusting for age. We found that testosterone, but not estradiol, correlated negatively with global 5-HT4R levels (p=0.02) suggesting that men with high levels of testosterone have higher cerebral serotonergic tonus. Our...
Survival analysis II: Cox regression

NARCIS (Netherlands)

Stel, Vianda S.; Dekker, Friedo W.; Tripepi, Giovanni; Zoccali, Carmine; Jager, Kitty J.

2011-01-01

In contrast to the Kaplan-Meier method, Cox proportional hazards regression can provide an effect estimate by quantifying the difference in survival between patient groups and can adjust for confounding effects of other variables. The purpose of this article is to explain the basic concepts of the
Childhood Depression: Relation to Adaptive, Clinical and Predictor Variables

Directory of Open Access Journals (Sweden)

Maite Garaigordobil

2017-05-01

Full Text Available The study had two goals: (1 to explore the relations between self-assessed childhood depression and other adaptive and clinical variables (2 to identify predictor variables of childhood depression. Participants were 420 students aged 7–10 years old (53.3% boys, 46.7% girls. Results revealed: (1 positive correlations between depression and clinical maladjustment, school maladjustment, emotional symptoms, internalizing and externalizing problems, problem behaviors, emotional reactivity, and childhood stress; and (2 negative correlations between depression and personal adaptation, global self-concept, social skills, and resilience (sense of competence and affiliation. Linear regression analysis including the global dimensions revealed 4 predictors of childhood depression that explained 50.6% of the variance: high clinical maladjustment, low global self-concept, high level of stress, and poor social skills. However, upon introducing the sub-dimensions, 9 predictor variables emerged that explained 56.4% of the variance: many internalizing problems, low family self-concept, high anxiety, low responsibility, low personal self-assessment, high social stress, few aggressive behaviors toward peers, many health/psychosomatic problems, and external locus of control. The discussion addresses the importance of implementing prevention programs for childhood depression at early ages.
Do high fetal catecholamine levels affect heart rate variability and ...

African Journals Online (AJOL)

Objectives. To deternrine the relationship between Umbilical arterial catecholamine levels and fetal heart rate variability and meconium passage. Study design. A prospective descriptive study was perfonned. Umbilical artery catecholamine levels were measured in 55 newborns and correlated with fetal heart rate before ...
Multi-step polynomial regression method to model and forecast malaria incidence.

Directory of Open Access Journals (Sweden)

Chandrajit Chatterjee

Full Text Available Malaria is one of the most severe problems faced by the world even today. Understanding the causative factors such as age, sex, social factors, environmental variability etc. as well as underlying transmission dynamics of the disease is important for epidemiological research on malaria and its eradication. Thus, development of suitable modeling approach and methodology, based on the available data on the incidence of the disease and other related factors is of utmost importance. In this study, we developed a simple non-linear regression methodology in modeling and forecasting malaria incidence in Chennai city, India, and predicted future disease incidence with high confidence level. We considered three types of data to develop the regression methodology: a longer time series data of Slide Positivity Rates (SPR of malaria; a smaller time series data (deaths due to Plasmodium vivax of one year; and spatial data (zonal distribution of P. vivax deaths for the city along with the climatic factors, population and previous incidence of the disease. We performed variable selection by simple correlation study, identification of the initial relationship between variables through non-linear curve fitting and used multi-step methods for induction of variables in the non-linear regression analysis along with applied Gauss-Markov models, and ANOVA for testing the prediction, validity and constructing the confidence intervals. The results execute the applicability of our method for different types of data, the autoregressive nature of forecasting, and show high prediction power for both SPR and P. vivax deaths, where the one-lag SPR values plays an influential role and proves useful for better prediction. Different climatic factors are identified as playing crucial role on shaping the disease curve. Further, disease incidence at zonal level and the effect of causative factors on different zonal clusters indicate the pattern of malaria prevalence in the city
Individual- and contextual-level factors associated with client-initiated HIV testing

Directory of Open Access Journals (Sweden)

Claudia Renata dos Santos Barros

Full Text Available ABSTRACT: Background: Knowing the reasons for seeking HIV testing is central for HIV prevention. Despite the availability of free HIV counseling and testing in Brazil, coverage remains lacking. Methods: Survey of 4,760 respondents from urban areas was analyzed. Individual-level variables included sociodemographic characteristics; sexual and reproductive health; HIV/AIDS treatment knowledge and beliefs; being personally acquainted with a person with HIV/AIDS; and holding discriminatory ideas about people living with HIV. Contextual-level variables included the Human Development Index (HDI of the municipality; prevalence of HIV/AIDS; and availability of local HIV counseling and testing (CT services. The dependent variable was client-initiated testing. Multilevel Poisson regression models with random intercepts were used to assess associated factors. Results: Common individual-level variables among men and women included being personally acquainted with a person with HIV/AIDS and age; whereas discordant variables included those related to sexual and reproductive health and experiencing sexual violence. Among contextual-level factors, availability of CT services was variable associated with client-initiated testing among women only. The contextual-level variable “HDI of the municipality” was associated with client-initiated testing among women. Conclusion: Thus, marked gender differences in HIV testing were found, with a lack of HIV testing among married women and heterosexual men, groups that do not spontaneously seek testing.
Physical activity levels of community-dwelling older adults are influenced by winter weather variables.

Science.gov (United States)

Jones, G R; Brandon, C; Gill, D P

2017-07-01

Winter weather conditions may negatively influence participation of older adults in daily physical activity (PA). Assess the influence of winter meteorological variables, day-time peak ambient temperature, windchill, humidity, and snow accumulation on the ground to accelerometer measured PA values in older adults. 50 community-dwelling older adults (77.4±4.7yrs; range 71-89; 12 females) living in Southwestern Ontario (Latitude 42.9°N Longitude 81.2° W) Canada, wore a waist-borne accelerometer during active waking hours (12h) for 7 consecutive days between February and April 2007. Hourly temperature, windchill, humidity, and snowfall accumulation were obtained from meteorological records and time locked to hourly accelerometer PA values. Regression analysis revealed significant relationships between time of day, ambient daytime high temperature and a humidity for participation in PA. Windchill temperature added no additional influence over PA acclamation already influenced by ambient day-time temperature and the observed variability in PA patterns relative to snow accumulation over the study period was too great to warrant its inclusion in the model. Most PA was completed in the morning hours and increased as the winter month's transitioned to spring (February through April). An equation was developed to adjust for winter weather conditions using temperature, humidity and time of day. Accurate PA assessment during the winter months must account for the ambient daytime high temperatures, humidity, and time of day. These older adults were more physically active during the morning hours and became more active as the winter season transitioned to spring. Copyright © 2017 Elsevier B.V. All rights reserved.
Significance testing in ridge regression for genetic data

Directory of Open Access Journals (Sweden)

De Iorio Maria

2011-09-01

Full Text Available Abstract Background Technological developments have increased the feasibility of large scale genetic association studies. Densely typed genetic markers are obtained using SNP arrays, next-generation sequencing technologies and imputation. However, SNPs typed using these methods can be highly correlated due to linkage disequilibrium among them, and standard multiple regression techniques fail with these data sets due to their high dimensionality and correlation structure. There has been increasing interest in using penalised regression in the analysis of high dimensional data. Ridge regression is one such penalised regression technique which does not perform variable selection, instead estimating a regression coefficient for each predictor variable. It is therefore desirable to obtain an estimate of the significance of each ridge regression coefficient. Results We develop and evaluate a test of significance for ridge regression coefficients. Using simulation studies, we demonstrate that the performance of the test is comparable to that of a permutation test, with the advantage of a much-reduced computational cost. We introduce the p-value trace, a plot of the negative logarithm of the p-values of ridge regression coefficients with increasing shrinkage parameter, which enables the visualisation of the change in p-value of the regression coefficients with increasing penalisation. We apply the proposed method to a lung cancer case-control data set from EPIC, the European Prospective Investigation into Cancer and Nutrition. Conclusions The proposed test is a useful alternative to a permutation test for the estimation of the significance of ridge regression coefficients, at a much-reduced computational cost. The p-value trace is an informative graphical tool for evaluating the results of a test of significance of ridge regression coefficients as the shrinkage parameter increases, and the proposed test makes its production computationally feasible.
ENSO-induced inter-annual sea level variability in the Singapore strait

Digital Repository Service at National Institute of Oceanography (India)

Soumya, M.; Vethamony, P.; Tkalich, P.

Sea level data from four tide gauge stations in the SS (Tanjong Pagar, Sultan Shoal, Sembawang and Raffles Lighthouse) for the period 1970-2012 were extracted to study the ENSO-induced interannual sea level variability Sea level during this period...
Kernel regression with functional response

OpenAIRE

Ferraty, Frédéric; Laksaci, Ali; Tadj, Amel; Vieu, Philippe

2011-01-01

We consider kernel regression estimate when both the response variable and the explanatory one are functional. The rates of uniform almost complete convergence are stated as function of the small ball probability of the predictor and as function of the entropy of the set on which uniformity is obtained.
Identifying individual changes in performance with composite quality indicators while accounting for regression to the mean.

Science.gov (United States)

Gajewski, Byron J; Dunton, Nancy

2013-04-01

Almost a decade ago Morton and Torgerson indicated that perceived medical benefits could be due to "regression to the mean." Despite this caution, the regression to the mean "effects on the identification of changes in institutional performance do not seem to have been considered previously in any depth" (Jones and Spiegelhalter). As a response, Jones and Spiegelhalter provide a methodology to adjust for regression to the mean when modeling recent changes in institutional performance for one-variable quality indicators. Therefore, in our view, Jones and Spiegelhalter provide a breakthrough methodology for performance measures. At the same time, in the interests of parsimony, it is useful to aggregate individual quality indicators into a composite score. Our question is, can we develop and demonstrate a methodology that extends the "regression to the mean" literature to allow for composite quality indicators? Using a latent variable modeling approach, we extend the methodology to the composite indicator case. We demonstrate the approach on 4 indicators collected by the National Database of Nursing Quality Indicators. A simulation study further demonstrates its "proof of concept."
On the shape of posterior densities and credible sets in instrumental variable regression models with reduced rank: an application of flexible sampling methods using neural networks

NARCIS (Netherlands)

Hoogerheide, L.F.; Kaashoek, J.F.; van Dijk, H.K.

2007-01-01

Likelihoods and posteriors of instrumental variable (IV) regression models with strong endogeneity and/or weak instruments may exhibit rather non-elliptical contours in the parameter space. This may seriously affect inference based on Bayesian credible sets. When approximating posterior
Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

Science.gov (United States)

Prahutama, Alan; Suparti; Wahyu Utami, Tiani

2018-03-01

Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

Drivers of Variability in Public-Supply Water Use Across the Contiguous United States

Science.gov (United States)

Worland, Scott C.; Steinschneider, Scott; Hornberger, George M.

2018-03-01

This study explores the relationship between municipal water use and an array of climate, economic, behavioral, and policy variables across the contiguous U.S. The relationship is explored using Bayesian-hierarchical regression models for over 2,500 counties, 18 covariates, and three higher-level grouping variables. Additionally, a second analysis is included for 83 cities where water price and water conservation policy information is available. A hierarchical model using the nine climate regions (product of National Oceanic and Atmospheric Administration) as the higher-level groups results in the best out-of-sample performance, as estimated by the Widely Available Information Criterion, compared to counties grouped by urban continuum classification or primary economic activity. The regression coefficients indicate that the controls on water use are not uniform across the nation: e.g., counties in the Northeast and Northwest climate regions are more sensitive to social variables, whereas counties in the Southwest and East North Central climate regions are more sensitive to environmental variables. For the national city-level model, it appears that arid cities with a high cost of living and relatively low water bills sell more water per customer, but as with the county-level model, the effect of each variable depends heavily on where a city is located.
Background stratified Poisson regression analysis of cohort data

International Nuclear Information System (INIS)

Richardson, David B.; Langholz, Bryan

2012-01-01

Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models. (orig.)
Sea level variability in the Arctic Ocean observed by satellite altimetry

OpenAIRE

Prandi, P.; Ablain, M.; Cazenave, A.; Picot, N.

2012-01-01

We investigate sea level variability in the Arctic Ocean from observations. Variability estimates are derived both at the basin scale and on smaller local spatial scales. The periods of the signals studied vary from high frequency (intra-annual) to long term trends. We also investigate the mechanisms responsible for the observed variability. Different data types are used, the main one being a recent reprocessing of satellite altimetry data...
Generation of daily global solar irradiation with support vector machines for regression

International Nuclear Information System (INIS)

Antonanzas-Torres, F.; Urraca, R.; Antonanzas, J.; Fernandez-Ceniceros, J.; Martinez-de-Pison, F.J.

2015-01-01

Highlights: • New methodology for estimation of daily solar irradiation with SVR. • Automatic procedure for training models and selecting meteorological features. • This methodology outperforms other well-known parametric and numeric techniques. - Abstract: Solar global irradiation is barely recorded in isolated rural areas around the world. Traditionally, solar resource estimation has been performed using parametric-empirical models based on the relationship of solar irradiation with other atmospheric and commonly measured variables, such as temperatures, rainfall, and sunshine duration, achieving a relatively high level of certainty. Considerable improvement in soft-computing techniques, which have been applied extensively in many research fields, has lead to improvements in solar global irradiation modeling, although most of these techniques lack spatial generalization. This new methodology proposes support vector machines for regression with optimized variable selection via genetic algorithms to generate non-locally dependent and accurate models. A case of study in Spain has demonstrated the value of this methodology. It achieved a striking reduction in the mean absolute error (MAE) – 41.4% and 19.9% – as compared to classic parametric models; Bristow & Campbell and Antonanzas-Torres et al., respectively
On the shape of posterior densities and credible sets in instrumental variable regression models with reduced rank: an application of flexible sampling methods using neural networks

NARCIS (Netherlands)

L.F. Hoogerheide (Lennart); J.F. Kaashoek (Johan); H.K. van Dijk (Herman)

2005-01-01

textabstractLikelihoods and posteriors of instrumental variable regression models with strong endogeneity and/or weak instruments may exhibit rather non-elliptical contours in the parameter space. This may seriously affect inference based on Bayesian credible sets. When approximating such contours
Analysis of Relationship Between Personality and Favorite Places with Poisson Regression Analysis

Directory of Open Access Journals (Sweden)

Yoon Song Ha

2018-01-01

Full Text Available A relationship between human personality and preferred locations have been a long conjecture for human mobility research. In this paper, we analyzed the relationship between personality and visiting place with Poisson Regression. Poisson Regression can analyze correlation between countable dependent variable and independent variable. For this analysis, 33 volunteers provided their personality data and 49 location categories data are used. Raw location data is preprocessed to be normalized into rates of visit and outlier data is prunned. For the regression analysis, independent variables are personality data and dependent variables are preprocessed location data. Several meaningful results are found. For example, persons with high tendency of frequent visiting to university laboratory has personality with high conscientiousness and low openness. As well, other meaningful location categories are presented in this paper.
Science Curriculum Guide, Level 4.

Science.gov (United States)

Newark School District, DE.

The fourth of four levels in a K-12 science curriculum is outlined. In Level 4 (grades 9-12), science areas include earth science, biology, chemistry, and physics. Six major themes provide the basis for study in all levels (K-12). These are: Change, Continuity, Diversity, Interaction, Limitation, and Organization. In Level 4, all six themes are…
Poisson regression approach for modeling fatal injury rates amongst Malaysian workers

International Nuclear Information System (INIS)

Kamarulzaman Ibrahim; Heng Khai Theng

2005-01-01

Many safety studies are based on the analysis carried out on injury surveillance data. The injury surveillance data gathered for the analysis include information on number of employees at risk of injury in each of several strata where the strata are defined in terms of a series of important predictor variables. Further insight into the relationship between fatal injury rates and predictor variables may be obtained by the poisson regression approach. Poisson regression is widely used in analyzing count data. In this study, poisson regression is used to model the relationship between fatal injury rates and predictor variables which are year (1995-2002), gender, recording system and industry type. Data for the analysis were obtained from PERKESO and Jabatan Perangkaan Malaysia. It is found that the assumption that the data follow poisson distribution has been violated. After correction for the problem of over dispersion, the predictor variables that are found to be significant in the model are gender, system of recording, industry type, two interaction effects (interaction between recording system and industry type and between year and industry type). Introduction Regression analysis is one of the most popular
Logistic regression for dichotomized counts.

Science.gov (United States)

Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

2016-12-01

Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

Directory of Open Access Journals (Sweden)

Ade Widyaningsih

2015-04-01

Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.
Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

Directory of Open Access Journals (Sweden)

Ade Widyaningsih

2014-06-01

Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.
Multi variate regression model of the water level and production rate time series of the geothermal reservoir Waiwera (New Zealand)

Science.gov (United States)

Kühn, Michael; Schöne, Tim

2017-04-01

Water management tools are essential to ensure the conservation of natural resources. The geothermal hot water reservoir below the village of Waiwera, on the Northern Island of New Zealand is used commercially since 1863. The continuous production of 50 °C hot geothermal water, to supply hotels and spas, has a negative impact on the reservoir. Until the year 1969 from all wells drilled the warm water flow was artesian. Due to overproduction the water needs to be pumped up nowadays. Further, within the years 1975 to 1976 the warm water seeps on the beach of Waiwera ran dry. In order to protect the reservoir and the historical and tourist site in the early 1980s a water management plan was deployed. The "Auckland Council" established guidelines to enable a sustainable management of the resource [1]. The management plan demands that the water level in the official and appropriate observation well of the council is 0.5 m above sea level throughout the year in average. Almost four decades of data (since 1978 until today) are now available [2]. For a sustainable water management, it is necessary to be able to forecast the water level as a function of the production rates in the production wells. The best predictions are provided by a multivariate regression model of the water level and production rate time series, which takes into account the production rates of individual wells. It is based on the inversely proportional relationship between the independent variable (production rate) and the dependent variable (measured water level). In production scenarios, a maximum total production rate of approx. 1,100 m3 / day is determined in order to comply with the guidelines of the "Auckland Council". [1] Kühn M., Stöfen H. (2005) A reactive flow model of the geothermal reservoir Waiwera, New Zealand. Hydrogeology Journal 13, 606-626, doi: 10.1007/s10040-004-0377-6 [2] Kühn M., Altmannsberger C. (2016) Assessment of data driven and process based water management tools for
Multinomial logistic regression in workers' health

Science.gov (United States)

Grilo, Luís M.; Grilo, Helena L.; Gonçalves, Sónia P.; Junça, Ana

2017-11-01

In European countries, namely in Portugal, it is common to hear some people mentioning that they are exposed to excessive and continuous psychosocial stressors at work. This is increasing in diverse activity sectors, such as, the Services sector. A representative sample was collected from a Portuguese Services' organization, by applying a survey (internationally validated), which variables were measured in five ordered categories in Likert-type scale. A multinomial logistic regression model is used to estimate the probability of each category of the dependent variable general health perception where, among other independent variables, burnout appear as statistically significant.
Estimation of lung tumor position from multiple anatomical features on 4D-CT using multiple regression analysis.

Science.gov (United States)

Ono, Tomohiro; Nakamura, Mitsuhiro; Hirose, Yoshinori; Kitsuda, Kenji; Ono, Yuka; Ishigaki, Takashi; Hiraoka, Masahiro

2017-09-01

To estimate the lung tumor position from multiple anatomical features on four-dimensional computed tomography (4D-CT) data sets using single regression analysis (SRA) and multiple regression analysis (MRA) approach and evaluate an impact of the approach on internal target volume (ITV) for stereotactic body radiotherapy (SBRT) of the lung. Eleven consecutive lung cancer patients (12 cases) underwent 4D-CT scanning. The three-dimensional (3D) lung tumor motion exceeded 5 mm. The 3D tumor position and anatomical features, including lung volume, diaphragm, abdominal wall, and chest wall positions, were measured on 4D-CT images. The tumor position was estimated by SRA using each anatomical feature and MRA using all anatomical features. The difference between the actual and estimated tumor positions was defined as the root-mean-square error (RMSE). A standard partial regression coefficient for the MRA was evaluated. The 3D lung tumor position showed a high correlation with the lung volume (R = 0.92 ± 0.10). Additionally, ITVs derived from SRA and MRA approaches were compared with ITV derived from contouring gross tumor volumes on all 10 phases of the 4D-CT (conventional ITV). The RMSE of the SRA was within 3.7 mm in all directions. Also, the RMSE of the MRA was within 1.6 mm in all directions. The standard partial regression coefficient for the lung volume was the largest and had the most influence on the estimated tumor position. Compared with conventional ITV, average percentage decrease of ITV were 31.9% and 38.3% using SRA and MRA approaches, respectively. The estimation accuracy of lung tumor position was improved by the MRA approach, which provided smaller ITV than conventional ITV. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
Sensori-motor synchronisation variability decreases as the number of metrical levels in the stimulus signal increases.

Science.gov (United States)

Madison, Guy

2014-03-01

Timing performance becomes less precise for longer intervals, which makes it difficult to achieve simultaneity in synchronisation with a rhythm. The metrical structure of music, characterised by hierarchical levels of binary or ternary subdivisions of time, may function to increase precision by providing additional timing information when the subdivisions are explicit. This hypothesis was tested by comparing synchronisation performance across different numbers of metrical levels conveyed by loudness of sounds, such that the slowest level was loudest and the fastest was softest. Fifteen participants moved their hand with one of 9 inter-beat intervals (IBIs) ranging from 524 to 3,125 ms in 4 metrical level (ML) conditions ranging from 1 (one movement for each sound) to 4 (one movement for every 8th sound). The lowest relative variability (SD/IBI<1.5%) was obtained for the 3 longest IBIs (1600-3,125 ms) and MLs 3-4, significantly less than the smallest value (4-5% at 524-1024 ms) for any ML 1 condition in which all sounds are identical. Asynchronies were also more negative with higher ML. In conclusion, metrical subdivision provides information that facilitates temporal performance, which suggests an underlying neural multi-level mechanism capable of integrating information across levels. © 2013.
The contextual effects of social capital on health: a cross-national instrumental variable analysis.

Science.gov (United States)

Kim, Daniel; Baum, Christopher F; Ganz, Michael L; Subramanian, S V; Kawachi, Ichiro

2011-12-01

Past research on the associations between area-level/contextual social capital and health has produced conflicting evidence. However, interpreting this rapidly growing literature is difficult because estimates using conventional regression are prone to major sources of bias including residual confounding and reverse causation. Instrumental variable (IV) analysis can reduce such bias. Using data on up to 167,344 adults in 64 nations in the European and World Values Surveys and applying IV and ordinary least squares (OLS) regression, we estimated the contextual effects of country-level social trust on individual self-rated health. We further explored whether these associations varied by gender and individual levels of trust. Using OLS regression, we found higher average country-level trust to be associated with better self-rated health in both women and men. Instrumental variable analysis yielded qualitatively similar results, although the estimates were more than double in size in both sexes when country population density and corruption were used as instruments. The estimated health effects of raising the percentage of a country's population that trusts others by 10 percentage points were at least as large as the estimated health effects of an individual developing trust in others. These findings were robust to alternative model specifications and instruments. Conventional regression and to a lesser extent IV analysis suggested that these associations are more salient in women and in women reporting social trust. In a large cross-national study, our findings, including those using instrumental variables, support the presence of beneficial effects of higher country-level trust on self-rated health. Previous findings for contextual social capital using traditional regression may have underestimated the true associations. Given the close linkages between self-rated health and all-cause mortality, the public health gains from raising social capital within and across
Improving sensitivity of linear regression-based cell type-specific differential expression deconvolution with per-gene vs. global significance threshold.

Science.gov (United States)

Glass, Edmund R; Dozmorov, Mikhail G

2016-10-06

The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from blood samples suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. Combined with cell counts, heterogeneous gene expression may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression, and a global cutoff to judge significance, such as False Discovery Rate (FDR). Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect linear regression. In this paper we quantify the parameter space affecting the performance of linear regression (sensitivity of cell type-specific differential expression detection) on a per-gene basis. We evaluated the effect of sample sizes, cell type-specific proportion variability, and mean squared error on sensitivity of cell type-specific differential expression detection using linear regression. Each parameter affected variability of cell type-specific expression estimates and, subsequently, the sensitivity of differential expression detection. We provide the R package, LRCDE, which performs linear regression-based cell type-specific differential expression (deconvolution) detection on a gene-by-gene basis. Accounting for variability around cell type-specific gene expression estimates, it computes per-gene t-statistics of differential detection, p-values, t-statistic-based sensitivity, group-specific mean squared error, and several gene-specific diagnostic metrics. The sensitivity of linear regression-based cell type-specific differential expression detection differed for each gene as a function of mean squared error, per group sample sizes, and variability of the proportions
QUANTITATIVE ELECTRONIC STRUCTURE - ACTIVITY RELATIONSHIP OF ANTIMALARIAL COMPOUND OF ARTEMISININ DERIVATIVES USING PRINCIPAL COMPONENT REGRESSION APPROACH

Directory of Open Access Journals (Sweden)

Paul Robert Martin Werfette

2010-06-01

Full Text Available Analysis of quantitative structure - activity relationship (QSAR for a series of antimalarial compound artemisinin derivatives has been done using principal component regression. The descriptors for QSAR study were representation of electronic structure i.e. atomic net charges of the artemisinin skeleton calculated by AM1 semi-empirical method. The antimalarial activity of the compound was expressed in log 1/IC50 which is an experimental data. The main purpose of the principal component analysis approach is to transform a large data set of atomic net charges to simplify into a data set which known as latent variables. The best QSAR equation to analyze of log 1/IC50 can be obtained from the regression method as a linear function of several latent variables i.e. x1, x2, x3, x4 and x5. The best QSAR model is expressed in the following equation, (;; Keywords: QSAR, antimalarial, artemisinin, principal component regression
REGRES: A FORTRAN-77 program to calculate nonparametric and ``structural'' parametric solutions to bivariate regression equations

Science.gov (United States)

Rock, N. M. S.; Duffy, T. R.

REGRES allows a range of regression equations to be calculated for paired sets of data values in which both variables are subject to error (i.e. neither is the "independent" variable). Nonparametric regressions, based on medians of all possible pairwise slopes and intercepts, are treated in detail. Estimated slopes and intercepts are output, along with confidence limits, Spearman and Kendall rank correlation coefficients. Outliers can be rejected with user-determined stringency. Parametric regressions can be calculated for any value of λ (the ratio of the variances of the random errors for y and x)—including: (1) major axis ( λ = 1); (2) reduced major axis ( λ = variance of y/variance of x); (3) Y on Xλ = infinity; or (4) X on Y ( λ = 0) solutions. Pearson linear correlation coefficients also are output. REGRES provides an alternative to conventional isochron assessment techniques where bivariate normal errors cannot be assumed, or weighting methods are inappropriate.
Variable selection methods in PLS regression - a comparison study on metabolomics data

DEFF Research Database (Denmark)

Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach

. The aim of the metabolomics study was to investigate the metabolic profile in pigs fed various cereal fractions with special attention to the metabolism of lignans using LC-MS based metabolomic approach. References 1. Lê Cao KA, Rossouw D, Robert-Granié C, Besse P: A Sparse PLS for Variable Selection when...... integrated approach. Due to the high number of variables in data sets (both raw data and after peak picking) the selection of important variables in an explorative analysis is difficult, especially when different data sets of metabolomics data need to be related. Variable selection (or removal of irrelevant...... different strategies for variable selection on PLSR method were considered and compared with respect to selected subset of variables and the possibility for biological validation. Sparse PLSR [1] as well as PLSR with Jack-knifing [2] was applied to data in order to achieve variable selection prior...

Fluid/structure interaction in BERDYNE (Level 4)

International Nuclear Information System (INIS)

Fox, M.J.H.

1988-02-01

A fluid-structure interaction capability has been developed for Level 4 of the finite element dynamics code BERDYNE, as part of the BERSAFE structural analysis system. This permits analysis of small amplitude free or forced vibration of systems comprising elastic structural components and inviscid volumes of possibly compressible fluid. Free fluid surfaces under the influence of gravity may be present. The formulation chosen uses the rigid walled fluid modes, calculated in a preliminary stage, as a basis for description of the coupled system, providing symmetric system matrices for which efficient solution procedures are available. The inclusion of the fluid modal variables within the system matrices is carried out through the use of the BERDYNE 'substructuring' feature, which allows the inclusion of very general 'super-elements' among the normal structural elements. The program also has a seismic analysis capability, used for the analysis of fluid-structure systems subjected to a specified support acceleration time history. In this case analysis is carried out in terms of relative structural motions, but absolute fluid pressures. Application of the BERDYNE fluid/structure interaction capability to some simple test cases produced results in good agreement with results obtained by analytic or independent numerical techniques. Full instructions on the use of the facility will be included in the BERDYNE Level 4 documentation. Interim documentation for the pre-release version is available from the author. (author)
Modelling the co-evolution of indirect genetic effects and inherited variability.

Science.gov (United States)

Marjanovic, Jovana; Mulder, Han A; Rönnegård, Lars; Bijma, Piter

2018-03-28

When individuals interact, their phenotypes may be affected not only by their own genes but also by genes in their social partners. This phenomenon is known as Indirect Genetic Effects (IGEs). In aquaculture species and some plants, however, competition not only affects trait levels of individuals, but also inflates variability of trait values among individuals. In the field of quantitative genetics, the variability of trait values has been studied as a quantitative trait in itself, and is often referred to as inherited variability. Such studies, however, consider only the genetic effect of the focal individual on trait variability and do not make a connection to competition. Although the observed phenotypic relationship between competition and variability suggests an underlying genetic relationship, the current quantitative genetic models of IGE and inherited variability do not allow for such a relationship. The lack of quantitative genetic models that connect IGEs to inherited variability limits our understanding of the potential of variability to respond to selection, both in nature and agriculture. Models of trait levels, for example, show that IGEs may considerably change heritable variation in trait values. Currently, we lack the tools to investigate whether this result extends to variability of trait values. Here we present a model that integrates IGEs and inherited variability. In this model, the target phenotype, say growth rate, is a function of the genetic and environmental effects of the focal individual and of the difference in trait value between the social partner and the focal individual, multiplied by a regression coefficient. The regression coefficient is a genetic trait, which is a measure of cooperation; a negative value indicates competition, a positive value cooperation, and an increasing value due to selection indicates the evolution of cooperation. In contrast to the existing quantitative genetic models, our model allows for co-evolution of
Ultracentrifuge separative power modeling with multivariate regression using covariance matrix

International Nuclear Information System (INIS)

Migliavacca, Elder

2004-01-01

In this work, the least-squares methodology with covariance matrix is applied to determine a data curve fitting to obtain a performance function for the separative power δU of a ultracentrifuge as a function of variables that are experimentally controlled. The experimental data refer to 460 experiments on the ultracentrifugation process for uranium isotope separation. The experimental uncertainties related with these independent variables are considered in the calculation of the experimental separative power values, determining an experimental data input covariance matrix. The process variables, which significantly influence the δU values are chosen in order to give information on the ultracentrifuge behaviour when submitted to several levels of feed flow rate F, cut θ and product line pressure P p . After the model goodness-of-fit validation, a residual analysis is carried out to verify the assumed basis concerning its randomness and independence and mainly the existence of residual heteroscedasticity with any explained regression model variable. The surface curves are made relating the separative power with the control variables F, θ and P p to compare the fitted model with the experimental data and finally to calculate their optimized values. (author)
Assessment of deforestation using regression; Hodnotenie odlesnenia s vyuzitim regresie

Energy Technology Data Exchange (ETDEWEB)

Juristova, J. [Univerzita Komenskeho, Prirodovedecka fakulta, Katedra kartografie, geoinformatiky a DPZ, 84215 Bratislava (Slovakia)

2013-04-16

This work is devoted to the evaluation of deforestation using regression methods through software Idrisi Taiga. Deforestation is evaluated by the method of logistic regression. The dependent variable has discrete values '0' and '1', indicating that the deforestation occurred or not. Independent variables have continuous values, expressing the distance from the edge of the deforested areas of forests from urban areas, the river and the road network. The results were also used in predicting the probability of deforestation in subsequent periods. The result is a map showing the output probability of deforestation for the periods 1990/2000 and 200/2006 in accordance with predetermined coefficients (values of independent variables). (authors)
Testing Heteroscedasticity in Robust Regression

Czech Academy of Sciences Publication Activity Database

Kalina, Jan

2011-01-01

Roč. 1, č. 4 (2011), s. 25-28 ISSN 2045-3345 Grant - others:GA ČR(CZ) GA402/09/0557 Institutional research plan: CEZ:AV0Z10300504 Keywords : robust regression * heteroscedasticity * regression quantiles * diagnostics Subject RIV: BB - Applied Statistics , Operational Research http://www.researchjournals.co.uk/documents/Vol4/06%20Kalina.pdf
Radio variability in the Phoenix Deep Survey at 1.4 GHz

Science.gov (United States)

Hancock, P. J.; Drury, J. A.; Bell, M. E.; Murphy, T.; Gaensler, B. M.

2016-09-01

We use archival data from the Phoenix Deep Survey to investigate the variable radio source population above 1 mJy beam-1 at 1.4 GHz. Given the similarity of this survey to other such surveys we take the opportunity to investigate the conflicting results which have appeared in the literature. Two previous surveys for variability conducted with the Very Large Array (VLA) achieved a sensitivity of 1 mJy beam-1. However, one survey found an areal density of radio variables on time-scales of decades that is a factor of ˜4 times greater than a second survey which was conducted on time-scales of less than a few years. In the Phoenix deep field we measure the density of variable radio sources to be ρ = 0.98 deg-2 on time-scales of 6 months to 8 yr. We make use of Wide-field Infrared Survey Explorer infrared cross-ids, and identify all variable sources as an active galactic nucleus of some description. We suggest that the discrepancy between previous VLA results is due to the different time-scales probed by each of the surveys, and that radio variability at 1.4 GHz is greatest on time-scales of 2-5 yr.
Fuzzy multiple linear regression: A computational approach

Science.gov (United States)

Juang, C. H.; Huang, X. H.; Fleming, J. W.

1992-01-01

This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.
Modeling Typhoon Event-Induced Landslides Using GIS-Based Logistic Regression: A Case Study of Alishan Forestry Railway, Taiwan

Directory of Open Access Journals (Sweden)

Sheng-Chuan Chen

2013-01-01

Full Text Available This study develops a model for evaluating the hazard level of landslides at Alishan Forestry Railway, Taiwan, by using logistic regression with the assistance of a geographical information system (GIS. A typhoon event-induced landslide inventory, independent variables, and a triggering factor were used to build the model. The environmental factors such as bedrock lithology from the geology database; topographic aspect, terrain roughness, profile curvature, and distance to river, from the topographic database; and the vegetation index value from SPOT 4 satellite images were used as variables that influence landslide occurrence. The area under curve (AUC of a receiver operator characteristic (ROC curve was used to validate the model. Effects of parameters on landslide occurrence were assessed from the corresponding coefficient that appears in the logistic regression function. Thereafter, the model was applied to predict the probability of landslides for rainfall data of different return periods. Using a predicted map of probability, the study area was classified into four ranks of landslide susceptibility: low, medium, high, and very high. As a result, most high susceptibility areas are located on the western portion of the study area. Several train stations and railways are located on sites with a high susceptibility ranking.
Testing homogeneity in Weibull-regression models.

Science.gov (United States)

Bolfarine, Heleno; Valença, Dione M

2005-10-01

In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.
Short-term load forecasting with increment regression tree

Energy Technology Data Exchange (ETDEWEB)

Yang, Jingfei; Stenzel, Juergen [Darmstadt University of Techonology, Darmstadt 64283 (Germany)

2006-06-15

This paper presents a new regression tree method for short-term load forecasting. Both increment and non-increment tree are built according to the historical data to provide the data space partition and input variable selection. Support vector machine is employed to the samples of regression tree nodes for further fine regression. Results of different tree nodes are integrated through weighted average method to obtain the comprehensive forecasting result. The effectiveness of the proposed method is demonstrated through its application to an actual system. (author)
American State Gun Law Strength and State Resident Differences in Neuroticism Levels

Directory of Open Access Journals (Sweden)

Stewart J. H. McCann

2016-04-01

Full Text Available Relations between state gun law strength and state-aggregated levels of Republican leaning, gun ownership, and resident Big Five neuroticism (based on 619,397 residents nationally were determined in a state-level analysis of the 50 American states using multiple regression strategies with state socioeconomic status, white population percent, and urban population percent statistically controlled. In a standard hierarchical model with state gun law strength as the criterion, the three demographic variables accounted for 44.4% of the variance and the Big Five accounted for another 21.9%. When the Big Five entered stepwise after the demographics, neuroticism was the sole significant personality predictor, accounting for another 13.4% of the variance. Greater state gun law strength was associated with higher state resident neuroticism. Further hierarchical regression analyses showed that state Republican leaning and gun ownership could account separately and jointly for significant variance in state gun law strength but not with state resident neuroticism controlled.
The relationship between venture capital investment and macro economic variables via statistical computation method

Science.gov (United States)

Aygunes, Gunes

2017-07-01

The objective of this paper is to survey and determine the macroeconomic factors affecting the level of venture capital (VC) investments in a country. The literary depends on venture capitalists' quality and countries' venture capital investments. The aim of this paper is to give relationship between venture capital investment and macro economic variables via statistical computation method. We investigate the countries and macro economic variables. By using statistical computation method, we derive correlation between venture capital investments and macro economic variables. According to method of logistic regression model (logit regression or logit model), macro economic variables are correlated with each other in three group. Venture capitalists regard correlations as a indicator. Finally, we give correlation matrix of our results.
Improved Genetic Algorithm with Two-Level Approximation for Truss Optimization by Using Discrete Shape Variables

Directory of Open Access Journals (Sweden)

Shen-yan Chen

2015-01-01

Full Text Available This paper presents an Improved Genetic Algorithm with Two-Level Approximation (IGATA to minimize truss weight by simultaneously optimizing size, shape, and topology variables. On the basis of a previously presented truss sizing/topology optimization method based on two-level approximation and genetic algorithm (GA, a new method for adding shape variables is presented, in which the nodal positions are corresponding to a set of coordinate lists. A uniform optimization model including size/shape/topology variables is established. First, a first-level approximate problem is constructed to transform the original implicit problem to an explicit problem. To solve this explicit problem which involves size/shape/topology variables, GA is used to optimize individuals which include discrete topology variables and shape variables. When calculating the fitness value of each member in the current generation, a second-level approximation method is used to optimize the continuous size variables. With the introduction of shape variables, the original optimization algorithm was improved in individual coding strategy as well as GA execution techniques. Meanwhile, the update strategy of the first-level approximation problem was also improved. The results of numerical examples show that the proposed method is effective in dealing with the three kinds of design variables simultaneously, and the required computational cost for structural analysis is quite small.
Comparison of cranial sex determination by discriminant analysis and logistic regression.

Science.gov (United States)

Amores-Ampuero, Anabel; Alemán, Inmaculada

2016-04-05

Various methods have been proposed for estimating dimorphism. The objective of this study was to compare sex determination results from cranial measurements using discriminant analysis or logistic regression. The study sample comprised 130 individuals (70 males) of known sex, age, and cause of death from San José cemetery in Granada (Spain). Measurements of 19 neurocranial dimensions and 11 splanchnocranial dimensions were subjected to discriminant analysis and logistic regression, and the percentages of correct classification were compared between the sex functions obtained with each method. The discriminant capacity of the selected variables was evaluated with a cross-validation procedure. The percentage accuracy with discriminant analysis was 78.2% for the neurocranium (82.4% in females and 74.6% in males) and 73.7% for the splanchnocranium (79.6% in females and 68.8% in males). These percentages were higher with logistic regression analysis: 85.7% for the neurocranium (in both sexes) and 94.1% for the splanchnocranium (100% in females and 91.7% in males).
Predictors of the number of under-five malnourished children in Bangladesh: application of the generalized poisson regression model.

Science.gov (United States)

Islam, Mohammad Mafijul; Alam, Morshed; Tariquzaman, Md; Kabir, Mohammad Alamgir; Pervin, Rokhsona; Begum, Munni; Khan, Md Mobarak Hossain

2013-01-08

Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance variable namely mother's education, father's education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh.
Common pitfalls in statistical analysis: Linear regression analysis

Directory of Open Access Journals (Sweden)

Rakesh Aggarwal

2017-01-01

Full Text Available In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.
Regression dilution bias: tools for correction methods and sample size calculation.

Science.gov (United States)

Berglund, Lars

2012-08-01

Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.
Pengaruh beberapa variable terhadap Pemilihan Metode Penilaian Persediaan pada Perusahaan Manufaktur

Directory of Open Access Journals (Sweden)

Herlin Tundjung Setijaningsih

2009-03-01

Full Text Available This study aims to provide empirical evidence whether the size of the company, inventory intensity, variability cost of sales, and accounting earnings variability influence the choice of inventory valuation methods. The object of this research was manufacturing companies listed in Indonesia Stock Exchange in the period 2005-2009. Thirty nine samples in this study were taken by several criteria. Statistical analysis tool used in this research was logistic regression with a significance level of 5%. From the test result, it was obtained that either partially or simultaneously, company size, intensity of inventory, price variability of goods sold, and income variability have a significance level above 5%. This shows that these variables did not significantly influence the selection method of inventory valuation.
Multiple variables explain the variability in the decrement in VO2max during acute hypobaric hypoxia.

Science.gov (United States)

Robergs, R A; Quintana, R; Parker, D L; Frankel, C C

1998-06-01

We used multiple regression analyses to determine the relationships between the decrement in sea level (SL, 760 Torr) VO2max during hypobaric hypoxia (HH) and variables that could alter or be related to the decrement in VO2max. HH conditions consisted of 682 Torr, 632 Torr, and 566 Torr, and the measured independent variables were SL-VO2max, SL lactate threshold (SL-LT), the change in hemoglobin saturation at VO2max between 760 and 566 Torr (delta SaO2max), lean body mass (LBM), and gender. Male (N = 14) and female (N = 14) subjects of varied fitness, training status, and residential altitude (1,640-2,460 m) completed cycle ergometry tests of VO2max at each HH condition under randomized and single-blinded conditions. VO2max decreased significantly from 760 Torr after 682 Torr (approximately 915 m) (3.5 +/- 0.9 to 3.4 +/- 0.8 L.min-1, P = 0.0003). Across all HH conditions, the slope of the relative decrement in VO2max (%VO2max) during HH was -9.2%/100 mm Hg (-8.1%/1000 m) with an initial decrease from 100% estimated to occur below 705 Torr (610 m). Step-wise multiple regression revealed that SL-VO2max, SL-LT, delta SaO2max, LBM, and gender each significantly combined to account for 89.03% of the variance in the decrement in VO2max (760-566 Torr) (P decrement in VO2max during HH. The unique variance explanation afforded by SL-LT, LBM, and gender suggests that issues pertaining to oxygen diffusion within skeletal muscle may add to the explanation of between subjects variability in the decrement in VO2max during HH.
Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: Regression estimation and sample size considerations.

Science.gov (United States)

NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel

2017-08-01

Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.

Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

Directory of Open Access Journals (Sweden)

Minh Vu Trieu

2017-03-01

Full Text Available This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS, Brazilian tensile strength (BTS, rock brittleness index (BI, the distance between planes of weakness (DPW, and the alpha angle (Alpha between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP. Four (4 statistical regression models (two linear and two nonlinear are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2 of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

Science.gov (United States)

Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

2017-03-01

This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

Science.gov (United States)

Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

2014-01-01

As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Tumor regression patterns in retinoblastoma

International Nuclear Information System (INIS)

Zafar, S.N.; Siddique, S.N.; Zaheer, N.

2016-01-01

To observe the types of tumor regression after treatment, and identify the common pattern of regression in our patients. Study Design: Descriptive study. Place and Duration of Study: Department of Pediatric Ophthalmology and Strabismus, Al-Shifa Trust Eye Hospital, Rawalpindi, Pakistan, from October 2011 to October 2014. Methodology: Children with unilateral and bilateral retinoblastoma were included in the study. Patients were referred to Pakistan Institute of Medical Sciences, Islamabad, for chemotherapy. After every cycle of chemotherapy, dilated funds examination under anesthesia was performed to record response of the treatment. Regression patterns were recorded on RetCam II. Results: Seventy-four tumors were included in the study. Out of 74 tumors, 3 were ICRB group A tumors, 43 were ICRB group B tumors, 14 tumors belonged to ICRB group C, and remaining 14 were ICRB group D tumors. Type IV regression was seen in 39.1% (n=29) tumors, type II in 29.7% (n=22), type III in 25.6% (n=19), and type I in 5.4% (n=4). All group A tumors (100%) showed type IV regression. Seventeen (39.5%) group B tumors showed type IV regression. In group C, 5 tumors (35.7%) showed type II regression and 5 tumors (35.7%) showed type IV regression. In group D, 6 tumors (42.9%) regressed to type II non-calcified remnants. Conclusion: The response and success of the focal and systemic treatment, as judged by the appearance of different patterns of tumor regression, varies with the ICRB grouping of the tumor. (author)
Group-wise partial least square regression

NARCIS (Netherlands)

Camacho, José; Saccenti, Edoardo

2018-01-01

This paper introduces the group-wise partial least squares (GPLS) regression. GPLS is a new sparse PLS technique where the sparsity structure is defined in terms of groups of correlated variables, similarly to what is done in the related group-wise principal component analysis. These groups are
Cyclic deformation and fatigue data for Ti–6Al–4V ELI under variable amplitude loading

Directory of Open Access Journals (Sweden)

Patricio E. Carrion

2017-08-01

Full Text Available This article presents the strain-based experimental data for Ti–6Al–4V ELI under non-constant amplitude cyclic loading. Uniaxial strain-controlled fatigue experiments were conducted under three different loading conditions, including two-level block loading (i.e. high-low and low-high, periodic overload, and variable amplitude loading. Tests were performed under fully-reversed, and mean strain/stress conditions. For each test conducted, two sets of data were collected; the cyclic stress–strain response (i.e. hysteresis loops in log10 increments, and the peak and valley values of stress and strain for each cycle. Residual fatigue lives are reported for tests with two-level block loading, while for periodic overload and variable amplitude experiments, fatigue lives are reported in terms of number of blocks to failure.
Monitoring the variability of sea level and surface circulation with satellite altimetry

NARCIS (Netherlands)

Volkov, Denis L. "Jr"

2004-01-01

Variability in the ocean plays an important role in determining global weather and climate conditions. The advent of satellite altimetry has significantly facilitated the study of the variability of sea level and surface circulation. Satellites provide high-quality regular and nearly global
About hidden influence of predictor variables: Suppressor and mediator variables

Directory of Open Access Journals (Sweden)

Milovanović Boško

2013-01-01

Full Text Available In this paper procedure for researching hidden influence of predictor variables in regression models and depicting suppressor variables and mediator variables is shown. It is also shown that detection of suppressor variables and mediator variables could provide refined information about the research problem. As an example for applying this procedure, relation between Atlantic atmospheric centers and air temperature and precipitation amount in Serbia is chosen. [Projekat Ministarstva nauke Republike Srbije, br. 47007
The Determinants of Equity Risk and Their Forecasting Implications: A Quantile Regression Perspective

Directory of Open Access Journals (Sweden)

Giovanni Bonaccolto

2016-07-01

Full Text Available Several market and macro-level variables influence the evolution of equity risk in addition to the well-known volatility persistence. However, the impact of those covariates might change depending on the risk level, being different between low and high volatility states. By combining equity risk estimates, obtained from the Realized Range Volatility, corrected for microstructure noise and jumps, and quantile regression methods, we evaluate the forecasting implications of the equity risk determinants in different volatility states and, without distributional assumptions on the realized range innovations, we recover both the points and the conditional distribution forecasts. In addition, we analyse how the the relationships among the involved variables evolve over time, through a rolling window procedure. The results show evidence of the selected variables’ relevant impacts and, particularly during periods of market stress, highlight heterogeneous effects across quantiles.
Selecting a Regression Saturated by Indicators

DEFF Research Database (Denmark)

Hendry, David F.; Johansen, Søren; Santos, Carlos

We consider selecting a regression model, using a variant of Gets, when there are more variables than observations, in the special case that the variables are impulse dummies (indicators) for every observation. We show that the setting is unproblematic if tackled appropriately, and obtain the fin...... the finite-sample distribution of estimators of the mean and variance in a simple location-scale model under the null that no impulses matter. A Monte Carlo simulation confirms the null distribution, and shows power against an alternative of interest....
Selecting a Regression Saturated by Indicators

DEFF Research Database (Denmark)

Hendry, David F.; Johansen, Søren; Santos, Carlos

We consider selecting a regression model, using a variant of Gets, when there are more variables than observations, in the special case that the variables are impulse dummies (indicators) for every observation. We show that the setting is unproblematic if tackled appropriately, and obtain the fin...... the finite-sample distribution of estimators of the mean and variance in a simple location-scale model under the null that no impulses matter. A Monte Carlo simulation confirms the null distribution, and shows power against an alternative of interest...
Item-level informant discrepancies across obese-overweight children and their parents on the PedsQL™ 4.0 instrument: an iterative hybrid ordinal logistic regression.

Science.gov (United States)

Jafari, Peyman; Allahyari, Elahe; Salarzadeh, Mina; Bagheri, Zahra

2016-01-01

Child obesity has become a major health concern worldwide. In order to provide successful intervention strategies, it is necessary to understand how obese-overweight children and their parents perceive obesity and its consequences on child's health-related quality of life (HRQoL). This study aimed to assess measurement equivalence of the PedsQL™ 4.0 across obese-overweight children and their parents. The items in the PedsQL™ 4.0 were analysed for differential item functioning (DIF) across obese-overweight children and their parents using an iterative hybrid ordinal logistic regression/item response theory approach. The sample included 647 overweight-obese children and their parents, who completed child and parent reports of the PedsQL™ 4.0, respectively. Overall, 17 out of 23 (74%) items were flagged with DIF across two groups: eight items exhibited uniform DIF and nine items non-uniform DIF. In addition, parents of obese children rated the child's HRQoL significantly lower than their children in all domains of the PedsQL™ 4.0, and this finding did not change whether or not items with uniform DIF were included. Although obese-overweight children and their parents interpret items of the PedsQL™ 4.0 in a conceptually different manner, removing or retaining DIF items in the subscales had no significant effects on group differences. Accordingly, it appears that observed differences in HRQoL scores across child and parent reports are a true difference and not a reflection of measurement artefact.
The intermediate endpoint effect in logistic and probit regression

Science.gov (United States)

MacKinnon, DP; Lockwood, CM; Brown, CH; Wang, W; Hoffman, JM

2010-01-01

Background An intermediate endpoint is hypothesized to be in the middle of the causal sequence relating an independent variable to a dependent variable. The intermediate variable is also called a surrogate or mediating variable and the corresponding effect is called the mediated, surrogate endpoint, or intermediate endpoint effect. Clinical studies are often designed to change an intermediate or surrogate endpoint and through this intermediate change influence the ultimate endpoint. In many intermediate endpoint clinical studies the dependent variable is binary, and logistic or probit regression is used. Purpose The purpose of this study is to describe a limitation of a widely used approach to assessing intermediate endpoint effects and to propose an alternative method, based on products of coefficients, that yields more accurate results. Methods The intermediate endpoint model for a binary outcome is described for a true binary outcome and for a dichotomization of a latent continuous outcome. Plots of true values and a simulation study are used to evaluate the different methods. Results Distorted estimates of the intermediate endpoint effect and incorrect conclusions can result from the application of widely used methods to assess the intermediate endpoint effect. The same problem occurs for the proportion of an effect explained by an intermediate endpoint, which has been suggested as a useful measure for identifying intermediate endpoints. A solution to this problem is given based on the relationship between latent variable modeling and logistic or probit regression. Limitations More complicated intermediate variable models are not addressed in the study, although the methods described in the article can be extended to these more complicated models. Conclusions Researchers are encouraged to use an intermediate endpoint method based on the product of regression coefficients. A common method based on difference in coefficient methods can lead to distorted
Canonical basis for type A4 (II) - Polynomial elements in one variable

International Nuclear Information System (INIS)

Hu Yuwang; Ye Jiachen

2003-12-01

All the 62 monomial elements in the canonical basis B of the quantized enveloping algebra for type A 4 have been determined. According to Lusztig's idea, the elements in the canonical basis B consist of monomials and linear combinations of monomials (for convenience, we call them polynomials). In this note, we compute all the 144 polynomial elements in one variable in the canonical basis B of the quantized enveloping algebra for type A 4 based on our joint note. We conjecture that there are other polynomial elements in two or three variables in the canonical basis B, which include independent variables and dependent variables. Moreover, it is conjectured that there are no polynomial elements in the canonical basis B with four or more variables. (author)
Multicollinearity is a red herring in the search for moderator variables: A guide to interpreting moderated multiple regression models and a critique of Iacobucci, Schneider, Popovich, and Bakamitsos (2016).

Science.gov (United States)

McClelland, Gary H; Irwin, Julie R; Disatnik, David; Sivan, Liron

2017-02-01

Multicollinearity is irrelevant to the search for moderator variables, contrary to the implications of Iacobucci, Schneider, Popovich, and Bakamitsos (Behavior Research Methods, 2016, this issue). Multicollinearity is like the red herring in a mystery novel that distracts the statistical detective from the pursuit of a true moderator relationship. We show multicollinearity is completely irrelevant for tests of moderator variables. Furthermore, readers of Iacobucci et al. might be confused by a number of their errors. We note those errors, but more positively, we describe a variety of methods researchers might use to test and interpret their moderated multiple regression models, including two-stage testing, mean-centering, spotlighting, orthogonalizing, and floodlighting without regard to putative issues of multicollinearity. We cite a number of recent studies in the psychological literature in which the researchers used these methods appropriately to test, to interpret, and to report their moderated multiple regression models. We conclude with a set of recommendations for the analysis and reporting of moderated multiple regression that should help researchers better understand their models and facilitate generalizations across studies.
Descriptor Learning via Supervised Manifold Regularization for Multioutput Regression.

Science.gov (United States)

Zhen, Xiantong; Yu, Mengyang; Islam, Ali; Bhaduri, Mousumi; Chan, Ian; Li, Shuo

2017-09-01

Multioutput regression has recently shown great ability to solve challenging problems in both computer vision and medical image analysis. However, due to the huge image variability and ambiguity, it is fundamentally challenging to handle the highly complex input-target relationship of multioutput regression, especially with indiscriminate high-dimensional representations. In this paper, we propose a novel supervised descriptor learning (SDL) algorithm for multioutput regression, which can establish discriminative and compact feature representations to improve the multivariate estimation performance. The SDL is formulated as generalized low-rank approximations of matrices with a supervised manifold regularization. The SDL is able to simultaneously extract discriminative features closely related to multivariate targets and remove irrelevant and redundant information by transforming raw features into a new low-dimensional space aligned to targets. The achieved discriminative while compact descriptor largely reduces the variability and ambiguity for multioutput regression, which enables more accurate and efficient multivariate estimation. We conduct extensive evaluation of the proposed SDL on both synthetic data and real-world multioutput regression tasks for both computer vision and medical image analysis. Experimental results have shown that the proposed SDL can achieve high multivariate estimation accuracy on all tasks and largely outperforms the algorithms in the state of the arts. Our method establishes a novel SDL framework for multioutput regression, which can be widely used to boost the performance in different applications.
Poisson Regression Analysis of Illness and Injury Surveillance Data

Energy Technology Data Exchange (ETDEWEB)

Frome E.L., Watkins J.P., Ellis E.D.

2012-12-12

The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra
Morphometric variability of mandible linear characteristics depending on level of teeth alveolus position

Directory of Open Access Journals (Sweden)

Olga Yu. Aleshkina

2017-05-01

Results and Conclusion ― The highest altitude was marked at levels of incisors and 3rd molar, the smallest one – at level of 1st and 2nd molars; maximum mandible thickness was defined at level of 2nd molar, minimum – at levels of canine and 1st – 2nd premolars on both sides of mandible; average thickness was revealed at levels of incisors, 1st and 2nd molars and had the same statistical values. Bilateral variability of thickness was significantly dominating on the right side and only at levels of 1st – 2nd premolars and 1st molar. Average values of altitude and thickness from both sides of mandible and at all levels had medium degree of variability.
Hydration level is an internal variable for computing motivation to obtain water rewards in monkeys.

Science.gov (United States)

Minamimoto, Takafumi; Yamada, Hiroshi; Hori, Yukiko; Suhara, Tetsuya

2012-05-01

In the process of motivation to engage in a behavior, valuation of the expected outcome is comprised of not only external variables (i.e., incentives) but also internal variables (i.e., drive). However, the exact neural mechanism that integrates these variables for the computation of motivational value remains unclear. Besides, the signal of physiological needs, which serves as the primary internal variable for this computation, remains to be identified. Concerning fluid rewards, the osmolality level, one of the physiological indices for the level of thirst, may be an internal variable for valuation, since an increase in the osmolality level induces drinking behavior. Here, to examine the relationship between osmolality and the motivational value of a water reward, we repeatedly measured the blood osmolality level, while 2 monkeys continuously performed an instrumental task until they spontaneously stopped. We found that, as the total amount of water earned increased, the osmolality level progressively decreased (i.e., the hydration level increased) in an individual-dependent manner. There was a significant negative correlation between the error rate of the task (the proportion of trials with low motivation) and the osmolality level. We also found that the increase in the error rate with reward accumulation can be well explained by a formula describing the changes in the osmolality level. These results provide a biologically supported computational formula for the motivational value of a water reward that depends on the hydration level, enabling us to identify the neural mechanism that integrates internal and external variables.
Variables Affecting a Level of Practice and Quality of Educational Quality Assurance in Basic Education Schools

Directory of Open Access Journals (Sweden)

Jakkapong Prongprommarat

2016-10-01

variables were sixty - eight percent explaining the variation of acting of educational quality assurance in basic education schools at the significant level of 0.01 4. The variables affecting level of external quality assessment in basic education schools were the level of director leadership (β = 0.02, the level of the directors working responsibility (β = 0.14, the level of the teacher, working responsibility (β = 0.14, the level of school directors and teachers cooperative (β = 0.33 and the level of acting of educational quality assurance in basic education schools (β = 0.21 These five variables were sixty percent explaining the variation of The level of external quality assessment in basic education schools at the significant level of 0.01.

SPSS macros to compare any two fitted values from a regression model.

Science.gov (United States)

Weaver, Bruce; Dubois, Sacha

2012-12-01

In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.
Regression Methods for Virtual Metrology of Layer Thickness in Chemical Vapor Deposition

DEFF Research Database (Denmark)

Purwins, Hendrik; Barak, Bernd; Nagi, Ahmed

2014-01-01

The quality of wafer production in semiconductor manufacturing cannot always be monitored by a costly physical measurement. Instead of measuring a quantity directly, it can be predicted by a regression method (Virtual Metrology). In this paper, a survey on regression methods is given to predict...... average Silicon Nitride cap layer thickness for the Plasma Enhanced Chemical Vapor Deposition (PECVD) dual-layer metal passivation stack process. Process and production equipment Fault Detection and Classification (FDC) data are used as predictor variables. Various variable sets are compared: one most...... algorithm, and Support Vector Regression (SVR). On a test set, SVR outperforms the other methods by a large margin, being more robust towards changes in the production conditions. The method performs better on high-dimensional multivariate input data than on the most predictive variables alone. Process...
Multiple Response Regression for Gaussian Mixture Models with Known Labels.

Science.gov (United States)

Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng

2012-12-01

Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes.
Analisis Perbandingan Teknik Support Vector Regression (SVR) Dan Decision Tree C4.5 Dalam Data Mining

OpenAIRE

Astuti, Yuniar Andi

2011-01-01

This study examines techniques Support Vector Regression and Decision Tree C4.5 has been used in studies in various fields, in order to know the advantages and disadvantages of both techniques that appear in Data Mining. From the ten studies that use both techniques, the results of the analysis showed that the accuracy of the SVR technique for 59,64% and C4.5 for 76,97% So in this study obtained a statement that C4.5 is better than SVR 097038020
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE.

Science.gov (United States)

Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan

2015-10-01

The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran's universities. This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran's public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran's libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries.
A binary logistic regression model with complex sampling design of ...

African Journals Online (AJOL)

2017-09-03

Sep 3, 2017 ... Bi-variable and multi-variable binary logistic regression model with complex sampling design was fitted. .... Data was entered into STATA-12 and analyzed using. SPSS-21. .... lack of access/too far or costs too much. 35. 1.2.
Patient and organisational variables associated with pressure ulcer prevalence in hospital settings: a multilevel analysis.

Science.gov (United States)

Bredesen, Ida Marie; Bjøro, Karen; Gunningberg, Lena; Hofoss, Dag

2015-08-27

To investigate the association of ward-level differences in the odds of hospital-acquired pressure ulcers (HAPUs) with selected ward organisational variables and patient risk factors. Multilevel approach to data from 2 cross-sectional studies. 4 hospitals in Norway were studied. 1056 patients at 84 somatic wards. HAPU. Significant variance in the odds of HAPUs was found across wards. A regression model using only organisational variables left a significant variance in the odds of HAPUs across wards but patient variables eliminated the across-ward variance. In the model including organisational and patient variables, significant ward-level HAPU variables were ward type (rehabilitation vs surgery/internal medicine: OR 0.17 (95% CI 0.04 to 0.66)), use of preventive measures (yes vs no: OR 2.02 (95% CI 1.12 to 3.64)) and ward patient safety culture (OR 0.97 (95% CI 0.96 to 0.99)). Significant patient-level predictors were age >70 vs organisation of care improvements, that is, by improving the patient safety culture and implementation of preventive measures. Some wards may prevent pressure ulcers better than other wards. The fact that ward-level variation was eliminated when patient-level HAPU variables were included in the model indicates that even wards with the best HAPU prevention will be challenged by an influx of high-risk patients. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

Directory of Open Access Journals (Sweden)

Suduan Chen

2014-01-01

Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Is plasma C3 and C4 levels useful in young cerebral ischemic stroke patients? Associations with prognosis at 3 months.

Science.gov (United States)

Zhang, Bin; Yang, Ning; Gao, Cong

2015-02-01

Plasma complement C3 and C4 act as risk factor for vascular diseases related to atherosclerosis. The association C3 and C4 levels in young ischemic stroke patients with the prognosis were still not unknown. We conducted this study to establish the significance of admission C3 and C4 levels as a possible predictor of 3 months prognosis in young patients with acute ischemic stroke. We conducted this study in 1,451 young Chinese patients as determined by the modified Rankin Scale at 3 months. Bivariate logistic regression analyses were used to determine the risk factors of outcome in male and female patients. Stepwise logistic regression analysis confirmed only the lowest quartile of C3 level (0.17-0.90 g/L) was independently associated with prognosis in male patient after adjustment the confounding risk factors of stroke [0.558 (0.382-0.815); P = 0.003], but not the association for plasma C4 levels. Meanwhile, serum SUA and WBC concentrations, TIA history are typically related to prognosis at 3 months after acute ischemic stroke. Our analysis does provide compelling information regarding the baseline complement C3 levels in young ischemic stroke patients as possible predictors of early prognosis after 3 months of acute phase. Thus, our results must be seen as a hypothesis only and will have to be confirmed in larger trials.
A simple approach to power and sample size calculations in logistic regression and Cox regression models.

Science.gov (United States)

Vaeth, Michael; Skovlund, Eva

2004-06-15

For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Reliability of the Load-Velocity Relationship Obtained Through Linear and Polynomial Regression Models to Predict the One-Repetition Maximum Load.

Science.gov (United States)

Pestaña-Melero, Francisco Luis; Haff, G Gregory; Rojas, Francisco Javier; Pérez-Castilla, Alejandro; García-Ramos, Amador

2017-12-18

This study aimed to compare the between-session reliability of the load-velocity relationship between (1) linear vs. polynomial regression models, (2) concentric-only vs. eccentric-concentric bench press variants, as well as (3) the within-participants vs. the between-participants variability of the velocity attained at each percentage of the one-repetition maximum (%1RM). The load-velocity relationship of 30 men (age: 21.2±3.8 y; height: 1.78±0.07 m, body mass: 72.3±7.3 kg; bench press 1RM: 78.8±13.2 kg) were evaluated by means of linear and polynomial regression models in the concentric-only and eccentric-concentric bench press variants in a Smith Machine. Two sessions were performed with each bench press variant. The main findings were: (1) first-order-polynomials (CV: 4.39%-4.70%) provided the load-velocity relationship with higher reliability than second-order-polynomials (CV: 4.68%-5.04%); (2) the reliability of the load-velocity relationship did not differ between the concentric-only and eccentric-concentric bench press variants; (3) the within-participants variability of the velocity attained at each %1RM was markedly lower than the between-participants variability. Taken together, these results highlight that, regardless of the bench press variant considered, the individual determination of the load-velocity relationship by a linear regression model could be recommended to monitor and prescribe the relative load in the Smith machine bench press exercise.
Soil moisture estimation using multi linear regression with terraSAR-X data

Directory of Open Access Journals (Sweden)

G. García

2016-06-01

Full Text Available The first five centimeters of soil form an interface where the main heat fluxes exchanges between the land surface and the atmosphere occur. Besides ground measurements, remote sensing has proven to be an excellent tool for the monitoring of spatial and temporal distributed data of the most relevant Earth surface parameters including soil’s parameters. Indeed, active microwave sensors (Synthetic Aperture Radar - SAR offer the opportunity to monitor soil moisture (HS at global, regional and local scales by monitoring involved processes. Several inversion algorithms, that derive geophysical information as HS from SAR data, were developed. Many of them use electromagnetic models for simulating the backscattering coefficient and are based on statistical techniques, such as neural networks, inversion methods and regression models. Recent studies have shown that simple multiple regression techniques yield satisfactory results. The involved geophysical variables in these methodologies are descriptive of the soil structure, microwave characteristics and land use. Therefore, in this paper we aim at developing a multiple linear regression model to estimate HS on flat agricultural regions using TerraSAR-X satellite data and data from a ground weather station. The results show that the backscatter, the precipitation and the relative humidity are the explanatory variables of HS. The results obtained presented a RMSE of 5.4 and a R2 of about 0.6
The Spatial Association Between Federally Qualified Health Centers and County-Level Reported Sexually Transmitted Infections: A Spatial Regression Approach.

Science.gov (United States)

Owusu-Edusei, Kwame; Gift, Thomas L; Leichliter, Jami S; Romaguera, Raul A

2018-02-01

The number of categorical sexually transmitted disease (STD) clinics is declining in the United States. Federally qualified health centers (FQHCs) have the potential to supplement the needed sexually transmitted infection (STI) services. In this study, we describe the spatial distribution of FQHC sites and determine if reported county-level nonviral STI morbidity were associated with having FQHC(s) using spatial regression techniques. We extracted map data from the Health Resources and Services Administration data warehouse on FQHCs (ie, geocoded health care service delivery [HCSD] sites) and extracted county-level data on the reported rates of chlamydia, gonorrhea and, primary and secondary (P&S) syphilis (2008-2012) from surveillance data. A 3-equation seemingly unrelated regression estimation procedure (with a spatial regression specification that controlled for county-level multiyear (2008-2012) demographic and socioeconomic factors) was used to determine the association between reported county-level STI morbidity and HCSD sites. Counties with HCSD sites had higher STI, poverty, unemployment, and violent crime rates than counties with no HCSD sites (P < 0.05). The number of HCSD sites was associated (P < 0.01) with increases in the temporally smoothed rates of chlamydia, gonorrhea, and P&S syphilis, but there was no significant association between the number of HCSD per 100,000 population and reported STI rates. There is a positive association between STI morbidity and the number of HCSD sites; however, this association does not exist when adjusting by population size. Further work may determine the extent to which HCSD sites can meet unmet needs for safety net STI services.
An artificial pancreas provided a novel model of blood glucose level variability in beagles.

Science.gov (United States)

Munekage, Masaya; Yatabe, Tomoaki; Kitagawa, Hiroyuki; Takezaki, Yuka; Tamura, Takahiko; Namikawa, Tsutomu; Hanazaki, Kazuhiro

2015-12-01

Although the effects on prognosis of blood glucose level variability have gained increasing attention, it is unclear whether blood glucose level variability itself or the manifestation of pathological conditions that worsen prognosis. Then, previous reports have not been published on variability models of perioperative blood glucose levels. The aim of this study is to establish a novel variability model of blood glucose concentration using an artificial pancreas. We maintained six healthy, male beagles. After anesthesia induction, a 20-G venous catheter was inserted in the right femoral vein and an artificial pancreas (STG-22, Nikkiso Co. Ltd., Tokyo, Japan) was connected for continuous blood glucose monitoring and glucose management. After achieving muscle relaxation, total pancreatectomy was performed. After 1 h of stabilization, automatic blood glucose control was initiated using the artificial pancreas. Blood glucose level varied for 8 h, alternating between the target blood glucose values of 170 and 70 mg/dL. Eight hours later, the experiment was concluded. Total pancreatectomy was performed for 62 ± 13 min. Blood glucose swings were achieved 9.8 ± 2.3 times. The average blood glucose level was 128.1 ± 5.1 mg/dL with an SD of 44.6 ± 3.9 mg/dL. The potassium levels after stabilization and at the end of the experiment were 3.5 ± 0.3 and 3.1 ± 0.5 mmol/L, respectively. In conclusion, the results of the present study demonstrated that an artificial pancreas contributed to the establishment of a novel variability model of blood glucose levels in beagles.
An Investigation of the Variables Predicting Faculty of Education Students' Speaking Anxiety through Ordinal Logistic Regression Analysis

Science.gov (United States)

Bozpolat, Ebru

2017-01-01

The purpose of this study is to determine whether Cumhuriyet University Faculty of Education students' levels of speaking anxiety are predicted by the variables of gender, department, grade, such sub-dimensions of "Speaking Self-Efficacy Scale for Pre-Service Teachers" as "public speaking," "effective speaking,"…
Comparing Kriging and Regression Approaches for Mapping Soil Clay Content in a diverse Danish Landscape

DEFF Research Database (Denmark)

Adhikari, Kabindra; Bou Kheir, Rania; Greve, Mette Balslev

2013-01-01

Information on the spatial variability of soil texture including soil clay content in a landscape is very important for agricultural and environmental use. Different prediction techniques are available to assess and map spatial variability of soil properties, but selecting the most suitable techn...... the prediction in OKst compared with that in OK, whereas RT showed the lowest performance of all (R2 = 0.52; RMSE = 0.52; and RPD = 1.17). We found RKrr to be an effective prediction method and recommend this method for any future soil mapping activities in Denmark....... technique at a given site has always been a major issue in all soil mapping applications. We studied the prediction performance of ordinary kriging (OK), stratified OK (OKst), regression trees (RT), and rule-based regression kriging (RKrr) for digital mapping of soil clay content at 30.4-m grid size using 6...
Analysis of the influence of quantile regression model on mainland tourists' service satisfaction performance.

Science.gov (United States)

Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

2014-01-01

It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Analysis of the Influence of Quantile Regression Model on Mainland Tourists' Service Satisfaction Performance

Science.gov (United States)

Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

2014-01-01

It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Analysis of the Influence of Quantile Regression Model on Mainland Tourists’ Service Satisfaction Performance

Directory of Open Access Journals (Sweden)

Wen-Cheng Wang

2014-01-01

Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Soft Sensor Modeling Based on Multiple Gaussian Process Regression and Fuzzy C-mean Clustering

Directory of Open Access Journals (Sweden)

Xianglin ZHU

2014-06-01

Full Text Available In order to overcome the difficulties of online measurement of some crucial biochemical variables in fermentation processes, a new soft sensor modeling method is presented based on the Gaussian process regression and fuzzy C-mean clustering. With the consideration that the typical fermentation process can be distributed into 4 phases including lag phase, exponential growth phase, stable phase and dead phase, the training samples are classified into 4 subcategories by using fuzzy C- mean clustering algorithm. For each sub-category, the samples are trained using the Gaussian process regression and the corresponding soft-sensing sub-model is established respectively. For a new sample, the membership between this sample and sub-models are computed based on the Euclidean distance, and then the prediction output of soft sensor is obtained using the weighting sum. Taking the Lysine fermentation as example, the simulation and experiment are carried out and the corresponding results show that the presented method achieves better fitting and generalization ability than radial basis function neutral network and single Gaussian process regression model.

High prevalence of cesarean section births in private sector health facilities- analysis of district level household survey-4 (DLHS-4) of India.

Science.gov (United States)

Singh, Priyanka; Hashmi, Gulfam; Swain, Prafulla Kumar

2018-05-10

Worldwide rising cesarean section (CS) births is an issue of concern. In India, with increase in institutional deliveries there has also been an increase in cesarean section births. Aim of the study is to quantify the prevalence of cesarean section births in public and private health facility, and also to determine the factors associated with cesarean section births. We analyzed data from district level household survey data 4 (DLHS-4) combined individual level dataset for 19 states/UTs of India comprising 24,398 deliveries resulting in 22,111 live births for year 2011. The percentages and Chi-square has been computed for the select variables viz. Socio demographic, maternal, antenatal care and delivery related based on type of births (CS Vs normal births). The multiple logistic regression model has been used to identify the potential risk factors associated with CS births. Of 22,111 live birth analyzed 49.2% were delivered at public sector, 31.9% at private sector and 18.9% were home deliveries. Prevalence of CS births were 13.7% (95% CI; 13.0- 14.3%) and 37.9% (95% CI; 36.7- 39.0%) in the public and private sectors, respectively. Higher odds of CS births were observed with- delivery at private health facility (OR 3.79; 95% C.I 3.06-4.72), urban residence (OR 1.15; 95% C.I 1.00- 1.35), first delivery after 35 years of maternal age (OR 5.5; 95% C.I 1.85- 16.4), hypertension in pregnancy (OR 1.32; 95% C.I 1.06- 1.65) and breach presentation (OR 2.37; 95% C.I. 1.63- 3.43). Our findings shows that CS births are nearly three times more in private as compared to public sector health facilities.The higher rates of CS births, especially in private sector, not only increase the cost of care but may pose unnecessary risks to women (when there is no indications for CS). The government of India need to take measures to strengthen existing public health facilities as well as ensure that cesarean sections are performed based upon medical indications in both public and private
Logarithmic Transformations in Regression: Do You Transform Back Correctly?

Science.gov (United States)

Dambolena, Ismael G.; Eriksen, Steven E.; Kopcso, David P.

2009-01-01

The logarithmic transformation is often used in regression analysis for a variety of purposes such as the linearization of a nonlinear relationship between two or more variables. We have noticed that when this transformation is applied to the response variable, the computation of the point estimate of the conditional mean of the original response…
Comparative study of airborne Alternaria conidia levels in two cities in Castilla-La Mancha (central Spain), and correlations with weather-related variables.

Science.gov (United States)

Sabariego, Silvia; Bouso, Veronica; Pérez-Badia, Rosa

2012-01-01

Alternaria conidia are among the airborne biological particles known to trigger allergic respiratory diseases. The presented paper reports on a study of seasonal variations in airborne Alternaria conidia concentrations in 2 cities in the central Spanish region of Castilla-La Mancha, Albacete and Toledo. The influence of weather-related variables on airborne conidia levels and distribution was also analysed. Sampling was carried out from 2008-2010 using a Hirst sampler, following the methodology established by the Spanish Aerobiology Network. Annual airborne Alternaria conidia counts were higher in Toledo (annual mean 3,936 conidia) than in Albacete (annual mean 2,268 conidia). Conidia were detected in the air throughout the year, but levels peaked between May-September. Considerable year-on-year variations were recorded both in total annual counts and in seasonal distribution. A significant positive correlation was generally found between mean daily Alternaria counts and both temperature and hours of sunlight, while a significant negative correlation was recorded for relative humidity, daily and cumulative rainfall, and wind speed. Regression models indicated that between 31%-52% of the variation in airborne Alternaria conidia concentrations could be explained by weather-related variables.
[Application of detecting and taking overdispersion into account in Poisson regression model].

Science.gov (United States)

Bouche, G; Lepage, B; Migeot, V; Ingrand, P

2009-08-01

Researchers often use the Poisson regression model to analyze count data. Overdispersion can occur when a Poisson regression model is used, resulting in an underestimation of variance of the regression model parameters. Our objective was to take overdispersion into account and assess its impact with an illustration based on the data of a study investigating the relationship between use of the Internet to seek health information and number of primary care consultations. Three methods, overdispersed Poisson, a robust estimator, and negative binomial regression, were performed to take overdispersion into account in explaining variation in the number (Y) of primary care consultations. We tested overdispersion in the Poisson regression model using the ratio of the sum of Pearson residuals over the number of degrees of freedom (chi(2)/df). We then fitted the three models and compared parameter estimation to the estimations given by Poisson regression model. Variance of the number of primary care consultations (Var[Y]=21.03) was greater than the mean (E[Y]=5.93) and the chi(2)/df ratio was 3.26, which confirmed overdispersion. Standard errors of the parameters varied greatly between the Poisson regression model and the three other regression models. Interpretation of estimates from two variables (using the Internet to seek health information and single parent family) would have changed according to the model retained, with significant levels of 0.06 and 0.002 (Poisson), 0.29 and 0.09 (overdispersed Poisson), 0.29 and 0.13 (use of a robust estimator) and 0.45 and 0.13 (negative binomial) respectively. Different methods exist to solve the problem of underestimating variance in the Poisson regression model when overdispersion is present. The negative binomial regression model seems to be particularly accurate because of its theorical distribution ; in addition this regression is easy to perform with ordinary statistical software packages.
Regression Levels of Selected Affective Factors on Science Achievement: A Structural Equation Model with TIMSS 2011 Data

Science.gov (United States)

Akilli, Mustafa

2015-01-01

The aim of this study is to demonstrate the science success regression levels of chosen emotional features of 8th grade students using Structural Equation Model. The study was conducted by the analysis of students' questionnaires and science success in TIMSS 2011 data using SEM. Initially, the factors that are thought to have an effect on science…
COMPARISON OF PARTIAL LEAST SQUARES REGRESSION METHOD ALGORITHMS: NIPALS AND PLS-KERNEL AND AN APPLICATION

Directory of Open Access Journals (Sweden)

ELİF BULUT

2013-06-01

Full Text Available Partial Least Squares Regression (PLSR is a multivariate statistical method that consists of partial least squares and multiple linear regression analysis. Explanatory variables, X, having multicollinearity are reduced to components which explain the great amount of covariance between explanatory and response variable. These components are few in number and they don’t have multicollinearity problem. Then multiple linear regression analysis is applied to those components to model the response variable Y. There are various PLSR algorithms. In this study NIPALS and PLS-Kernel algorithms will be studied and illustrated on a real data set.
Differentiating regressed melanoma from regressed lichenoid keratosis.

Science.gov (United States)

Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

2017-04-01

Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

Science.gov (United States)

Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

2011-01-01

Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
JT-60 configuration parameters for feedback control determined by regression analysis

Energy Technology Data Exchange (ETDEWEB)

Matsukawa, Makoto; Hosogane, Nobuyuki; Ninomiya, Hiromasa (Japan Atomic Energy Research Inst., Naka, Ibaraki (Japan). Naka Fusion Research Establishment)

1991-12-01

The stepwise regression procedure was applied to obtain measurement formulas for equilibrium parameters used in the feedback control of JT-60. This procedure automatically selects variables necessary for the measurements, and selects a set of variables which are not likely to be picked up by physical considerations. Regression equations with stable and small multicollinearity were obtained and it was experimentally confirmed that the measurement formulas obtained through this procedure were accurate enough to be applicable to the feedback control of plasma configurations in JT-60. (author).
JT-60 configuration parameters for feedback control determined by regression analysis

International Nuclear Information System (INIS)

Matsukawa, Makoto; Hosogane, Nobuyuki; Ninomiya, Hiromasa

1991-12-01

The stepwise regression procedure was applied to obtain measurement formulas for equilibrium parameters used in the feedback control of JT-60. This procedure automatically selects variables necessary for the measurements, and selects a set of variables which are not likely to be picked up by physical considerations. Regression equations with stable and small multicollinearity were obtained and it was experimentally confirmed that the measurement formulas obtained through this procedure were accurate enough to be applicable to the feedback control of plasma configurations in JT-60. (author)
Development and application of the variable focus laser leveling gage

International Nuclear Information System (INIS)

Gong Kun; Ma Jinglong

2005-01-01

The variable focus laser leveling gage was developed. The performance and structure were introduced. The several alignments and tests in KrF laser angle multi-path optical system were accomplished with them. Its application in other optical equipment was discussed too. (author)
Using Logistic Regression to Predict the Probability of Debris Flows in Areas Burned by Wildfires, Southern California, 2003-2006

Science.gov (United States)

Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.

2008-01-01

Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of
Discriminative Elastic-Net Regularized Linear Regression.

Science.gov (United States)

Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen

2017-03-01

In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.
Predicting company growth using logistic regression and neural networks

Directory of Open Access Journals (Sweden)

Marijana Zekić-Sušac

2016-12-01

Full Text Available The paper aims to establish an efficient model for predicting company growth by leveraging the strengths of logistic regression and neural networks. A real dataset of Croatian companies was used which described the relevant industry sector, financial ratios, income, and assets in the input space, with a dependent binomial variable indicating whether a company had high-growth if it had annualized growth in assets by more than 20% a year over a three-year period. Due to a large number of input variables, factor analysis was performed in the pre -processing stage in order to extract the most important input components. Building an efficient model with a high classification rate and explanatory ability required application of two data mining methods: logistic regression as a parametric and neural networks as a non -parametric method. The methods were tested on the models with and without variable reduction. The classification accuracy of the models was compared using statistical tests and ROC curves. The results showed that neural networks produce a significantly higher classification accuracy in the model when incorporating all available variables. The paper further discusses the advantages and disadvantages of both approaches, i.e. logistic regression and neural networks in modelling company growth. The suggested model is potentially of benefit to investors and economic policy makers as it provides support for recognizing companies with growth potential, especially during times of economic downturn.
Sea Level Trend and Variability in the Straits of Singapore and Malacca

Science.gov (United States)

Luu, Q.; Tkalich, P.

2013-12-01

The Straits of Singapore and Malacca (SSM) connect the Andaman Sea located northeast of the Indian Ocean to the South China Sea, the largest marginal sea situated in the tropical Pacific Ocean. Consequently, sea level in the SSM is assumed to be governed by various regional phenomena associated with the adjacent parts of Indian and Pacific Oceans. At annual scale sea level variability is dominant by the Asian monsoon. Interannual sea level signals are modulated by the El Niño-Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD). In the long term, regional sea level is driven by the global climate change. However, relative impacts of these multi-scale phenomena on regional sea level in the SSM are yet to be quantified. In present study, publicly available tide gauge records and satellite altimetry data are used to derive long-term sea level trend and variability in SSM. We used the data from research-quality stations, including four located in the Singapore Strait (Tanjong Pagar, Raffles Lighthouse, Sultan Shoal and Sembawang) and seven situated in the Malacca Strait (Kelang, Keling, Kukup, Langkawji, Lumut, Penang and Ko Taphao Noi), each one having 25-39 year data up to the year 2011. Harmonic analysis is performed to filter out astronomic tides from the tide gauge records when necessary; and missing data are reconstructed using identified relationships between sea level and the governing phenomena. The obtained sea level anomalies (SLAs) and reconstructed mean sea level are then validated against satellite altimetry data from AVISO. At multi-decadal scale, annual measured sea level in the SSM is varying with global mean sea level, rising for the period 1984-2009 at the rate 1.8-2.3 mm/year in the Singapore Strait and 1.1-2.8 mm/year in the Malacca Strait. Interannual regional sea level drops are associated with El Niño events, while the rises are correlated with La Niña episodes; both variations are in the range of ×5 cm with correlation coefficient
Caudal Regression and Encephalocele: Rare Manifestations of Expanded Goldenhar Complex

Directory of Open Access Journals (Sweden)

Gabriella D’Angelo

2017-01-01

Full Text Available Oculoauriculovertebral spectrum, or Goldenhar Syndrome, is a condition characterized by variable degrees of uni- or bilateral involvement of craniofacial structures, ocular anomalies, and vertebral defects. Its expressivity is variable; therefore, the term “expanded Goldenhar complex” has been coined. The Goldenhar Syndrome usually involves anomalies in craniofacial structures, but it is known that nervous system anomalies, including encephalocele or caudal regression, may, rarely, occur in this condition. We report two rare cases of infants affected by Goldenhar Syndrome, associated with neural tube defects, specifically caudal regression syndrome and nasal encephaloceles, to underline the extremely complex and heterogeneous clinical features of this oculoauriculovertebral spectrum. These additional particular cases could increase the number of new variable spectrums to be included in the “expanded Goldenhar complex.”
LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA

Directory of Open Access Journals (Sweden)

Ersin Yılmaz

2016-05-01

Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for the estimation of the model. With the weights regression model will be consistent and unbiased with that. And also there is a method for the censored data that is a semi parametric regression and this method also give useful results for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.
The Collinearity Free and Bias Reduced Regression Estimation Project: The Theory of Normalization Ridge Regression. Report No. 2.

Science.gov (United States)

Bulcock, J. W.; And Others

Multicollinearity refers to the presence of highly intercorrelated independent variables in structural equation models, that is, models estimated by using techniques such as least squares regression and maximum likelihood. There is a problem of multicollinearity in both the natural and social sciences where theory formulation and estimation is in…
Gender effects in gaming research: a case for regression residuals?

Science.gov (United States)

Pfister, Roland

2011-10-01

Numerous recent studies have examined the impact of video gaming on various dependent variables, including the players' affective reactions, positive as well as detrimental cognitive effects, and real-world aggression. These target variables are typically analyzed as a function of game characteristics and player attributes-especially gender. However, findings on the uneven distribution of gaming experience between males and females, on the one hand, and the effect of gaming experience on several target variables, on the other hand, point at a possible confound when gaming experiments are analyzed with a standard analysis of variance. This study uses simulated data to exemplify analysis of regression residuals as a potentially beneficial data analysis strategy for such datasets. As the actual impact of gaming experience on each of the various dependent variables differs, the ultimate benefits of analysis of regression residuals entirely depend on the research question, but it offers a powerful statistical approach to video game research whenever gaming experience is a confounding factor.
On macroeconomic values investigation using fuzzy linear regression analysis

Directory of Open Access Journals (Sweden)

Richard Pospíšil

2017-06-01

Full Text Available The theoretical background for abstract formalization of the vague phenomenon of complex systems is the fuzzy set theory. In the paper, vague data is defined as specialized fuzzy sets - fuzzy numbers and there is described a fuzzy linear regression model as a fuzzy function with fuzzy numbers as vague parameters. To identify the fuzzy coefficients of the model, the genetic algorithm is used. The linear approximation of the vague function together with its possibility area is analytically and graphically expressed. A suitable application is performed in the tasks of the time series fuzzy regression analysis. The time-trend and seasonal cycles including their possibility areas are calculated and expressed. The examples are presented from the economy field, namely the time-development of unemployment, agricultural production and construction respectively between 2009 and 2011 in the Czech Republic. The results are shown in the form of the fuzzy regression models of variables of time series. For the period 2009-2011, the analysis assumptions about seasonal behaviour of variables and the relationship between them were confirmed; in 2010, the system behaved fuzzier and the relationships between the variables were vaguer, that has a lot of causes, from the different elasticity of demand, through state interventions to globalization and transnational impacts.

Sea level trend and variability in the Singapore Strait

Digital Repository Service at National Institute of Oceanography (India)

Tkalich, P.; Vethamony, P.; Luu, Q.-H.; Babu, M.T.

www.ocean-sci.net/9/293/2013/ doi:10.5194/os-9-293-2013 © Author(s) 2013. CC Attribution 3.0 License. EGU Journal Logos (RGB) Advances in Geosciences O pen A ccess Natural Hazards and Earth System Sciences O pen A ccess Annales Geophysicae O pen A... Sci., 9, 293–300, 2013 www.ocean-sci.net/9/293/2013/ P. Tkalich et al.: Sea level in Singapore Strait 295 likely to be the cause for modulating the inter-annual sea level variability associated with ENSO. On the Sunda Shelf and particularly in SS, our...
Robust Methods for Moderation Analysis with a Two-Level Regression Model.

Science.gov (United States)

Yang, Miao; Yuan, Ke-Hai

2016-01-01

Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.
TNF-α levels in cancer patients relate to social variables

Science.gov (United States)

Marucha, Phillip T.; Crespin, Timothy R.; Shelby, Rebecca A.; Andersen, Barbara L.

2008-01-01

Tumor necrosis factor-α (TNF-α) is an important cytokine associated with tumor regression and increased survival time for cancer patients. Research evidence relates immune factors (e.g., natural killer (NK) cell counts, NK cell lysis, lymphocyte profile, and lymphocyte proliferation) to the frequency and quality of social relations among cancer patients. We hypothesized that disruptions in social relations would be associated with lower TNF-α responses, and conversely, that reports of positive changes in social relations correlate with stronger responses. A prospective design measured changes in social activity and relationship satisfaction with a partner in 44 breast cancer patients at the time of cancer diagnosis, and initial surgery and 12 months later. Results indicated that patients reporting increased social activities or satisfaction exhibited stronger stimulated TNF-α responses. This is the first study to link changes in patient social relations with a cancer-relevant immune variable. PMID:15890493
Depth-weighted robust multivariate regression with application to sparse data

KAUST Repository

Dutta, Subhajit; Genton, Marc G.

2017-01-01

A robust method for multivariate regression is developed based on robust estimators of the joint location and scatter matrix of the explanatory and response variables using the notion of data depth. The multivariate regression estimator possesses desirable affine equivariance properties, achieves the best breakdown point of any affine equivariant estimator, and has an influence function which is bounded in both the response as well as the predictor variable. To increase the efficiency of this estimator, a re-weighted estimator based on robust Mahalanobis distances of the residual vectors is proposed. In practice, the method is more stable than existing methods that are constructed using subsamples of the data. The resulting multivariate regression technique is computationally feasible, and turns out to perform better than several popular robust multivariate regression methods when applied to various simulated data as well as a real benchmark data set. When the data dimension is quite high compared to the sample size it is still possible to use meaningful notions of data depth along with the corresponding depth values to construct a robust estimator in a sparse setting.
Depth-weighted robust multivariate regression with application to sparse data

KAUST Repository

Dutta, Subhajit

2017-04-05

A robust method for multivariate regression is developed based on robust estimators of the joint location and scatter matrix of the explanatory and response variables using the notion of data depth. The multivariate regression estimator possesses desirable affine equivariance properties, achieves the best breakdown point of any affine equivariant estimator, and has an influence function which is bounded in both the response as well as the predictor variable. To increase the efficiency of this estimator, a re-weighted estimator based on robust Mahalanobis distances of the residual vectors is proposed. In practice, the method is more stable than existing methods that are constructed using subsamples of the data. The resulting multivariate regression technique is computationally feasible, and turns out to perform better than several popular robust multivariate regression methods when applied to various simulated data as well as a real benchmark data set. When the data dimension is quite high compared to the sample size it is still possible to use meaningful notions of data depth along with the corresponding depth values to construct a robust estimator in a sparse setting.
Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable.

Science.gov (United States)

Austin, Peter C; Steyerberg, Ewout W

2012-06-20

When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.
Variable selection in multivariate calibration based on clustering of variable concept.

Science.gov (United States)

Farrokhnia, Maryam; Karimi, Sadegh

2016-01-01

Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached. Copyright © 2015 Elsevier B.V. All rights reserved.
Adaptive metric kernel regression

DEFF Research Database (Denmark)

Goutte, Cyril; Larsen, Jan

2000-01-01

Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...
Explaining the heterogeneous scrapie surveillance figures across Europe: a meta-regression approach

Directory of Open Access Journals (Sweden)

Ru Giuseppe

2007-06-01

Full Text Available Abstract Background Two annual surveys, the abattoir and the fallen stock, monitor the presence of scrapie across Europe. A simple comparison between the prevalence estimates in different countries reveals that, in 2003, the abattoir survey appears to detect more scrapie in some countries. This is contrary to evidence suggesting the greater ability of the fallen stock survey to detect the disease. We applied meta-analysis techniques to study this apparent heterogeneity in the behaviour of the surveys across Europe. Furthermore, we conducted a meta-regression analysis to assess the effect of country-specific characteristics on the variability. We have chosen the odds ratios between the two surveys to inform the underlying relationship between them and to allow comparisons between the countries under the meta-regression framework. Baseline risks, those of the slaughtered populations across Europe, and country-specific covariates, available from the European Commission Report, were inputted in the model to explain the heterogeneity. Results Our results show the presence of significant heterogeneity in the odds ratios between countries and no reduction in the variability after adjustment for the different risks in the baseline populations. Three countries contributed the most to the overall heterogeneity: Germany, Ireland and The Netherlands. The inclusion of country-specific covariates did not, in general, reduce the variability except for one variable: the proportion of the total adult sheep population sampled as fallen stock by each country. A large residual heterogeneity remained in the model indicating the presence of substantial effect variability between countries. Conclusion The meta-analysis approach was useful to assess the level of heterogeneity in the implementation of the surveys and to explore the reasons for the variation between countries.
Regression away from the mean: Theory and examples.

Science.gov (United States)

Schwarz, Wolf; Reike, Dennis

2018-02-01

Using a standard repeated measures model with arbitrary true score distribution and normal error variables, we present some fundamental closed-form results which explicitly indicate the conditions under which regression effects towards (RTM) and away from the mean are expected. Specifically, we show that for skewed and bimodal distributions many or even most cases will show a regression effect that is in expectation away from the mean, or that is not just towards but actually beyond the mean. We illustrate our results in quantitative detail with typical examples from experimental and biometric applications, which exhibit a clear regression away from the mean ('egression from the mean') signature. We aim not to repeal cautionary advice against potential RTM effects, but to present a balanced view of regression effects, based on a clear identification of the conditions governing the form that regression effects take in repeated measures designs. © 2017 The British Psychological Society.
ANALYSIS OF THE FINANCIAL PERFORMANCES OF THE FIRM, BY USING THE MULTIPLE REGRESSION MODEL

Directory of Open Access Journals (Sweden)

Constantin Anghelache

2011-11-01

Full Text Available The information achieved through the use of simple linear regression are not always enough to characterize the evolution of an economic phenomenon and, furthermore, to identify its possible future evolution. To remedy these drawbacks, the special literature includes multiple regression models, in which the evolution of the dependant variable is defined depending on two or more factorial variables.
A regression approach for zircaloy-2 in-reactor creep constitutive equations

International Nuclear Information System (INIS)

Yung Liu, Y.; Bement, A.L.

1977-01-01

In this paper the methodology of multiple regressions as applied to zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. From data analysis and model development point of views, both the assumption of independence and prior committment to specific model forms are unacceptable. One would desire means which can not only estimate the required parameters directly from data but also provide basis for model selections, viz., one model against others. Basic understanding of the physics of deformation is important in choosing the forms of starting physical model equations, but the justifications must rely on their abilities in correlating the overall data. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) when there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets, (2) regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections
Area-Level and Individual-Level Factors for Teenage Motherhood: A Multilevel Analysis in Japan.

Science.gov (United States)

Baba, Sachiko; Iso, Hiroyasu; Fujiwara, Takeo

2016-01-01

Teenage motherhood is strongly associated with a range of disadvantages for both the mother and the child. No epidemiological studies have examined related factors for teenage motherhood at both area and individual levels among Japanese women. Therefore, we performed a multilevel analysis of nationwide data in Japan to explore the association of area- and individual-level factors with teenage motherhood. The study population comprised 21,177 mothers living in 47 prefectures who had their first, singleton baby between 10 and 17 January or between 10 and 17 July, 2001. Information on the prefecture in which the mothers resided was linked to prefecture-level variables. Primary outcomes were area-level characteristics (single-mother households, three-generation households, college enrollment, abortions, juvenile crime, and per capita income) and individual-level characteristics, and divided into tertiles or quintiles based on their variable distributions. Multilevel logistic regression analysis was then performed. There were 440 teenage mothers (2.1%) in this study. In addition to individual low level of education [adjusted odds ratio (OR), 7.40; 95% confidence interval (CI), 5.59-9.78], low income [4.23 (2.95-6.08)], and smoking [1.65 (1.31-2.07)], high proportions of single-mother households [1.72 (1.05-2.80)] and three-generation household [1.81 (1.17-2.78)], and per capita income [2.19 (1.06-3.81)] at an area level were positively associated, and high level of college enrollment [0.46 (0.25-0.83)] and lower crime rate [0.62 (0.40-0.98)] at area level were inversely associated with teenage motherhood compared with the corresponding women living in prefectures with the lowest levels of these variables. Our findings suggest that encouraging the completion of higher education and reducing the number of single-mother household at an area level may be important public health strategies to reduce teenage motherhood.
Regression analysis using dependent Polya trees.

Science.gov (United States)

Schörgendorfer, Angela; Branscum, Adam J

2013-11-30

Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Puffed-up but shaky selves: State self-esteem level and variability in narcissists.

Science.gov (United States)

Geukes, Katharina; Nestler, Steffen; Hutteman, Roos; Dufner, Michael; Küfner, Albrecht C P; Egloff, Boris; Denissen, Jaap J A; Back, Mitja D

2017-05-01

Different theoretical conceptualizations characterize grandiose narcissists by high, yet fragile self-esteem. Empirical evidence, however, has been inconsistent, particularly regarding the relationship between narcissism and self-esteem fragility (i.e., self-esteem variability). Here, we aim at unraveling this inconsistency by disentangling the effects of two theoretically distinct facets of narcissism (i.e., admiration and rivalry) on the two aspects of state self-esteem (i.e., level and variability). We report on data from a laboratory-based and two field-based studies (total N = 596) in realistic social contexts, capturing momentary, daily, and weekly fluctuations of state self-esteem. To estimate unbiased effects of narcissism on the level and variability of self-esteem within one model, we applied mixed-effects location scale models. Results of the three studies and their meta-analytical integration indicated that narcissism is positively linked to self-esteem level and variability. When distinguishing between admiration and rivalry, however, an important dissociation was identified: Admiration was related to high (and rather stable) levels of state self-esteem, whereas rivalry was related to (rather low and) fragile self-esteem. Analyses on underlying processes suggest that effects of rivalry on self-esteem variability are based on stronger decreases in self-esteem from one assessment to the next, particularly after a perceived lack of social inclusion. The revealed differentiated effects of admiration and rivalry explain why the analysis of narcissism as a unitary concept has led to the inconsistent past findings and provide deeper insights into the intrapersonal dynamics of grandiose narcissism governing state self-esteem. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
A Diagrammatic Exposition of Regression and Instrumental Variables for the Beginning Student

Science.gov (United States)

Foster, Gigi

2009-01-01

Some beginning students of statistics and econometrics have difficulty with traditional algebraic approaches to explaining regression and related techniques. For these students, a simple and intuitive diagrammatic introduction as advocated by Kennedy (2008) may prove a useful framework to support further study. The author presents a series of…
Regression and artificial neural network modeling for the prediction of gray leaf spot of maize.

Science.gov (United States)

Paul, P A; Munkvold, G P

2005-04-01

ABSTRACT Regression and artificial neural network (ANN) modeling approaches were combined to develop models to predict the severity of gray leaf spot of maize, caused by Cercospora zeae-maydis. In all, 329 cases consisting of environmental, cultural, and location-specific variables were collected for field plots in Iowa between 1998 and 2002. Disease severity on the ear leaf at the dough to dent plant growth stage was used as the response variable. Correlation and regression analyses were performed to select potentially useful predictor variables. Predictors from the best 9 of 80 regression models were used to develop ANN models. A random sample of 60% of the cases was used to train the networks, and 20% each for testing and validation. Model performance was evaluated based on coefficient of determination (R(2)) and mean square error (MSE) for the validation data set. The best models had R(2) ranging from 0.70 to 0.75 and MSE ranging from 174.7 to 202.8. The most useful predictor variables were hours of daily temperatures between 22 and 30 degrees C (85.50 to 230.50 h) and hours of nightly relative humidity >/=90% (122 to 330 h) for the period between growth stages V4 and V12, mean nightly temperature (65.26 to 76.56 degrees C) for the period between growth stages V12 and R2, longitude (90.08 to 95.14 degrees W), maize residue on the soil surface (0 to 100%), planting date (in day of the year; 112 to 182), and gray leaf spot resistance rating (2 to 7; based on a 1-to-9 scale, where 1 = most susceptible to 9 = most resistant).
Optimal Inference for Instrumental Variables Regression with non-Gaussian Errors

DEFF Research Database (Denmark)

Cattaneo, Matias D.; Crump, Richard K.; Jansson, Michael

This paper is concerned with inference on the coefficient on the endogenous regressor in a linear instrumental variables model with a single endogenous regressor, nonrandom exogenous regressors and instruments, and i.i.d. errors whose distribution is unknown. It is shown that under mild smoothness...
Adaptive Metric Kernel Regression

DEFF Research Database (Denmark)

Goutte, Cyril; Larsen, Jan

1998-01-01

Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression...... by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...
Applied Prevalence Ratio estimation with different Regression models: An example from a cross-national study on substance use research.

Science.gov (United States)

Espelt, Albert; Marí-Dell'Olmo, Marc; Penelo, Eva; Bosque-Prous, Marina

2016-06-14

To examine the differences between Prevalence Ratio (PR) and Odds Ratio (OR) in a cross-sectional study and to provide tools to calculate PR using two statistical packages widely used in substance use research (STATA and R). We used cross-sectional data from 41,263 participants of 16 European countries participating in the Survey on Health, Ageing and Retirement in Europe (SHARE). The dependent variable, hazardous drinking, was calculated using the Alcohol Use Disorders Identification Test - Consumption (AUDIT-C). The main independent variable was gender. Other variables used were: age, educational level and country of residence. PR of hazardous drinking in men with relation to women was estimated using Mantel-Haenszel method, log-binomial regression models and poisson regression models with robust variance. These estimations were compared to the OR calculated using logistic regression models. Prevalence of hazardous drinkers varied among countries. Generally, men have higher prevalence of hazardous drinking than women [PR=1.43 (1.38-1.47)]. Estimated PR was identical independently of the method and the statistical package used. However, OR overestimated PR, depending on the prevalence of hazardous drinking in the country. In cross-sectional studies, where comparisons between countries with differences in the prevalence of the disease or condition are made, it is advisable to use PR instead of OR.

Application of range-test in multiple linear regression analysis in ...

African Journals Online (AJOL)

Application of range-test in multiple linear regression analysis in the presence of outliers is studied in this paper. First, the plot of the explanatory variables (i.e. Administration, Social/Commercial, Economic services and Transfer) on the dependent variable (i.e. GDP) was done to identify the statistical trend over the years.
Integrating High Levels of Variable Renewable Energy into Electric Power Systems

Energy Technology Data Exchange (ETDEWEB)

Kroposki, Benjamin D. [National Renewable Energy Laboratory (NREL), Golden, CO (United States)

2017-08-01

As more variable renewable energy is integrated into electric power systems, there are a range of challenges and solutions to accommodating very high penetration levels. This presentation highlights some of the recent research in this area.
Measuring the surgical 'learning curve': methods, variables and competency.

Science.gov (United States)

Khan, Nuzhath; Abboudi, Hamid; Khan, Mohammed Shamim; Dasgupta, Prokar; Ahmed, Kamran

2014-03-01

To describe how learning curves are measured and what procedural variables are used to establish a 'learning curve' (LC). To assess whether LCs are a valuable measure of competency. A review of the surgical literature pertaining to LCs was conducted using the Medline and OVID databases. Variables should be fully defined and when possible, patient-specific variables should be used. Trainee's prior experience and level of supervision should be quantified; the case mix and complexity should ideally be constant. Logistic regression may be used to control for confounding variables. Ideally, a learning plateau should reach a predefined/expert-derived competency level, which should be fully defined. When the group splitting method is used, smaller cohorts should be used in order to narrow the range of the LC. Simulation technology and competence-based objective assessments may be used in training and assessment in LC studies. Measuring the surgical LC has potential benefits for patient safety and surgical education. However, standardisation in the methods and variables used to measure LCs is required. Confounding variables, such as participant's prior experience, case mix, difficulty of procedures and level of supervision, should be controlled. Competency and expert performance should be fully defined. © 2013 The Authors. BJU International © 2013 BJU International.
Effects of soybean resistance on variability in life history traits of the higher trophic level parasitoid Meteorus pulchricornis (Hymenoptera: Braconidae).

Science.gov (United States)

Li, X; Li, B; Xing, G; Meng, L

2017-02-01

To extrapolate the influence of plant cultivars varying in resistance levels to hosts on parasitoid life history traits, we estimated variation in parasitoid developmental and reproductive performances as a function of resistance in soybean cultivars, which were randomly chosen from a line of resistant genotypes. Our study showed that the parasitoid Meteorus pulchricornis varied widely in offspring survival and lifetime fecundity, but varied slightly in development time and adult body size, in response to the soybean cultivars that varied in resistance to the host Spodoptera litura. Furthermore, the variability in survival and lifetime fecundity was different between attacking the 2nd and the 4th instar host larvae, varying more in survival but less in lifetime fecundity when attacking the 4th than 2nd instar larvae. Our study provides further evidence supporting that plant resistance to herbivorous hosts have variable effects on different life history traits of higher trophic level parasitoids.
Possible Increase in Serum FABP4 Level Despite Adiposity Reduction by Canagliflozin, an SGLT2 Inhibitor.

Directory of Open Access Journals (Sweden)

Masato Furuhashi

Full Text Available Fatty acid-binding protein 4 (FABP4/A-FABP/aP2 is secreted from adipocytes in association with catecholamine-induced lipolysis, and elevated serum FABP4 level is associated with obesity, insulin resistance and atherosclerosis. Secreted FABP4 as a novel adipokine leads to insulin resistance via increased hepatic glucose production (HGP. Sodium-glucose cotransporter 2 (SGLT2 inhibitors decrease blood glucose level via increased urinary glucose excretion, though HGP is enhanced. Here we investigated whether canagliflozin, an SGLT2 inhibitor, modulates serum FABP4 level.Canagliflozin (100 mg/day was administered to type 2 diabetic patients (n = 39 for 12 weeks. Serum FABP4 level was measured before and after treatment.At baseline, serum FABP4 level was correlated with adiposity, renal dysfunction and noradrenaline level. Treatment with canagliflozin significantly decreased adiposity and levels of fasting glucose and HbA1c but increased average serum FABP4 level by 10.3% (18.0 ± 1.0 vs. 19.8 ± 1.2 ng/ml, P = 0.008, though elevation of FABP4 level after treatment was observed in 26 (66.7% out of 39 patients. Change in FABP4 level was positively correlated with change in levels of fasting glucose (r = 0.329, P = 0.044, HbA1c (r = 0.329, P = 0.044 and noradrenaline (r = 0.329, P = 0.041 but was not significantly correlated with change in adiposity or other variables.Canagliflozin paradoxically increases serum FABP4 level in some diabetic patients despite amelioration of glucose metabolism and adiposity reduction, possibly via induction of catecholamine-induced lipolysis in adipocytes. Increased FABP4 level by canagliflozin may undermine the improvement of glucose metabolism and might be a possible mechanism of increased HGP by inhibition of SGLT2.UMIN-CTR Clinical Trial UMIN000018151.
A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method.

Science.gov (United States)

Yang, Jun-He; Cheng, Ching-Hsue; Chan, Chia-Pan

2017-01-01

Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.
The Evolution of Power System Planning with High Levels of Variable Renewable Generation

Energy Technology Data Exchange (ETDEWEB)

Katz, Jessica [National Renewable Energy Lab. (NREL), Golden, CO (United States); Milligan, Michael [National Renewable Energy Lab. (NREL), Golden, CO (United States)

2016-09-01

Greening the Grid provides technical assistance to energy system planners, regulators, and grid operators to overcome challenges associated with integrating variable renewable energy into the grid. This document, part of the Greening the Grid introduces the evolution of power system planning with high levels of variable renewable generation.
Seasonal Variability of Aragonite Saturation State in the North Pacific Ocean Predicted by Multiple Linear Regression

Science.gov (United States)

Kim, T. W.; Park, G. H.

2014-12-01

Seasonal variation of aragonite saturation state (Ωarag) in the North Pacific Ocean (NPO) was investigated, using multiple linear regression (MLR) models produced from the PACIFICA (Pacific Ocean interior carbon) dataset. Data within depth ranges of 50-1200m were used to derive MLR models, and three parameters (potential temperature, nitrate, and apparent oxygen utilization (AOU)) were chosen as predictor variables because these parameters are associated with vertical mixing, DIC (dissolved inorganic carbon) removal and release which all affect Ωarag in water column directly or indirectly. The PACIFICA dataset was divided into 5° × 5° grids, and a MLR model was produced in each grid, giving total 145 independent MLR models over the NPO. Mean RMSE (root mean square error) and r2 (coefficient of determination) of all derived MLR models were approximately 0.09 and 0.96, respectively. Then the obtained MLR coefficients for each of predictor variables and an intercept were interpolated over the study area, thereby making possible to allocate MLR coefficients to data-sparse ocean regions. Predictability from the interpolated coefficients was evaluated using Hawaiian time-series data, and as a result mean residual between measured and predicted Ωarag values was approximately 0.08, which is less than the mean RMSE of our MLR models. The interpolated MLR coefficients were combined with seasonal climatology of World Ocean Atlas 2013 (1° × 1°) to produce seasonal Ωarag distributions over various depths. Large seasonal variability in Ωarag was manifested in the mid-latitude Western NPO (24-40°N, 130-180°E) and low-latitude Eastern NPO (0-12°N, 115-150°W). In the Western NPO, seasonal fluctuations of water column stratification appeared to be responsible for the seasonal variation in Ωarag (~ 0.5 at 50 m) because it closely followed temperature variations in a layer of 0-75 m. In contrast, remineralization of organic matter was the main cause for the seasonal
Identification of variables and their influence on the human resources planning in the territorial level

Energy Technology Data Exchange (ETDEWEB)

Martínez Vivar, R.; Sánchez Rodríguez, A.; Pérez Campdesuñer, R.; García Vidal, G.

2016-07-01

The purpose of this paper lies in the use of experimental way through empirical tools for identification of the set of variables and their interrelationships and influences on the human resources planning at the territorial level. The methodology used to verify the existence of the variables that affect the planning of human resources at the territorial level consists of two phases: a qualitative study of the variables that influence the planning of human resources, where the explicit variables are measured and / or implied raised in the literature analyzing the main contributions and limitations expressed by each of the authors consulted. Then it proceeds to confirmatory phase (quantitative) to prove the existence of the dimensions of the planning of human resources in the territorial level through the use of multivariate statistics through the combination of expert analysis and techniques of factorial grouping. Identification is achieved by using empirical methods, variables that affect human resources planning at the territorial level, as well as their grouping essential dimensions, while the description of a theoretical model that integrates the dimensions is made essential and relationships that affect human resource planning at the regional level, which is characterized by the existence of systemic and prospective nature. The literature shows two streams that address a wide range of approaches to human resources planning. The first is oriented from the business object and the second part of the management in highlighting a limited territorial level to address this latest theoretical development, an element that has contributed to the fragmented treatment of human resources planning and management in general at this level. The originality of this paper is part of the creation and adaptation, on a scientific basis of a theoretical model developed from the conceptual contribution of this process at the territorial level where the key variables that affect this
Comparison of Linear and Non-linear Regression Analysis to Determine Pulmonary Pressure in Hyperthyroidism.

Science.gov (United States)

Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan

2017-01-01

This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second
On logistic regression analysis of dichotomized responses.

Science.gov (United States)

Lu, Kaifeng

2017-01-01

We study the properties of treatment effect estimate in terms of odds ratio at the study end point from logistic regression model adjusting for the baseline value when the underlying continuous repeated measurements follow a multivariate normal distribution. Compared with the analysis that does not adjust for the baseline value, the adjusted analysis produces a larger treatment effect as well as a larger standard error. However, the increase in standard error is more than offset by the increase in treatment effect so that the adjusted analysis is more powerful than the unadjusted analysis for detecting the treatment effect. On the other hand, the true adjusted odds ratio implied by the normal distribution of the underlying continuous variable is a function of the baseline value and hence is unlikely to be able to be adequately represented by a single value of adjusted odds ratio from the logistic regression model. In contrast, the risk difference function derived from the logistic regression model provides a reasonable approximation to the true risk difference function implied by the normal distribution of the underlying continuous variable over the range of the baseline distribution. We show that different metrics of treatment effect have similar statistical power when evaluated at the baseline mean. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Relationships of Cerebrospinal Fluid Monoamine Metabolite Levels With Clinical Variables in Major Depressive Disorder.

Science.gov (United States)

Yoon, Hyung Shin; Hattori, Kotaro; Ogawa, Shintaro; Sasayama, Daimei; Ota, Miho; Teraishi, Toshiya; Kunugi, Hiroshi

Many studies have investigated cerebrospinal fluid (CSF) monoamine metabolite levels in depressive disorders. However, their clinical significance is still unclear. We tried to determine whether CSF monoamine metabolite levels could be a state-dependent marker for major depressive disorder (MDD) based on analyses stratified by clinical variables in a relatively large sample. Subjects were 75 patients with MDD according to DSM-IV criteria and 87 healthy controls, matched for age, sex, and ethnicity (Japanese). They were recruited between May 2010 and November 2013. We measured homovanillic acid (HVA), 5-hydroxyindoleacetic acid (5-HIAA), and 3-methoxy-4-hydroxyphenylethyleneglycol (MHPG) in CSF samples by high-performance liquid chromatography. We analyzed the relationships of the metabolite levels with age, sex, diagnosis, psychotropic medication use, and depression severity. There was a weak positive correlation between age and 5-HIAA levels in controls (ρ = 0.26, P 12) were significantly lower than those in controls (P .1), were related to depression severity. CSF 5-HIAA and HVA levels could be state-dependent markers in MDD patients. Since 5-HIAA levels greatly decrease with the use of antidepressants, HVA levels might be more useful in the clinical setting. © Copyright 2017 Physicians Postgraduate Press, Inc.
Vocabulary of preschool children with typical language development and socioeducational variables.

Science.gov (United States)

Moretti, Thaís Cristina da Freiria; Kuroishi, Rita Cristina Sadako; Mandrá, Patrícia Pupin

2017-03-09

To investigate the correlation between age, socioeconomic status (SES), and performance on emissive and receptive vocabulary tests in children with typical language development. The study sample was composed of 60 preschool children of both genders, aged 3 years to 5 years 11 months, with typical language development divided into three groups: G I (mean age=3 years 6 months), G II (mean age=4 years 4 months) and G III (mean age=5 years 9 months). The ABFW Child Language Test - Vocabulary and the Peabody Picture Vocabulary Test (PPVT) for emissive and receptive language were applied to the preschoolers. The socioeconomic classification questionnaire of the Brazilian Association of Survey Companies (ABEP) was applied to the preschoolers' parents/legal guardians. Data were analyzed according to the criteria of the aforementioned instruments and were arranged in Excel spreadsheet for Windows XP®. A multiple linear regression model was used, adopting a statistical significance level of 5%, to analyze the correlation between age, SES, and performance on the receptive and emissive vocabulary tests. In the ABEP questionnaire, participants were classified mostly into social level C (63.3%), followed by levels B (26.6%) and D (10%). The preschoolers investigated presented emissive and receptive vocabulary adequate for the age groups. No statistically significant difference was found for the variables age and SES regarding emissive and receptive vocabulary. Higher test scores were observed with increased age and SES, for social levels "B" compared with "D" and for "C" with "D". The variables age and socioeconomic status influenced the performance on emissive and receptive vocabulary tests in the study group.
Exact Rational Expectations, Cointegration, and Reduced Rank Regression

DEFF Research Database (Denmark)

Johansen, Søren; Swensen, Anders Rygh

We interpret the linear relations from exact rational expectations models as restrictions on the parameters of the statistical model called the cointegrated vector autoregressive model for non-stationary variables. We then show how reduced rank regression, Anderson (1951), plays an important role...
Exact rational expectations, cointegration, and reduced rank regression

DEFF Research Database (Denmark)

Johansen, Søren; Swensen, Anders Rygh

We interpret the linear relations from exact rational expectations models as restrictions on the parameters of the statistical model called the cointegrated vector autoregressive model for non-stationary variables. We then show how reduced rank regression, Anderson (1951), plays an important role...
Exact rational expectations, cointegration, and reduced rank regression

DEFF Research Database (Denmark)

Johansen, Søren; Swensen, Anders Rygh

2008-01-01

We interpret the linear relations from exact rational expectations models as restrictions on the parameters of the statistical model called the cointegrated vector autoregressive model for non-stationary variables. We then show how reduced rank regression, Anderson (1951), plays an important role...
Clinical evaluation of a novel population-based regression analysis for detecting glaucomatous visual field progression.

Science.gov (United States)

Kovalska, M P; Bürki, E; Schoetzau, A; Orguel, S F; Orguel, S; Grieshaber, M C

2011-04-01

The distinction of real progression from test variability in visual field (VF) series may be based on clinical judgment, on trend analysis based on follow-up of test parameters over time, or on identification of a significant change related to the mean of baseline exams (event analysis). The aim of this study was to compare a new population-based method (Octopus field analysis, OFA) with classic regression analyses and clinical judgment for detecting glaucomatous VF changes. 240 VF series of 240 patients with at least 9 consecutive examinations available were included into this study. They were independently classified by two experienced investigators. The results of such a classification served as a reference for comparison for the following statistical tests: (a) t-test global, (b) r-test global, (c) regression analysis of 10 VF clusters and (d) point-wise linear regression analysis. 32.5 % of the VF series were classified as progressive by the investigators. The sensitivity and specificity were 89.7 % and 92.0 % for r-test, and 73.1 % and 93.8 % for the t-test, respectively. In the point-wise linear regression analysis, the specificity was comparable (89.5 % versus 92 %), but the sensitivity was clearly lower than in the r-test (22.4 % versus 89.7 %) at a significance level of p = 0.01. A regression analysis for the 10 VF clusters showed a markedly higher sensitivity for the r-test (37.7 %) than the t-test (14.1 %) at a similar specificity (88.3 % versus 93.8 %) for a significant trend (p = 0.005). In regard to the cluster distribution, the paracentral clusters and the superior nasal hemifield progressed most frequently. The population-based regression analysis seems to be superior to the trend analysis in detecting VF progression in glaucoma, and may eliminate the drawbacks of the event analysis. Further, it may assist the clinician in the evaluation of VF series and may allow better visualization of the correlation between function and structure owing to VF
The process and utility of classification and regression tree methodology in nursing research.

Science.gov (United States)

Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

2014-06-01

This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.
Regression Equations for Birth Weight Estimation using ...

African Journals Online (AJOL)

In this study, Birth Weight has been estimated from anthropometric measurements of hand and foot. Linear regression equations were formed from each of the measured variables. These simple equations can be used to estimate Birth Weight of new born babies, in order to identify those with low birth weight and referred to ...
Why is the Groundwater Level Rising? A Case Study Using HARTT to Simulate Groundwater Level Dynamic.

Science.gov (United States)

Yihdego, Yohannes; Danis, Cara; Paffard, Andrew

2017-12-01

Groundwater from a shallow unconfined aquifer at a site in coastal New South Wales has been causing recent water logging issues. A trend of rising groundwater level has been anecdotally observed over the last 10 years. It was not clear whether the changes in groundwater levels were solely natural variations within the groundwater system or whether human interference was driving the level up. Time series topographic images revealed significant surrounding land use changes and human modification to the environment of the groundwater catchment. A statistical model utilising HARTT (multiple linear regression hydrograph analysis method) simulated the groundwater level dynamics at five key monitoring locations and successfully showed a trend of rising groundwater level. Utilising hydrogeological input from field investigations, the model successfully simulated the rise in the water table over time to the present day levels, whilst taking into consideration rainfall and land changes. The underlying geological/land conditions were found to be just as significant as the impact of climate variation. The correlation coefficient for the monitoring bores (MB), excluding MB4, show that the groundwater level fluctuation can be explained by the climate variable (rainfall) with the lag time between the atypical rainfall and groundwater level ranging from 4 to 7 months. The low R2 value for MB4 indicates that there are factors missing in the model which are primarily related to human interference. The elevated groundwater levels in the affected area are the result of long term cumulative land use changes, instigated by humans, which have directly resulted in detrimental changes to the groundwater aquifer properties.

Analisis Faktor – Faktor yang Mempengaruhi Jumlah Kejahatan Pencurian Kendaraan Bermotor (Curanmor) Menggunakan Model Geographically Weighted Poisson Regression (Gwpr)

OpenAIRE

Haris, Muhammad; Yasin, Hasbi; Hoyyi, Abdul

2015-01-01

Theft is an act taking someone else's property, partially or entierely, with intention to have it illegally. Motor vehicle theft is one of the most highlighted crime type and disturbing the communities. Regression analysis is a statistical analysis for modeling the relationships between response variable and predictor variable. If the response variable follows a Poisson distribution or categorized as a count data, so the regression model used is Poisson regression. Geographically Weighted Poi...
Climate Variability and Mangrove Cover Dynamics at Species Level in the Sundarbans, Bangladesh

Directory of Open Access Journals (Sweden)

Manoj Kumer Ghosh

2017-05-01

Full Text Available Mangrove ecosystems are complex in nature. For monitoring the impact of climate variability in this ecosystem, a multidisciplinary approach is a prerequisite. Changes in temperature and rainfall pattern have been suggested as an influential factor responsible for the change in mangrove species composition and spatial distribution. The main aim of this study was to assess the relationship between temperature, rainfall pattern and dynamics of mangrove species in the Sundarbans, Bangladesh, over a 38 year time period from 1977 to 2015. To assess the relationship, a three stage analytical process was employed. Primarily, the trend of temperature and rainfall over the study period were identified using a linear trend model; then, the supervised maximum likelihood classifier technique was employed to classify images recorded by Landsat series and post-classification comparison techniques were used to detect changes at species level. The rate of change of different mangrove species was also estimated in the second stage. Finally, the relationship between temperature, rainfall and the dynamics of mangroves at species level was determined using a simple linear regression model. The results show a significant statistical relationship between temperature, rainfall and the dynamics of mangrove species. The trends of change for Heritiera fomes and Sonneratia apelatala show a strong relationship with temperature and rainfall, while Ceriops decandra shows a weak relationship. In contrast, Excoecaria agallocha and Xylocarpus mekongensis do not show any significant relationship with temperature and rainfall. On the basis of our results, it can be concluded that temperature and rainfall are important climatic factors influencing the dynamics of three major mangrove species viz. H. fomes, S. apelatala and C. decandra in the Sundarbans.
Gráfico de controle de regressão aplicado na monitoração de processos Regression control chart applied in process monitoring

Directory of Open Access Journals (Sweden)

Luciane Flores Jacobi

2002-01-01

Full Text Available Esta pesquisa tem por objetivo empregar o gráfico de controle de regressão, como ferramenta de controle estatístico, para monitorar processos produtivos, onde uma variável de estado, que seja de interesse, possa ser expressa como função de uma variável de controle. Existem vários estudos sobre o controle de variáveis em processos produtivos, mas, na maioria das vezes, são em relação ao controle de cada variável, separadamente, não podendo ser utilizados para um estudo comparativo. Esta pesquisa, portanto, apresenta uma técnica eficiente no controle simultâneo de variáveis correlacionadas.The main purpose of this research is to apply the regression control chart as tool of statistical control to monitor productive processes, where a state variable that is of interest can be expressed as function of a control variable. Several studies exist to control variables in productive processes, but most of time they are separately in relation to the control of each variable, and however not could be used for a comparative study. This research, therefore, it presents an efficient technique to control simultaneous by correlated variables.
PTH levels and not serum phosphorus levels are a predictor of the progression of kidney disease in elderly patients with advanced chronic kidney disease.

Science.gov (United States)

Toapanta Gaibor, Néstor Gabriel; Nava Pérez, Nathasha Carolina; Martínez Echevers, Yeleine; Montes Delgado, Rafael; Guerrero Riscos, María Ángeles

At present, there is a high incidence of elderly patients with advanced chronic kidney disease (CKD) and it is important to know the long term progression and the factors that influence it. To analyse the progression of advanced CKD in elderly patients and the influence of bone-mineral metabolism. Retrospective study of 125 patients ≥70years of age with CKD stages 4-5 who started follow-up from January 1, 2007 to December 31, 2008, showing the progression of CKD (measured by the slope of the regression line of the estimated glomerular filtration rate [eGFR] by MDRD-4) over 5years. Progression in the entire group (median and 25th and 75th percentiles): -1.15 (-2.8/0.17) ml/min/1.73m 2 /year, CKD-4: -1.3 (-2.8/0.03) ml/min/1.73m 2 /year, CKD-5: -1.03 (-3.0/0.8) ml/min/1.73m 2 /year; the slope of the regression line was positive in 35 patients (28%: CKD does not progress) and negative in 90 patients (72%: CKD progresses). Negative correlation (Spearman) (slower progression): PTH, albumin/Cr ratio and daily Na excretion (all baseline measurements). No correlation with eGFR, serum P, urinary P excretion, protein intake and intake of P (all baseline measurements). In the linear regression analysis (dependent variable: slope of progression): albuminuria and PTH (both at baseline measurements) influenced this variable independently. Logistic regression (progresses vs. does not progress): PTH, albuminuria and eGFR (all at baseline measurements) influenced significantly. In our group of elderly patients, impairment of renal function is slow, particularly in CKD-5 patients. Albuminuria and PTH at baseline levels are prognostic factors in the evolution of renal function. Copyright © 2016 Sociedad Española de Nefrología. Published by Elsevier España, S.L.U. All rights reserved.
An Introduction to Macro- Level Spatial Nonstationarity: a Geographically Weighted Regression Analysis of Diabetes and Poverty.

Science.gov (United States)

Siordia, Carlos; Saenz, Joseph; Tom, Sarah E

2012-01-01

Type II diabetes is a growing health problem in the United States. Understanding geographic variation in diabetes prevalence will inform where resources for management and prevention should be allocated. Investigations of the correlates of diabetes prevalence have largely ignored how spatial nonstationarity might play a role in the macro-level distribution of diabetes. This paper introduces the reader to the concept of spatial nonstationarity-variance in statistical relationships as a function of geographical location. Since spatial nonstationarity means different predictors can have varying effects on model outcomes, we make use of a geographically weighed regression to calculate correlates of diabetes as a function of geographic location. By doing so, we demonstrate an exploratory example in which the diabetes-poverty macro-level statistical relationship varies as a function of location. In particular, we provide evidence that when predicting macro-level diabetes prevalence, poverty is not always positively associated with diabetes.
Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict

Science.gov (United States)

Ismail, Mohd Tahir; Alias, Siti Nor Shadila

2014-07-01

For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..
Direction of Effects in Multiple Linear Regression Models.

Science.gov (United States)

Wiedermann, Wolfgang; von Eye, Alexander

2015-01-01

Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Minimizing the effects of multicollinearity in the polynomial regression of age relationships and sex differences in serum levels of pregnenolone sulfate in healthy subjects.

Science.gov (United States)

Meloun, Milan; Hill, Martin; Vceláková-Havlíková, Helena

2009-01-01

Pregnenolone sulfate (PregS) is known as a steroid conjugate positively modulating N-methyl-D-aspartate receptors on neuronal membranes. These receptors are responsible for permeability of calcium channels and activation of neuronal function. Neuroactivating effect of PregS is also exerted via non-competitive negative modulation of GABA(A) receptors regulating the chloride influx. Recently, a penetrability of blood-brain barrier for PregS was found in rat, but some experiments in agreement with this finding were reported even earlier. It is known that circulating levels of PregS in human are relatively high depending primarily on age and adrenal activity. Concerning the neuromodulating effect of PregS, we recently evaluated age relationships of PregS in both sexes using polynomial regression models known to bring about the problems of multicollinearity, i.e., strong correlations among independent variables. Several criteria for the selection of suitable bias are demonstrated. Biased estimators based on the generalized principal component regression (GPCR) method avoiding multicollinearity problems are described. Significant differences were found between men and women in the course of the age dependence of PregS. In women, a significant maximum was found around the 30th year followed by a rapid decline, while the maximum in men was achieved almost 10 years earlier and changes were minor up to the 60th year. The investigation of gender differences and age dependencies in PregS could be of interest given its well-known neurostimulating effect, relatively high serum concentration, and the probable partial permeability of the blood-brain barrier for the steroid conjugate. GPCR in combination with the MEP (mean quadric error of prediction) criterion is extremely useful and appealing for constructing biased models. It can also be used for achieving such estimates with regard to keeping the model course corresponding to the data trend, especially in polynomial type
A brief introduction to regression designs and mixed-effects modelling by a recent convert

OpenAIRE

Balling, Laura Winther

2008-01-01

This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable sele...
Development of hybrid genetic-algorithm-based neural networks using regression trees for modeling air quality inside a public transportation bus.

Science.gov (United States)

Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok

2013-02-01

The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic
Functional data analysis of generalized regression quantiles

KAUST Repository

Guo, Mengmeng; Zhou, Lan; Huang, Jianhua Z.; Hä rdle, Wolfgang Karl

2013-01-01

Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
Functional data analysis of generalized regression quantiles

KAUST Repository

Guo, Mengmeng

2013-11-05

Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method

Directory of Open Access Journals (Sweden)

Jun-He Yang

2017-01-01

Full Text Available Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir’s water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir’s water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.
Variable Selection via Partial Correlation.

Science.gov (United States)

Li, Runze; Liu, Jingyuan; Lou, Lejia

2017-07-01

Partial correlation based variable selection method was proposed for normal linear regression models by Bühlmann, Kalisch and Maathuis (2010) as a comparable alternative method to regularization methods for variable selection. This paper addresses two important issues related to partial correlation based variable selection method: (a) whether this method is sensitive to normality assumption, and (b) whether this method is valid when the dimension of predictor increases in an exponential rate of the sample size. To address issue (a), we systematically study this method for elliptical linear regression models. Our finding indicates that the original proposal may lead to inferior performance when the marginal kurtosis of predictor is not close to that of normal distribution. Our simulation results further confirm this finding. To ensure the superior performance of partial correlation based variable selection procedure, we propose a thresholded partial correlation (TPC) approach to select significant variables in linear regression models. We establish the selection consistency of the TPC in the presence of ultrahigh dimensional predictors. Since the TPC procedure includes the original proposal as a special case, our theoretical results address the issue (b) directly. As a by-product, the sure screening property of the first step of TPC was obtained. The numerical examples also illustrate that the TPC is competitively comparable to the commonly-used regularization methods for variable selection.
Retro-regression--another important multivariate regression improvement.

Science.gov (United States)

Randić, M

2001-01-01

We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.
The crux of the method: assumptions in ordinary least squares and logistic regression.

Science.gov (United States)

Long, Rebecca G

2008-10-01

Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
PARAMETRIC AND NON PARAMETRIC (MARS: MULTIVARIATE ADDITIVE REGRESSION SPLINES) LOGISTIC REGRESSIONS FOR PREDICTION OF A DICHOTOMOUS RESPONSE VARIABLE WITH AN EXAMPLE FOR PRESENCE/ABSENCE OF AMPHIBIANS

Science.gov (United States)

The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Research and analyze of physical health using multiple regression analysis

Directory of Open Access Journals (Sweden)

T. S. Kyi

2014-01-01

Full Text Available This paper represents the research which is trying to create a mathematical model of the "healthy people" using the method of regression analysis. The factors are the physical parameters of the person (such as heart rate, lung capacity, blood pressure, breath holding, weight height coefficient, flexibility of the spine, muscles of the shoulder belt, abdominal muscles, squatting, etc.., and the response variable is an indicator of physical working capacity. After performing multiple regression analysis, obtained useful multiple regression models that can predict the physical performance of boys the aged of fourteen to seventeen years. This paper represents the development of regression model for the sixteen year old boys and analyzed results.
Advanced supersonic propulsion study, phases 3 and 4. [variable cycle engines

Science.gov (United States)

Allan, R. D.; Joy, W.

1977-01-01

An evaluation of various advanced propulsion concepts for supersonic cruise aircraft resulted in the identification of the double-bypass variable cycle engine as the most promising concept. This engine design utilizes special variable geometry components and an annular exhaust nozzle to provide high take-off thrust and low jet noise. The engine also provides good performance at both supersonic cruise and subsonic cruise. Emission characteristics are excellent. The advanced technology double-bypass variable cycle engine offers an improvement in aircraft range performance relative to earlier supersonic jet engine designs and yet at a lower level of engine noise. Research and technology programs required in certain design areas for this engine concept to realize its potential benefits include refined parametric analysis of selected variable cycle engines, screening of additional unconventional concepts, and engine preliminary design studies. Required critical technology programs are summarized.
Predict the Medicare Functional Classification Level (K-level) using the Amputee Mobility Predictor in people with unilateral transfemoral and transtibial amputation: A pilot study.

Science.gov (United States)

Dillon, Michael P; Major, Matthew J; Kaluf, Brian; Balasanov, Yuri; Fatone, Stefania

2018-04-01

While Amputee Mobility Predictor scores differ between Medicare Functional Classification Levels (K-level), this does not demonstrate that the Amputee Mobility Predictor can accurately predict K-level. To determine how accurately K-level could be predicted using the Amputee Mobility Predictor in combination with patient characteristics for persons with transtibial and transfemoral amputation. Prediction. A cumulative odds ordinal logistic regression was built to determine the effect that the Amputee Mobility Predictor, in combination with patient characteristics, had on the odds of being assigned to a particular K-level in 198 people with transtibial or transfemoral amputation. For people assigned to the K2 or K3 level by their clinician, the Amputee Mobility Predictor predicted the clinician-assigned K-level more than 80% of the time. For people assigned to the K1 or K4 level by their clinician, the prediction of clinician-assigned K-level was less accurate. The odds of being in a higher K-level improved with younger age and transfemoral amputation. Ordinal logistic regression can be used to predict the odds of being assigned to a particular K-level using the Amputee Mobility Predictor and patient characteristics. This pilot study highlighted critical method design issues, such as potential predictor variables and sample size requirements for future prospective research. Clinical relevance This pilot study demonstrated that the odds of being assigned a particular K-level could be predicted using the Amputee Mobility Predictor score and patient characteristics. While the model seemed sufficiently accurate to predict clinician assignment to the K2 or K3 level, further work is needed in larger and more representative samples, particularly for people with low (K1) and high (K4) levels of mobility, to be confident in the model's predictive value prior to use in clinical practice.

Psychological and biographical differences between secondary school teachers experiencing high and low levels of burnout.

Science.gov (United States)

Pierce, C M; Molloy, G N

1990-02-01

A total of 750 teachers from 16 government and non-government schools from areas of contrasted socio-economic status (SES) responded to a questionnaire designed to investigate associations between selected aspects of burnout among teachers working in secondary schools in Victoria, Australia. By comparing high and low burnout groups on biographic, psychological and work pattern variables, differences between teachers experiencing high and low levels of burnout were identified. Multiple regression analyses assessed the relative importance of these variables in accounting for the variance in each of the three burnout subscales. School type was related to perceptions of stress and burnout. Higher levels of burnout were associated with poorer physical health, higher rates of absenteeism, lower self-confidence and more frequent use of regressive coping strategies. Teachers classified as experiencing high levels of burnout attributed most of the stress in their lives to teaching and reported low levels of career commitment and satisfaction. Further, teachers who recorded high levels of burnout were characterised by lower levels of the personality disposition of hardiness, lower levels of social support, higher levels of role stress and more custodial pupil control ideologies than their low-burnout counterparts. Psychological variables were found to be more significant predictors of burnout than biographical variables.
Testing of variables which affect stablity of cement solidified low-level waste

International Nuclear Information System (INIS)

Boris, G.F.

1989-01-01

This paper describes the test program undertaken to investigate variables which could affect the stability of cement solidified low-level waste and to evaluate the effect of these variables on certain tests prescribed in the Technical Position on Waste Form. The majority of the testing was performed on solidified undepleted bead resin, however, six additional waste types, suggested by the NRC, were tested. The tested variables included waste loading, immersion duration, depletion level, ambient cure duration, curing environment, immersion medium and waste type. Of these, lower waste loadings, longer ambient cures prior to testing and immersion in demineralized water versus simulated sea water and potable water resulted in higher compressive strengths for bead resin samples. Immersion times longer than 90 days did not affect the resin samples. Compressive strengths for other waste types varied depending upon the waste. The strengths of all waste types exceeded the minimum criterion by at least a factor of four, up to a factor of forty. The higher waste loadings exhibit strengths less than the lower waste loadings
Accounting for estimated IQ in neuropsychological test performance with regression-based techniques.

Science.gov (United States)

Testa, S Marc; Winicki, Jessica M; Pearlson, Godfrey D; Gordon, Barry; Schretlen, David J

2009-11-01

Regression-based normative techniques account for variability in test performance associated with multiple predictor variables and generate expected scores based on algebraic equations. Using this approach, we show that estimated IQ, based on oral word reading, accounts for 1-9% of the variability beyond that explained by individual differences in age, sex, race, and years of education for most cognitive measures. These results confirm that adding estimated "premorbid" IQ to demographic predictors in multiple regression models can incrementally improve the accuracy with which regression-based norms (RBNs) benchmark expected neuropsychological test performance in healthy adults. It remains to be seen whether the incremental variance in test performance explained by estimated "premorbid" IQ translates to improved diagnostic accuracy in patient samples. We describe these methods, and illustrate the step-by-step application of RBNs with two cases. We also discuss the rationale, assumptions, and caveats of this approach. More broadly, we note that adjusting test scores for age and other characteristics might actually decrease the accuracy with which test performance predicts absolute criteria, such as the ability to drive or live independently.
Total levels of hippocampal histone acetylation predict normal variability in mouse behavior.

Directory of Open Access Journals (Sweden)

Addie May I Nesbitt

Full Text Available Genetic, pharmacological, and environmental interventions that alter total levels of histone acetylation in specific brain regions can modulate behaviors and treatment responses. Efforts have been made to identify specific genes that are affected by alterations in total histone acetylation and to propose that such gene specific modulation could explain the effects of total histone acetylation levels on behavior - the implication being that under naturalistic conditions variability in histone acetylation occurs primarily around the promoters of specific genes.Here we challenge this hypothesis by demonstrating with a novel flow cytometry based technique that normal variability in open field exploration, a hippocampus-related behavior, was associated with total levels of histone acetylation in the hippocampus but not in other brain regions.Results suggest that modulation of total levels of histone acetylation may play a role in regulating biological processes. We speculate in the discussion that endogenous regulation of total levels of histone acetylation may be a mechanism through which organisms regulate cellular plasticity. Flow cytometry provides a useful approach to measure total levels of histone acetylation at the single cell level. Relating such information to behavioral measures and treatment responses could inform drug delivery strategies to target histone deacetylase inhibitors and other chromatin modulators to places where they may be of benefit while avoiding areas where correction is not needed and could be harmful.
Distributed Monitoring of the R2 Statistic for Linear Regression

Data.gov (United States)

National Aeronautics and Space Administration — The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and...
Nucleotide variability and linkage disequilibrium patterns in the porcine MUC4 gene

Directory of Open Access Journals (Sweden)

Yang Ming

2012-07-01

Full Text Available Abstract Background MUC4 is a type of membrane anchored glycoprotein and serves as the major constituent of mucus that covers epithelial surfaces of many tissues such as trachea, colon and cervix. MUC4 plays important roles in the lubrication and protection of the surface epithelium, cell proliferation and differentiation, immune response, cell adhesion and cancer development. To gain insights into the evolution of the porcine MUC4 gene, we surveyed the nucleotide variability and linkage disequilibrium (LD within this gene in Chinese indigenous breeds and Western commercial breeds. Results A total of 53 SNPs covering the MUC4 gene were genotyped on 5 wild boars and 307 domestic pigs representing 11 Chinese breeds and 3 Western breeds. The nucleotide variability, haplotype phylogeny and LD extent of MUC4 were analyzed in these breeds. Both Chinese and Western breeds had considerable nucleotide diversity at the MUC4 locus. Western pig breeds like Duroc and Large White have comparable nucleotide diversity as many of Chinese breeds, thus artificial selection for lean pork production have not reduced the genetic variability of MUC4 in Western commercial breeds. Haplotype phylogeny analyses indicated that MUC4 had evolved divergently in Chinese and Western pigs. The dendrogram of genetic differentiation between breeds generally reflected demographic history and geographical distribution of these breeds. LD patterns were unexpectedly similar between Chinese and Western breeds, in which LD usually extended less than 20 kb. This is different from the presumed high LD extent (more than 100 kb in Western commercial breeds. The significant positive Tajima’D, and Fu and Li’s D statistics in a few Chinese and Western breeds implied that MUC4 might undergo balancing selection in domestic breeds. Nevertheless, we cautioned that the significant statistics could be upward biased by SNP ascertainment process. Conclusions Chinese and Western breeds have
Regression models in the determination of the absorbed dose with extrapolation chamber for ophthalmological applicators

International Nuclear Information System (INIS)

Alvarez R, J.T.; Morales P, R.

1992-06-01

The absorbed dose for equivalent soft tissue is determined,it is imparted by ophthalmologic applicators, ( 90 Sr/ 90 Y, 1850 MBq) using an extrapolation chamber of variable electrodes; when estimating the slope of the extrapolation curve using a simple lineal regression model is observed that the dose values are underestimated from 17.7 percent up to a 20.4 percent in relation to the estimate of this dose by means of a regression model polynomial two grade, at the same time are observed an improvement in the standard error for the quadratic model until in 50%. Finally the global uncertainty of the dose is presented, taking into account the reproducibility of the experimental arrangement. As conclusion it can infers that in experimental arrangements where the source is to contact with the extrapolation chamber, it was recommended to substitute the lineal regression model by the quadratic regression model, in the determination of the slope of the extrapolation curve, for more exact and accurate measurements of the absorbed dose. (Author)
Minima of interannual sea-level variability in the Indian Ocean

Digital Repository Service at National Institute of Oceanography (India)

Shankar, D.; Aparna, S.G.; Mc; Suresh, I.; Neetu, S.; Durand, F.; Shenoi, S.S.C.; Al Saafani, M.A.

of interannual sea-level variability in the Indian Ocean D. Shankar a ,S.G.Aparna a ,J.P.McCreary b ,I.Suresh a , S. Neetu a ,F.Durand c , S. S. C. Shenoi a , M. A. Al Saafani a,d a National Institute of Oceanography,Dona Paula, Goa 403 004, India. b SOEST..., for example,the reviewby Schott and McCreary, 2001) implies that changes in sea level can be forced at a given loca- tion by winds blowing elsewhere earlier in the season. This phenomenon, called remote forcing, “merges the equatorial Indian Ocean, the Arabian...
Comparative Analysis of A, B Type and Exchange Traded Funds Performances with Mutual Fund Performance Measures, Regression Analysis and Manova Technique.

Directory of Open Access Journals (Sweden)

Mehmet Arslan

2010-06-01

Full Text Available The objective of the study is to evaluate risk- reward relationship and relative performances of the 4 different groups of mutual funds. To this end, daily return data of these 12 mutual funds (3 type variable fund; 3 B type variable fund; 3 A type stock fund and 3 A type Exchange traded fund together with daily market index (imkb100 return and daily return of riskless rate for the period from January 2006 to Feb 2010. The 180-day maturity T-Bill has been selected to represent riskless rate. To determine performances of mutual funds; Sharpe ratio, M2 measure, Treynor index, Jensen index, Sortino ratio, T2 ratio, Valuation ratio has been applied and these indicators produced conflicting results in ranking mutual funds. Then timingand selection capability of the fund manager has been determined by applying simple regression and Quadratic regression. Interestingly all funds found to have positive coefficient, indicating positive election capability of managers; but in terms of timing capability only one fund managers showed success. Finally, to determine extent to which mean returns are differs between mutual funds, market index (imkb100 and riskless rate (180 day TBill results of the analysis revealed that mean returns of individual security returns differs at P≤0,01 level. That shows instability in returns and poor ex-ante forecast modeling capability.
Sea level anomaly in the North Atlantic and seas around Europe: Long-term variability and response to North Atlantic teleconnection patterns.

Science.gov (United States)

Iglesias, Isabel; Lorenzo, M Nieves; Lázaro, Clara; Fernandes, M Joana; Bastos, Luísa

2017-12-31

Sea level anomaly (SLA), provided globally by satellite altimetry, is considered a valuable proxy for detecting long-term changes of the global ocean, as well as short-term and annual variations. In this manuscript, monthly sea level anomaly grids for the period 1993-2013 are used to characterise the North Atlantic Ocean variability at inter-annual timescales and its response to the North Atlantic main patterns of atmospheric circulation variability (North Atlantic Oscillation, Eastern Atlantic, Eastern Atlantic/Western Russia, Scandinavian and Polar/Eurasia) and main driven factors as sea level pressure, sea surface temperature and wind fields. SLA variability and long-term trends are analysed for the North Atlantic Ocean and several sub-regions (North, Baltic and Mediterranean and Black seas, Bay of Biscay extended to the west coast of the Iberian Peninsula, and the northern North Atlantic Ocean), depicting the SLA fluctuations at basin and sub-basin scales, aiming at representing the regions of maximum sea level variability. A significant correlation between SLA and the different phases of the teleconnection patterns due to the generated winds, sea level pressure and sea surface temperature anomalies, with a strong variability on temporal and spatial scales, has been identified. Long-term analysis reveals the existence of non-stationary inter-annual SLA fluctuations in terms of the temporal scale. Spectral density analysis has shown the existence of long-period signals in the SLA inter-annual component, with periods of ~10, 5, 4 and 2years, depending on the analysed sub-region. Also, a non-uniform increase in sea level since 1993 is identified for all sub-regions, with trend values between 2.05mm/year, for the Bay of Biscay region, and 3.98mm/year for the Baltic Sea (no GIA correction considered). The obtained results demonstrated a strong link between the atmospheric patterns and SLA, as well as strong long-period fluctuations of this variable in spatial and
Influence of different levels of concentrate and ruminally undegraded protein on digestive variables in beef heifers.

Science.gov (United States)

Pina, D S; Valadares Filho, S C; Tedeschi, L O; Barbosa, A M; Valadares, R F D

2009-03-01

This experiment evaluated the effect of 2 levels of diet concentrate (20 and 40% of DM) and 2 levels of ruminally undegraded protein (RUP: 25 and 40% of CP) on nutrient intake, total and partial apparent nutrient digestibility, microbial protein synthesis, and ruminal and physiological variables. Eight Nellore heifers (233 +/- 14 kg of BW) fitted with ruminal, abomasal, and ileal cannulas were used. The animals were held in individual sheltered pens of approximately 15 m(2) and fed twice daily at 0800 and 1600 h for ad libitum intake. Heifers were allocated in two 4 x 4 Latin square designs, containing 8 heifers, 4 experimental periods, and 4 treatments in a 2 x 2 factorial arrangement. All statistical analyses were performed using PROC MIXED of SAS. Titanium dioxide (TiO(2)) and chromic oxide (Cr(2)O(3)) were used to estimate digesta fluxes and fecal excretion. Purine derivative (PD) excretion and abomasal purine bases were used to estimate the microbial N (MN) synthesis. No significant interaction (P > 0.10) between dietary levels of RUP and concentrate was observed. There was no effect of treatment (P = 0.24) on DMI. Both markers led to the same estimates of fecal, abomasal, and ileal DM fluxes, and digestibilities of DM and individual nutrients. Ruminal pH was affected by sampling time (P RUP, whereas a quadratic effect (P RUP. The higher level of dietary concentrate led to greater MN yield regardless of the level of RUP. The MN yield and the efficiency of microbial yield estimated from urinary PD excretion produced greater (P RUP and concentrate were observed for ruminal and digestive parameters. Neither RUP nor concentrate level affected DMI. Titanium dioxide showed to be similar to Cr(2)O(3) as an external marker to measure digestibility and nutrient fluxes in cattle.
Adaptive regression for modeling nonlinear relationships

CERN Document Server

Knafl, George J

2016-01-01

This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...
Temporal Variability of Upper-level Winds at the Eastern Range, Western Range and Wallops Flight Facility

Science.gov (United States)

Decker, Ryan; Barbre, Robert E.

2014-01-01

Space launch vehicles incorporate upper-level wind profiles to determine wind effects on the vehicle and for a commit to launch decision. These assessments incorporate wind profiles measured hours prior to launch and may not represent the actual wind the vehicle will fly through. Uncertainty in the upper-level winds over the time period between the assessment and launch can be mitigated by a statistical analysis of wind change over time periods of interest using historical data from the launch range. Five sets of temporal wind pairs at various times (.75, 1.5, 2, 3 and 4-hrs) at the Eastern Range, Western Range and Wallops Flight Facility were developed for use in upper-level wind assessments. Database development procedures as well as statistical analysis of temporal wind variability at each launch range will be presented.
BRGLM, Interactive Linear Regression Analysis by Least Square Fit

International Nuclear Information System (INIS)

Ringland, J.T.; Bohrer, R.E.; Sherman, M.E.

1985-01-01

1 - Description of program or function: BRGLM is an interactive program written to fit general linear regression models by least squares and to provide a variety of statistical diagnostic information about the fit. Stepwise and all-subsets regression can be carried out also. There are facilities for interactive data management (e.g. setting missing value flags, data transformations) and tools for constructing design matrices for the more commonly-used models such as factorials, cubic Splines, and auto-regressions. 2 - Method of solution: The least squares computations are based on the orthogonal (QR) decomposition of the design matrix obtained using the modified Gram-Schmidt algorithm. 3 - Restrictions on the complexity of the problem: The current release of BRGLM allows maxima of 1000 observations, 99 variables, and 3000 words of main memory workspace. For a problem with N observations and P variables, the number of words of main memory storage required is MAX(N*(P+6), N*P+P*P+3*N, and 3*P*P+6*N). Any linear model may be fit although the in-memory workspace will have to be increased for larger problems
Incorporating wind availability into land use regression modelling of air quality in mountainous high-density urban environment.

Science.gov (United States)

Shi, Yuan; Lau, Kevin Ka-Lun; Ng, Edward

2017-08-01

Urban air quality serves as an important function of the quality of urban life. Land use regression (LUR) modelling of air quality is essential for conducting health impacts assessment but more challenging in mountainous high-density urban scenario due to the complexities of the urban environment. In this study, a total of 21 LUR models are developed for seven kinds of air pollutants (gaseous air pollutants CO, NO 2 , NO x , O 3 , SO 2 and particulate air pollutants PM 2.5 , PM 10 ) with reference to three different time periods (summertime, wintertime and annual average of 5-year long-term hourly monitoring data from local air quality monitoring network) in Hong Kong. Under the mountainous high-density urban scenario, we improved the traditional LUR modelling method by incorporating wind availability information into LUR modelling based on surface geomorphometrical analysis. As a result, 269 independent variables were examined to develop the LUR models by using the "ADDRESS" independent variable selection method and stepwise multiple linear regression (MLR). Cross validation has been performed for each resultant model. The results show that wind-related variables are included in most of the resultant models as statistically significant independent variables. Compared with the traditional method, a maximum increase of 20% was achieved in the prediction performance of annual averaged NO 2 concentration level by incorporating wind-related variables into LUR model development. Copyright © 2017 Elsevier Inc. All rights reserved.
Genetic variability of the pattern of night melatonin blood levels in relation to coat changes development in rabbits

Directory of Open Access Journals (Sweden)

Chemineau Philippe

2004-03-01

Full Text Available Abstract To assess the genetic variability in both the nocturnal increase pattern of melatonin concentration and photoresponsiveness in coat changes, an experiment on 422 Rex rabbits (from 23 males raised under a constant light programme from birth was performed. The animals were sampled at 12 weeks of age, according to 4 periods over a year. Blood samples were taken 7 times during the dark phase and up to 1 h after the lighting began. Maturity of the fur was assessed at pelting. Heritability estimates of blood melatonin concentration (0.42, 0.17 and 0.11 at mid-night, 13 and 15 h after lights-out respectively and strong genetic correlations between fur maturity and melatonin levels at the end of the dark phase (-0.64 indicates that (i the variability of the nocturnal pattern of melatonin levels is under genetic control and (ii the duration of the nocturnal melatonin increase is a genetic component of photoresponsiveness in coat changes.
A Predictive Logistic Regression Model of World Conflict Using Open Source Data

Science.gov (United States)

2015-03-26

No correlation between the error terms and the independent variables 9. Absence of perfect multicollinearity (Menard, 2001) When assumptions are...some of the variables before initial model building. Multicollinearity , or near-linear dependence among the variables will cause problems in the...model. High multicollinearity tends to produce unreasonably high logistic regression coefficients and can result in coefficients that are not
Completing the Remedial Sequence and College-Level Credit-Bearing Math: Comparing Binary, Cumulative, and Continuation Ratio Logistic Regression Models

Science.gov (United States)

Davidson, J. Cody

2016-01-01

Mathematics is the most common subject area of remedial need and the majority of remedial math students never pass a college-level credit-bearing math class. The majorities of studies that investigate this phenomenon are conducted at community colleges and use some type of regression model; however, none have used a continuation ratio model. The…
Regression Models For Multivariate Count Data.

Science.gov (United States)

Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

2017-01-01

Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.
Model selection in kernel ridge regression

DEFF Research Database (Denmark)

Exterkate, Peter

2013-01-01

Kernel ridge regression is a technique to perform ridge regression with a potentially infinite number of nonlinear transformations of the independent variables as regressors. This method is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts....... The influence of the choice of kernel and the setting of tuning parameters on forecast accuracy is investigated. Several popular kernels are reviewed, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. The latter two kernels are interpreted in terms of their smoothing properties......, and the tuning parameters associated to all these kernels are related to smoothness measures of the prediction function and to the signal-to-noise ratio. Based on these interpretations, guidelines are provided for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study...

Regression models for estimating concentrations of atrazine plus deethylatrazine in shallow groundwater in agricultural areas of the United States

Science.gov (United States)

Stackelberg, Paul E.; Barbash, Jack E.; Gilliom, Robert J.; Stone, Wesley W.; Wolock, David M.

2012-01-01

Tobit regression models were developed to predict the summed concentration of atrazine [6-chloro-N-ethyl-N'-(1-methylethyl)-1,3,5-triazine-2,4-diamine] and its degradate deethylatrazine [6-chloro-N-(1-methylethyl)-1,3,5,-triazine-2,4-diamine] (DEA) in shallow groundwater underlying agricultural settings across the conterminous United States. The models were developed from atrazine and DEA concentrations in samples from 1298 wells and explanatory variables that represent the source of atrazine and various aspects of the transport and fate of atrazine and DEA in the subsurface. One advantage of these newly developed models over previous national regression models is that they predict concentrations (rather than detection frequency), which can be compared with water quality benchmarks. Model results indicate that variability in the concentration of atrazine residues (atrazine plus DEA) in groundwater underlying agricultural areas is more strongly controlled by the history of atrazine use in relation to the timing of recharge (groundwater age) than by processes that control the dispersion, adsorption, or degradation of these compounds in the saturated zone. Current (1990s) atrazine use was found to be a weak explanatory variable, perhaps because it does not represent the use of atrazine at the time of recharge of the sampled groundwater and because the likelihood that these compounds will reach the water table is affected by other factors operating within the unsaturated zone, such as soil characteristics, artificial drainage, and water movement. Results show that only about 5% of agricultural areas have greater than a 10% probability of exceeding the USEPA maximum contaminant level of 3.0 μg L-1. These models are not developed for regulatory purposes but rather can be used to (i) identify areas of potential concern, (ii) provide conservative estimates of the concentrations of atrazine residues in deeper potential drinking water supplies, and (iii) set priorities
Multivariate linear regression of high-dimensional fMRI data with multiple target variables.

Science.gov (United States)

Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia

2014-05-01

Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets. Copyright © 2013 Wiley Periodicals, Inc.
Demand analysis of flood insurance by using logistic regression model and genetic algorithm

Science.gov (United States)

Sidi, P.; Mamat, M. B.; Sukono; Supian, S.; Putra, A. S.

2018-03-01

Citarum River floods in the area of South Bandung Indonesia, often resulting damage to some buildings belonging to the people living in the vicinity. One effort to alleviate the risk of building damage is to have flood insurance. The main obstacle is not all people in the Citarum basin decide to buy flood insurance. In this paper, we intend to analyse the decision to buy flood insurance. It is assumed that there are eight variables that influence the decision of purchasing flood assurance, include: income level, education level, house distance with river, building election with road, flood frequency experience, flood prediction, perception on insurance company, and perception towards government effort in handling flood. The analysis was done by using logistic regression model, and to estimate model parameters, it is done with genetic algorithm. The results of the analysis shows that eight variables analysed significantly influence the demand of flood insurance. These results are expected to be considered for insurance companies, to influence the decision of the community to be willing to buy flood insurance.
Tidal and sub-tidal sea level variability at the northern shelf of the Brazilian Northeast Region.

Science.gov (United States)

Frota, Felipe F; Truccolo, Eliane C; Schettini, Carlos A F

2016-09-01

A characterization of the sea level variability at tidal and sub-tidal frequencies at the northern shore of the Brazilian Northeast shelf for the period 2009-2011 is presented. The sea level data used was obtained from the Permanent Geodetic Tide Network from the Brazilian Institute of Geography and Statistics for the Fortaleza gauge station. Local wind data was also used to assess its effects on the low-frequency sea level variability. The variability of the sea level was investigated by classical harmonic analysis and by morphology assessment over the tidal signal. The low frequencies were obtained by low-pass filtering. The tidal range oscillated with the highest value of 3.3 m during the equinox and the lowest value of 0.7 m during the solstice. Differences between the spring and neap tides were as high as 1 m. A total of 59 tidal constituents were obtained from harmonic analysis, and the regional tide was classified as semi-diurnal pure with a form number of 0.11. An assessment of the monthly variability of the main tidal constituents (M2, S2, N2, O1, and K1) indicated that the main semi-diurnal solar S2 presented the highest variability, ranging from 0.21 to 0.41 m; it was the main element altering the form number through the years. The low frequency sea-level variability is negligible, although there is a persistent signal with an energy peak in the 10-15 day period, and it cannot be explained by the effects of local winds.
On Weighted Support Vector Regression

DEFF Research Database (Denmark)

Han, Xixuan; Clemmensen, Line Katrine Harder

2014-01-01

We propose a new type of weighted support vector regression (SVR), motivated by modeling local dependencies in time and space in prediction of house prices. The classic weights of the weighted SVR are added to the slack variables in the objective function (OF‐weights). This procedure directly...... shrinks the coefficient of each observation in the estimated functions; thus, it is widely used for minimizing influence of outliers. We propose to additionally add weights to the slack variables in the constraints (CF‐weights) and call the combination of weights the doubly weighted SVR. We illustrate...... the differences and similarities of the two types of weights by demonstrating the connection between the Least Absolute Shrinkage and Selection Operator (LASSO) and the SVR. We show that an SVR problem can be transformed to a LASSO problem plus a linear constraint and a box constraint. We demonstrate...
MORTALITY FROM SUICIDE AND ALCOHOLISM, DEPENDING ON THE LEVEL OF ALCOHOL CONSUMPTION

Directory of Open Access Journals (Sweden)

L. A. Radkevich

2017-01-01

Full Text Available According to WHO, the world takes place every year approximately 500 000 suicides and suicide attempts of 7 million. Since 1994, Russia ranks 2nd in the world after Lithuania, in the level of suicides, and is among the countries with the linear dependence of frequency of suicides on the level of alcohol consumption.Purpose. Install a quantitative connection between the frequency of suicide with alcohol consumption and mortality from alcoholism in the world.Material and method. For studies we used the mortality coefficient (MK from suicide and alcohol abuse (number of people/100 thousand of age standardized the population in 159 countries according to the WHO in 2004, the average daily consumption levels of alcoholic beverages: spirits, wine and beer (g/person/day according to the FAO (Food and Agriculture Organization of the United Nations. For data analysis we used correlation and regression methods.Results. We found significant positive correlation of mortality coefficient (MK from suicide for men and women with consumption of alcoholic beverages (spirits, wine and beer and mortality from alcoholism. The gender differences are revealed. Included in the regression model independent variables (levels of alcohol consumption and mortality from alcoholism explain 66% and 52% of the variability in the frequency of suicides of men and women (dependent variables. A complete rejection of the consumption of alcohol reduces the MK from suicide of men in the world at 39.5 percent, in Russia — at 76.5%; women — 37.9%, in Russia — by 54.3%. According to the regression analysis the average daily level of consumption of strong alcohol in the world is 10.4 g (3.8 kg per year for men, in Russia — 91.8 g (of 33.5 kg per year. The increase in the consumption of strong alcohol to 3 g per day (1 kg per year increases the MK from suicide in men up to 10.8% (1.6 people in the world, in Russia — 2.4% (1.6 people. The increase in the MK of alcoholism of men
Multiple regression equations modelling of groundwater of Ajmer-Pushkar railway line region, Rajasthan (India).

Science.gov (United States)

Mathur, Praveen; Sharma, Sarita; Soni, Bhupendra

2010-01-01

In the present work, an attempt is made to formulate multiple regression equations using all possible regressions method for groundwater quality assessment of Ajmer-Pushkar railway line region in pre- and post-monsoon seasons. Correlation studies revealed the existence of linear relationships (r 0.7) for electrical conductivity (EC), total hardness (TH) and total dissolved solids (TDS) with other water quality parameters. The highest correlation was found between EC and TDS (r = 0.973). EC showed highly significant positive correlation with Na, K, Cl, TDS and total solids (TS). TH showed highest correlation with Ca and Mg. TDS showed significant correlation with Na, K, SO4, PO4 and Cl. The study indicated that most of the contamination present was water soluble or ionic in nature. Mg was present as MgCl2; K mainly as KCl and K2SO4, and Na was present as the salts of Cl, SO4 and PO4. On the other hand, F and NO3 showed no significant correlations. The r2 values and F values (at 95% confidence limit, alpha = 0.05) for the modelled equations indicated high degree of linearity among independent and dependent variables. Also the error % between calculated and experimental values was contained within +/- 15% limit.
Prediction of Currency Volume Issued in Taiwan Using a Hybrid Artificial Neural Network and Multiple Regression Approach

Directory of Open Access Journals (Sweden)

Yuehjen E. Shao

2013-01-01

Full Text Available Because the volume of currency issued by a country always affects its interest rate, price index, income levels, and many other important macroeconomic variables, the prediction of currency volume issued has attracted considerable attention in recent years. In contrast to the typical single-stage forecast model, this study proposes a hybrid forecasting approach to predict the volume of currency issued in Taiwan. The proposed hybrid models consist of artificial neural network (ANN and multiple regression (MR components. The MR component of the hybrid models is established for a selection of fewer explanatory variables, wherein the selected variables are of higher importance. The ANN component is then designed to generate forecasts based on those important explanatory variables. Subsequently, the model is used to analyze a real dataset of Taiwan's currency from 1996 to 2011 and twenty associated explanatory variables. The prediction results reveal that the proposed hybrid scheme exhibits superior forecasting performance for predicting the volume of currency issued in Taiwan.
Exploratory regression analysis: a tool for selecting models and determining predictor importance.

Science.gov (United States)

Braun, Michael T; Oswald, Frederick L

2011-06-01

Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.
Knee-Extension Torque Variability and Subjective Knee Function in Patients with a History of Anterior Cruciate Ligament Reconstruction.

Science.gov (United States)

Goetschius, John; Hart, Joseph M

2016-01-01

When returning to physical activity, patients with a history of anterior cruciate ligament reconstruction (ACL-R) often experience limitations in knee-joint function that may be due to chronic impairments in quadriceps motor control. Assessment of knee-extension torque variability may demonstrate underlying impairments in quadriceps motor control in patients with a history of ACL-R. To identify differences in maximal isometric knee-extension torque variability between knees that have undergone ACL-R and healthy knees and to determine the relationship between knee-extension torque variability and self-reported knee function in patients with a history of ACL-R. Descriptive laboratory study. Laboratory. A total of 53 individuals with primary, unilateral ACL-R (age = 23.4 ± 4.9 years, height = 1.7 ± 0.1 m, mass = 74.6 ± 14.8 kg) and 50 individuals with no history of substantial lower extremity injury or surgery who served as controls (age = 23.3 ± 4.4 years, height = 1.7 ± 0.1 m, mass = 67.4 ± 13.2 kg). Torque variability, strength, and central activation ratio (CAR) were calculated from 3-second maximal knee-extension contraction trials (90° of flexion) with a superimposed electrical stimulus. All participants completed the International Knee Documentation Committee (IKDC) Subjective Knee Evaluation Form, and we determined the number of months after surgery. Group differences were assessed using independent-samples t tests. Correlation coefficients were calculated among torque variability, strength, CAR, months after surgery, and IKDC scores. Torque variability, strength, CAR, and months after surgery were regressed on IKDC scores using stepwise, multiple linear regression. Torque variability was greater and strength, CAR, and IKDC scores were lower in the ACL-R group than in the control group (P Torque variability and strength were correlated with IKDC scores (P Torque variability, strength, and CAR were correlated with each other (P Torque variability alone
A generalization of voxel-wise procedures for highdimensional statistical inference using ridge regression

DEFF Research Database (Denmark)

Sjöstrand, Karl; Cardenas, Valerie A.; Larsen, Rasmus

2008-01-01

regression to address this issue, allowing for a gradual introduction of correlation information into the model. We make the connections between ridge regression and voxel-wise procedures explicit and discuss relations to other statistical methods. Results are given on an in-vivo data set of deformation......Whole-brain morphometry denotes a group of methods with the aim of relating clinical and cognitive measurements to regions of the brain. Typically, such methods require the statistical analysis of a data set with many variables (voxels and exogenous variables) paired with few observations (subjects...
Steric sea level variability (1993-2010) in an ensemble of ocean reanalyses and objective analyses

Science.gov (United States)

Storto, Andrea; Masina, Simona; Balmaseda, Magdalena; Guinehut, Stéphanie; Xue, Yan; Szekely, Tanguy; Fukumori, Ichiro; Forget, Gael; Chang, You-Soon; Good, Simon A.; Köhl, Armin; Vernieres, Guillaume; Ferry, Nicolas; Peterson, K. Andrew; Behringer, David; Ishii, Masayoshi; Masuda, Shuhei; Fujii, Yosuke; Toyoda, Takahiro; Yin, Yonghong; Valdivieso, Maria; Barnier, Bernard; Boyer, Tim; Lee, Tony; Gourrion, Jérome; Wang, Ou; Heimback, Patrick; Rosati, Anthony; Kovach, Robin; Hernandez, Fabrice; Martin, Matthew J.; Kamachi, Masafumi; Kuragano, Tsurane; Mogensen, Kristian; Alves, Oscar; Haines, Keith; Wang, Xiaochun

2017-08-01

Quantifying the effect of the seawater density changes on sea level variability is of crucial importance for climate change studies, as the sea level cumulative rise can be regarded as both an important climate change indicator and a possible danger for human activities in coastal areas. In this work, as part of the Ocean Reanalysis Intercomparison Project, the global and regional steric sea level changes are estimated and compared from an ensemble of 16 ocean reanalyses and 4 objective analyses. These estimates are initially compared with a satellite-derived (altimetry minus gravimetry) dataset for a short period (2003-2010). The ensemble mean exhibits a significant high correlation at both global and regional scale, and the ensemble of ocean reanalyses outperforms that of objective analyses, in particular in the Southern Ocean. The reanalysis ensemble mean thus represents a valuable tool for further analyses, although large uncertainties remain for the inter-annual trends. Within the extended intercomparison period that spans the altimetry era (1993-2010), we find that the ensemble of reanalyses and objective analyses are in good agreement, and both detect a trend of the global steric sea level of 1.0 and 1.1 ± 0.05 mm/year, respectively. However, the spread among the products of the halosteric component trend exceeds the mean trend itself, questioning the reliability of its estimate. This is related to the scarcity of salinity observations before the Argo era. Furthermore, the impact of deep ocean layers is non-negligible on the steric sea level variability (22 and 12 % for the layers below 700 and 1500 m of depth, respectively), although the small deep ocean trends are not significant with respect to the products spread.
Overcoming multicollinearity in multiple regression using correlation coefficient

Science.gov (United States)

Zainodin, H. J.; Yap, S. J.

2013-09-01

Multicollinearity happens when there are high correlations among independent variables. In this case, it would be difficult to distinguish between the contributions of these independent variables to that of the dependent variable as they may compete to explain much of the similar variance. Besides, the problem of multicollinearity also violates the assumption of multiple regression: that there is no collinearity among the possible independent variables. Thus, an alternative approach is introduced in overcoming the multicollinearity problem in achieving a well represented model eventually. This approach is accomplished by removing the multicollinearity source variables on the basis of the correlation coefficient values based on full correlation matrix. Using the full correlation matrix can facilitate the implementation of Excel function in removing the multicollinearity source variables. It is found that this procedure is easier and time-saving especially when dealing with greater number of independent variables in a model and a large number of all possible models. Hence, in this paper detailed insight of the procedure is shown, compared and implemented.
Multivariate regression applied to the performance optimization of a countercurrent ultracentrifuge - a preliminary study

International Nuclear Information System (INIS)

Migliavacca, Elder; Andrade, Delvonei Alves de

2004-01-01

In this work, the least-squares methodology with covariance matrix is applied to determine a data curve fitting in order to obtain a performance function for the separative power δU of a ultracentrifuge as a function of variables that are experimentally controlled. The experimental data refer to 173 experiments on the ultracentrifugation process for uranium isotope separation. The experimental uncertainties related with these independent variables are considered in the calculation of the experimental separative power values, determining an experimental data input covariance matrix. The process control variables, which significantly influence the δU values, are chosen in order to give information on the ultracentrifuge behaviour when submitted to several levels of feed flow F and cut θ . After the model goodness-of-fit validation, a residual analysis is carried out to verify the assumed basis concerning its randomness and independence and mainly the existence of residual heterocedasticity with any regression model variable. The response curves are made relating the separative power with the control variables F and θ, to compare the fitted model with the experimental data and finally to calculate their optimized values. (author)
Examining Decision Making Level of Wrestlers in Terms of Some Variable

Science.gov (United States)

Yigit, Sihmehmet; Dalbudak, Ibrahim; Musa, Mihriay; Gürkan, Alper C.; Dalkiliç, Mehmet

2016-01-01

The aim of this research is to examine decision making level of wrestlers who joined Turkey inter university wrestling championship, according to variables as wrestlers' sex, age, grade, department, and education type. Study group consists of 34 females and 196 males, totally 230 athletes, who joined Turkey Inter University Wrestling Championship…
Unbalanced Regressions and the Predictive Equation

DEFF Research Database (Denmark)

Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...... in the theoretical predictive equation by suggesting a data generating process, where returns are generated as linear functions of a lagged latent I(0) risk process. The observed predictor is a function of this latent I(0) process, but it is corrupted by a fractionally integrated noise. Such a process may arise due...... to aggregation or unexpected level shifts. In this setup, the practitioner estimates a misspecified, unbalanced, and endogenous predictive regression. We show that the OLS estimate of this regression is inconsistent, but standard inference is possible. To obtain a consistent slope estimate, we then suggest...
Multinomial logistic regression analysis for differentiating 3 treatment outcome trajectory groups for headache-associated disability.

Science.gov (United States)

Lewis, Kristin Nicole; Heckman, Bernadette Davantes; Himawan, Lina

2011-08-01

Growth mixture modeling (GMM) identified latent groups based on treatment outcome trajectories of headache disability measures in patients in headache subspecialty treatment clinics. Using a longitudinal design, 219 patients in headache subspecialty clinics in 4 large cities throughout Ohio provided data on their headache disability at pretreatment and 3 follow-up assessments. GMM identified 3 treatment outcome trajectory groups: (1) patients who initiated treatment with elevated disability levels and who reported statistically significant reductions in headache disability (high-disability improvers; 11%); (2) patients who initiated treatment with elevated disability but who reported no reductions in disability (high-disability nonimprovers; 34%); and (3) patients who initiated treatment with moderate disability and who reported statistically significant reductions in headache disability (moderate-disability improvers; 55%). Based on the final multinomial logistic regression model, a dichotomized treatment appointment attendance variable was a statistically significant predictor for differentiating high-disability improvers from high-disability nonimprovers. Three-fourths of patients who initiated treatment with elevated disability levels did not report reductions in disability after 5 months of treatment with new preventive pharmacotherapies. Preventive headache agents may be most efficacious for patients with moderate levels of disability and for patients with high disability levels who attend all treatment appointments. Copyright © 2011 International Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Understanding Variability in Beach Slope to Improve Forecasts of Storm-induced Water Levels

Science.gov (United States)

Doran, K. S.; Stockdon, H. F.; Long, J.

2014-12-01

The National Assessment of Hurricane-Induced Coastal Erosion Hazards combines measurements of beach morphology with storm hydrodynamics to produce forecasts of coastal change during storms for the Gulf of Mexico and Atlantic coastlines of the United States. Wave-induced water levels are estimated using modeled offshore wave height and period and measured beach slope (from dune toe to shoreline) through the empirical parameterization of Stockdon et al. (2006). Spatial and temporal variability in beach slope leads to corresponding variability in predicted wave setup and swash. Seasonal and storm-induced changes in beach slope can lead to differences on the order of a meter in wave runup elevation, making accurate specification of this parameter essential to skillful forecasts of coastal change. Spatial variation in beach slope is accounted for through alongshore averaging, but temporal variability in beach slope is not included in the final computation of the likelihood of coastal change. Additionally, input morphology may be years old and potentially very different than the conditions present during forecast storm. In order to improve our forecasts of hurricane-induced coastal erosion hazards, the temporal variability of beach slope must be included in the final uncertainty of modeled wave-induced water levels. Frequently collected field measurements of lidar-based beach morphology are examined for study sites in Duck, North Carolina, Treasure Island, Florida, Assateague Island, Virginia, and Dauphin Island, Alabama, with some records extending over a period of 15 years. Understanding the variability of slopes at these sites will help provide estimates of associated water level uncertainty which can then be applied to other areas where lidar observations are infrequent, and improve the overall skill of future forecasts of storm-induced coastal change. Stockdon, H. F., Holman, R. A., Howd, P. A., and Sallenger Jr, A. H. (2006). Empirical parameterization of setup
Estimating the prevalence of 26 health-related indicators at neighbourhood level in the Netherlands using structured additive regression.

Science.gov (United States)

van de Kassteele, Jan; Zwakhals, Laurens; Breugelmans, Oscar; Ameling, Caroline; van den Brink, Carolien

2017-07-01

Local policy makers increasingly need information on health-related indicators at smaller geographic levels like districts or neighbourhoods. Although more large data sources have become available, direct estimates of the prevalence of a health-related indicator cannot be produced for neighbourhoods for which only small samples or no samples are available. Small area estimation provides a solution, but unit-level models for binary-valued outcomes that can handle both non-linear effects of the predictors and spatially correlated random effects in a unified framework are rarely encountered. We used data on 26 binary-valued health-related indicators collected on 387,195 persons in the Netherlands. We associated the health-related indicators at the individual level with a set of 12 predictors obtained from national registry data. We formulated a structured additive regression model for small area estimation. The model captured potential non-linear relations between the predictors and the outcome through additive terms in a functional form using penalized splines and included a term that accounted for spatially correlated heterogeneity between neighbourhoods. The registry data were used to predict individual outcomes which in turn are aggregated into higher geographical levels, i.e. neighbourhoods. We validated our method by comparing the estimated prevalences with observed prevalences at the individual level and by comparing the estimated prevalences with direct estimates obtained by weighting methods at municipality level. We estimated the prevalence of the 26 health-related indicators for 415 municipalities, 2599 districts and 11,432 neighbourhoods in the Netherlands. We illustrate our method on overweight data and show that there are distinct geographic patterns in the overweight prevalence. Calibration plots show that the estimated prevalences agree very well with observed prevalences at the individual level. The estimated prevalences agree reasonably well with the
Application of Robust Regression and Bootstrap in Poductivity Analysis of GERD Variable in EU27

Directory of Open Access Journals (Sweden)

Dagmar Blatná

2014-06-01

Full Text Available The GERD is one of Europe 2020 headline indicators being tracked within the Europe 2020 strategy. The headline indicator is the 3% target for the GERD to be reached within the EU by 2020. Eurostat defi nes “GERD” as total gross domestic expenditure on research and experimental development in a percentage of GDP. GERD depends on numerous factors of a general economic background, namely of employment, innovation and research, science and technology. The values of these indicators vary among the European countries, and consequently the occurrence of outliers can be anticipated in corresponding analyses. In such a case, a classical statistical approach – the least squares method – can be highly unreliable, the robust regression methods representing an acceptable and useful tool. The aim of the present paper is to demonstrate the advantages of robust regression and applicability of the bootstrap approach in regression based on both classical and robust methods.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.