Uncovering state-dependent relationships in shallow lakes using Bayesian latent variable regression.
Vitense, Kelsey; Hanson, Mark A; Herwig, Brian R; Zimmer, Kyle D; Fieberg, John
2018-03-01
Ecosystems sometimes undergo dramatic shifts between contrasting regimes. Shallow lakes, for instance, can transition between two alternative stable states: a clear state dominated by submerged aquatic vegetation and a turbid state dominated by phytoplankton. Theoretical models suggest that critical nutrient thresholds differentiate three lake types: highly resilient clear lakes, lakes that may switch between clear and turbid states following perturbations, and highly resilient turbid lakes. For effective and efficient management of shallow lakes and other systems, managers need tools to identify critical thresholds and state-dependent relationships between driving variables and key system features. Using shallow lakes as a model system for which alternative stable states have been demonstrated, we developed an integrated framework using Bayesian latent variable regression (BLR) to classify lake states, identify critical total phosphorus (TP) thresholds, and estimate steady state relationships between TP and chlorophyll a (chl a) using cross-sectional data. We evaluated the method using data simulated from a stochastic differential equation model and compared its performance to k-means clustering with regression (KMR). We also applied the framework to data comprising 130 shallow lakes. For simulated data sets, BLR had high state classification rates (median/mean accuracy >97%) and accurately estimated TP thresholds and state-dependent TP-chl a relationships. Classification and estimation improved with increasing sample size and decreasing noise levels. Compared to KMR, BLR had higher classification rates and better approximated the TP-chl a steady state relationships and TP thresholds. We fit the BLR model to three different years of empirical shallow lake data, and managers can use the estimated bifurcation diagrams to prioritize lakes for management according to their proximity to thresholds and chance of successful rehabilitation. Our model improves upon
Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi
2018-04-01
Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.
Variable importance in latent variable regression models
Kvalheim, O.M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, A.K.; Westerhuis, J.A.
2014-01-01
The quality and practical usefulness of a regression model are a function of both interpretability and prediction performance. This work presents some new graphical tools for improved interpretation of latent variable regression models that can also assist in improved algorithms for variable
Establishment of regression dependences. Linear and nonlinear dependences
International Nuclear Information System (INIS)
Onishchenko, A.M.
1994-01-01
The main problems of determination of linear and 19 types of nonlinear regression dependences are completely discussed. It is taken into consideration that total dispersions are the sum of measurement dispersions and parameter variation dispersions themselves. Approaches to all dispersions determination are described. It is shown that the least square fit gives inconsistent estimation for industrial objects and processes. The correction methods by taking into account comparable measurement errors for both variable give an opportunity to obtain consistent estimation for the regression equation parameters. The condition of the correction technique application expediency is given. The technique for determination of nonlinear regression dependences taking into account the dependence form and comparable errors of both variables is described. 6 refs., 1 tab
Variable and subset selection in PLS regression
DEFF Research Database (Denmark)
Høskuldsson, Agnar
2001-01-01
The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...
transformation of independent variables in polynomial regression ...
African Journals Online (AJOL)
Ada
preferable when possible to work with a simple functional form in transformed variables rather than with a more complicated form in the original variables. In this paper, it is shown that linear transformations applied to independent variables in polynomial regression models affect the t ratio and hence the statistical ...
Creel, Scott; Creel, Michael
2009-11-01
1. Sampling error in annual estimates of population size creates two widely recognized problems for the analysis of population growth. First, if sampling error is mistakenly treated as process error, one obtains inflated estimates of the variation in true population trajectories (Staples, Taper & Dennis 2004). Second, treating sampling error as process error is thought to overestimate the importance of density dependence in population growth (Viljugrein et al. 2005; Dennis et al. 2006). 2. In ecology, state-space models are used to account for sampling error when estimating the effects of density and other variables on population growth (Staples et al. 2004; Dennis et al. 2006). In econometrics, regression with instrumental variables is a well-established method that addresses the problem of correlation between regressors and the error term, but requires fewer assumptions than state-space models (Davidson & MacKinnon 1993; Cameron & Trivedi 2005). 3. We used instrumental variables to account for sampling error and fit a generalized linear model to 472 annual observations of population size for 35 Elk Management Units in Montana, from 1928 to 2004. We compared this model with state-space models fit with the likelihood function of Dennis et al. (2006). We discuss the general advantages and disadvantages of each method. Briefly, regression with instrumental variables is valid with fewer distributional assumptions, but state-space models are more efficient when their distributional assumptions are met. 4. Both methods found that population growth was negatively related to population density and winter snow accumulation. Summer rainfall and wolf (Canis lupus) presence had much weaker effects on elk (Cervus elaphus) dynamics [though limitation by wolves is strong in some elk populations with well-established wolf populations (Creel et al. 2007; Creel & Christianson 2008)]. 5. Coupled with predictions for Montana from global and regional climate models, our results
How Robust Is Linear Regression with Dummy Variables?
Blankmeyer, Eric
2006-01-01
Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…
Asymptotics of Multivariate Regression with Consecutively Added Dependent Varibles
Raats, V.M.; van der Genugten, B.B.; Moors, J.J.A.
2004-01-01
We consider multivariate regression where new dependent variables are consecutively added during the experiment (or in time).So, viewed at the end of the experiment, the number of observations decreases with each added variable. The explanatory variables are observed throughout.In a previous paper
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Purposeful selection of variables in logistic regression
Directory of Open Access Journals (Sweden)
Williams David Keith
2008-12-01
Full Text Available Abstract Background The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results We show that the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS data. Conclusion If an analyst is in need of an algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.
Penalized variable selection in competing risks regression.
Fu, Zhixuan; Parikh, Chirag R; Zhou, Bingqing
2017-07-01
Penalized variable selection methods have been extensively studied for standard time-to-event data. Such methods cannot be directly applied when subjects are at risk of multiple mutually exclusive events, known as competing risks. The proportional subdistribution hazard (PSH) model proposed by Fine and Gray (J Am Stat Assoc 94:496-509, 1999) has become a popular semi-parametric model for time-to-event data with competing risks. It allows for direct assessment of covariate effects on the cumulative incidence function. In this paper, we propose a general penalized variable selection strategy that simultaneously handles variable selection and parameter estimation in the PSH model. We rigorously establish the asymptotic properties of the proposed penalized estimators and modify the coordinate descent algorithm for implementation. Simulation studies are conducted to demonstrate the good performance of the proposed method. Data from deceased donor kidney transplants from the United Network of Organ Sharing illustrate the utility of the proposed method.
Variable Selection for Regression Models of Percentile Flows
Fouad, G.
2017-12-01
Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high
Avoiding and Correcting Bias in Score-Based Latent Variable Regression with Discrete Manifest Items
Lu, Irene R. R.; Thomas, D. Roland
2008-01-01
This article considers models involving a single structural equation with latent explanatory and/or latent dependent variables where discrete items are used to measure the latent variables. Our primary focus is the use of scores as proxies for the latent variables and carrying out ordinary least squares (OLS) regression on such scores to estimate…
Regression calibration with more surrogates than mismeasured variables
Kipnis, Victor
2012-06-29
In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449-461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two-stage method. We also discuss extensions to different data structures.
Regression calibration with more surrogates than mismeasured variables
Kipnis, Victor; Midthune, Douglas; Freedman, Laurence S.; Carroll, Raymond J.
2012-01-01
In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449-461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two-stage method. We also discuss extensions to different data structures.
Variable selection and model choice in geoadditive regression models.
Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard
2009-06-01
Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.
Interpreting Multiple Linear Regression: A Guidebook of Variable Importance
Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim
2012-01-01
Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…
Variable selection in Logistic regression model with genetic algorithm.
Zhang, Zhongheng; Trevino, Victor; Hoseini, Sayed Shahabuddin; Belciug, Smaranda; Boopathi, Arumugam Manivanna; Zhang, Ping; Gorunescu, Florin; Subha, Velappan; Dai, Songshi
2018-02-01
Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.
Bayesian approach to errors-in-variables in regression models
Rozliman, Nur Aainaa; Ibrahim, Adriana Irawati Nur; Yunus, Rossita Mohammad
2017-05-01
In many applications and experiments, data sets are often contaminated with error or mismeasured covariates. When at least one of the covariates in a model is measured with error, Errors-in-Variables (EIV) model can be used. Measurement error, when not corrected, would cause misleading statistical inferences and analysis. Therefore, our goal is to examine the relationship of the outcome variable and the unobserved exposure variable given the observed mismeasured surrogate by applying the Bayesian formulation to the EIV model. We shall extend the flexible parametric method proposed by Hossain and Gustafson (2009) to another nonlinear regression model which is the Poisson regression model. We shall then illustrate the application of this approach via a simulation study using Markov chain Monte Carlo sampling methods.
Exhaustive Search for Sparse Variable Selection in Linear Regression
Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato
2018-04-01
We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.
Integrated Multiscale Latent Variable Regression and Application to Distillation Columns
Directory of Open Access Journals (Sweden)
Muddu Madakyaru
2013-01-01
Full Text Available Proper control of distillation columns requires estimating some key variables that are challenging to measure online (such as compositions, which are usually estimated using inferential models. Commonly used inferential models include latent variable regression (LVR techniques, such as principal component regression (PCR, partial least squares (PLS, and regularized canonical correlation analysis (RCCA. Unfortunately, measured practical data are usually contaminated with errors, which degrade the prediction abilities of inferential models. Therefore, noisy measurements need to be filtered to enhance the prediction accuracy of these models. Multiscale filtering has been shown to be a powerful feature extraction tool. In this work, the advantages of multiscale filtering are utilized to enhance the prediction accuracy of LVR models by developing an integrated multiscale LVR (IMSLVR modeling algorithm that integrates modeling and feature extraction. The idea behind the IMSLVR modeling algorithm is to filter the process data at different decomposition levels, model the filtered data from each level, and then select the LVR model that optimizes a model selection criterion. The performance of the developed IMSLVR algorithm is illustrated using three examples, one using synthetic data, one using simulated distillation column data, and one using experimental packed bed distillation column data. All examples clearly demonstrate the effectiveness of the IMSLVR algorithm over the conventional methods.
Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.
2008-04-01
Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
How a dependent's variable non-randomness affects taper equation ...
African Journals Online (AJOL)
In order to apply the least squares method in regression analysis, the values of the dependent variable Y should be random. In an example of regression analysis linear and nonlinear taper equations, which estimate the diameter of the tree dhi at any height of the tree hi, were compared. For each tree the diameter at the ...
Two-step variable selection in quantile regression models
Directory of Open Access Journals (Sweden)
FAN Yali
2015-06-01
Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions, in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform ℓ1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
Directory of Open Access Journals (Sweden)
Maarten van Smeden
2016-11-01
Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.
van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B
2016-11-24
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Rodríguez-Barranco, Miguel; Tobías, Aurelio; Redondo, Daniel; Molina-Portillo, Elena; Sánchez, María José
2017-03-17
Meta-analysis is very useful to summarize the effect of a treatment or a risk factor for a given disease. Often studies report results based on log-transformed variables in order to achieve the principal assumptions of a linear regression model. If this is the case for some, but not all studies, the effects need to be homogenized. We derived a set of formulae to transform absolute changes into relative ones, and vice versa, to allow including all results in a meta-analysis. We applied our procedure to all possible combinations of log-transformed independent or dependent variables. We also evaluated it in a simulation based on two variables either normally or asymmetrically distributed. In all the scenarios, and based on different change criteria, the effect size estimated by the derived set of formulae was equivalent to the real effect size. To avoid biased estimates of the effect, this procedure should be used with caution in the case of independent variables with asymmetric distributions that significantly differ from the normal distribution. We illustrate an application of this procedure by an application to a meta-analysis on the potential effects on neurodevelopment in children exposed to arsenic and manganese. The procedure proposed has been shown to be valid and capable of expressing the effect size of a linear regression model based on different change criteria in the variables. Homogenizing the results from different studies beforehand allows them to be combined in a meta-analysis, independently of whether the transformations had been performed on the dependent and/or independent variables.
Reward-dependent modulation of movement variability.
Pekny, Sarah E; Izawa, Jun; Shadmehr, Reza
2015-03-04
Movement variability is often considered an unwanted byproduct of a noisy nervous system. However, variability can signal a form of implicit exploration, indicating that the nervous system is intentionally varying the motor commands in search of actions that yield the greatest success. Here, we investigated the role of the human basal ganglia in controlling reward-dependent motor variability as measured by trial-to-trial changes in performance during a reaching task. We designed an experiment in which the only performance feedback was success or failure and quantified how reach variability was modulated as a function of the probability of reward. In healthy controls, reach variability increased as the probability of reward decreased. Control of variability depended on the history of past rewards, with the largest trial-to-trial changes occurring immediately after an unrewarded trial. In contrast, in participants with Parkinson's disease, a known example of basal ganglia dysfunction, reward was a poor modulator of variability; that is, the patients showed an impaired ability to increase variability in response to decreases in the probability of reward. This was despite the fact that, after rewarded trials, reach variability in the patients was comparable to healthy controls. In summary, we found that movement variability is partially a form of exploration driven by the recent history of rewards. When the function of the human basal ganglia is compromised, the reward-dependent control of movement variability is impaired, particularly affecting the ability to increase variability after unsuccessful outcomes. Copyright © 2015 the authors 0270-6474/15/354015-10$15.00/0.
Régression orthogonale de trois variables liées Orthogonal Regression of Linked Variables
Directory of Open Access Journals (Sweden)
Phelizon J. -F.
2006-11-01
Full Text Available On propose dans cet article un algorithme permettant de déterminer les paramètres de l'équation de régression orthogonale de trois variables liées par une relation linéaire. Cet algorithme est remarquablement simple puisqu'il n'implique pas de devoir calculer les valeurs propres de la matrice des covariances. D'autre part, on montre que l'équation obtenue (celle d'une droite dans l'espace à trois dimensions caractérise aussi une droite dans un diagramme triangulaire, ce qui rend l'interprétation des résultats immédiate. L'exposé théorique se poursuit par deux exemples qui ont été effectivement testés sur ordinateur. This article proposes on algorithm for determining the parameters of the equation for the orthogonal regression of three variables linked by a linear relation. This algorithm is remarkably simple in that il does not require the actual values of the covariance matrix to be calculated. In addition, the equation obtained (for a straight line in three-dimensional space is shown to characterize a straight line in a triang ular diagram as well, thus making il immediately possible ta interpret the resulis. The theoretical explanation continues with two examples that were actually tried out on a computer.
Joint Bayesian variable and graph selection for regression models with network-structured predictors
Peterson, C. B.; Stingo, F. C.; Vannucci, M.
2015-01-01
In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications since it allows the identification of pathways of functionally related genes or proteins which impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings, and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival. PMID:26514925
The number of subjects per variable required in linear regression analyses
P.C. Austin (Peter); E.W. Steyerberg (Ewout)
2015-01-01
textabstractObjectives To determine the number of independent variables that can be included in a linear regression model. Study Design and Setting We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression
Problems Identifying Independent and Dependent Variables
Leatham, Keith R.
2012-01-01
This paper discusses one step from the scientific method--that of identifying independent and dependent variables--from both scientific and mathematical perspectives. It begins by analyzing an episode from a middle school mathematics classroom that illustrates the need for students and teachers alike to develop a robust understanding of…
Benford's law and continuous dependent random variables
Becker, Thealexa; Burt, David; Corcoran, Taylor C.; Greaves-Tunnell, Alec; Iafrate, Joseph R.; Jing, Joy; Miller, Steven J.; Porfilio, Jaclyn D.; Ronan, Ryan; Samranvedhya, Jirapat; Strauch, Frederick W.; Talbut, Blaine
2018-01-01
Many mathematical, man-made and natural systems exhibit a leading-digit bias, where a first digit (base 10) of 1 occurs not 11% of the time, as one would expect if all digits were equally likely, but rather 30%. This phenomenon is known as Benford's Law. Analyzing which datasets adhere to Benford's Law and how quickly Benford behavior sets in are the two most important problems in the field. Most previous work studied systems of independent random variables, and relied on the independence in their analyses. Inspired by natural processes such as particle decay, we study the dependent random variables that emerge from models of decomposition of conserved quantities. We prove that in many instances the distribution of lengths of the resulting pieces converges to Benford behavior as the number of divisions grow, and give several conjectures for other fragmentation processes. The main difficulty is that the resulting random variables are dependent. We handle this by using tools from Fourier analysis and irrationality exponents to obtain quantified convergence rates as well as introducing and developing techniques to measure and control the dependencies. The construction of these tools is one of the major motivations of this work, as our approach can be applied to many other dependent systems. As an example, we show that the n ! entries in the determinant expansions of n × n matrices with entries independently drawn from nice random variables converges to Benford's Law.
Fouad, Geoffrey; Skupin, André; Hope, Allen
2016-04-01
The flow duration curve (FDC) is one of the most widely used tools to quantify streamflow. Its percentile flows are often required for water resource applications, but these values must be predicted for ungauged basins with insufficient or no streamflow data. Regional regression is a commonly used approach for predicting percentile flows that involves identifying hydrologic regions and calibrating regression models to each region. The independent variables used to describe the physiographic and climatic setting of the basins are a critical component of regional regression, yet few studies have investigated their effect on resulting predictions. In this study, the complexity of the independent variables needed for regional regression is investigated. Different levels of variable complexity are applied for a regional regression consisting of 918 basins in the US. Both the hydrologic regions and regression models are determined according to the different sets of variables, and the accuracy of resulting predictions is assessed. The different sets of variables include (1) a simple set of three variables strongly tied to the FDC (mean annual precipitation, potential evapotranspiration, and baseflow index), (2) a traditional set of variables describing the average physiographic and climatic conditions of the basins, and (3) a more complex set of variables extending the traditional variables to include statistics describing the distribution of physiographic data and temporal components of climatic data. The latter set of variables is not typically used in regional regression, and is evaluated for its potential to predict percentile flows. The simplest set of only three variables performed similarly to the other more complex sets of variables. Traditional variables used to describe climate, topography, and soil offered little more to the predictions, and the experimental set of variables describing the distribution of basin data in more detail did not improve predictions
Meaney, Christopher; Moineddin, Rahim
2014-01-24
In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the
Improved Regression Analysis of Temperature-Dependent Strain-Gage Balance Calibration Data
Ulbrich, N.
2015-01-01
An improved approach is discussed that may be used to directly include first and second order temperature effects in the load prediction algorithm of a wind tunnel strain-gage balance. The improved approach was designed for the Iterative Method that fits strain-gage outputs as a function of calibration loads and uses a load iteration scheme during the wind tunnel test to predict loads from measured gage outputs. The improved approach assumes that the strain-gage balance is at a constant uniform temperature when it is calibrated and used. First, the method introduces a new independent variable for the regression analysis of the balance calibration data. The new variable is designed as the difference between the uniform temperature of the balance and a global reference temperature. This reference temperature should be the primary calibration temperature of the balance so that, if needed, a tare load iteration can be performed. Then, two temperature{dependent terms are included in the regression models of the gage outputs. They are the temperature difference itself and the square of the temperature difference. Simulated temperature{dependent data obtained from Triumph Aerospace's 2013 calibration of NASA's ARC-30K five component semi{span balance is used to illustrate the application of the improved approach.
Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection
Chen, Lisha; Huang, Jianhua Z.
2012-01-01
and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group
Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection
Chen, Lisha
2012-12-01
The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.
Online Support Vector Regression with Varying Parameters for Time-Dependent Data
International Nuclear Information System (INIS)
Omitaomu, Olufemi A.; Jeong, Myong K.; Badiru, Adedeji B.
2011-01-01
Support vector regression (SVR) is a machine learning technique that continues to receive interest in several domains including manufacturing, engineering, and medicine. In order to extend its application to problems in which datasets arrive constantly and in which batch processing of the datasets is infeasible or expensive, an accurate online support vector regression (AOSVR) technique was proposed. The AOSVR technique efficiently updates a trained SVR function whenever a sample is added to or removed from the training set without retraining the entire training data. However, the AOSVR technique assumes that the new samples and the training samples are of the same characteristics; hence, the same value of SVR parameters is used for training and prediction. This assumption is not applicable to data samples that are inherently noisy and non-stationary such as sensor data. As a result, we propose Accurate On-line Support Vector Regression with Varying Parameters (AOSVR-VP) that uses varying SVR parameters rather than fixed SVR parameters, and hence accounts for the variability that may exist in the samples. To accomplish this objective, we also propose a generalized weight function to automatically update the weights of SVR parameters in on-line monitoring applications. The proposed function allows for lower and upper bounds for SVR parameters. We tested our proposed approach and compared results with the conventional AOSVR approach using two benchmark time series data and sensor data from nuclear power plant. The results show that using varying SVR parameters is more applicable to time dependent data.
Nobuoki, Eshima; Minoru, Tabata; Geng, Zhi; Department of Medical Information Analysis, Faculty of Medicine, Oita Medical University; Department of Applied Mathematics, Faculty of Engineering, Kobe University; Department of Probability and Statistics, Peking University
2001-01-01
This paper discusses path analysis of categorical variables with logistic regression models. The total, direct and indirect effects in fully recursive causal systems are considered by using model parameters. These effects can be explained in terms of log odds ratios, uncertainty differences, and an inner product of explanatory variables and a response variable. A study on food choice of alligators as a numerical exampleis reanalysed to illustrate the present approach.
Heteroscedasticity as a Basis of Direction Dependence in Reversible Linear Regression Models.
Wiedermann, Wolfgang; Artner, Richard; von Eye, Alexander
2017-01-01
Heteroscedasticity is a well-known issue in linear regression modeling. When heteroscedasticity is observed, researchers are advised to remedy possible model misspecification of the explanatory part of the model (e.g., considering alternative functional forms and/or omitted variables). The present contribution discusses another source of heteroscedasticity in observational data: Directional model misspecifications in the case of nonnormal variables. Directional misspecification refers to situations where alternative models are equally likely to explain the data-generating process (e.g., x → y versus y → x). It is shown that the homoscedasticity assumption is likely to be violated in models that erroneously treat true nonnormal predictors as response variables. Recently, Direction Dependence Analysis (DDA) has been proposed as a framework to empirically evaluate the direction of effects in linear models. The present study links the phenomenon of heteroscedasticity with DDA and describes visual diagnostics and nine homoscedasticity tests that can be used to make decisions concerning the direction of effects in linear models. Results of a Monte Carlo simulation that demonstrate the adequacy of the approach are presented. An empirical example is provided, and applicability of the methodology in cases of violated assumptions is discussed.
The number of subjects per variable required in linear regression analyses.
Austin, Peter C; Steyerberg, Ewout W
2015-06-01
To determine the number of independent variables that can be included in a linear regression model. We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R(2) of the fitted model. A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R(2), although adjusted R(2) estimates behaved well. The bias in estimating the model R(2) statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
A Spline-Based Lack-Of-Fit Test for Independent Variable Effect in Poisson Regression.
Li, Chin-Shang; Tu, Wanzhu
2007-05-01
In regression analysis of count data, independent variables are often modeled by their linear effects under the assumption of log-linearity. In reality, the validity of such an assumption is rarely tested, and its use is at times unjustifiable. A lack-of-fit test is proposed for the adequacy of a postulated functional form of an independent variable within the framework of semiparametric Poisson regression models based on penalized splines. It offers added flexibility in accommodating the potentially non-loglinear effect of the independent variable. A likelihood ratio test is constructed for the adequacy of the postulated parametric form, for example log-linearity, of the independent variable effect. Simulations indicate that the proposed model performs well, and misspecified parametric model has much reduced power. An example is given.
Robust best linear estimation for regression analysis using surrogate and instrumental variables.
Wang, C Y
2012-04-01
We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case-control study.
Directory of Open Access Journals (Sweden)
Jonathan E. Leightner
2012-01-01
Full Text Available The omitted variables problem is one of regression analysis’ most serious problems. The standard approach to the omitted variables problem is to find instruments, or proxies, for the omitted variables, but this approach makes strong assumptions that are rarely met in practice. This paper introduces best projection reiterative truncated projected least squares (BP-RTPLS, the third generation of a technique that solves the omitted variables problem without using proxies or instruments. This paper presents a theoretical argument that BP-RTPLS produces unbiased reduced form estimates when there are omitted variables. This paper also provides simulation evidence that shows OLS produces between 250% and 2450% more errors than BP-RTPLS when there are omitted variables and when measurement and round-off error is 1 percent or less. In an example, the government spending multiplier, , is estimated using annual data for the USA between 1929 and 2010.
Schmidtmann, I; Elsäßer, A; Weinmann, A; Binder, H
2014-12-30
For determining a manageable set of covariates potentially influential with respect to a time-to-event endpoint, Cox proportional hazards models can be combined with variable selection techniques, such as stepwise forward selection or backward elimination based on p-values, or regularized regression techniques such as component-wise boosting. Cox regression models have also been adapted for dealing with more complex event patterns, for example, for competing risks settings with separate, cause-specific hazard models for each event type, or for determining the prognostic effect pattern of a variable over different landmark times, with one conditional survival model for each landmark. Motivated by a clinical cancer registry application, where complex event patterns have to be dealt with and variable selection is needed at the same time, we propose a general approach for linking variable selection between several Cox models. Specifically, we combine score statistics for each covariate across models by Fisher's method as a basis for variable selection. This principle is implemented for a stepwise forward selection approach as well as for a regularized regression technique. In an application to data from hepatocellular carcinoma patients, the coupled stepwise approach is seen to facilitate joint interpretation of the different cause-specific Cox models. In conditional survival models at landmark times, which address updates of prediction as time progresses and both treatment and other potential explanatory variables may change, the coupled regularized regression approach identifies potentially important, stably selected covariates together with their effect time pattern, despite having only a small number of events. These results highlight the promise of the proposed approach for coupling variable selection between Cox models, which is particularly relevant for modeling for clinical cancer registries with their complex event patterns. Copyright © 2014 John Wiley & Sons
Directory of Open Access Journals (Sweden)
Horst Entorf
2015-07-01
Full Text Available Two alternative hypotheses – referred to as opportunity- and stigma-based behavior – suggest that the magnitude of the link between unemployment and crime also depends on preexisting local crime levels. In order to analyze conjectured nonlinearities between both variables, we use quantile regressions applied to German district panel data. While both conventional OLS and quantile regressions confirm the positive link between unemployment and crime for property crimes, results for assault differ with respect to the method of estimation. Whereas conventional mean regressions do not show any significant effect (which would confirm the usual result found for violent crimes in the literature, quantile regression reveals that size and importance of the relationship are conditional on the crime rate. The partial effect is significantly positive for moderately low and median quantiles of local assault rates.
The use of cognitive ability measures as explanatory variables in regression analysis.
Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J
2012-12-01
Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual's wage, or a decision such as an individual's education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score , constructed via standard psychometric practice from individuals' responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a "mixed effects structural equations" (MESE) model, may be more appropriate in many circumstances.
Penalized regression procedures for variable selection in the potential outcomes framework.
Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L
2015-05-10
A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple 'impute, then select' class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data, and imputation are drawn. A difference least absolute shrinkage and selection operator algorithm is defined, along with its multiple imputation analogs. The procedures are illustrated using a well-known right-heart catheterization dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Polychotomization of continuous variables in regression models based on the overall C index
Directory of Open Access Journals (Sweden)
Bax Leon
2006-12-01
Full Text Available Abstract Background When developing multivariable regression models for diagnosis or prognosis, continuous independent variables can be categorized to make a prediction table instead of a prediction formula. Although many methods have been proposed to dichotomize prognostic variables, to date there has been no integrated method for polychotomization. The latter is necessary when dichotomization results in too much loss of information or when central values refer to normal states and more dispersed values refer to less preferable states, a situation that is not unusual in medical settings (e.g. body temperature, blood pressure. The goal of our study was to develop a theoretical and practical method for polychotomization. Methods We used the overall discrimination index C, introduced by Harrel, as a measure of the predictive ability of an independent regressor variable and derived a method for polychotomization mathematically. Since the naïve application of our method, like some existing methods, gives rise to positive bias, we developed a parametric method that minimizes this bias and assessed its performance by the use of Monte Carlo simulation. Results The overall C is closely related to the area under the ROC curve and the produced di(polychotomized variable's predictive performance is comparable to the original continuous variable. The simulation shows that the parametric method is essentially unbiased for both the estimates of performance and the cutoff points. Application of our method to the predictor variables of a previous study on rhabdomyolysis shows that it can be used to make probability profile tables that are applicable to the diagnosis or prognosis of individual patient status. Conclusion We propose a polychotomization (including dichotomization method for independent continuous variables in regression models based on the overall discrimination index C and clarified its meaning mathematically. To avoid positive bias in
Using the classical linear regression model in analysis of the dependences of conveyor belt life
Directory of Open Access Journals (Sweden)
Miriam Andrejiová
2013-12-01
Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.
Value of Construction Company and its Dependence on Significant Variables
Vítková, E.; Hromádka, V.; Ondrušková, E.
2017-10-01
The paper deals with the value of the construction company assessment respecting usable approaches and determinable variables. The reasons of the value of the construction company assessment are different, but the most important reasons are the sale or the purchase of the company, the liquidation of the company, the fusion of the company with another subject or the others. According the reason of the value assessment it is possible to determine theoretically different approaches for valuation, mainly it concerns about the yield method of valuation and the proprietary method of valuation. Both approaches are dependant of detailed input variables, which quality will influence the final assessment of the company´s value. The main objective of the paper is to suggest, according to the analysis, possible ways of input variables, mainly in the form of expected cash-flows or the profit, determination. The paper is focused mainly on methods of time series analysis, regression analysis and mathematical simulation utilization. As the output, the results of the analysis on the case study will be demonstrated.
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Application of Robust Regression and Bootstrap in Poductivity Analysis of GERD Variable in EU27
Directory of Open Access Journals (Sweden)
Dagmar Blatná
2014-06-01
Full Text Available The GERD is one of Europe 2020 headline indicators being tracked within the Europe 2020 strategy. The headline indicator is the 3% target for the GERD to be reached within the EU by 2020. Eurostat defi nes “GERD” as total gross domestic expenditure on research and experimental development in a percentage of GDP. GERD depends on numerous factors of a general economic background, namely of employment, innovation and research, science and technology. The values of these indicators vary among the European countries, and consequently the occurrence of outliers can be anticipated in corresponding analyses. In such a case, a classical statistical approach – the least squares method – can be highly unreliable, the robust regression methods representing an acceptable and useful tool. The aim of the present paper is to demonstrate the advantages of robust regression and applicability of the bootstrap approach in regression based on both classical and robust methods.
Age dependant somatometric and cephalometric variables among ...
African Journals Online (AJOL)
Background: The process of growth passes through stages of developmental processes. This stage is the age. Age is known to affect many parameters in the body and this includes somatometric and cephalometric variables. Methods: The study was conducted with a total number of 409 students of university of Jos, ...
Maximal Inequalities for Dependent Random Variables
DEFF Research Database (Denmark)
Hoffmann-Jorgensen, Jorgen
2016-01-01
Maximal inequalities play a crucial role in many probabilistic limit theorem; for instance, the law of large numbers, the law of the iterated logarithm, the martingale limit theorem and the central limit theorem. Let X-1, X-2,... be random variables with partial sums S-k = X-1 + ... + X-k. Then a......Maximal inequalities play a crucial role in many probabilistic limit theorem; for instance, the law of large numbers, the law of the iterated logarithm, the martingale limit theorem and the central limit theorem. Let X-1, X-2,... be random variables with partial sums S-k = X-1 + ... + X......-k. Then a maximal inequality gives conditions ensuring that the maximal partial sum M-n = max(1) (...
Lunt, Mark
2015-07-01
In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method
Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.
2017-04-01
Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP
NetRaVE: constructing dependency networks using sparse linear regression
DEFF Research Database (Denmark)
Phatak, A.; Kiiveri, H.; Clemmensen, Line Katrine Harder
2010-01-01
NetRaVE is a small suite of R functions for generating dependency networks using sparse regression methods. Such networks provide an alternative to interpreting 'top n lists' of genes arising out of an analysis of microarray data, and they provide a means of organizing and visualizing the resulting...
Future-dependent Flow Policies with Prophetic Variables
DEFF Research Database (Denmark)
Li, Ximeng; Nielson, Flemming; Nielson, Hanne Riis
2016-01-01
future-dependent flow policies- policies that can depend on not only the current values of variables, but also their final values. The final values are referred to using what we call prophetic variables, just as the initial values can be referenced using logical variables in Hoare logic. We develop...... and enforce a notion of future-dependent security for open systems, in the spirit of "non-deducibility on strategies". We also illustrate our approach in scenarios where future-dependency has advantages over present-dependency and avoids mixtures of upgradings and downgradings....
Evans, Wiley; Mathis, Jeremy T.; Winsor, Peter; Statscewich, Hank; Whitledge, Terry E.
2013-01-01
northern Gulf of Alaska (GOA) shelf experiences carbonate system variability on seasonal and annual time scales, but little information exists to resolve higher frequency variability in this region. To resolve this variability using platforms-of-opportunity, we present multiple linear regression (MLR) models constructed from hydrographic data collected along the Northeast Pacific Global Ocean Ecosystems Dynamics (GLOBEC) Seward Line. The empirical algorithms predict dissolved inorganic carbon (DIC) and total alkalinity (TA) using observations of nitrate (NO3-), temperature, salinity and pressure from the surface to 500 m, with R2s > 0.97 and RMSE values of 11 µmol kg-1 for DIC and 9 µmol kg-1 for TA. We applied these relationships to high-resolution NO3- data sets collected during a novel 20 h glider flight and a GLOBEC mesoscale SeaSoar survey. Results from the glider flight demonstrated time/space along-isopycnal variability of aragonite saturations (Ωarag) associated with a dicothermal layer (a cold near-surface layer found in high latitude oceans) that rivaled changes seen vertically through the thermocline. The SeaSoar survey captured the uplift to aragonite saturation horizon (depth where Ωarag = 1) shoaled to a previously unseen depth in the northern GOA. This work is similar to recent studies aimed at predicting the carbonate system in continental margin settings, albeit demonstrates that a NO3--based approach can be applied to high-latitude data collected from platforms capable of high-frequency measurements.
Modified Regression Correlation Coefficient for Poisson Regression Model
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Nimon, Kim; Henson, Robin K.
2015-01-01
The authors empirically examined whether the validity of a residualized dependent variable after covariance adjustment is comparable to that of the original variable of interest. When variance of a dependent variable is removed as a result of one or more covariates, the residual variance may not reflect the same meaning. Using the pretest-posttest…
Modeling Source Water TOC Using Hydroclimate Variables and Local Polynomial Regression.
Samson, Carleigh C; Rajagopalan, Balaji; Summers, R Scott
2016-04-19
To control disinfection byproduct (DBP) formation in drinking water, an understanding of the source water total organic carbon (TOC) concentration variability can be critical. Previously, TOC concentrations in water treatment plant source waters have been modeled using streamflow data. However, the lack of streamflow data or unimpaired flow scenarios makes it difficult to model TOC. In addition, TOC variability under climate change further exacerbates the problem. Here we proposed a modeling approach based on local polynomial regression that uses climate, e.g. temperature, and land surface, e.g., soil moisture, variables as predictors of TOC concentration, obviating the need for streamflow. The local polynomial approach has the ability to capture non-Gaussian and nonlinear features that might be present in the relationships. The utility of the methodology is demonstrated using source water quality and climate data in three case study locations with surface source waters including river and reservoir sources. The models show good predictive skill in general at these locations, with lower skills at locations with the most anthropogenic influences in their streams. Source water TOC predictive models can provide water treatment utilities important information for making treatment decisions for DBP regulation compliance under future climate scenarios.
To, Minh-Son; Prakash, Shivesh; Poonnoose, Santosh I; Bihari, Shailesh
2018-05-01
The study uses meta-regression analysis to quantify the dose-dependent effects of statin pharmacotherapy on vasospasm, delayed ischemic neurologic deficits (DIND), and mortality in aneurysmal subarachnoid hemorrhage. Prospective, retrospective observational studies, and randomized controlled trials (RCTs) were retrieved by a systematic database search. Summary estimates were expressed as absolute risk (AR) for a given statin dose or control (placebo). Meta-regression using inverse variance weighting and robust variance estimation was performed to assess the effect of statin dose on transformed AR in a random effects model. Dose-dependence of predicted AR with 95% confidence interval (CI) was recovered by using Miller's Freeman-Tukey inverse. The database search and study selection criteria yielded 18 studies (2594 patients) for analysis. These included 12 RCTs, 4 retrospective observational studies, and 2 prospective observational studies. Twelve studies investigated simvastatin, whereas the remaining studies investigated atorvastatin, pravastatin, or pitavastatin, with simvastatin-equivalent doses ranging from 20 to 80 mg. Meta-regression revealed dose-dependent reductions in Freeman-Tukey-transformed AR of vasospasm (slope coefficient -0.00404, 95% CI -0.00720 to -0.00087; P = 0.0321), DIND (slope coefficient -0.00316, 95% CI -0.00586 to -0.00047; P = 0.0392), and mortality (slope coefficient -0.00345, 95% CI -0.00623 to -0.00067; P = 0.0352). The present meta-regression provides weak evidence for dose-dependent reductions in vasospasm, DIND and mortality associated with acute statin use after aneurysmal subarachnoid hemorrhage. However, the analysis was limited by substantial heterogeneity among individual studies. Greater dosing strategies are a potential consideration for future RCTs. Copyright © 2018 Elsevier Inc. All rights reserved.
Regression Analysis for Multivariate Dependent Count Data Using Convolved Gaussian Processes
Sofro, A'yunin; Shi, Jian Qing; Cao, Chunzheng
2017-01-01
Research on Poisson regression analysis for dependent data has been developed rapidly in the last decade. One of difficult problems in a multivariate case is how to construct a cross-correlation structure and at the meantime make sure that the covariance matrix is positive definite. To address the issue, we propose to use convolved Gaussian process (CGP) in this paper. The approach provides a semi-parametric model and offers a natural framework for modeling common mean structure and covarianc...
Dons, Evi; Van Poppel, Martine; Kochan, Bruno; Wets, Geert; Int Panis, Luc
2013-08-01
Land use regression (LUR) modeling is a statistical technique used to determine exposure to air pollutants in epidemiological studies. Time-activity diaries can be combined with LUR models, enabling detailed exposure estimation and limiting exposure misclassification, both in shorter and longer time lags. In this study, the traffic related air pollutant black carbon was measured with μ-aethalometers on a 5-min time base at 63 locations in Flanders, Belgium. The measurements show that hourly concentrations vary between different locations, but also over the day. Furthermore the diurnal pattern is different for street and background locations. This suggests that annual LUR models are not sufficient to capture all the variation. Hourly LUR models for black carbon are developed using different strategies: by means of dummy variables, with dynamic dependent variables and/or with dynamic and static independent variables. The LUR model with 48 dummies (weekday hours and weekend hours) performs not as good as the annual model (explained variance of 0.44 compared to 0.77 in the annual model). The dataset with hourly concentrations of black carbon can be used to recalibrate the annual model, resulting in many of the original explaining variables losing their statistical significance, and certain variables having the wrong direction of effect. Building new independent hourly models, with static or dynamic covariates, is proposed as the best solution to solve these issues. R2 values for hourly LUR models are mostly smaller than the R2 of the annual model, ranging from 0.07 to 0.8. Between 6 a.m. and 10 p.m. on weekdays the R2 approximates the annual model R2. Even though models of consecutive hours are developed independently, similar variables turn out to be significant. Using dynamic covariates instead of static covariates, i.e. hourly traffic intensities and hourly population densities, did not significantly improve the models' performance.
Abad, Cesar C C; Barros, Ronaldo V; Bertuzzi, Romulo; Gagliardi, João F L; Lima-Silva, Adriano E; Lambert, Mike I; Pires, Flavio O
2016-06-01
The aim of this study was to verify the power of VO 2max , peak treadmill running velocity (PTV), and running economy (RE), unadjusted or allometrically adjusted, in predicting 10 km running performance. Eighteen male endurance runners performed: 1) an incremental test to exhaustion to determine VO 2max and PTV; 2) a constant submaximal run at 12 km·h -1 on an outdoor track for RE determination; and 3) a 10 km running race. Unadjusted (VO 2max , PTV and RE) and adjusted variables (VO 2max 0.72 , PTV 0.72 and RE 0.60 ) were investigated through independent multiple regression models to predict 10 km running race time. There were no significant correlations between 10 km running time and either the adjusted or unadjusted VO 2max . Significant correlations (p 0.84 and power > 0.88. The allometrically adjusted predictive model was composed of PTV 0.72 and RE 0.60 and explained 83% of the variance in 10 km running time with a standard error of the estimate (SEE) of 1.5 min. The unadjusted model composed of a single PVT accounted for 72% of the variance in 10 km running time (SEE of 1.9 min). Both regression models provided powerful estimates of 10 km running time; however, the unadjusted PTV may provide an uncomplicated estimation.
Fixed transaction costs and modelling limited dependent variables
Hempenius, A.L.
1994-01-01
As an alternative to the Tobit model, for vectors of limited dependent variables, I suggest a model, which follows from explicitly using fixed costs, if appropriate of course, in the utility function of the decision-maker.
Hoeffding’s Inequality for Sums of Dependent Random Variables
Czech Academy of Sciences Publication Activity Database
Pelekis, Christos; Ramon, J.
2017-01-01
Roč. 14, č. 6 (2017), č. článku 243. ISSN 1660-5446 Institutional support: RVO:67985807 Keywords : dependent random variables * Hoeffding’s inequality * k-wise independent random variables * martingale differences Subject RIV: BA - General Mathematics OBOR OECD: Pure mathematics Impact factor: 0.868, year: 2016
On Direction of Dependence in Latent Variable Contexts
von Eye, Alexander; Wiedermann, Wolfgang
2014-01-01
Approaches to determining direction of dependence in nonexperimental data are based on the relation between higher-than second-order moments on one side and correlation and regression models on the other. These approaches have experienced rapid development and are being applied in contexts such as research on partner violence, attention deficit…
Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.
Liu, Cong; Wang, Xujun; Genchev, Georgi Z; Lu, Hui
2017-07-15
New developments in high-throughput genomic technologies have enabled the measurement of diverse types of omics biomarkers in a cost-efficient and clinically-feasible manner. Developing computational methods and tools for analysis and translation of such genomic data into clinically-relevant information is an ongoing and active area of investigation. For example, several studies have utilized an unsupervised learning framework to cluster patients by integrating omics data. Despite such recent advances, predicting cancer prognosis using integrated omics biomarkers remains a challenge. There is also a shortage of computational tools for predicting cancer prognosis by using supervised learning methods. The current standard approach is to fit a Cox regression model by concatenating the different types of omics data in a linear manner, while penalty could be added for feature selection. A more powerful approach, however, would be to incorporate data by considering relationships among omics datatypes. Here we developed two methods: a SKI-Cox method and a wLASSO-Cox method to incorporate the association among different types of omics data. Both methods fit the Cox proportional hazards model and predict a risk score based on mRNA expression profiles. SKI-Cox borrows the information generated by these additional types of omics data to guide variable selection, while wLASSO-Cox incorporates this information as a penalty factor during model fitting. We show that SKI-Cox and wLASSO-Cox models select more true variables than a LASSO-Cox model in simulation studies. We assess the performance of SKI-Cox and wLASSO-Cox using TCGA glioblastoma multiforme and lung adenocarcinoma data. In each case, mRNA expression, methylation, and copy number variation data are integrated to predict the overall survival time of cancer patients. Our methods achieve better performance in predicting patients' survival in glioblastoma and lung adenocarcinoma. Copyright © 2017. Published by Elsevier
Das, Siddhartha; Siopsis, George; Weedbrook, Christian
2018-02-01
With the significant advancement in quantum computation during the past couple of decades, the exploration of machine-learning subroutines using quantum strategies has become increasingly popular. Gaussian process regression is a widely used technique in supervised classical machine learning. Here we introduce an algorithm for Gaussian process regression using continuous-variable quantum systems that can be realized with technology based on photonic quantum computers under certain assumptions regarding distribution of data and availability of efficient quantum access. Our algorithm shows that by using a continuous-variable quantum computer a dramatic speedup in computing Gaussian process regression can be achieved, i.e., the possibility of exponentially reducing the time to compute. Furthermore, our results also include a continuous-variable quantum-assisted singular value decomposition method of nonsparse low rank matrices and forms an important subroutine in our Gaussian process regression algorithm.
Kim, T. W.; Park, G. H.
2014-12-01
Seasonal variation of aragonite saturation state (Ωarag) in the North Pacific Ocean (NPO) was investigated, using multiple linear regression (MLR) models produced from the PACIFICA (Pacific Ocean interior carbon) dataset. Data within depth ranges of 50-1200m were used to derive MLR models, and three parameters (potential temperature, nitrate, and apparent oxygen utilization (AOU)) were chosen as predictor variables because these parameters are associated with vertical mixing, DIC (dissolved inorganic carbon) removal and release which all affect Ωarag in water column directly or indirectly. The PACIFICA dataset was divided into 5° × 5° grids, and a MLR model was produced in each grid, giving total 145 independent MLR models over the NPO. Mean RMSE (root mean square error) and r2 (coefficient of determination) of all derived MLR models were approximately 0.09 and 0.96, respectively. Then the obtained MLR coefficients for each of predictor variables and an intercept were interpolated over the study area, thereby making possible to allocate MLR coefficients to data-sparse ocean regions. Predictability from the interpolated coefficients was evaluated using Hawaiian time-series data, and as a result mean residual between measured and predicted Ωarag values was approximately 0.08, which is less than the mean RMSE of our MLR models. The interpolated MLR coefficients were combined with seasonal climatology of World Ocean Atlas 2013 (1° × 1°) to produce seasonal Ωarag distributions over various depths. Large seasonal variability in Ωarag was manifested in the mid-latitude Western NPO (24-40°N, 130-180°E) and low-latitude Eastern NPO (0-12°N, 115-150°W). In the Western NPO, seasonal fluctuations of water column stratification appeared to be responsible for the seasonal variation in Ωarag (~ 0.5 at 50 m) because it closely followed temperature variations in a layer of 0-75 m. In contrast, remineralization of organic matter was the main cause for the seasonal
Nowicki, M. A.; Hearne, M.; Thompson, E.; Wald, D. J.
2012-12-01
Seismically induced landslides present a costly and often fatal threats in many mountainous regions. Substantial effort has been invested to understand where seismically induced landslides may occur in the future. Both slope-stability methods and, more recently, statistical approaches to the problem are described throughout the literature. Though some regional efforts have succeeded, no uniformly agreed-upon method is available for predicting the likelihood and spatial extent of seismically induced landslides. For use in the U. S. Geological Survey (USGS) Prompt Assessment of Global Earthquakes for Response (PAGER) system, we would like to routinely make such estimates, in near-real time, around the globe. Here we use the recently produced USGS ShakeMap Atlas of historic earthquakes to develop an empirical landslide probability model. We focus on recent events, yet include any digitally-mapped landslide inventories for which well-constrained ShakeMaps are also available. We combine these uniform estimates of the input shaking (e.g., peak acceleration and velocity) with broadly available susceptibility proxies, such as topographic slope and surface geology. The resulting database is used to build a predictive model of the probability of landslide occurrence with logistic regression. The landslide database includes observations from the Northridge, California (1994); Wenchuan, China (2008); ChiChi, Taiwan (1999); and Chuetsu, Japan (2004) earthquakes; we also provide ShakeMaps for moderate-sized events without landslide for proper model testing and training. The performance of the regression model is assessed with both statistical goodness-of-fit metrics and a qualitative review of whether or not the model is able to capture the spatial extent of landslides for each event. Part of our goal is to determine which variables can be employed based on globally-available data or proxies, and whether or not modeling results from one region are transferrable to
Szekér, Szabolcs; Vathy-Fogarassy, Ágnes
2018-01-01
Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.
Analysis of extreme drinking in patients with alcohol dependence using Pareto regression.
Das, Sourish; Harel, Ofer; Dey, Dipak K; Covault, Jonathan; Kranzler, Henry R
2010-05-20
We developed a novel Pareto regression model with an unknown shape parameter to analyze extreme drinking in patients with Alcohol Dependence (AD). We used the generalized linear model (GLM) framework and the log-link to include the covariate information through the scale parameter of the generalized Pareto distribution. We proposed a Bayesian method based on Ridge prior and Zellner's g-prior for the regression coefficients. Simulation study indicated that the proposed Bayesian method performs better than the existing likelihood-based inference for the Pareto regression.We examined two issues of importance in the study of AD. First, we tested whether a single nucleotide polymorphism within GABRA2 gene, which encodes a subunit of the GABA(A) receptor, and that has been associated with AD, influences 'extreme' alcohol intake and second, the efficacy of three psychotherapies for alcoholism in treating extreme drinking behavior. We found an association between extreme drinking behavior and GABRA2. We also found that, at baseline, men with a high-risk GABRA2 allele had a significantly higher probability of extreme drinking than men with no high-risk allele. However, men with a high-risk allele responded to the therapy better than those with two copies of the low-risk allele. Women with high-risk alleles also responded to the therapy better than those with two copies of the low-risk allele, while women who received the cognitive behavioral therapy had better outcomes than those receiving either of the other two therapies. Among men, motivational enhancement therapy was the best for the treatment of the extreme drinking behavior. Copyright 2010 John Wiley & Sons, Ltd.
Regression analysis of mixed panel count data with dependent terminal events.
Yu, Guanglei; Zhu, Liang; Li, Yang; Sun, Jianguo; Robison, Leslie L
2017-05-10
Event history studies are commonly conducted in many fields, and a great deal of literature has been established for the analysis of the two types of data commonly arising from these studies: recurrent event data and panel count data. The former arises if all study subjects are followed continuously, while the latter means that each study subject is observed only at discrete time points. In reality, a third type of data, a mixture of the two types of the data earlier, may occur and furthermore, as with the first two types of the data, there may exist a dependent terminal event, which may preclude the occurrences of recurrent events of interest. This paper discusses regression analysis of mixed recurrent event and panel count data in the presence of a terminal event and an estimating equation-based approach is proposed for estimation of regression parameters of interest. In addition, the asymptotic properties of the proposed estimator are established, and a simulation study conducted to assess the finite-sample performance of the proposed method suggests that it works well in practical situations. Finally, the methodology is applied to a childhood cancer study that motivated this study. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Chilenski, M. A.; Greenwald, M. J.; Hubbard, A. E.; Hughes, J. W.; Lee, J. P.; Marzouk, Y. M.; Rice, J. E.; White, A. E.
2017-12-01
It remains an open question to explain the dramatic change in intrinsic rotation induced by slight changes in electron density (White et al 2013 Phys. Plasmas 20 056106). One proposed explanation is that momentum transport is sensitive to the second derivatives of the temperature and density profiles (Lee et al 2015 Plasma Phys. Control. Fusion 57 125006), but it is widely considered to be impossible to measure these higher derivatives. In this paper, we show that it is possible to estimate second derivatives of electron density and temperature using a nonparametric regression technique known as Gaussian process regression. This technique avoids over-constraining the fit by not assuming an explicit functional form for the fitted curve. The uncertainties, obtained rigorously using Markov chain Monte Carlo sampling, are small enough that it is reasonable to explore hypotheses which depend on second derivatives. It is found that the differences in the second derivatives of n{e} and T{e} between the peaked and hollow rotation cases are rather small, suggesting that changes in the second derivatives are not likely to explain the experimental results.
Luo, Lei; Yang, Jian; Qian, Jianjun; Tai, Ying; Lu, Gui-Fu
2017-09-01
Dealing with partial occlusion or illumination is one of the most challenging problems in image representation and classification. In this problem, the characterization of the representation error plays a crucial role. In most current approaches, the error matrix needs to be stretched into a vector and each element is assumed to be independently corrupted. This ignores the dependence between the elements of error. In this paper, it is assumed that the error image caused by partial occlusion or illumination changes is a random matrix variate and follows the extended matrix variate power exponential distribution. This has the heavy tailed regions and can be used to describe a matrix pattern of l×m dimensional observations that are not independent. This paper reveals the essence of the proposed distribution: it actually alleviates the correlations between pixels in an error matrix E and makes E approximately Gaussian. On the basis of this distribution, we derive a Schatten p -norm-based matrix regression model with L q regularization. Alternating direction method of multipliers is applied to solve this model. To get a closed-form solution in each step of the algorithm, two singular value function thresholding operators are introduced. In addition, the extended Schatten p -norm is utilized to characterize the distance between the test samples and classes in the design of the classifier. Extensive experimental results for image reconstruction and classification with structural noise demonstrate that the proposed algorithm works much more robustly than some existing regression-based methods.
Statistical Dependence of Pipe Breaks on Explanatory Variables
Directory of Open Access Journals (Sweden)
Patricia Gómez-Martínez
2017-02-01
Full Text Available Aging infrastructure is the main challenge currently faced by water suppliers. Estimation of assets lifetime requires reliable criteria to plan assets repair and renewal strategies. To do so, pipe break prediction is one of the most important inputs. This paper analyzes the statistical dependence of pipe breaks on explanatory variables, determining their optimal combination and quantifying their influence on failure prediction accuracy. A large set of registered data from Madrid water supply network, managed by Canal de Isabel II, has been filtered, classified and studied. Several statistical Bayesian models have been built and validated from the available information with a technique that combines reference periods of time as well as geographical location. Statistical models of increasing complexity are built from zero up to five explanatory variables following two approaches: a set of independent variables or a combination of two joint variables plus an additional number of independent variables. With the aim of finding the variable combination that provides the most accurate prediction, models are compared following an objective validation procedure based on the model skill to predict the number of pipe breaks in a large set of geographical locations. As expected, model performance improves as the number of explanatory variables increases. However, the rate of improvement is not constant. Performance metrics improve significantly up to three variables, but the tendency is softened for higher order models, especially in trunk mains where performance is reduced. Slight differences are found between trunk mains and distribution lines when selecting the most influent variables and models.
Wesołowska, Karolina; Elovainio, Marko; Hintsa, Taina; Jokela, Markus; Pulkki-Råback, Laura; Pitkänen, Niina; Lipsanen, Jari; Tukiainen, Janne; Lyytikäinen, Leo-Pekka; Lehtimäki, Terho; Juonala, Markus; Raitakari, Olli; Keltikangas-Järvinen, Liisa
2017-12-01
Type 2 diabetes (T2D) has been associated with depressive symptoms, but the causal direction of this association and the underlying mechanisms, such as increased glucose levels, remain unclear. We used instrumental-variable regression with a genetic instrument (Mendelian randomization) to examine a causal role of increased glucose concentrations in the development of depressive symptoms. Data were from the population-based Cardiovascular Risk in Young Finns Study (n = 1217). Depressive symptoms were assessed in 2012 using a modified Beck Depression Inventory (BDI-I). Fasting glucose was measured concurrently with depressive symptoms. A genetic risk score for fasting glucose (with 35 single nucleotide polymorphisms) was used as an instrumental variable for glucose. Glucose was not associated with depressive symptoms in the standard linear regression (B = -0.04, 95% CI [-0.12, 0.04], p = .34), but the instrumental-variable regression showed an inverse association between glucose and depressive symptoms (B = -0.43, 95% CI [-0.79, -0.07], p = .020). The difference between the estimates of standard linear regression and instrumental-variable regression was significant (p = .026) CONCLUSION: Our results suggest that the association between T2D and depressive symptoms is unlikely to be caused by increased glucose concentrations. It seems possible that T2D might be linked to depressive symptoms due to low glucose levels.
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Variable selection methods in PLS regression - a comparison study on metabolomics data
DEFF Research Database (Denmark)
Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach
. The aim of the metabolomics study was to investigate the metabolic profile in pigs fed various cereal fractions with special attention to the metabolism of lignans using LC-MS based metabolomic approach. References 1. Lê Cao KA, Rossouw D, Robert-Granié C, Besse P: A Sparse PLS for Variable Selection when...... integrated approach. Due to the high number of variables in data sets (both raw data and after peak picking) the selection of important variables in an explorative analysis is difficult, especially when different data sets of metabolomics data need to be related. Variable selection (or removal of irrelevant...... different strategies for variable selection on PLSR method were considered and compared with respect to selected subset of variables and the possibility for biological validation. Sparse PLSR [1] as well as PLSR with Jack-knifing [2] was applied to data in order to achieve variable selection prior...
Christiansen, Bo
2015-04-01
Linear regression methods are without doubt the most used approaches to describe and predict data in the physical sciences. They are often good first order approximations and they are in general easier to apply and interpret than more advanced methods. However, even the properties of univariate regression can lead to debate over the appropriateness of various models as witnessed by the recent discussion about climate reconstruction methods. Before linear regression is applied important choices have to be made regarding the origins of the noise terms and regarding which of the two variables under consideration that should be treated as the independent variable. These decisions are often not easy to make but they may have a considerable impact on the results. We seek to give a unified probabilistic - Bayesian with flat priors - treatment of univariate linear regression and prediction by taking, as starting point, the general errors-in-variables model (Christiansen, J. Clim., 27, 2014-2031, 2014). Other versions of linear regression can be obtained as limits of this model. We derive the likelihood of the model parameters and predictands of the general errors-in-variables model by marginalizing over the nuisance parameters. The resulting likelihood is relatively simple and easy to analyze and calculate. The well known unidentifiability of the errors-in-variables model is manifested as the absence of a well-defined maximum in the likelihood. However, this does not mean that probabilistic inference can not be made; the marginal likelihoods of model parameters and the predictands have, in general, well-defined maxima. We also include a probabilistic version of classical calibration and show how it is related to the errors-in-variables model. The results are illustrated by an example from the coupling between the lower stratosphere and the troposphere in the Northern Hemisphere winter.
Directory of Open Access Journals (Sweden)
Hardt Jochen
2012-12-01
Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.
Choi, Kilchan
2011-01-01
This report explores a new latent variable regression 4-level hierarchical model for monitoring school performance over time using multisite multiple-cohorts longitudinal data. This kind of data set has a 4-level hierarchical structure: time-series observation nested within students who are nested within different cohorts of students. These…
Barnwell-Ménard, Jean-Louis; Li, Qing; Cohen, Alan A
2015-03-15
The loss of signal associated with categorizing a continuous variable is well known, and previous studies have demonstrated that this can lead to an inflation of Type-I error when the categorized variable is a confounder in a regression analysis estimating the effect of an exposure on an outcome. However, it is not known how the Type-I error may vary under different circumstances, including logistic versus linear regression, different distributions of the confounder, and different categorization methods. Here, we analytically quantified the effect of categorization and then performed a series of 9600 Monte Carlo simulations to estimate the Type-I error inflation associated with categorization of a confounder under different regression scenarios. We show that Type-I error is unacceptably high (>10% in most scenarios and often 100%). The only exception was when the variable categorized was a continuous mixture proxy for a genuinely dichotomous latent variable, where both the continuous proxy and the categorized variable are error-ridden proxies for the dichotomous latent variable. As expected, error inflation was also higher with larger sample size, fewer categories, and stronger associations between the confounder and the exposure or outcome. We provide online tools that can help researchers estimate the potential error inflation and understand how serious a problem this is. Copyright © 2014 John Wiley & Sons, Ltd.
Semiparametric regression analysis of failure time data with dependent interval censoring.
Chen, Chyong-Mei; Shen, Pao-Sheng
2017-09-20
Interval-censored failure-time data arise when subjects are examined or observed periodically such that the failure time of interest is not examined exactly but only known to be bracketed between two adjacent observation times. The commonly used approaches assume that the examination times and the failure time are independent or conditionally independent given covariates. In many practical applications, patients who are already in poor health or have a weak immune system before treatment usually tend to visit physicians more often after treatment than those with better health or immune system. In this situation, the visiting rate is positively correlated with the risk of failure due to the health status, which results in dependent interval-censored data. While some measurable factors affecting health status such as age, gender, and physical symptom can be included in the covariates, some health-related latent variables cannot be observed or measured. To deal with dependent interval censoring involving unobserved latent variable, we characterize the visiting/examination process as recurrent event process and propose a joint frailty model to account for the association of the failure time and visiting process. A shared gamma frailty is incorporated into the Cox model and proportional intensity model for the failure time and visiting process, respectively, in a multiplicative way. We propose a semiparametric maximum likelihood approach for estimating model parameters and show the asymptotic properties, including consistency and weak convergence. Extensive simulation studies are conducted and a data set of bladder cancer is analyzed for illustrative purposes. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection
DEFF Research Database (Denmark)
Karaman, Ibrahim; Qannari, El Mostafa; Martens, Harald
2013-01-01
The objective of this study was to compare two different techniques of variable selection, Sparse PLSR and Jack-knife PLSR, with respect to their predictive ability and their ability to identify relevant variables. Sparse PLSR is a method that is frequently used in genomics, whereas Jack-knife PL...
The discovery of timescale-dependent color variability of quasars
Energy Technology Data Exchange (ETDEWEB)
Sun, Yu-Han; Wang, Jun-Xian; Chen, Xiao-Yang [CAS Key Laboratory for Research in Galaxies and Cosmology, Department of Astronomy, University of Science and Technology of China, Hefei, Anhui 230026 (China); Zheng, Zhen-Ya, E-mail: sunyh92@mail.ustc.edu.cn, E-mail: jxw@ustc.edu.cn [School of Earth and Space Exploration, Arizona State University, Tempe, AZ 85287 (United States)
2014-09-01
Quasars are variable on timescales from days to years in UV/optical and generally appear bluer while they brighten. The physics behind the variations in fluxes and colors remains unclear. Using Sloan Digital Sky Survey g- and r-band photometric monitoring data for quasars in Stripe 82, we find that although the flux variation amplitude increases with timescale, the color variability exhibits the opposite behavior. The color variability of quasars is prominent at timescales as short as ∼10 days, but gradually reduces toward timescales up to years. In other words, the variable emission at shorter timescales is bluer than that at longer timescales. This timescale dependence is clearly and consistently detected at all redshifts from z = 0 to 3.5; thus, it cannot be due to contamination to broadband photometry from emission lines that do not respond to fast continuum variations. The discovery directly rules out the possibility that simply attributes the color variability to contamination from a non-variable redder component such as the host galaxy. It cannot be interpreted as changes in global accretion rate either. The thermal accretion disk fluctuation model is favored in the sense that fluctuations in the inner, hotter region of the disk are responsible for short-term variations, while longer-term and stronger variations are expected from the larger and cooler disk region. An interesting implication is that one can use quasar variations at different timescales to probe disk emission at different radii.
International Nuclear Information System (INIS)
Heinzel, F.; Mueller-Duysing, W.; Blattman, H.; Bacesa, L.; Rao, K.R.; Mindek, G.
In order to be able to test the therapeutic value of the pions in comparison with conventional X-rays, analyses of animal experiments with induced tumors, transplantation tumors, and comparative cellular kinetic studies of tissue cultures will be performed. So that differences in radiation effect and a possible superiority of the pion therapy be objectively acknowledged, the reaction systems to be tested must be as homogenous as possible. For this purpose, the dependence of the radiation related regression on various parameters such as sex, age of hosts, environmental factors radiation conditions (intensity, fractionation, and so on), tumor size, and so on, must be investigated on sterile animals in a sterile environment. The experiments should be conducted under conditions as close as possible to clinical ones. For comparison, the reaction of normal tissue (in vitro and in vivo) and of malignant cells in short-time tissue cultures will be analysed. Cellular kinetics, alteration of chromosomes and metabolic activity of the cells will be studied
A Diagrammatic Exposition of Regression and Instrumental Variables for the Beginning Student
Foster, Gigi
2009-01-01
Some beginning students of statistics and econometrics have difficulty with traditional algebraic approaches to explaining regression and related techniques. For these students, a simple and intuitive diagrammatic introduction as advocated by Kennedy (2008) may prove a useful framework to support further study. The author presents a series of…
Weighted linear regression using D2H and D2 as the independent variables
Hans T. Schreuder; Michael S. Williams
1998-01-01
Several error structures for weighted regression equations used for predicting volume were examined for 2 large data sets of felled and standing loblolly pine trees (Pinus taeda L.). The generally accepted model with variance of error proportional to the value of the covariate squared ( D2H = diameter squared times height or D...
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Quadratic time dependent Hamiltonians and separation of variables
International Nuclear Information System (INIS)
Anzaldo-Meneses, A.
2017-01-01
Time dependent quantum problems defined by quadratic Hamiltonians are solved using canonical transformations. The Green’s function is obtained and a comparison with the classical Hamilton–Jacobi method leads to important geometrical insights like exterior differential systems, Monge cones and time dependent Gaussian metrics. The Wei–Norman approach is applied using unitary transformations defined in terms of generators of the associated Lie groups, here the semi-direct product of the Heisenberg group and the symplectic group. A new explicit relation for the unitary transformations is given in terms of a finite product of elementary transformations. The sequential application of adequate sets of unitary transformations leads naturally to a new separation of variables method for time dependent Hamiltonians, which is shown to be related to the Inönü–Wigner contraction of Lie groups. The new method allows also a better understanding of interacting particles or coupled modes and opens an alternative way to analyze topological phases in driven systems. - Highlights: • Exact unitary transformation reducing time dependent quadratic quantum Hamiltonian to zero. • New separation of variables method and simultaneous uncoupling of modes. • Explicit examples of transformations for one to four dimensional problems. • New general evolution equation for quadratic form in the action, respectively Green’s function.
Madonna, Erica; Ginsbourger, David; Martius, Olivia
2018-05-01
In Switzerland, hail regularly causes substantial damage to agriculture, cars and infrastructure, however, little is known about its long-term variability. To study the variability, the monthly number of days with hail in northern Switzerland is modeled in a regression framework using large-scale predictors derived from ERA-Interim reanalysis. The model is developed and verified using radar-based hail observations for the extended summer season (April-September) in the period 2002-2014. The seasonality of hail is explicitly modeled with a categorical predictor (month) and monthly anomalies of several large-scale predictors are used to capture the year-to-year variability. Several regression models are applied and their performance tested with respect to standard scores and cross-validation. The chosen model includes four predictors: the monthly anomaly of the two meter temperature, the monthly anomaly of the logarithm of the convective available potential energy (CAPE), the monthly anomaly of the wind shear and the month. This model well captures the intra-annual variability and slightly underestimates its inter-annual variability. The regression model is applied to the reanalysis data back in time to 1980. The resulting hail day time series shows an increase of the number of hail days per month, which is (in the model) related to an increase in temperature and CAPE. The trend corresponds to approximately 0.5 days per month per decade. The results of the regression model have been compared to two independent data sets. All data sets agree on the sign of the trend, but the trend is weaker in the other data sets.
Quadratic time dependent Hamiltonians and separation of variables
Anzaldo-Meneses, A.
2017-06-01
Time dependent quantum problems defined by quadratic Hamiltonians are solved using canonical transformations. The Green's function is obtained and a comparison with the classical Hamilton-Jacobi method leads to important geometrical insights like exterior differential systems, Monge cones and time dependent Gaussian metrics. The Wei-Norman approach is applied using unitary transformations defined in terms of generators of the associated Lie groups, here the semi-direct product of the Heisenberg group and the symplectic group. A new explicit relation for the unitary transformations is given in terms of a finite product of elementary transformations. The sequential application of adequate sets of unitary transformations leads naturally to a new separation of variables method for time dependent Hamiltonians, which is shown to be related to the Inönü-Wigner contraction of Lie groups. The new method allows also a better understanding of interacting particles or coupled modes and opens an alternative way to analyze topological phases in driven systems.
DEFF Research Database (Denmark)
Fitzenberger, Bernd; Wilke, Ralf Andreas
2015-01-01
if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...
DEFF Research Database (Denmark)
Christensen, E; Altman, D G; Neuberger, J
1993-01-01
BACKGROUND: The precision of current prognostic models in primary biliary cirrhosis (PBC) is rather low, partly because they are based on data from just one time during the course of the disease. The aim of this study was to design a new, more precise prognostic model by incorporating follow......-up data in the development of the model. METHODS: We have performed Cox regression analyses with time-dependent variables in 237 PBC patients followed up regularly for up to 11 years. The validity of the obtained models was tested by comparing predicted and observed survival in 147 independent PBC...... patients followed for up to 6 years. RESULTS: In the obtained model the following time-dependent variables independently indicated a poor prognosis: high bilirubin, low albumin, ascites, gastrointestinal bleeding, and old age. When including histological variables, cirrhosis, central cholestasis, and low...
Multivariate linear regression of high-dimensional fMRI data with multiple target variables.
Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia
2014-05-01
Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets. Copyright © 2013 Wiley Periodicals, Inc.
Optimal Inference for Instrumental Variables Regression with non-Gaussian Errors
DEFF Research Database (Denmark)
Cattaneo, Matias D.; Crump, Richard K.; Jansson, Michael
This paper is concerned with inference on the coefficient on the endogenous regressor in a linear instrumental variables model with a single endogenous regressor, nonrandom exogenous regressors and instruments, and i.i.d. errors whose distribution is unknown. It is shown that under mild smoothness...
Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried
2018-03-01
This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease ( P 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.
Oguntunde, Philip G; Lischeid, Gunnar; Dietrich, Ottfried
2018-03-01
This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease (P 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.
Regression models for categorical, count, and related variables an applied approach
Hoffmann, John P
2016-01-01
Social science and behavioral science students and researchers are often confronted with data that are categorical, count a phenomenon, or have been collected over time. Sociologists examining the likelihood of interracial marriage, political scientists studying voting behavior, criminologists counting the number of offenses people commit, health scientists studying the number of suicides across neighborhoods, and psychologists modeling mental health treatment success are all interested in outcomes that are not continuous. Instead, they must measure and analyze these events and phenomena in a discrete manner. This book provides an introduction and overview of several statistical models designed for these types of outcomes--all presented with the assumption that the reader has only a good working knowledge of elementary algebra and has taken introductory statistics and linear regression analysis. Numerous examples from the social sciences demonstrate the practical applications of these models. The chapte...
Pralle, R S; Weigel, K W; White, H M
2018-05-01
Prediction of postpartum hyperketonemia (HYK) using Fourier transform infrared (FTIR) spectrometry analysis could be a practical diagnostic option for farms because these data are now available from routine milk analysis during Dairy Herd Improvement testing. The objectives of this study were to (1) develop and evaluate blood β-hydroxybutyrate (BHB) prediction models using multivariate linear regression (MLR), partial least squares regression (PLS), and artificial neural network (ANN) methods and (2) evaluate whether milk FTIR spectrum (mFTIR)-based models are improved with the inclusion of test-day variables (mTest; milk composition and producer-reported data). Paired blood and milk samples were collected from multiparous cows 5 to 18 d postpartum at 3 Wisconsin farms (3,629 observations from 1,013 cows). Blood BHB concentration was determined by a Precision Xtra meter (Abbot Diabetes Care, Alameda, CA), and milk samples were analyzed by a privately owned laboratory (AgSource, Menomonie, WI) for components and FTIR spectrum absorbance. Producer-recorded variables were extracted from farm management software. A blood BHB ≥1.2 mmol/L was considered HYK. The data set was divided into a training set (n = 3,020) and an external testing set (n = 609). Model fitting was implemented with JMP 12 (SAS Institute, Cary, NC). A 5-fold cross-validation was performed on the training data set for the MLR, PLS, and ANN prediction methods, with square root of blood BHB as the dependent variable. Each method was fitted using 3 combinations of variables: mFTIR, mTest, or mTest + mFTIR variables. Models were evaluated based on coefficient of determination, root mean squared error, and area under the receiver operating characteristic curve. Four models (PLS-mTest + mFTIR, ANN-mFTIR, ANN-mTest, and ANN-mTest + mFTIR) were chosen for further evaluation in the testing set after fitting to the full training set. In the cross-validation analysis, model fit was greatest for ANN, followed
Directory of Open Access Journals (Sweden)
Mok Tik
2014-06-01
Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.
Exercise training improves heart rate variability after methamphetamine dependency.
Dolezal, Brett Andrew; Chudzynski, Joy; Dickerson, Daniel; Mooney, Larissa; Rawson, Richard A; Garfinkel, Alan; Cooper, Christopher B
2014-06-01
Heart rate variability (HRV) reflects a healthy autonomic nervous system and is increased with physical training. Methamphetamine dependence (MD) causes autonomic dysfunction and diminished HRV. We compared recently abstinent methamphetamine-dependent participants with age-matched, drug-free controls (DF) and also investigated whether HRV can be improved with exercise training in the methamphetamine-dependent participants. In 50 participants (MD = 28; DF = 22), resting heart rate (HR; R-R intervals) was recorded over 5 min while seated using a monitor affixed to a chest strap. Previously reported time domain (SDNN, RMSSD, pNN50) and frequency domain (LFnu, HFnu, LF/HF) parameters of HRV were calculated with customized software. MD were randomized to thrice-weekly exercise training (ME = 14) or equal attention without training (MC = 14) over 8 wk. Groups were compared using paired and unpaired t-tests. Statistical significance was set at P ≤ 0.05. Participant characteristics were matched between groups (mean ± SD): age = 33 ± 6 yr; body mass = 82.7 ± 12 kg, body mass index = 26.8 ± 4.1 kg·min. Compared with DF, the MD group had significantly higher resting HR (P HRV indices were similar between ME and MC groups. However, after training, the ME group significantly (all P HRV, based on several conventional indices, was diminished in recently abstinent, methamphetamine-dependent individuals. Moreover, physical training yielded a marked increase in HRV, representing increased vagal modulation or improved autonomic balance.
Relationship between the curve of Spee and craniofacial variables: A regression analysis.
Halimi, Abdelali; Benyahia, Hicham; Azeroual, Mohamed-Faouzi; Bahije, Loubna; Zaoui, Fatima
2018-06-01
The aim of this regression analysis was to identify the determining factors, which impact the curve of Spee during its genesis, its therapeutic reconstruction, and its stability, within a continuously evolving craniofacial morphology throughout life. We selected a total of 107 patients, according to the inclusion criteria. A morphological and functional clinical examination was performed for each patient: plaster models, tracing of the curve of Spee, crowding, Angle's classification, overjet and overbite were thus recorded. Then, we made a cephalometric analysis based on the standardized lateral cephalograms. In the sagittal dimension, we measured the values of angles ANB, SNA, SNB, SND, I/i; and the following distances: AoBo, I/NA, i/NB, SE and SL. In the vertical dimension, we measured the values of angles FMA, GoGn/SN, the occlusal plane, and the following distances: SAr, ArD, Ar/Con, Con/Gn, GoPo, HFP, HFA and IF. The statistical analysis was performed using the SPSS software with a significance level of 0.05. Our sample including 107 subjects was composed of 77 female patients (71.3%) and 30 male patients (27.8%) 7 hypodivergent patients (6.5%), 56 hyperdivergent patients (52.3%) and 44 normodivergent patients (41.1%). Patients' mean age was 19.35±5.95 years. The hypodivergent patients presented more pronounced curves of Spee compared to the normodivergent and the hyperdivergent populations; patients in skeletal Class I presented less pronounced curves of Spee compared to patients in skeletal Class II and Class III. These differences were non significant (P>0.05). The curve of Spee was positively and moderately correlated with Angle's classification, overjet, overbite, sellion-articulare distance, and breathing type (P0.05). Seventy five percent (75%) of the hyperdivergent patients with an oral breathing presented an overbite of 3mm, which is quite excessive given the characteristics often admitted for this typology; this parameter could explain the overbite
Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales
Czech Academy of Sciences Publication Activity Database
Krištoufek, Ladislav
2015-01-01
Roč. 91, č. 1 (2015), 022802-1-022802-5 ISSN 1539-3755 R&D Projects: GA ČR(CZ) GP14-11402P Grant - others:GA ČR(CZ) GAP402/11/0948 Program:GA Institutional support: RVO:67985556 Keywords : Detrended cross-correlation analysis * Regression * Scales Subject RIV: AH - Economics Impact factor: 2.288, year: 2014 http://library.utia.cas.cz/separaty/2015/E/kristoufek-0452315.pdf
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Takami, Yoshiyuki; Tajima, Kazuyoshi
2015-07-01
In hemodialysis (HD)-dependent patients, secondary hyperparathyroidism induces cardiac hypertrophy. This study investigated whether parathyroid hormone (PTH) levels affect the degree of left ventricular (LV) mass regression in HD patients after aortic valve replacement (AVR) for aortic stenosis (AS). We retrospectively obtained preoperative and 2-year postoperative echocardiography and intact PTH measurements in 88 HD patients who underwent AVR, with bioprostheses (n = 35, 40%) and mechanical valves (n = 53, 60%) of effective orifice area >0.80 cm2/m2, between January 1997 and December 2010. The LV mass decreased significantly from 308 ± 88 to 217 ± 68 g at follow-up of 28 ± 4 months after AVR (p regression at follow-up was inversely related to preoperative PTH values (R = 0.44, p = 0.001). The LV mass regression at follow-up was significantly smaller in the patients (n = 47) with PTH ≥100 pg/mL than in those (n = 41) with PTH regression at 2-year follow-up (β = 0.23, r2 = 0.24, p = 0.02). In conclusion, the HD patients with high levels of PTH presented with less LV mass regression after AVR for AS without patient-prosthesis mismatch. Secondary hyperparathyroidism may impair regression of cardiac hypertrophy after AVR in HD patients with AS.
van der Zijden, A M; Groen, B E; Tanck, E; Nienhuis, B; Verdonschot, N; Weerdesteyn, V
2017-03-21
Many research groups have studied fall impact mechanics to understand how fall severity can be reduced to prevent hip fractures. Yet, direct impact force measurements with force plates are restricted to a very limited repertoire of experimental falls. The purpose of this study was to develop a generic model for estimating hip impact forces (i.e. fall severity) in in vivo sideways falls without the use of force plates. Twelve experienced judokas performed sideways Martial Arts (MA) and Block ('natural') falls on a force plate, both with and without a mat on top. Data were analyzed to determine the hip impact force and to derive 11 selected (subject-specific and kinematic) variables. Falls from kneeling height were used to perform a stepwise regression procedure to assess the effects of these input variables and build the model. The final model includes four input variables, involving one subject-specific measure and three kinematic variables: maximum upper body deceleration, body mass, shoulder angle at the instant of 'maximum impact' and maximum hip deceleration. The results showed that estimated and measured hip impact forces were linearly related (explained variances ranging from 46 to 63%). Hip impact forces of MA falls onto the mat from a standing position (3650±916N) estimated by the final model were comparable with measured values (3698±689N), even though these data were not used for training the model. In conclusion, a generic linear regression model was developed that enables the assessment of fall severity through kinematic measures of sideways falls, without using force plates. Copyright © 2017 Elsevier Ltd. All rights reserved.
Austin, Peter C; Steyerberg, Ewout W
2012-06-20
When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.
Channel-dependent GMM and multi-class logistic: Regression models for language recognition
Leeuwen, D.A. van; Brümmer, Niko
2006-01-01
This paper describes two new approaches to spoken language recognition. These were both successfully applied in the NIST 2005 Language Recognition Evaluation. The first approach extends the Gaussian Mixture Model technique with channel dependency, which results in actual detection costs (CDET) of
International Nuclear Information System (INIS)
Leng Ling; Zhang Tianyi; Kleinman, Lawrence; Zhu Wei
2007-01-01
Regression analysis, especially the ordinary least squares method which assumes that errors are confined to the dependent variable, has seen a fair share of its applications in aerosol science. The ordinary least squares approach, however, could be problematic due to the fact that atmospheric data often does not lend itself to calling one variable independent and the other dependent. Errors often exist for both measurements. In this work, we examine two regression approaches available to accommodate this situation. They are orthogonal regression and geometric mean regression. Comparisons are made theoretically as well as numerically through an aerosol study examining whether the ratio of organic aerosol to CO would change with age
Building the nodal nuclear data dependences in a many-dimensional state-variable space
International Nuclear Information System (INIS)
Dufek, Jan
2011-01-01
Highlights: → The Abstract and Introduction are revised to reflect reviewers' comments. → Section is revised and simplified. → The third paragraph in Section is revised. → All typos are fixed. - Abstract: We present new methods for building the polynomial-regression based nodal nuclear data models. The data models can reflect dependences on a large number of state variables, and they can consider various history effects. Suitable multivariate polynomials that approximate the nodal data dependences are identified efficiently in an iterative manner. The history effects are analysed using a new sampling scheme for lattice calculations where the traditional base burnup and branch calculations are replaced by a large number of diverse burnup histories. The total number of lattice calculations is controlled so that the data models are built to a required accuracy.
Directory of Open Access Journals (Sweden)
Baxter Lisa K
2008-05-01
Full Text Available Abstract Background There is a growing body of literature linking GIS-based measures of traffic density to asthma and other respiratory outcomes. However, no consensus exists on which traffic indicators best capture variability in different pollutants or within different settings. As part of a study on childhood asthma etiology, we examined variability in outdoor concentrations of multiple traffic-related air pollutants within urban communities, using a range of GIS-based predictors and land use regression techniques. Methods We measured fine particulate matter (PM2.5, nitrogen dioxide (NO2, and elemental carbon (EC outside 44 homes representing a range of traffic densities and neighborhoods across Boston, Massachusetts and nearby communities. Multiple three to four-day average samples were collected at each home during winters and summers from 2003 to 2005. Traffic indicators were derived using Massachusetts Highway Department data and direct traffic counts. Multivariate regression analyses were performed separately for each pollutant, using traffic indicators, land use, meteorology, site characteristics, and central site concentrations. Results PM2.5 was strongly associated with the central site monitor (R2 = 0.68. Additional variability was explained by total roadway length within 100 m of the home, smoking or grilling near the monitor, and block-group population density (R2 = 0.76. EC showed greater spatial variability, especially during winter months, and was predicted by roadway length within 200 m of the home. The influence of traffic was greater under low wind speed conditions, and concentrations were lower during summer (R2 = 0.52. NO2 showed significant spatial variability, predicted by population density and roadway length within 50 m of the home, modified by site characteristics (obstruction, and with higher concentrations during summer (R2 = 0.56. Conclusion Each pollutant examined displayed somewhat different spatial patterns
Human phoneme recognition depending on speech-intrinsic variability.
Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger
2010-11-01
The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
Multiple linear regression analysis
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Bounded Gaussian process regression
DEFF Research Database (Denmark)
Jensen, Bjørn Sand; Nielsen, Jens Brehm; Larsen, Jan
2013-01-01
We extend the Gaussian process (GP) framework for bounded regression by introducing two bounded likelihood functions that model the noise on the dependent variable explicitly. This is fundamentally different from the implicit noise assumption in the previously suggested warped GP framework. We...... with the proposed explicit noise-model extension....
Hierarchical regression analysis in structural Equation Modeling
de Jong, P.F.
1999-01-01
In a hierarchical or fixed-order regression analysis, the independent variables are entered into the regression equation in a prespecified order. Such an analysis is often performed when the extra amount of variance accounted for in a dependent variable by a specific independent variable is the main
Shen, Chung-Wei; Chen, Yi-Hau
2018-03-13
We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.
International Nuclear Information System (INIS)
Iqbal, Z.M.; Khan, S.A.
2003-01-01
Partial regression coefficient, genotypic and phenotypic variabilities, heritability co-heritability and genetic advance were studied in 15 Potato varieties of exotic and local origin. Both genotypic and phenotypic coefficients of variations were high for scab and rhizoctonia incidence percentage. Significant partial regression coefficient for emergence percentage indicated its relative importance in tuber yield. High heritability (broadsense) estimates coupled with high genetic advance for plant height, number of stems per plant and scab percentage revealed substantial contribution of additive genetic variance in the expression of these traits. Hence, the selection based on these characters could play a significant role in their improvement the dominance and epistatic variance was more important for character expression of yield ha/sup -1/, emergence and rhizoctonia percentage. This phenomenon is mainly due to the accumulative effects of low heritability and low to moderate genetic advance. The high co-heritability coupled with negative genotypic and phenotypic covariance revealed that selection of varieties having low scab and rhizoctonia percentage resulted in more potato yield. (author)
Zhang, Qingyu; Liu, Jie; Liu, Bin; Xia, Juan; Chen, Nianping; Chen, Xiaofeng; Cao, Yi; Zhang, Chen; Lu, Caijie; Li, Mingyi; Zhu, Runzhi
2014-04-01
The development of antitumor chemotherapy drugs remains a key goal for oncologists, and natural products provide a vast resource for anti-cancer drug discovery. In the current study, we found that the flavonoid dihydromyricetin (DHM) exhibited antitumor activity against liver cancer cells, including primary cells obtained from hepatocellular carcinoma (HCC) patients. In contrast, DHM was not cytotoxic to immortalized normal liver cells. Furthermore, DHM treatment resulted in the growth inhibition and remission of xenotransplanted tumors in nude mice. Our results further demonstrated that this antitumor activity was caused by the activation of the p53-dependent apoptosis pathway via p53 phosphorylation at serine (15Ser). Moreover, our results showed that DHM plays a dual role in the induction of cell death when administered in combination with cisplatin, a common clinical drug that kills primary hepatoma cells but not normal liver cells.
Hill, Emma M.; Ponte, Rui M.; Davis, James L.
2007-01-01
Comparison of monthly mean tide-gauge time series to corresponding model time series based on a static inverted barometer (IB) for pressure-driven fluctuations and a ocean general circulation model (OM) reveals that the combined model successfully reproduces seasonal and interannual changes in relative sea level at many stations. Removal of the OM and IB from the tide-gauge record produces residual time series with a mean global variance reduction of 53%. The OM is mis-scaled for certain regions, and 68% of the residual time series contain a significant seasonal variability after removal of the OM and IB from the tide-gauge data. Including OM admittance parameters and seasonal coefficients in a regression model for each station, with IB also removed, produces residual time series with mean global variance reduction of 71%. Examination of the regional improvement in variance caused by scaling the OM, including seasonal terms, or both, indicates weakness in the model at predicting sea-level variation for constricted ocean regions. The model is particularly effective at reproducing sea-level variation for stations in North America, Europe, and Japan. The RMS residual for many stations in these areas is 25-35 mm. The production of "cleaner" tide-gauge time series, with oceanographic variability removed, is important for future analysis of nonsecular and regionally differing sea-level variations. Understanding the ocean model's strengths and weaknesses will allow for future improvements of the model.
Einav, Sharon; Alon, Gady; Kaufman, Nechama; Braunstein, Rony; Carmel, Sara; Varon, Joseph; Hersch, Moshe
2012-09-01
To determine whether variables in physicians' backgrounds influenced their decision to forego resuscitating a patient they did not previously know. Questionnaire survey of a convenience sample of 204 physicians working in the departments of internal medicine, anaesthesiology and cardiology in 11 hospitals in Israel. Twenty per cent of the participants had elected to forego resuscitating a patient they did not previously know without additional consultation. Physicians who had more frequently elected to forego resuscitation had practised medicine for more than 5 years (p=0.013), estimated the number of resuscitations they had performed as being higher (p=0.009), and perceived their experience in resuscitation as sufficient (p=0.001). The variable that predicted the outcome of always performing resuscitation in the logistic regression model was less than 5 years of experience in medicine (OR 0.227, 95% CI 0.065 to 0.793; p=0.02). Physicians' level of experience may affect the probability of a patient's receiving resuscitation, whereas the physicians' personal beliefs and values did not seem to affect this outcome.
Mortensen, Eric; Wu, Shu; Notaro, Michael; Vavrus, Stephen; Montgomery, Rob; De Piérola, José; Sánchez, Carlos; Block, Paul
2018-01-01
Located at a complex topographic, climatic, and hydrologic crossroads, southern Peru is a semiarid region that exhibits high spatiotemporal variability in precipitation. The economic viability of the region hinges on this water, yet southern Peru is prone to water scarcity caused by seasonal meteorological drought. Meteorological droughts in this region are often triggered during El Niño episodes; however, other large-scale climate mechanisms also play a noteworthy role in controlling the region's hydrologic cycle. An extensive season-ahead precipitation prediction model is developed to help bolster the existing capacity of stakeholders to plan for and mitigate deleterious impacts of drought. In addition to existing climate indices, large-scale climatic variables, such as sea surface temperature, are investigated to identify potential drought predictors. A principal component regression framework is applied to 11 potential predictors to produce an ensemble forecast of regional January-March precipitation totals. Model hindcasts of 51 years, compared to climatology and another model conditioned solely on an El Niño-Southern Oscillation index, achieve notable skill and perform better for several metrics, including ranked probability skill score and a hit-miss statistic. The information provided by the developed model and ancillary modeling efforts, such as extending the lead time of and spatially disaggregating precipitation predictions to the local level as well as forecasting the number of wet-dry days per rainy season, may further assist regional stakeholders and policymakers in preparing for drought.
Directory of Open Access Journals (Sweden)
Hukharnsusatrue, A.
2005-11-01
Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
St. Martin, Clara M.; Lundquist, Julie K.; Handschy, Mark A.
2015-04-01
The variability in wind-generated electricity complicates the integration of this electricity into the electrical grid. This challenge steepens as the percentage of renewably-generated electricity on the grid grows, but variability can be reduced by exploiting geographic diversity: correlations between wind farms decrease as the separation between wind farms increases. But how far is far enough to reduce variability? Grid management requires balancing production on various timescales, and so consideration of correlations reflective of those timescales can guide the appropriate spatial scales of geographic diversity grid integration. To answer ‘how far is far enough,’ we investigate the universal behavior of geographic diversity by exploring wind-speed correlations using three extensive datasets spanning continents, durations and time resolution. First, one year of five-minute wind power generation data from 29 wind farms span 1270 km across Southeastern Australia (Australian Energy Market Operator). Second, 45 years of hourly 10 m wind-speeds from 117 stations span 5000 km across Canada (National Climate Data Archive of Environment Canada). Finally, four years of five-minute wind-speeds from 14 meteorological towers span 350 km of the Northwestern US (Bonneville Power Administration). After removing diurnal cycles and seasonal trends from all datasets, we investigate dependence of correlation length on time scale by digitally high-pass filtering the data on 0.25-2000 h timescales and calculating correlations between sites for each high-pass filter cut-off. Correlations fall to zero with increasing station separation distance, but the characteristic correlation length varies with the high-pass filter applied: the higher the cut-off frequency, the smaller the station separation required to achieve de-correlation. Remarkable similarities between these three datasets reveal behavior that, if universal, could be particularly useful for grid management. For high
Directory of Open Access Journals (Sweden)
M Taki
2017-05-01
Full Text Available Introduction Controlling greenhouse microclimate not only influences the growth of plants, but also is critical in the spread of diseases inside the greenhouse. The microclimate parameters were inside air, greenhouse roof and soil temperature, relative humidity and solar radiation intensity. Predicting the microclimate conditions inside a greenhouse and enabling the use of automatic control systems are the two main objectives of greenhouse climate model. The microclimate inside a greenhouse can be predicted by conducting experiments or by using simulation. Static and dynamic models are used for this purpose as a function of the metrological conditions and the parameters of the greenhouse components. Some works were done in past to 2015 year to simulation and predict the inside variables in different greenhouse structures. Usually simulation has a lot of problems to predict the inside climate of greenhouse and the error of simulation is higher in literature. The main objective of this paper is comparison between heat transfer and regression models to evaluate them to predict inside air and roof temperature in a semi-solar greenhouse in Tabriz University. Materials and Methods In this study, a semi-solar greenhouse was designed and constructed at the North-West of Iran in Azerbaijan Province (geographical location of 38°10′ N and 46°18′ E with elevation of 1364 m above the sea level. In this research, shape and orientation of the greenhouse, selected between some greenhouses common shapes and according to receive maximum solar radiation whole the year. Also internal thermal screen and cement north wall was used to store and prevent of heat lost during the cold period of year. So we called this structure, ‘semi-solar’ greenhouse. It was covered with glass (4 mm thickness. It occupies a surface of approximately 15.36 m2 and 26.4 m3. The orientation of this greenhouse was East–West and perpendicular to the direction of the wind prevailing
Solute transport modelling with the variable temporally dependent ...
Indian Academy of Sciences (India)
Pintu Das
2018-02-07
Feb 7, 2018 ... in a finite domain with time-dependent sources and dis- tance-dependent dispersivities. Also, existing ... solute transport in multi-layered porous media using gen- eralized integral transform technique with .... methods for solving the fractional reaction-–sub-diffusion equation. To solve numerically the Eqs.
Directory of Open Access Journals (Sweden)
Aarón Salinas-Rodríguez
2006-10-01
the Public Health field. MATERIAL AND METHODS: From the National Reproductive Health Survey performed in 2003, the proportion of individual coverage in the family planning program -proposed in one study carried out in the National Institute of Public Health in Cuernavaca, Morelos, Mexico (2005- was modeled using the Normal, Gamma, Beta and quasi-likelihood regression models. The Akaike Information Criterion (AIC proposed by McQuarrie and Tsai was used to define the best model. Then, using a simulation (Monte Carlo/Markov Chains approach a variable with a Beta distribution was generated to evaluate the behavior of the 4 models while varying the sample size from 100 to 18 000 observations. RESULTS: Results showed that the best statistical option for the analysis of continuous proportions was the Beta regression model, since its assumptions are easily accomplished and because it had the lowest AIC value. Simulation evidenced that while the sample size increases the Gamma, and even more so the quasi-likelihood, models come significantly close to the Beta regression model. CONCLUSIONS: The use of parametric Beta regression is highly recommended to model continuous proportions and the normal model should be avoided. If the sample size is large enough, the use of quasi-likelihood model represents a good alternative.
Yang, Tse-Chuan; Matthews, Stephen A; Chen, Vivian Y-J
2014-04-01
Obesity has become a problem in the USA and identifying modifiable factors at the individual level may help to address this public health concern. A burgeoning literature has suggested that sleep and stress may be associated with obesity; however, little is know about whether these two factors moderate each other and even less is known about whether their impacts on obesity differ by gender. This study investigates whether sleep and stress are associated with body mass index (BMI) respectively, explores whether the combination of stress and sleep is also related to BMI, and demonstrates how these associations vary across the distribution of BMI values. We analyze the data from 3,318 men and 6,689 women in the Philadelphia area using quantile regression (QR) to evaluate the relationships between sleep, stress, and obesity by gender. Our substantive findings include: (1) high and/or extreme stress were related to roughly an increase of 1.2 in BMI after accounting for other covariates; (2) the pathways linking sleep and BMI differed by gender, with BMI for men increasing by 0.77-1 units with reduced sleep duration and BMI for women declining by 0.12 unit with 1 unit increase in sleep quality; (3) stress- and sleep-related variables were confounded, but there was little evidence for moderation between these two; (4) the QR results demonstrate that the association between high and/or extreme stress to BMI varied stochastically across the distribution of BMI values, with an upward trend, suggesting that stress played a more important role among adults with higher BMI (i.e., BMI > 26 for both genders); and (5) the QR plots of sleep-related variables show similar patterns, with stronger effects on BMI at the upper end of BMI distribution. Our findings suggested that sleep and stress were two seemingly independent predictors for BMI and their relationships with BMI were not constant across the BMI distribution.
Kim, Sun Mi; Han, Heon; Park, Jeong Mi; Choi, Yoon Jung; Yoon, Hoi Soo; Sohn, Jung Hee; Baek, Moon Hee; Kim, Yoon Nam; Chae, Young Moon; June, Jeon Jong; Lee, Jiwon; Jeon, Yong Hwan
2012-10-01
To determine which Breast Imaging Reporting and Data System (BI-RADS) descriptors for ultrasound are predictors for breast cancer using logistic regression (LR) analysis in conjunction with interobserver variability between breast radiologists, and to compare the performance of artificial neural network (ANN) and LR models in differentiation of benign and malignant breast masses. Five breast radiologists retrospectively reviewed 140 breast masses and described each lesion using BI-RADS lexicon and categorized final assessments. Interobserver agreements between the observers were measured by kappa statistics. The radiologists' responses for BI-RADS were pooled. The data were divided randomly into train (n = 70) and test sets (n = 70). Using train set, optimal independent variables were determined by using LR analysis with forward stepwise selection. The LR and ANN models were constructed with the optimal independent variables and the biopsy results as dependent variable. Performances of the models and radiologists were evaluated on the test set using receiver-operating characteristic (ROC) analysis. Among BI-RADS descriptors, margin and boundary were determined as the predictors according to stepwise LR showing moderate interobserver agreement. Area under the ROC curves (AUC) for both of LR and ANN were 0.87 (95% CI, 0.77-0.94). AUCs for the five radiologists ranged 0.79-0.91. There was no significant difference in AUC values among the LR, ANN, and radiologists (p > 0.05). Margin and boundary were found as statistically significant predictors with good interobserver agreement. Use of the LR and ANN showed similar performance to that of the radiologists for differentiation of benign and malignant breast masses.
Yet another look at MIDAS regression
Ph.H.B.F. Franses (Philip Hans)
2016-01-01
textabstractA MIDAS regression involves a dependent variable observed at a low frequency and independent variables observed at a higher frequency. This paper relates a true high frequency data generating process, where also the dependent variable is observed (hypothetically) at the high frequency,
International Nuclear Information System (INIS)
Doloff, Joshua C; Waxman, David J
2015-01-01
Cyclophosphamide treatment on a six-day repeating metronomic schedule induces a dramatic, innate immune cell-dependent regression of implanted gliomas. However, little is known about the underlying mechanisms whereby metronomic cyclophosphamide induces innate immune cell mobilization and recruitment, or about the role of DNA damage and cell stress response pathways in eliciting the immune responses linked to tumor regression. Untreated and metronomic cyclophosphamide-treated human U251 glioblastoma xenografts were analyzed on human microarrays at two treatment time points to identify responsive tumor cell-specific factors and their upstream regulators. Mouse microarray analysis across two glioma models (human U251, rat 9L) was used to identify host factors and gene networks that contribute to the observed immune and tumor regression responses. Metronomic cyclophosphamide increased expression of tumor cell-derived DNA damage, cell stress, and cell death genes, which may facilitate innate immune activation. Increased expression of many host (mouse) immune networks was also seen in both tumor models, including complement components, toll-like receptors, interferons, and cytolysis pathways. Key upstream regulators activated by metronomic cyclophosphamide include members of the interferon, toll-like receptor, inflammatory response, and PPAR signaling pathways, whose activation may contribute to anti-tumor immunity. Many upstream regulators inhibited by metronomic cyclophosphamide, including hypoxia-inducible factors and MAP kinases, have glioma-promoting activity; their inhibition may contribute to the therapeutic effectiveness of the six-day repeating metronomic cyclophosphamide schedule. Large numbers of responsive cytokines, chemokines and immune regulatory genes linked to innate immune cell recruitment and tumor regression were identified, as were several immunosuppressive factors that may contribute to the observed escape of some tumors from metronomic CPA
Multicollinearity and Regression Analysis
Daoud, Jamal I.
2017-12-01
In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.
Directory of Open Access Journals (Sweden)
Geoffrey Fouad
2018-06-01
New hydrological insights for the region: A set of three variables selected based on an expert assessment of factors that influence percentile flows performed similarly to larger sets of variables selected using a data-driven method. Expert assessment variables included mean annual precipitation, potential evapotranspiration, and baseflow index. Larger sets of up to 37 variables contributed little, if any, additional predictive information. Variables used to describe the distribution of basin data (e.g. standard deviation were not useful, and average values were sufficient to characterize physical and climatic basin conditions. Effectiveness of the expert assessment variables may be due to the high degree of multicollinearity (i.e. cross-correlation among additional variables. A tool is provided in the Supplementary material to predict percentile flows based on the three expert assessment variables. Future work should develop new variables with a strong understanding of the processes related to percentile flows.
Energy decay of a variable-coefficient wave equation with nonlinear time-dependent localized damping
Directory of Open Access Journals (Sweden)
Jieqiong Wu
2015-09-01
Full Text Available We study the energy decay for the Cauchy problem of the wave equation with nonlinear time-dependent and space-dependent damping. The damping is localized in a bounded domain and near infinity, and the principal part of the wave equation has a variable-coefficient. We apply the multiplier method for variable-coefficient equations, and obtain an energy decay that depends on the property of the coefficient of the damping term.
Hoogerheide, L.F.; Kaashoek, J.F.; van Dijk, H.K.
2007-01-01
Likelihoods and posteriors of instrumental variable (IV) regression models with strong endogeneity and/or weak instruments may exhibit rather non-elliptical contours in the parameter space. This may seriously affect inference based on Bayesian credible sets. When approximating posterior
L.F. Hoogerheide (Lennart); J.F. Kaashoek (Johan); H.K. van Dijk (Herman)
2005-01-01
textabstractLikelihoods and posteriors of instrumental variable regression models with strong endogeneity and/or weak instruments may exhibit rather non-elliptical contours in the parameter space. This may seriously affect inference based on Bayesian credible sets. When approximating such contours
Lamadrid-Figueroa, Héctor; Téllez-Rojo, Martha M; Angeles, Gustavo; Hernández-Ávila, Mauricio; Hu, Howard
2011-01-01
In-vivo measurement of bone lead by means of K-X-ray fluorescence (KXRF) is the preferred biological marker of chronic exposure to lead. Unfortunately, considerable measurement error associated with KXRF estimations can introduce bias in estimates of the effect of bone lead when this variable is included as the exposure in a regression model. Estimates of uncertainty reported by the KXRF instrument reflect the variance of the measurement error and, although they can be used to correct the measurement error bias, they are seldom used in epidemiological statistical analyzes. Errors-in-variables regression (EIV) allows for correction of bias caused by measurement error in predictor variables, based on the knowledge of the reliability of such variables. The authors propose a way to obtain reliability coefficients for bone lead measurements from uncertainty data reported by the KXRF instrument and compare, by the use of Monte Carlo simulations, results obtained using EIV regression models vs. those obtained by the standard procedures. Results of the simulations show that Ordinary Least Square (OLS) regression models provide severely biased estimates of effect, and that EIV provides nearly unbiased estimates. Although EIV effect estimates are more imprecise, their mean squared error is much smaller than that of OLS estimates. In conclusion, EIV is a better alternative than OLS to estimate the effect of bone lead when measured by KXRF. Copyright Â© 2010 Elsevier Inc. All rights reserved.
Forkuor, Gerald; Hounkpatin, Ozias K L; Welp, Gerhard; Thiel, Michael
2017-01-01
Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness
Directory of Open Access Journals (Sweden)
Gerald Forkuor
Full Text Available Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat, terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC, soil organic carbon (SOC and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR, random forest regression (RFR, support vector machine (SVM, stochastic gradient boosting (SGB-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices
Olive, David J
2017-01-01
This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Granato, Gregory E.
2006-01-01
data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.
DEFF Research Database (Denmark)
Østergaard, Søren; Ettema, Jehan Frans; Hjortø, Line
Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent v...... in multiparous cows. The merit of using this approach was demonstrated since the economic value of metritis was estimated to be 81% higher when no mediator variables were included in the multiple regression analysis......Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent...... variable, while using the traits representing the direct effects of metritis on yield, fertility and occurrence of other diseases as mediator variables. The economic value of metritis was estimated to be €78 per 100 cow-years for each 1% increase of metritis in the period of 1-100 days in milk...
Jović, Ozren; Smrečki, Neven; Popović, Zora
2016-04-01
A novel quantitative prediction and variable selection method called interval ridge regression (iRR) is studied in this work. The method is performed on six data sets of FTIR, two data sets of UV-vis and one data set of DSC. The obtained results show that models built with ridge regression on optimal variables selected with iRR significantly outperfom models built with ridge regression on all variables in both calibration (6 out of 9 cases) and validation (2 out of 9 cases). In this study, iRR is also compared with interval partial least squares regression (iPLS). iRR outperfomed iPLS in validation (insignificantly in 6 out of 9 cases and significantly in one out of 9 cases for poil, a well known health beneficial nutrient, is studied in this work by mixing it with cheap and widely used oils such as soybean (So) oil, rapeseed (R) oil and sunflower (Su) oil. Binary mixture sets of hempseed oil with these three oils (HSo, HR and HSu) and a ternary mixture set of H oil, R oil and Su oil (HRSu) were considered. The obtained accuracy indicates that using iRR on FTIR and UV-vis data, each particular oil can be very successfully quantified (in all 8 cases RMSEPoil (R(2)>0.99). Copyright © 2015 Elsevier B.V. All rights reserved.
Understanding logistic regression analysis
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...
Bartelet, D.; Haelermans, C.; Groot, W.; Maassen van den Brink, H.
2013-01-01
This paper explores the variability in the effect of an additional year of education on different basic mathematical skills, which are taught to children and explicitly repeated at different points in time during elementary school. In addition, the role of child specific characteristics and the role
Street, Nathan Lee
2017-01-01
Teacher value-added measures (VAM) are designed to provide information regarding teachers' causal impact on the academic growth of students while controlling for exogenous variables. While some researchers contend VAMs successfully and authentically measure teacher causality on learning, others suggest VAMs cannot adequately control for exogenous…
Bozpolat, Ebru
2017-01-01
The purpose of this study is to determine whether Cumhuriyet University Faculty of Education students' levels of speaking anxiety are predicted by the variables of gender, department, grade, such sub-dimensions of "Speaking Self-Efficacy Scale for Pre-Service Teachers" as "public speaking," "effective speaking,"…
Panayi, Efstathios; Kyriakides, George
2017-01-01
Quantifying the effects of environmental factors over the duration of the growing process on Agaricus Bisporus (button mushroom) yields has been difficult, as common functional data analysis approaches require fixed length functional data. The data available from commercial growers, however, is of variable duration, due to commercial considerations. We employ a recently proposed regression technique termed Variable-Domain Functional Regression in order to be able to accommodate these irregular-length datasets. In this way, we are able to quantify the contribution of covariates such as temperature, humidity and water spraying volumes across the growing process, and for different lengths of growing processes. Our results indicate that optimal oxygen and temperature levels vary across the growing cycle and we propose environmental schedules for these covariates to optimise overall yields. PMID:28961254
Non-uniform approximations for sums of discrete m-dependent random variables
Vellaisamy, P.; Cekanavicius, V.
2013-01-01
Non-uniform estimates are obtained for Poisson, compound Poisson, translated Poisson, negative binomial and binomial approximations to sums of of m-dependent integer-valued random variables. Estimates for Wasserstein metric also follow easily from our results. The results are then exemplified by the approximation of Poisson binomial distribution, 2-runs and $m$-dependent $(k_1,k_2)$-events.
Francq, Bernard G; Govaerts, Bernadette
2016-06-30
Two main methodologies for assessing equivalence in method-comparison studies are presented separately in the literature. The first one is the well-known and widely applied Bland-Altman approach with its agreement intervals, where two methods are considered interchangeable if their differences are not clinically significant. The second approach is based on errors-in-variables regression in a classical (X,Y) plot and focuses on confidence intervals, whereby two methods are considered equivalent when providing similar measures notwithstanding the random measurement errors. This paper reconciles these two methodologies and shows their similarities and differences using both real data and simulations. A new consistent correlated-errors-in-variables regression is introduced as the errors are shown to be correlated in the Bland-Altman plot. Indeed, the coverage probabilities collapse and the biases soar when this correlation is ignored. Novel tolerance intervals are compared with agreement intervals with or without replicated data, and novel predictive intervals are introduced to predict a single measure in an (X,Y) plot or in a Bland-Atman plot with excellent coverage probabilities. We conclude that the (correlated)-errors-in-variables regressions should not be avoided in method comparison studies, although the Bland-Altman approach is usually applied to avert their complexity. We argue that tolerance or predictive intervals are better alternatives than agreement intervals, and we provide guidelines for practitioners regarding method comparison studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
LiMin Wang
Full Text Available Numerous data mining models have been proposed to construct computer-aided medical expert systems. Bayesian network classifiers (BNCs are more distinct and understandable than other models. To graphically describe the dependency relationships among clinical variables for thyroid disease diagnosis and ensure the rationality of the diagnosis results, the proposed k-dependence causal forest (KCF model generates a series of submodels in the framework of maximum spanning tree (MST and demonstrates stronger dependence representation. Friedman test on 12 UCI datasets shows that KCF has classification accuracy advantage over the other state-of-the-art BNCs, such as Naive Bayes, tree augmented Naive Bayes, and k-dependence Bayesian classifier. Our extensive experimental comparison on 4 medical datasets also proves the feasibility and effectiveness of KCF in terms of sensitivity and specificity.
Directory of Open Access Journals (Sweden)
Oh Seok Kim
2017-05-01
Full Text Available This paper introduces a mixed method approach for analyzing the determinants of natural latex yields and the associated spatial variations and identifying the most suitable regions for producing latex. Geographically Weighted Regressions (GWR and Iterative Self-Organizing Data Analysis Technique (ISODATA are jointly applied to the georeferenced data points collected from the rubber plantations in Xishuangbanna (in Yunnan province, south China and other remotely-sensed spatial data. According to the GWR models, Age of rubber tree, Percent of clay in soil, Elevation, Solar radiation, Population, Distance from road, Distance from stream, Precipitation, and Mean temperature turn out statistically significant, indicating that these are the major determinants shaping latex yields at the prefecture level. However, the signs and magnitudes of the parameter estimates at the aggregate level are different from those at the lower spatial level, and the differences are due to diverse reasons. The ISODATA classifies the landscape into three categories: high, medium, and low potential yields. The map reveals that Mengla County has the majority of land with high potential yield, while Jinghong City and Menghai County show lower potential yield. In short, the mixed method can offer a means of providing greater insights in the prediction of agricultural production.
Chandler, T L; Pralle, R S; Dórea, J R R; Poock, S E; Oetzel, G R; Fourdraine, R H; White, H M
2018-03-01
Although cowside testing strategies for diagnosing hyperketonemia (HYK) are available, many are labor intensive and costly, and some lack sufficient accuracy. Predicting milk ketone bodies by Fourier transform infrared spectrometry during routine milk sampling may offer a more practical monitoring strategy. The objectives of this study were to (1) develop linear and logistic regression models using all available test-day milk and performance variables for predicting HYK and (2) compare prediction methods (Fourier transform infrared milk ketone bodies, linear regression models, and logistic regression models) to determine which is the most predictive of HYK. Given the data available, a secondary objective was to evaluate differences in test-day milk and performance variables (continuous measurements) between Holsteins and Jerseys and between cows with or without HYK within breed. Blood samples were collected on the same day as milk sampling from 658 Holstein and 468 Jersey cows between 5 and 20 d in milk (DIM). Diagnosis of HYK was at a serum β-hydroxybutyrate (BHB) concentration ≥1.2 mmol/L. Concentrations of milk BHB and acetone were predicted by Fourier transform infrared spectrometry (Foss Analytical, Hillerød, Denmark). Thresholds of milk BHB and acetone were tested for diagnostic accuracy, and logistic models were built from continuous variables to predict HYK in primiparous and multiparous cows within breed. Linear models were constructed from continuous variables for primiparous and multiparous cows within breed that were 5 to 11 DIM or 12 to 20 DIM. Milk ketone body thresholds diagnosed HYK with 64.0 to 92.9% accuracy in Holsteins and 59.1 to 86.6% accuracy in Jerseys. Logistic models predicted HYK with 82.6 to 97.3% accuracy. Internally cross-validated multiple linear regression models diagnosed HYK of Holstein cows with 97.8% accuracy for primiparous and 83.3% accuracy for multiparous cows. Accuracy of Jersey models was 81.3% in primiparous and 83
Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo
2016-11-01
The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.
Galea, Joseph M.; Ruge, Diane; Buijink, Arthur; Bestmann, Sven; Rothwell, John C.
2013-01-01
Action selection describes the high-level process which selects between competing movements. In animals, behavioural variability is critical for the motor exploration required to select the action which optimizes reward and minimizes cost/punishment, and is guided by dopamine (DA). The aim of this study was to test in humans whether low-level movement parameters are affected by punishment and reward in ways similar to high-level action selection. Moreover, we addressed the proposed dependence of behavioural and neurophysiological variability on DA, and whether this may underpin the exploration of kinematic parameters. Participants performed an out-and-back index finger movement and were instructed that monetary reward and punishment were based on its maximal acceleration (MA). In fact, the feedback was not contingent on the participant’s behaviour but pre-determined. Blocks highly-biased towards punishment were associated with increased MA variability relative to blocks with either reward or without feedback. This increase in behavioural variability was positively correlated with neurophysiological variability, as measured by changes in cortico-spinal excitability with transcranial magnetic stimulation over the primary motor cortex. Following the administration of a DA-antagonist, the variability associated with punishment diminished and the correlation between behavioural and neurophysiological variability no longer existed. Similar changes in variability were not observed when participants executed a pre-determined MA, nor did DA influence resting neurophysiological variability. Thus, under conditions of punishment, DA-dependent processes influence the selection of low-level movement parameters. We propose that the enhanced behavioural variability reflects the exploration of kinematic parameters for less punishing, or conversely more rewarding, outcomes. PMID:23447607
Directory of Open Access Journals (Sweden)
Qunying Wu
2017-05-01
Full Text Available Abstract In this paper, we study the equivalent conditions of complete moment convergence for sequences of identically distributed extended negatively dependent random variables. As a result, we extend and generalize some results of complete moment convergence obtained by Chow (Bull. Inst. Math. Acad. Sin. 16:177-201, 1988 and Li and Spătaru (J. Theor. Probab. 18:933-947, 2005 from the i.i.d. case to extended negatively dependent sequences.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Spady, Richard; Stouli, Sami
2012-01-01
We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...
Kusurkar, R A; Ten Cate, Th J; van Asperen, M; Croiset, G
2011-01-01
Motivation in learning behaviour and education is well-researched in general education, but less in medical education. To answer two research questions, 'How has the literature studied motivation as either an independent or dependent variable? How is motivation useful in predicting and understanding processes and outcomes in medical education?' in the light of the Self-determination Theory (SDT) of motivation. A literature search performed using the PubMed, PsycINFO and ERIC databases resulted in 460 articles. The inclusion criteria were empirical research, specific measurement of motivation and qualitative research studies which had well-designed methodology. Only studies related to medical students/school were included. Findings of 56 articles were included in the review. Motivation as an independent variable appears to affect learning and study behaviour, academic performance, choice of medicine and specialty within medicine and intention to continue medical study. Motivation as a dependent variable appears to be affected by age, gender, ethnicity, socioeconomic status, personality, year of medical curriculum and teacher and peer support, all of which cannot be manipulated by medical educators. Motivation is also affected by factors that can be influenced, among which are, autonomy, competence and relatedness, which have been described as the basic psychological needs important for intrinsic motivation according to SDT. Motivation is an independent variable in medical education influencing important outcomes and is also a dependent variable influenced by autonomy, competence and relatedness. This review finds some evidence in support of the validity of SDT in medical education.
Construction of adjoint operators for coupled equations depending on different variables
International Nuclear Information System (INIS)
Hoogenboom, J.E.
1986-01-01
A procedure is described for the construction of the adjoint operator matrix in case of coupled equations defining quantities that depend on different sets of variables. This case is not properly treated in the literature. From this procedure a simple rule can be deduced for the construction of such adjoint operator matrices
Panel data models extended to spatial error autocorrelation or a spatially lagged dependent variable
Elhorst, J. Paul
2001-01-01
This paper surveys panel data models extended to spatial error autocorrelation or a spatially lagged dependent variable. In particular, it focuses on the specification and estimation of four panel data models commonly used in applied research: the fixed effects model, the random effects model, the
Irlbeck, Sonja A.
2002-01-01
Provides a chronological perspective of human performance technology (HPT) definitions and an evaluation of them in terms of independent and dependent variables. Discusses human competence and performance technology and compares the definitions with the goals that have been articulated for HPT. (Author/LRW)
Central limit theorem for the Banach-valued weakly dependent random variables
International Nuclear Information System (INIS)
Dmitrovskij, V.A.; Ermakov, S.V.; Ostrovskij, E.I.
1983-01-01
The central limit theorem (CLT) for the Banach-valued weakly dependent random variables is proved. In proving CLT convergence of finite-measured (i.e. cylindrical) distributions is established. A weak compactness of the family of measures generated by a certain sequence is confirmed. The continuity of the limiting field is checked
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Wong, Man Sing; Ho, Hung Chak; Yang, Lin; Shi, Wenzhong; Yang, Jinxin; Chan, Ta-Chien
2017-07-24
Dust events have long been recognized to be associated with a higher mortality risk. However, no study has investigated how prolonged dust events affect the spatial variability of mortality across districts in a downwind city. In this study, we applied a spatial regression approach to estimate the district-level mortality during two extreme dust events in Hong Kong. We compared spatial and non-spatial models to evaluate the ability of each regression to estimate mortality. We also compared prolonged dust events with non-dust events to determine the influences of community factors on mortality across the city. The density of a built environment (estimated by the sky view factor) had positive association with excess mortality in each district, while socioeconomic deprivation contributed by lower income and lower education induced higher mortality impact in each territory planning unit during a prolonged dust event. Based on the model comparison, spatial error modelling with the 1st order of queen contiguity consistently outperformed other models. The high-risk areas with higher increase in mortality were located in an urban high-density environment with higher socioeconomic deprivation. Our model design shows the ability to predict spatial variability of mortality risk during an extreme weather event that is not able to be estimated based on traditional time-series analysis or ecological studies. Our spatial protocol can be used for public health surveillance, sustainable planning and disaster preparation when relevant data are available.
SECOND ORDER LEAST SQUARE ESTIMATION ON ARCH(1 MODEL WITH BOX-COX TRANSFORMED DEPENDENT VARIABLE
Directory of Open Access Journals (Sweden)
Herni Utami
2014-03-01
Full Text Available Box-Cox transformation is often used to reduce heterogeneity and to achieve a symmetric distribution of response variable. In this paper, we estimate the parameters of Box-Cox transformed ARCH(1 model using second-order leastsquare method and then we study the consistency and asymptotic normality for second-order least square (SLS estimators. The SLS estimation was introduced byWang (2003, 2004 to estimate the parameters of nonlinear regression models with independent and identically distributed errors
Directory of Open Access Journals (Sweden)
Kheirbek Iyad
2012-07-01
Full Text Available Abstract Background Hazardous air pollutant exposures are common in urban areas contributing to increased risk of cancer and other adverse health outcomes. While recent analyses indicate that New York City residents experience significantly higher cancer risks attributable to hazardous air pollutant exposures than the United States as a whole, limited data exist to assess intra-urban variability in air toxics exposures. Methods To assess intra-urban spatial variability in exposures to common hazardous air pollutants, street-level air sampling for volatile organic compounds and aldehydes was conducted at 70 sites throughout New York City during the spring of 2011. Land-use regression models were developed using a subset of 59 sites and validated against the remaining 11 sites to describe the relationship between concentrations of benzene, total BTEX (benzene, toluene, ethylbenzene, xylenes and formaldehyde to indicators of local sources, adjusting for temporal variation. Results Total BTEX levels exhibited the most spatial variability, followed by benzene and formaldehyde (coefficient of variation of temporally adjusted measurements of 0.57, 0.35, 0.22, respectively. Total roadway length within 100 m, traffic signal density within 400 m of monitoring sites, and an indicator of temporal variation explained 65% of the total variability in benzene while 70% of the total variability in BTEX was accounted for by traffic signal density within 450 m, density of permitted solvent-use industries within 500 m, and an indicator of temporal variation. Measures of temporal variation, traffic signal density within 400 m, road length within 100 m, and interior building area within 100 m (indicator of heating fuel combustion predicted 83% of the total variability of formaldehyde. The models built with the modeling subset were found to predict concentrations well, predicting 62% to 68% of monitored values at validation sites. Conclusions Traffic and
Kheirbek, Iyad; Johnson, Sarah; Ross, Zev; Pezeshki, Grant; Ito, Kazuhiko; Eisl, Holger; Matte, Thomas
2012-07-31
Hazardous air pollutant exposures are common in urban areas contributing to increased risk of cancer and other adverse health outcomes. While recent analyses indicate that New York City residents experience significantly higher cancer risks attributable to hazardous air pollutant exposures than the United States as a whole, limited data exist to assess intra-urban variability in air toxics exposures. To assess intra-urban spatial variability in exposures to common hazardous air pollutants, street-level air sampling for volatile organic compounds and aldehydes was conducted at 70 sites throughout New York City during the spring of 2011. Land-use regression models were developed using a subset of 59 sites and validated against the remaining 11 sites to describe the relationship between concentrations of benzene, total BTEX (benzene, toluene, ethylbenzene, xylenes) and formaldehyde to indicators of local sources, adjusting for temporal variation. Total BTEX levels exhibited the most spatial variability, followed by benzene and formaldehyde (coefficient of variation of temporally adjusted measurements of 0.57, 0.35, 0.22, respectively). Total roadway length within 100 m, traffic signal density within 400 m of monitoring sites, and an indicator of temporal variation explained 65% of the total variability in benzene while 70% of the total variability in BTEX was accounted for by traffic signal density within 450 m, density of permitted solvent-use industries within 500 m, and an indicator of temporal variation. Measures of temporal variation, traffic signal density within 400 m, road length within 100 m, and interior building area within 100 m (indicator of heating fuel combustion) predicted 83% of the total variability of formaldehyde. The models built with the modeling subset were found to predict concentrations well, predicting 62% to 68% of monitored values at validation sites. Traffic and point source emissions cause substantial variation in street-level exposures
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we
Directory of Open Access Journals (Sweden)
Christophe Coupé
2018-04-01
Full Text Available As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM, which address grouping of observations, and generalized linear mixed-effects models (GLMM, which offer a family of distributions for the dependent variable. Generalized additive models (GAM are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS. We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
A Matlab program for stepwise regression
Directory of Open Access Journals (Sweden)
Yanhong Qi
2016-03-01
Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.
Method of nuclear reactor control using a variable temperature load dependent set point
International Nuclear Information System (INIS)
Kelly, J.J.; Rambo, G.E.
1982-01-01
A method and apparatus for controlling a nuclear reactor in response to a variable average reactor coolant temperature set point is disclosed. The set point is dependent upon percent of full power load demand. A manually-actuated ''droop mode'' of control is provided whereby the reactor coolant temperature is allowed to drop below the set point temperature a predetermined amount wherein the control is switched from reactor control rods exclusively to feedwater flow
TWO MEASURES OF THE DEPENDENCE OF PREFERENTIAL RANKINGS ON CATEGORICAL VARIABLES
Directory of Open Access Journals (Sweden)
Lissowski Grzegorz
2017-06-01
Full Text Available The aim of this paper is to apply a general methodology for constructing statistical methods, which is based on decision theory, to give a statistical description of preferential rankings, with a focus on the rankings’ dependence on categorical variables. In the paper, I use functions of description errors that are based on the Kemeny and Hamming distances between preferential orderings, but the proposed methodology can also be applied to other methods of estimating description errors.
An edgeworth expansion for a sum of M-Dependent random variables
Directory of Open Access Journals (Sweden)
Wan Soo Rhee
1985-01-01
Full Text Available Given a sequence X1,X2,…,Xn of m-dependent random variables with moments of order 3+α (0<α≦1, we give an Edgeworth expansion of the distribution of Sσ−1(S=X1+X2+…+Xn, σ2=ES2 under the assumption that E[exp(it Sσ1] is small away from the origin. The result is of the best possible order.
ABCB1 genetic variability and methadone dosage requirements in opioid-dependent individuals.
Coller, Janet K; Barratt, Daniel T; Dahlen, Karianne; Loennechen, Morten H; Somogyi, Andrew A
2006-12-01
The most common treatment for opioid dependence is substitution therapy with another opioid such as methadone. The methadone dosage is individualized but highly variable, and program retention rates are low due in part to nonoptimal dosing resulting in withdrawal symptoms and further heroin craving and use. Methadone is a substrate for the P-glycoprotein transporter, encoded by the ABCB1 gene, which regulates central nervous system exposure. This retrospective study aimed to investigate the influence of ABCB1 genetic variability on methadone dose requirements. Genomic deoxyribonucleic acid was isolated from opioid-dependent subjects (n = 60) and non-opioid-dependent control subjects (n = 60), and polymerase chain reaction-restriction fragment length polymorphism and allele-specific polymerase chain reaction were used to determine the presence of single nucleotide polymorphisms at positions 61, 1199, 1236, 2677, and 3435. ABCB1 haplotypes were inferred with PHASE software (version 2.1). There were no significant differences in the allele or genotype frequencies of the individual single nucleotide polymorphisms or haplotypes between the 2 populations. ABCB1 genetic variability influenced daily methadone dose requirements, such that subjects carrying 2 copies of the wild-type haplotype required higher doses compared with those with 1 copy and those with no copies (98.3 +/- 10.4, 58.6 +/- 20.9, and 55.4 +/- 26.1 mg/d, respectively; P = .029). In addition, carriers of the AGCTT haplotype required significantly lower doses than noncarriers (38.0 +/- 16.8 and 61.3 +/- 24.6 mg/d, respectively; P = .04). Although ABCB1 genetic variability is not related to the development of opioid dependence, identification of variant haplotypes may, after larger prospective studies have been performed, provide clinicians with a tool for methadone dosage individualization.
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
Energy Technology Data Exchange (ETDEWEB)
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios, E-mail: junhankim@email.arizona.edu [Department of Astronomy and Steward Observatory, University of Arizona, 933 N. Cherry Avenue, Tucson, AZ 85721 (United States)
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.
Distance and Azimuthal Dependence of Ground‐Motion Variability for Unilateral Strike‐Slip Ruptures
Vyas, Jagdish Chandra
2016-06-21
We investigate near‐field ground‐motion variability by computing the seismic wavefield for five kinematic unilateral‐rupture models of the 1992 Mw 7.3 Landers earthquake, eight simplified unilateral‐rupture models based on the Landers event, and a large Mw 7.8 ShakeOut scenario. We include the geometrical fault complexity and consider different 1D velocity–density profiles for the Landers simulations and a 3D heterogeneous Earth structure for the ShakeOut scenario. For the Landers earthquake, the computed waveforms are validated using strong‐motion recordings. We analyze the simulated ground‐motion data set in terms of distance and azimuth dependence of peak ground velocity (PGV). Our simulations reveal that intraevent ground‐motion variability Graphic is higher in close distances to the fault (<20 km) and decreases with increasing distance following a power law. This finding is in stark contrast to constant sigma‐values used in empirical ground‐motion prediction equations. The physical explanation of a large near‐field Graphic is the presence of strong directivity and rupture complexity. High values of Graphic occur in the rupture‐propagation direction, but small values occur in the direction perpendicular to it. We observe that the power‐law decay of Graphic is primarily controlled by slip heterogeneity. In addition, Graphic, as function of azimuth, is sensitive to variations in both rupture speed and slip heterogeneity. The azimuth dependence of the ground‐motion mean μln(PGV) is well described by a Cauchy–Lorentz function that provides a novel empirical quantification to model the spatial dependency of ground motion. Online Material: Figures of slip distributions, residuals to ground‐motion prediction equations (GMPEs), distance and azimuthal dependence, and directivity predictor of ground‐motion variability for different source models.
Degree of multicollinearity and variables involved in linear dependence in additive-dominant models
Directory of Open Access Journals (Sweden)
Juliana Petrini
2012-12-01
Full Text Available The objective of this work was to assess the degree of multicollinearity and to identify the variables involved in linear dependence relations in additive-dominant models. Data of birth weight (n=141,567, yearling weight (n=58,124, and scrotal circumference (n=20,371 of Montana Tropical composite cattle were used. Diagnosis of multicollinearity was based on the variance inflation factor (VIF and on the evaluation of the condition indexes and eigenvalues from the correlation matrix among explanatory variables. The first model studied (RM included the fixed effect of dam age class at calving and the covariates associated to the direct and maternal additive and non-additive effects. The second model (R included all the effects of the RM model except the maternal additive effects. Multicollinearity was detected in both models for all traits considered, with VIF values of 1.03 - 70.20 for RM and 1.03 - 60.70 for R. Collinearity increased with the increase of variables in the model and the decrease in the number of observations, and it was classified as weak, with condition index values between 10.00 and 26.77. In general, the variables associated with additive and non-additive effects were involved in multicollinearity, partially due to the natural connection between these covariables as fractions of the biological types in breed composition.
The dependence of J/ψ-nucleon inelastic cross section on the Feynman variable
International Nuclear Information System (INIS)
Duan Chungui; Liu Na; Miao Wendan
2011-01-01
By means of two typical sets of nuclear parton distribution functions, meanwhile taking account of the energy loss of the beam proton and the nuclear absorption of the charmonium states traversing the nuclear matter in the uniform framework of the Glauber model, a leading order phenomenological analysis is given in the color evaporation model of the E866 experimental data on J/ψ production differential cross section ratios R Fe/Be (x F ). It is shown that the energy loss effect of beam proton on R Fe/Be (x F ) is more important than the nuclear effects on parton distribution functions in the high Feynman variable x F region. It is found that the J/ψ-nucleon inelastic cross section depends on the Feynman variable x F and increases linearly with x F in the region x F > 0.2. (authors)
Bricklemyer, Ross S; Brown, David J; Turk, Philip J; Clegg, Sam M
2013-10-01
Laser-induced breakdown spectroscopy (LIBS) provides a potential method for rapid, in situ soil C measurement. In previous research on the application of LIBS to intact soil cores, we hypothesized that ultraviolet (UV) spectrum LIBS (200-300 nm) might not provide sufficient elemental information to reliably discriminate between soil organic C (SOC) and inorganic C (IC). In this study, using a custom complete spectrum (245-925 nm) core-scanning LIBS instrument, we analyzed 60 intact soil cores from six wheat fields. Predictive multi-response partial least squares (PLS2) models using full and reduced spectrum LIBS were compared for directly determining soil total C (TC), IC, and SOC. Two regression shrinkage and variable selection approaches, the least absolute shrinkage and selection operator (LASSO) and sparse multivariate regression with covariance estimation (MRCE), were tested for soil C predictions and the identification of wavelengths important for soil C prediction. Using complete spectrum LIBS for PLS2 modeling reduced the calibration standard error of prediction (SEP) 15 and 19% for TC and IC, respectively, compared to UV spectrum LIBS. The LASSO and MRCE approaches provided significantly improved calibration accuracy and reduced SEP 32-55% over UV spectrum PLS2 models. We conclude that (1) complete spectrum LIBS is superior to UV spectrum LIBS for predicting soil C for intact soil cores without pretreatment; (2) LASSO and MRCE approaches provide improved calibration prediction accuracy over PLS2 but require additional testing with increased soil and target analyte diversity; and (3) measurement errors associated with analyzing intact cores (e.g., sample density and surface roughness) require further study and quantification.
Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.
2017-01-01
Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...
Wu, Chih-Da; Chen, Yu-Cheng; Pan, Wen-Chi; Zeng, Yu-Ting; Chen, Mu-Jean; Guo, Yue Leon; Lung, Shih-Chun Candice
2017-05-01
This study utilized a long-term satellite-based vegetation index, and considered culture-specific emission sources (temples and Chinese restaurants) with Land-use Regression (LUR) modelling to estimate the spatial-temporal variability of PM 2.5 using data from Taipei metropolis, which exhibits typical Asian city characteristics. Annual average PM 2.5 concentrations from 2006 to 2012 of 17 air quality monitoring stations established by Environmental Protection Administration of Taiwan were used for model development. PM 2.5 measurements from 2013 were used for external data verification. Monthly Normalized Difference Vegetation Index (NDVI) images coupled with buffer analysis were used to assess the spatial-temporal variations of greenness surrounding the monitoring sites. The distribution of temples and Chinese restaurants were included to represent the emission contributions from incense and joss money burning, and gas cooking, respectively. Spearman correlation coefficient and stepwise regression were used for LUR model development, and 10-fold cross-validation and external data verification were applied to verify the model reliability. The results showed a strongly negative correlation (r: -0.71 to -0.77) between NDVI and PM 2.5 while temples (r: 0.52 to 0.66) and Chinese restaurants (r: 0.31 to 0.44) were positively correlated to PM 2.5 concentrations. With the adjusted model R 2 of 0.89, a cross-validated adj-R 2 of 0.90, and external validated R 2 of 0.83, the high explanatory power of the resultant model was confirmed. Moreover, the averaged NDVI within a 1750 m circular buffer (p < 0.01), the number of Chinese restaurants within a 1750 m buffer (p < 0.01), and the number of temples within a 750 m buffer (p = 0.06) were selected as important predictors during the stepwise selection procedures. According to the partial R 2 , NDVI explained 66% of PM 2.5 variation and was the dominant variable in the developed model. We suggest future studies
Memory effects, two color percolation, and the temperature dependence of Mott variable-range hopping
Agam, Oded; Aleiner, Igor L.
2014-06-01
There are three basic processes that determine hopping transport: (a) hopping between normally empty sites (i.e., having exponentially small occupation numbers at equilibrium), (b) hopping between normally occupied sites, and (c) transitions between normally occupied and unoccupied sites. In conventional theories all these processes are considered Markovian and the correlations of occupation numbers of different sites are believed to be small (i.e., not exponential in temperature). We show that, contrary to this belief, memory effects suppress the processes of type (c) and manifest themselves in a subleading exponential temperature dependence of the variable-range hopping conductivity. This temperature dependence originates from the property that sites of type (a) and (b) form two independent resistor networks that are weakly coupled to each other by processes of type (c). This leads to a two-color percolation problem which we solve in the critical region.
Population and prehistory III: food-dependent demography in variable environments.
Lee, Charlotte T; Puleston, Cedric O; Tuljapurkar, Shripad
2009-11-01
The population dynamics of preindustrial societies depend intimately on their surroundings, and food is a primary means through which environment influences population size and individual well-being. Food production requires labor; thus, dependence of survival and fertility on food involves dependence of a population's future on its current state. We use a perturbation approach to analyze the effects of random environmental variation on this nonlinear, age-structured system. We show that in expanding populations, direct environmental effects dominate induced population fluctuations, so environmental variability has little effect on mean hunger levels, although it does decrease population growth. The growth rate determines the time until population is limited by space. This limitation introduces a tradeoff between population density and well-being, so population effects become more important than the direct effects of the environment: environmental fluctuation increases mortality, releasing density dependence and raising average well-being for survivors. We discuss the social implications of these findings for the long-term fate of populations as they transition from expansion into limitation, given that conditions leading to high well-being during growth depress well-being during limitation.
Principal component regression for crop yield estimation
Suryanarayana, T M V
2016-01-01
This book highlights the estimation of crop yield in Central Gujarat, especially with regard to the development of Multiple Regression Models and Principal Component Regression (PCR) models using climatological parameters as independent variables and crop yield as a dependent variable. It subsequently compares the multiple linear regression (MLR) and PCR results, and discusses the significance of PCR for crop yield estimation. In this context, the book also covers Principal Component Analysis (PCA), a statistical procedure used to reduce a number of correlated variables into a smaller number of uncorrelated variables called principal components (PC). This book will be helpful to the students and researchers, starting their works on climate and agriculture, mainly focussing on estimation models. The flow of chapters takes the readers in a smooth path, in understanding climate and weather and impact of climate change, and gradually proceeds towards downscaling techniques and then finally towards development of ...
Czech Academy of Sciences Publication Activity Database
Axelsson, Owe; Xin, H.; Neytcheva, M.
2015-01-01
Roč. 20, č. 2 (2015), s. 232-260 ISSN 1392-6292 Institutional support: RVO:68145535 Keywords : variable density * phase-field model * Navier-Stokes equations * preconditioning * variable viscosity Subject RIV: BA - General Mathematics Impact factor: 0.468, year: 2015 http://www.tandfonline.com/doi/abs/10.3846/13926292.2015.1021395
ANALYSIS OF THE FINANCIAL PERFORMANCES OF THE FIRM, BY USING THE MULTIPLE REGRESSION MODEL
Directory of Open Access Journals (Sweden)
Constantin Anghelache
2011-11-01
Full Text Available The information achieved through the use of simple linear regression are not always enough to characterize the evolution of an economic phenomenon and, furthermore, to identify its possible future evolution. To remedy these drawbacks, the special literature includes multiple regression models, in which the evolution of the dependant variable is defined depending on two or more factorial variables.
International Nuclear Information System (INIS)
Verhulst, Simon; Aviv, Abraham; Benetos, Athanase; Berenson, Gerald S.; Kark, Jeremy D.
2013-01-01
Leukocyte telomere length (LTL) shortens with age. Longitudinal studies have reported accelerated LTL attrition when baseline LTL is longer. However, the dependency of LTL attrition on baseline LTL might stem from a statistical artifact known as regression to the mean (RTM). To our knowledge no published study of LTL dynamics (LTL and its attrition rate) has corrected for this phenomenon. We illustrate the RTM effect using replicate LTL measurements, and show, using simulated data, how the RTM effect increases with a rise in stochastic measurement variation (representing LTL measurement error), resulting in spurious increasingly elevated dependencies of attrition on baseline values. In addition, we re-analyzed longitudinal LTL data collected from four study populations to test the hypothesis that LTL attrition depends on baseline LTL. We observed that the rate of LTL attrition was proportional to baseline LTL, but correction for the RTM effect reduced the slope of the relationship by 57 % when measurement error was low (coefficient of variation ∼2 %). A modest but statistically significant effect remained however, indicating that high baseline LTL is associated with higher LTL attrition even when correcting for the RTM effect. Baseline LTL explained 1.3 % of the variation in LTL attrition, but this effect, which differed significantly between the study samples, appeared to be primarily attributable to the association in men (3.7 %)
Directory of Open Access Journals (Sweden)
Ulaş Yurtsever
2017-03-01
Full Text Available In this study, an experimental system entailing ciprofloxacin hydrochloride (CIP removal from aqueous solution is modeled by using artificial neural networks (ANNs. For modeling of CIP removal from aqueous solution using bentonite and activated carbon, we utilized the combination of output-dependent data scaling (ODDS with ANN, and the combination of ODDS with multivariable linear regression model (MVLR. The ANN model normalized via ODDS performs better in comparison with the ANN model scaled via standard normalization. Four distinct hybrid models, ANN with standard normalization, ANN with ODDS, MVLR with standard normalization, and MVLR with ODDS, were also applied. We observed that ANN and MVLR estimations’ consistency, accuracy ratios and model performances increase as a result of pre-processing with ODDS.
Bowles, Tyler J.; Jones, Jason
2004-01-01
Single equation regression models have been used rather extensively to test the effectiveness of Supplemental Instruction (SI). This approach, however, fails to account for the possibility that SI attendance and the outcome of SI attendance are jointly determined endogenous variables. Moreover, the standard approach fails to account for the fact…
Directory of Open Access Journals (Sweden)
Guanghao Sun
2016-11-01
Full Text Available Background and Objectives: Heart rate variability (HRV has been intensively studied as a promising biological marker of major depressive disorder (MDD. Our previous study confirmed that autonomic activity and reactivity in depression revealed by HRV during rest and mental task (MT conditions can be used as diagnostic measures and in clinical evaluation. In this study, logistic regression analysis (LRA was utilized for the classification and prediction of MDD based on HRV data obtained in an MT paradigm.Methods: Power spectral analysis of HRV on R-R intervals before, during, and after an MT (random number generation was performed in 44 drug-naïve patients with MDD and 47 healthy control subjects at Department of Psychiatry in Shizuoka Saiseikai General Hospital. Logit scores of LRA determined by HRV indices and heart rates discriminated patients with MDD from healthy subjects. The high frequency (HF component of HRV and the ratio of the low frequency (LF component to the HF component (LF/HF correspond to parasympathetic and sympathovagal balance, respectively.Results: The LRA achieved a sensitivity and specificity of 80.0% and 79.0%, respectively, at an optimum cutoff logit score (0.28. Misclassifications occurred only when the logit score was close to the cutoff score. Logit scores also correlated significantly with subjective self-rating depression scale scores (p < 0.05.Conclusion: HRV indices recorded during a mental task may be an objective tool for screening patients with MDD in psychiatric practice. The proposed method appears promising for not only objective and rapid MDD screening, but also evaluation of its severity.
Directory of Open Access Journals (Sweden)
Komačka Jozef
2016-05-01
Full Text Available The study focused on variability of surface reflections amplitudes of GPR horn antenna in relation to distance between an antenna and a surface is presented in the paper. The air-coupled antenna with the central frequency of 1 GHz was used in the investigation. Four types of surfaces (dry pavement, wet pavement, metal plate and composite layer from gypsum and wood were tested. The distance of antenna above the surfaces was changed in the range from 37.5 cm to 53.5 cm. The amplitudes of negative and positive peaks and their variability were analysed in relation to the distance of antenna above the surfaces. Moreover, the influence of changes in the peaks of negative and positive amplitudes on the total amplitudes was assessed. It was found out the amplitudes of negative peaks for all investigated surfaces were relatively consistent in the range from 40.5 cm to 48.5 cm and the moderate decline was identified in the case of amplitudes of positive peaks in the range of distances from 37.5 cm to 51.5 cm. This decline influences the tendency of total amplitudes. Based on the results of analysis it can be stated the distance of air-coupled antenna above the surface can influence the value of total amplitude and the differences depend on the type of surface.
The Attentional Dependence of Emotion Cognition is Variable with the Competing Task
Directory of Open Access Journals (Sweden)
Cheng Chen
2016-11-01
Full Text Available The relationship between emotion and attention has fascinated researchers for decades. Many previous studies have used eye-tracking, ERP, MEG and fMRI to explore this issue but have reached different conclusions: some researchers hold that emotion cognition is an automatic process and independent of attention, while some others believed that emotion cognition is modulated by attentional resources and is a type of controlled processing. The present research aimed to investigate this controversy, and we hypothesized that the attentional dependence of emotion cognition is variable with the competing task. Eye-tracking technology and a dual-task paradigm were adopted, and subjects’ attention was manipulated to fixate at the central task to investigate whether subjects could detect the emotional faces presented in the peripheral area with a decrease or near-absence of attention. The results revealed that when the peripheral task was emotional face discrimination but the central attention-demanding task was different, subjects performed well in the peripheral task, which means that emotional information can be processed in parallel with other stimuli, and there may be a specific channel in the human brain for processing emotional information. However, when the central and peripheral tasks were both emotional face discrimination, subjects could not perform well in the peripheral task, indicating that the processing of emotional information required attentional resources and that it is a type of controlled processing. Therefore, we concluded that the attentional dependence of emotion cognition varied with the competing task.
Directory of Open Access Journals (Sweden)
Xing-Bin Hu
2009-01-01
Full Text Available It has been reported that blocking Notch signaling in tumor-bearing mice results in abortive angiogenesis and tumor regression. However, given that Notch signaling influences numerous cellular processes in vivo, a comprehensive evaluation of the effect of Notch inactivation on tumor growth would be favorable. In this study, we inoculated four cancer cell lines in mice with the conditional inactivation of recombination signal-binding protein-Jκ (RBP-J, which mediates signaling from all four mammalian Notch receptors. We found that whereas three tumors including hepatocarcinoma, lung cancer, and osteogenic sarcoma grew slower in the RBP-J-deficient mice, at least a melanoma, B16, grew significantly faster in the RBP-J-deficient mice than in the controls, suggesting that the RBP-J-deficient hosts could provide permissive cues for tumor growth. All these tumors showed increased microvessels and up-regulated hypoxia-inducible factor 1α, suggesting that whereas defective angiogenesis resulted in hypoxia, different tumors might grow differentially in the RBP-J-deleted mice. Similarly, increased infiltration of Gr1+/Mac1+ cells were noticed in tumors grown in the RBP-J-inactivated mice. Moreover, we found that when inoculated in the RBP-J knockout hosts, the H22 hepatoma cells had a high frequency of metastasis and lethality, suggesting that at least for H22, deficiency of environmental Notch signaling favored tumor metastasis. Our findings suggested that the general blockade of Notch signaling in tumor-bearing mice could lead to defective angiogenesis in tumors, but depending on tumor cell types, general inhibition of Notch signaling might result in tumor regression, progression, or metastasis.
Matson, Johnny L.; Kozlowski, Alison M.
2010-01-01
Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…
(Non) linear regression modelling
Cizek, P.; Gentle, J.E.; Hardle, W.K.; Mori, Y.
2012-01-01
We will study causal relationships of a known form between random variables. Given a model, we distinguish one or more dependent (endogenous) variables Y = (Y1,…,Yl), l ∈ N, which are explained by a model, and independent (exogenous, explanatory) variables X = (X1,…,Xp),p ∈ N, which explain or
The Dependence of Cloud Particle Size on Non-Aerosol-Loading Related Variables
Energy Technology Data Exchange (ETDEWEB)
Shao, H.; Liu, G.
2005-03-18
An enhanced concentration of aerosol may increase the number of cloud drops by providing more cloud condensation nuclei (CCN), which in turn results in a higher cloud albedo at a constant cloud liquid water path. This process is often referred to as the aerosol indirect effect (AIE). Many in situ and remote sensing observations support this hypothesis (Ramanathan et al. 2001). However, satellite observed relations between aerosol concentration and cloud drop size are not always in agreement with the AIE. Based on global analysis of cloud effective radius (r{sub e}) and aerosol number concentration (N{sub a}) derived from satellite data, Sekiguchi et al. (2003) found that the correlations between the two variables can be either negative, or positive, or none, depending on the location of the clouds. They discovered that significantly negative r{sub e} - N{sub a} correlation can only be identified along coastal regions of the continents where abundant continental aerosols inflow from land, whereas Feingold et al. (2001) found that the response of r{sub e} to aerosol loading is the greatest in the region where aerosol optical depth ({tau}{sub a}) is the smallest. The reason for the discrepancy is likely due to the variations in cloud macroscopic properties such as geometrical thickness (Brenguier et al. 2003). Since r{sub e} is modified not only by aerosol but also by cloud geometrical thickness (H), the correlation between re and {tau}{sub a} actually reflects both the aerosol indirect effect and dependence of H. Therefore, discussing AIE based on the r{sub e}-{tau}{sub a} correlation without taking into account variations in cloud geometrical thickness may be misleading. This paper is motivated to extract aerosols' effect from overall effects using the independent measurements of cloud geometrical thickness, {tau}{sub a} and r{sub e}.
Time-dependence in relativistic collisionless shocks: theory of the variable
Energy Technology Data Exchange (ETDEWEB)
Spitkovsky, A
2004-02-05
We describe results from time-dependent numerical modeling of the collisionless reverse shock terminating the pulsar wind in the Crab Nebula. We treat the upstream relativistic wind as composed of ions and electron-positron plasma embedded in a toroidal magnetic field, flowing radially outward from the pulsar in a sector around the rotational equator. The relativistic cyclotron instability of the ion gyrational orbit downstream of the leading shock in the electron-positron pairs launches outward propagating magnetosonic waves. Because of the fresh supply of ions crossing the shock, this time-dependent process achieves a limit-cycle, in which the waves are launched with periodicity on the order of the ion Larmor time. Compressions in the magnetic field and pair density associated with these waves, as well as their propagation speed, semi-quantitatively reproduce the behavior of the wisp and ring features described in recent observations obtained using the Hubble Space Telescope and the Chandra X-Ray Observatory. By selecting the parameters of the ion orbits to fit the spatial separation of the wisps, we predict the period of time variability of the wisps that is consistent with the data. When coupled with a mechanism for non-thermal acceleration of the pairs, the compressions in the magnetic field and plasma density associated with the optical wisp structure naturally account for the location of X-ray features in the Crab. We also discuss the origin of the high energy ions and their acceleration in the equatorial current sheet of the pulsar wind.
Akbar, Noreen Sher; Abid, Syed Ali; Tripathi, Dharmendra; Mir, Nazir Ahmed
2017-03-01
The transport of single-wall carbon nanotube (CNT) nanofluids with temperature-dependent variable viscosity is analyzed by peristaltically driven flow. The main flow problem has been modeled using cylindrical coordinates and flow equations are simplified to ordinary differential equations using long wavelength and low Reynolds' number approximation. Analytical solutions have been obtained for axial velocity, pressure gradient and temperature. Results acquired are discussed graphically for better understanding. It is observed that with an increment in the Grashof number the velocity of the governing fluids starts to decrease significantly and the pressure gradient is higher for pure water as compared to single-walled carbon nanotubes due to low density. As the specific heat is very high for pure water as compared to the multi-wall carbon nanotubes, it raises temperature of the muscles, in the case of pure water, as compared to the multi-walled carbon nanotubes. Furthermore, it is noticed that the trapped bolus starts decreasing in size as the buoyancy forces are dominant as compared to viscous forces. This model may be applicable in biomedical engineering and nanotechnology to design the biomedical devices.
Directory of Open Access Journals (Sweden)
Matthias Schmid
Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.
Post-processing through linear regression
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Post-processing through linear regression
Directory of Open Access Journals (Sweden)
B. Van Schaeybroeck
2011-03-01
Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.
These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Multinomial logistic regression in workers' health
Grilo, Luís M.; Grilo, Helena L.; Gonçalves, Sónia P.; Junça, Ana
2017-11-01
In European countries, namely in Portugal, it is common to hear some people mentioning that they are exposed to excessive and continuous psychosocial stressors at work. This is increasing in diverse activity sectors, such as, the Services sector. A representative sample was collected from a Portuguese Services' organization, by applying a survey (internationally validated), which variables were measured in five ordered categories in Likert-type scale. A multinomial logistic regression model is used to estimate the probability of each category of the dependent variable general health perception where, among other independent variables, burnout appear as statistically significant.
Alaskan soil carbon stocks: spatial variability and dependence on environmental factors
Directory of Open Access Journals (Sweden)
U. Mishra
2012-09-01
Full Text Available The direction and magnitude of soil organic carbon (SOC changes in response to climate change depend on the spatial and vertical distributions of SOC. We estimated spatially resolved SOC stocks from surface to C horizon, distinguishing active-layer and permafrost-layer stocks, based on geospatial analysis of 472 soil profiles and spatially referenced environmental variables for Alaska. Total Alaska state-wide SOC stock was estimated to be 77 Pg, with 61% in the active-layer, 27% in permafrost, and 12% in non-permafrost soils. Prediction accuracy was highest for the active-layer as demonstrated by highest ratio of performance to deviation (1.5. Large spatial variability was predicted, with whole-profile, active-layer, and permafrost-layer stocks ranging from 1–296 kg C m^{−2}, 2–166 kg m^{−2}, and 0–232 kg m^{−2}, respectively. Temperature and soil wetness were found to be primary controllers of whole-profile, active-layer, and permafrost-layer SOC stocks. Secondary controllers, in order of importance, were found to be land cover type, topographic attributes, and bedrock geology. The observed importance of soil wetness rather than precipitation on SOC stocks implies that the poor representation of high-latitude soil wetness in Earth system models may lead to large uncertainty in predicted SOC stocks under future climate change scenarios. Under strict caveats described in the text and assuming temperature changes from the A1B Intergovernmental Panel on Climate Change emissions scenario, our geospatial model indicates that the equilibrium average 2100 Alaska active-layer depth could deepen by 11 cm, resulting in a thawing of 13 Pg C currently in permafrost. The equilibrium SOC loss associated with this warming would be highest under continuous permafrost (31%, followed by discontinuous (28%, isolated (24.3%, and sporadic (23.6% permafrost areas. Our high-resolution mapping of soil carbon stock reveals the
An improved multiple linear regression and data analysis computer program package
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
On Weighted Support Vector Regression
DEFF Research Database (Denmark)
Han, Xixuan; Clemmensen, Line Katrine Harder
2014-01-01
We propose a new type of weighted support vector regression (SVR), motivated by modeling local dependencies in time and space in prediction of house prices. The classic weights of the weighted SVR are added to the slack variables in the objective function (OF‐weights). This procedure directly...... shrinks the coefficient of each observation in the estimated functions; thus, it is widely used for minimizing influence of outliers. We propose to additionally add weights to the slack variables in the constraints (CF‐weights) and call the combination of weights the doubly weighted SVR. We illustrate...... the differences and similarities of the two types of weights by demonstrating the connection between the Least Absolute Shrinkage and Selection Operator (LASSO) and the SVR. We show that an SVR problem can be transformed to a LASSO problem plus a linear constraint and a box constraint. We demonstrate...
MIRU-VNTR allelic variability depends on Mycobacterium bovis clonal group identity.
Hauer, Amandine; Michelet, Lorraine; De Cruz, Krystel; Cochard, Thierry; Branger, Maxime; Karoui, Claudine; Henault, Sylvie; Biet, Franck; Boschiroli, María Laura
2016-11-01
The description of the population of M. bovis strains circulating in France from 1978 to 2013 has highlighted the discriminating power of the MLVA among predominant spoligotype groups. In the present study we aimed to characterize clonal groups via MLVA and to better understand the strain's population structure. MLVA was performed with eight MIRU-VNTR loci, most of them defined by the Venomyc European consortium. The discriminatory index of each MLVA loci was calculated for SB0120, SB0134, SB0121 and the "F4-family", the main spoligotype groups in France. Differences in global DI per spoligotype, but also by locus within each spoligotype, were observed, which strongly suggest the clonal complex nature of these major groups. These MLVA results were compared to those of other European countries where strain collections had been characterized (Spain, Portugal, Italy, Northern Ireland and Belgium). Overall, QUB 3232 and ETR D are respectively the most and the least discriminative loci, regardless of the strains geographical origin. However, marked DI differences are observed in the rest of the MIRU-VNTR loci, again highlighting that strain genetic variability in a country depends on the dominant existing clonal complexes. A web application for M. bovis, including spoligotyping and MIRU-VNTR typing data, was developed to allow inter-laboratory comparison of field isolates. In conclusion, combination of typing methods is required for M. bovis optimum discrimination and differentiation of groups of strains. Thus, the loci employed for MLVA in a country should be those which are the most discriminative for the clonal complexes which characterize their M. bovis population. Copyright Â© 2016 Elsevier B.V. All rights reserved.
Alexeeff, Stacey E; Schwartz, Joel; Kloog, Itai; Chudnovsky, Alexandra; Koutrakis, Petros; Coull, Brent A
2015-01-01
Many epidemiological studies use predicted air pollution exposures as surrogates for true air pollution levels. These predicted exposures contain exposure measurement error, yet simulation studies have typically found negligible bias in resulting health effect estimates. However, previous studies typically assumed a statistical spatial model for air pollution exposure, which may be oversimplified. We address this shortcoming by assuming a realistic, complex exposure surface derived from fine-scale (1 km × 1 km) remote-sensing satellite data. Using simulation, we evaluate the accuracy of epidemiological health effect estimates in linear and logistic regression when using spatial air pollution predictions from kriging and land use regression models. We examined chronic (long-term) and acute (short-term) exposure to air pollution. Results varied substantially across different scenarios. Exposure models with low out-of-sample R(2) yielded severe biases in the health effect estimates of some models, ranging from 60% upward bias to 70% downward bias. One land use regression exposure model with >0.9 out-of-sample R(2) yielded upward biases up to 13% for acute health effect estimates. Almost all models drastically underestimated the SEs. Land use regression models performed better in chronic effect simulations. These results can help researchers when interpreting health effect estimates in these types of studies.
Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun
2016-07-01
In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Ohlsson, Henrik; Merlo, Juan
2009-08-01
Therapeutic traditions at health care practices (HCPs) influence physicians' adherence to prescription guidelines for specific drugs, however, it is not known if such traditions affect all kinds of prescriptions or only specific types of drug. Our goal was to determine whether adherence to prescription guidelines is a common trait of HCPs or dependent on drug type. We fitted separate multi-level logistic regression models to all patients in the Skåne region who received a prescription for a statin drug (ATC: C10AA, n = 6232), an agent acting on the renin-angiotensin system (ATC: C09, n = 7222) or a proton pump inhibitor (ATC: A02BC, n = 11 563) at 198 HCPs from July 2006 to December 2006. There was a high clustering of adherence to prescription guidelines at HCPs for the different drug types (MOR(agents acting on the renin-angiotensin system) = 4.72 [95% CI: 3.90-5.92], MOR(Statins) = 2.71 [95% CI: 2.23-3.39] and MOR(Proton pump inhibitors) = 2.16 [95% CI: 1.95-2.45]). Compared with HCPs with low adherence to guidelines in two drug types, those HCPs with the highest level of adherence for these two drug types also showed a higher probability of adherence for the third drug type. Physicians' decisions to follow prescription guidelines seem to be influenced by therapeutic traditions at the HCP. Moreover, these therapeutic traditions seem to affect all kinds of prescriptions. This information can be used as basis for interventions to support rational and cost-effective medication use. Copyright 2009 John Wiley & Sons, Ltd.
International Nuclear Information System (INIS)
Cheah, C Y; Jaurigue, L C; Kaiser, A B; Gómez-Navarro, C
2013-01-01
We report an analysis of low-temperature measurements of the conductance of partially disordered reduced graphene oxide, finding that the data follow a simple crossover scenario. At room temperature, the conductance is dominated by two-dimensional (2D) electric field-assisted, thermally driven (Pollak–Riess) variable-range hopping (VRH) through highly disordered regions. However, at lower temperatures T, we find a smooth crossover to follow the exp(−E 0 /E) 1/3 field-driven (Shklovskii) 2D VRH conductance behaviour when the electric field E exceeds a specific crossover value E C (T) 2D =(E a E 0 1/3 /3) 3/4 determined by the scale factors E 0 and E a for the high-field and intermediate-field regimes respectively. Our crossover scenario also accounts well for experimental data reported by other authors for three-dimensional disordered carbon networks, suggesting wide applicability. (paper)
Atang, Christopher I.
The effects of black and white and color illustrations on student achievement were studied to investigate the relationships between cognitive styles and instructional design. Field dependence (FD) and field independence (FI) were chosen as the cognitive style variables. Subjects were 85 freshman students in the Iowa State University Psychology…
Time-Dependent Drug Administration in Hypertension and its Effect on Blood Pressure Variability
Directory of Open Access Journals (Sweden)
Magdás Annamária
2017-06-01
Full Text Available Background: Optimizing blood pressure variability seems to represent a new therapeutic target in the management of hypertension. It is emphasized that scheduling at least one antihypertensive agent at bedtime, has the ability to reduce blood pressure.
Time-Dependent Drug Administration in Hypertension and its Effect on Blood Pressure Variability
Magdás Annamária; Podoleanu Cristian; Tusa Anna-Boróka; Găburoi Adina; Incze Alexandru
2017-01-01
Background: Optimizing blood pressure variability seems to represent a new therapeutic target in the management of hypertension. It is emphasized that scheduling at least one antihypertensive agent at bedtime, has the ability to reduce blood pressure.
From Rasch scores to regression
DEFF Research Database (Denmark)
Christensen, Karl Bang
2006-01-01
Rasch models provide a framework for measurement and modelling latent variables. Having measured a latent variable in a population a comparison of groups will often be of interest. For this purpose the use of observed raw scores will often be inadequate because these lack interval scale propertie....... This paper compares two approaches to group comparison: linear regression models using estimated person locations as outcome variables and latent regression models based on the distribution of the score....
Xianyu, J.; Rasouli, S.; Timmermans, H.J.P.
The use of GPS devices and smartphones has made feasible the collection of multi-day activity-travel diaries. In turn, the availability of multi-day travel diary data opens up new avenues for analyzing dynamics of individual travel behavior. This paper addresses the issue of day-to-day variability
Palacios, C.; Abecia, J. A.
2015-05-01
A total number of 48,088 artificial inseminations (AIs) have been controlled during seven consecutive years in 79 dairy sheep Spanish farms (41° N). Mean, maximum and minimum ambient temperatures ( Ts), temperature amplitude (TA), mean relative humidity (RH), mean solar radiation (SR) and total rainfall of each insemination day and 15 days later were recorded. Temperature-humidity index (THI) and effective temperature (ET) have been calculated. A binary logistic regression model to estimate the risk of not getting pregnant compared to getting pregnant, through the odds ratio (OR), was performed. Successful winter inseminations were carried out under higher SR ( P 1 (maximum T, ET and rainfall on AI day, and ET and rainfall on day 15), and two variables presented OR AI day and maximum T on day 15). However, the effect of meteorological factors affected fertility in opposite ways, so T becomes a protective or risk factor on fertility depending on season. In conclusion, the percentage of pregnancy after AI in sheep is significantly affected by meteorological variables in a seasonal-dependent manner, so the parameters such as temperature reverse their effects in the hot or cold seasons. A forecast of the meteorological conditions could be a useful tool when AI dates are being scheduled.
Gross, Samuel M; Tibshirani, Robert
2015-04-01
We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with these type of data is "sparse multiple canonical correlation analysis" (sparse mCCA). All of the current sparse mCCA techniques are biconvex and thus have no guarantees about reaching a global optimum. We propose a method for performing sparse supervised canonical correlation analysis (sparse sCCA), a specific case of sparse mCCA when one of the datasets is a vector. Our proposal for sparse sCCA is convex and thus does not face the same difficulties as the other methods. We derive efficient algorithms for this problem that can be implemented with off the shelf solvers, and illustrate their use on simulated and real data. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Directory of Open Access Journals (Sweden)
Cengiz AKTAŞ
2005-06-01
Full Text Available In this study, we investigate the importance of tourism for Turkish ecenomy, and define the optimum variables which affect tourism revenues. In this type of econometric study that needs the multiple regression models, one of the problems in estimation of parameters is stationarity in time series. Therefore, usableness of the problem for long run relationship is analyzed. Finally autocorrelation, multicollinearity and heteroscedasticity are investigated.
The Variability of Atmospheric Deuterium Brightness at Mars: Evidence for Seasonal Dependence
Mayyasi, Majd; Clarke, John; Bhattacharyya, Dolon; Deighan, Justin; Jain, Sonal; Chaffin, Michael; Thiemann, Edward; Schneider, Nick; Jakosky, Bruce
2017-10-01
The enhanced ratio of deuterium to hydrogen on Mars has been widely interpreted as indicating the loss of a large column of water into space, and the hydrogen content of the upper atmosphere is now known to be highly variable. The variation in the properties of both deuterium and hydrogen in the upper atmosphere of Mars is indicative of the dynamical processes that produce these species and propagate them to altitudes where they can escape the planet. Understanding the seasonal variability of D is key to understanding the variability of the escape rate of water from Mars. Data from a 15 month observing campaign, made by the Mars Atmosphere and Volatile Evolution Imaging Ultraviolet Spectrograph high-resolution echelle channel, are used to determine the brightness of deuterium as observed at the limb of Mars. The D emission is highly variable, with a peak in brightness just after southern summer solstice. The trends of D brightness are examined against extrinsic as well as intrinsic sources. It is found that the fluctuations in deuterium brightness in the upper atmosphere of Mars (up to 400 km), corrected for periodic solar variations, vary on timescales that are similar to those of water vapor fluctuations lower in the atmosphere (20-80 km). The observed variability in deuterium may be attributed to seasonal factors such as regional dust storm activity and subsequent circulation lower in the atmosphere.
bayesQR: A Bayesian Approach to Quantile Regression
Directory of Open Access Journals (Sweden)
Dries F. Benoit
2017-01-01
Full Text Available After its introduction by Koenker and Basset (1978, quantile regression has become an important and popular tool to investigate the conditional response distribution in regression. The R package bayesQR contains a number of routines to estimate quantile regression parameters using a Bayesian approach based on the asymmetric Laplace distribution. The package contains functions for the typical quantile regression with continuous dependent variable, but also supports quantile regression for binary dependent variables. For both types of dependent variables, an approach to variable selection using the adaptive lasso approach is provided. For the binary quantile regression model, the package also contains a routine that calculates the fitted probabilities for each vector of predictors. In addition, functions for summarizing the results, creating traceplots, posterior histograms and drawing quantile plots are included. This paper starts with a brief overview of the theoretical background of the models used in the bayesQR package. The main part of this paper discusses the computational problems that arise in the implementation of the procedure and illustrates the usefulness of the package through selected examples.
Ghavami, Raoof; Najafi, Amir; Sajadi, Mohammad; Djannaty, Farhad
2008-09-01
In order to accurately simulate (13)C NMR spectra of hydroxy, polyhydroxy and methoxy substituted flavonoid a quantitative structure-property relationship (QSPR) model, relating atom-based calculated descriptors to (13)C NMR chemical shifts (ppm, TMS=0), is developed. A dataset consisting of 50 flavonoid derivatives was employed for the present analysis. A set of 417 topological, geometrical, and electronic descriptors representing various structural characteristics was calculated and separate multilinear QSPR models were developed between each carbon atom of flavonoid and the calculated descriptors. Genetic algorithm (GA) and multiple linear regression analysis (MLRA) were used to select the descriptors and to generate the correlation models. Analysis of the results revealed a correlation coefficient and root mean square error (RMSE) of 0.994 and 2.53ppm, respectively, for the prediction set.
Liao, Jiaqiang; Yu, Shicheng; Yang, Fang; Yang, Min; Hu, Yuehua; Zhang, Juying
2016-01-01
Hand, Foot, and Mouth Disease (HFMD) is a worldwide infectious disease. In China, many provinces have reported HFMD cases, especially the south and southwest provinces. Many studies have found a strong association between the incidence of HFMD and climatic factors such as temperature, rainfall, and relative humidity. However, few studies have analyzed cluster effects between various geographical units. The nonlinear relationships and lag effects between weekly HFMD cases and climatic variables were estimated for the period of 2008-2013 using a polynomial distributed lag model. The extra-Poisson multilevel spatial polynomial model was used to model the exact relationship between weekly HFMD incidence and climatic variables after considering cluster effects, provincial correlated structure of HFMD incidence and overdispersion. The smoothing spline methods were used to detect threshold effects between climatic factors and HFMD incidence. The HFMD incidence spatial heterogeneity distributed among provinces, and the scale measurement of overdispersion was 548.077. After controlling for long-term trends, spatial heterogeneity and overdispersion, temperature was highly associated with HFMD incidence. Weekly average temperature and weekly temperature difference approximate inverse "V" shape and "V" shape relationships associated with HFMD incidence. The lag effects for weekly average temperature and weekly temperature difference were 3 weeks and 2 weeks. High spatial correlated HFMD incidence were detected in northern, central and southern province. Temperature can be used to explain most of variation of HFMD incidence in southern and northeastern provinces. After adjustment for temperature, eastern and Northern provinces still had high variation HFMD incidence. We found a relatively strong association between weekly HFMD incidence and weekly average temperature. The association between the HFMD incidence and climatic variables spatial heterogeneity distributed across
Yang, Fang; Yang, Min; Hu, Yuehua; Zhang, Juying
2016-01-01
Background Hand, Foot, and Mouth Disease (HFMD) is a worldwide infectious disease. In China, many provinces have reported HFMD cases, especially the south and southwest provinces. Many studies have found a strong association between the incidence of HFMD and climatic factors such as temperature, rainfall, and relative humidity. However, few studies have analyzed cluster effects between various geographical units. Methods The nonlinear relationships and lag effects between weekly HFMD cases and climatic variables were estimated for the period of 2008–2013 using a polynomial distributed lag model. The extra-Poisson multilevel spatial polynomial model was used to model the exact relationship between weekly HFMD incidence and climatic variables after considering cluster effects, provincial correlated structure of HFMD incidence and overdispersion. The smoothing spline methods were used to detect threshold effects between climatic factors and HFMD incidence. Results The HFMD incidence spatial heterogeneity distributed among provinces, and the scale measurement of overdispersion was 548.077. After controlling for long-term trends, spatial heterogeneity and overdispersion, temperature was highly associated with HFMD incidence. Weekly average temperature and weekly temperature difference approximate inverse “V” shape and “V” shape relationships associated with HFMD incidence. The lag effects for weekly average temperature and weekly temperature difference were 3 weeks and 2 weeks. High spatial correlated HFMD incidence were detected in northern, central and southern province. Temperature can be used to explain most of variation of HFMD incidence in southern and northeastern provinces. After adjustment for temperature, eastern and Northern provinces still had high variation HFMD incidence. Conclusion We found a relatively strong association between weekly HFMD incidence and weekly average temperature. The association between the HFMD incidence and climatic
On history dependence of stress-strain diagrams and creep curves under variable repeated loading
International Nuclear Information System (INIS)
Gokhfeld, D.A.; Sadakov, O.S.; Martynenko, M.E.
1979-01-01
The ability of structural alloys to 'keep in memory' the loading prehistory becomes of special importance when inelastic variable repeated loading is considered. There are two main approaches to the development of the mathematical description of this phenomenon: the inclusion of hidden state variables in the incremental theory constitutive equations (a) and construction of proper hereditary functionals (b). In this respect the assumption that the 'memory' regarding the previous deformation history is due to structural nonhomogeneity of actual materials proves to be fruitful. (orig.)
Assessment of deforestation using regression; Hodnotenie odlesnenia s vyuzitim regresie
Energy Technology Data Exchange (ETDEWEB)
Juristova, J. [Univerzita Komenskeho, Prirodovedecka fakulta, Katedra kartografie, geoinformatiky a DPZ, 84215 Bratislava (Slovakia)
2013-04-16
This work is devoted to the evaluation of deforestation using regression methods through software Idrisi Taiga. Deforestation is evaluated by the method of logistic regression. The dependent variable has discrete values '0' and '1', indicating that the deforestation occurred or not. Independent variables have continuous values, expressing the distance from the edge of the deforested areas of forests from urban areas, the river and the road network. The results were also used in predicting the probability of deforestation in subsequent periods. The result is a map showing the output probability of deforestation for the periods 1990/2000 and 200/2006 in accordance with predetermined coefficients (values of independent variables). (authors)
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Distance and Azimuthal Dependence of Ground‐Motion Variability for Unilateral Strike‐Slip Ruptures
Vyas, Jagdish Chandra; Mai, Paul Martin; Galis, Martin
2016-01-01
We investigate near‐field ground‐motion variability by computing the seismic wavefield for five kinematic unilateral‐rupture models of the 1992 Mw 7.3 Landers earthquake, eight simplified unilateral‐rupture models based on the Landers event, and a
Rand, Miya K; Shimansky, Y P; Hossain, Abul B M I; Stelmach, George E
2010-11-01
Based on an assumption of movement control optimality in reach-to-grasp movements, we have recently developed a mathematical model of transport-aperture coordination (TAC), according to which the hand-target distance is a function of hand velocity and acceleration, aperture magnitude, and aperture velocity and acceleration (Rand et al. in Exp Brain Res 188:263-274, 2008). Reach-to-grasp movements were performed by young adults under four different reaching speeds and two different transport distances. The residual error magnitude of fitting the above model to data across different trials and subjects was minimal for the aperture-closure phase, but relatively much greater for the aperture-opening phase, indicating considerable difference in TAC variability between those phases. This study's goal is to identify the main reasons for that difference and obtain insights into the control strategy of reach-to-grasp movements. TAC variability within the aperture-opening phase of a single trial was found minimal, indicating that TAC variability between trials was not due to execution noise, but rather a result of inter-trial and inter-subject variability of motor plan. At the same time, the dependence of the extent of trial-to-trial variability of TAC in that phase on the speed of hand transport was sharply inconsistent with the concept of speed-accuracy trade-off: the lower the speed, the larger the variability. Conversely, the dependence of the extent of TAC variability in the aperture-closure phase on hand transport speed was consistent with that concept. Taking into account recent evidence that the cost of neural information processing is substantial for movement planning, the dependence of TAC variability in the aperture-opening phase on task performance conditions suggests that it is not the movement time that the CNS saves in that phase, but the cost of neuro-computational resources and metabolic energy required for TAC regulation in that phase. Thus, the CNS
Directory of Open Access Journals (Sweden)
Feng Zhong-xiang
2014-01-01
Full Text Available In order to build a combined model which can meet the variation rule of death toll data for road traffic accidents and can reflect the influence of multiple factors on traffic accidents and improve prediction accuracy for accidents, the Verhulst model was built based on the number of death tolls for road traffic accidents in China from 2002 to 2011; and car ownership, population, GDP, highway freight volume, highway passenger transportation volume, and highway mileage were chosen as the factors to build the death toll multivariate linear regression model. Then the two models were combined to be a combined prediction model which has weight coefficient. Shapley value method was applied to calculate the weight coefficient by assessing contributions. Finally, the combined model was used to recalculate the number of death tolls from 2002 to 2011, and the combined model was compared with the Verhulst and multivariate linear regression models. The results showed that the new model could not only characterize the death toll data characteristics but also quantify the degree of influence to the death toll by each influencing factor and had high accuracy as well as strong practicability.
Feng, Zhong-xiang; Lu, Shi-sheng; Zhang, Wei-hua; Zhang, Nan-nan
2014-01-01
In order to build a combined model which can meet the variation rule of death toll data for road traffic accidents and can reflect the influence of multiple factors on traffic accidents and improve prediction accuracy for accidents, the Verhulst model was built based on the number of death tolls for road traffic accidents in China from 2002 to 2011; and car ownership, population, GDP, highway freight volume, highway passenger transportation volume, and highway mileage were chosen as the factors to build the death toll multivariate linear regression model. Then the two models were combined to be a combined prediction model which has weight coefficient. Shapley value method was applied to calculate the weight coefficient by assessing contributions. Finally, the combined model was used to recalculate the number of death tolls from 2002 to 2011, and the combined model was compared with the Verhulst and multivariate linear regression models. The results showed that the new model could not only characterize the death toll data characteristics but also quantify the degree of influence to the death toll by each influencing factor and had high accuracy as well as strong practicability.
Tripepi, Giovanni; Jager, Kitty J.; Stel, Vianda S.; Dekker, Friedo W.; Zoccali, Carmine
2011-01-01
Because of some limitations of stratification methods, epidemiologists frequently use multiple linear and logistic regression analyses to address specific epidemiological questions. If the dependent variable is a continuous one (for example, systolic pressure and serum creatinine), the researcher
Fouad, Marwa A; Tolba, Enas H; El-Shal, Manal A; El Kerdawy, Ahmed M
2018-05-11
The justified continuous emerging of new β-lactam antibiotics provokes the need for developing suitable analytical methods that accelerate and facilitate their analysis. A face central composite experimental design was adopted using different levels of phosphate buffer pH, acetonitrile percentage at zero time and after 15 min in a gradient program to obtain the optimum chromatographic conditions for the elution of 31 β-lactam antibiotics. Retention factors were used as the target property to build two QSRR models utilizing the conventional forward selection and the advanced nature-inspired firefly algorithm for descriptor selection, coupled with multiple linear regression. The obtained models showed high performance in both internal and external validation indicating their robustness and predictive ability. Williams-Hotelling test and student's t-test showed that there is no statistical significant difference between the models' results. Y-randomization validation showed that the obtained models are due to significant correlation between the selected molecular descriptors and the analytes' chromatographic retention. These results indicate that the generated FS-MLR and FFA-MLR models are showing comparable quality on both the training and validation levels. They also gave comparable information about the molecular features that influence the retention behavior of β-lactams under the current chromatographic conditions. We can conclude that in some cases simple conventional feature selection algorithm can be used to generate robust and predictive models comparable to that are generated using advanced ones. Copyright © 2018 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Olga Yu. Aleshkina
2017-05-01
Results and Conclusion ― The highest altitude was marked at levels of incisors and 3rd molar, the smallest one – at level of 1st and 2nd molars; maximum mandible thickness was defined at level of 2nd molar, minimum – at levels of canine and 1st – 2nd premolars on both sides of mandible; average thickness was revealed at levels of incisors, 1st and 2nd molars and had the same statistical values. Bilateral variability of thickness was significantly dominating on the right side and only at levels of 1st – 2nd premolars and 1st molar. Average values of altitude and thickness from both sides of mandible and at all levels had medium degree of variability.
BOX-COX REGRESSION METHOD IN TIME SCALING
Directory of Open Access Journals (Sweden)
ATİLLA GÖKTAŞ
2013-06-01
Full Text Available Box-Cox regression method with λj, for j = 1, 2, ..., k, power transformation can be used when dependent variable and error term of the linear regression model do not satisfy the continuity and normality assumptions. The situation obtaining the smallest mean square error when optimum power λj, transformation for j = 1, 2, ..., k, of Y has been discussed. Box-Cox regression method is especially appropriate to adjust existence skewness or heteroscedasticity of error terms for a nonlinear functional relationship between dependent and explanatory variables. In this study, the advantage and disadvantage use of Box-Cox regression method have been discussed in differentiation and differantial analysis of time scale concept.
Scale-dependent spatial variability in peatland lead pollution in the southern Pennines, UK
International Nuclear Information System (INIS)
Rothwell, James J.; Evans, Martin G.; Lindsay, John B.; Allott, Timothy E.H.
2007-01-01
Increasingly, within-site and regional comparisons of peatland lead pollution have been undertaken using the inventory approach. The peatlands of the Peak District, southern Pennines, UK, have received significant atmospheric inputs of lead over the last few hundred years. A multi-core study at three peatland sites in the Peak District demonstrates significant within-site spatial variability in industrial lead pollution. Stochastic simulations reveal that 15 peat cores are required to calculate reliable lead inventories at the within-site and within-region scale for this highly polluted area of the southern Pennines. Within-site variability in lead pollution is dominant at the within-region scale. The study demonstrates that significant errors may be associated with peatland lead inventories at sites where only a single peat core has been used to calculate an inventory. Meaningful comparisons of lead inventories at the regional or global scale can only be made if the within-site variability of lead pollution has been quantified reliably. - Multiple peat cores are required for accurate peatland Pb inventories
Scale-dependent spatial variability in peatland lead pollution in the southern Pennines, UK.
Rothwell, James J; Evans, Martin G; Lindsay, John B; Allott, Timothy E H
2007-01-01
Increasingly, within-site and regional comparisons of peatland lead pollution have been undertaken using the inventory approach. The peatlands of the Peak District, southern Pennines, UK, have received significant atmospheric inputs of lead over the last few hundred years. A multi-core study at three peatland sites in the Peak District demonstrates significant within-site spatial variability in industrial lead pollution. Stochastic simulations reveal that 15 peat cores are required to calculate reliable lead inventories at the within-site and within-region scale for this highly polluted area of the southern Pennines. Within-site variability in lead pollution is dominant at the within-region scale. The study demonstrates that significant errors may be associated with peatland lead inventories at sites where only a single peat core has been used to calculate an inventory. Meaningful comparisons of lead inventories at the regional or global scale can only be made if the within-site variability of lead pollution has been quantified reliably.
McClelland, Gary H; Irwin, Julie R; Disatnik, David; Sivan, Liron
2017-02-01
Multicollinearity is irrelevant to the search for moderator variables, contrary to the implications of Iacobucci, Schneider, Popovich, and Bakamitsos (Behavior Research Methods, 2016, this issue). Multicollinearity is like the red herring in a mystery novel that distracts the statistical detective from the pursuit of a true moderator relationship. We show multicollinearity is completely irrelevant for tests of moderator variables. Furthermore, readers of Iacobucci et al. might be confused by a number of their errors. We note those errors, but more positively, we describe a variety of methods researchers might use to test and interpret their moderated multiple regression models, including two-stage testing, mean-centering, spotlighting, orthogonalizing, and floodlighting without regard to putative issues of multicollinearity. We cite a number of recent studies in the psychological literature in which the researchers used these methods appropriately to test, to interpret, and to report their moderated multiple regression models. We conclude with a set of recommendations for the analysis and reporting of moderated multiple regression that should help researchers better understand their models and facilitate generalizations across studies.
Identifying Midshipmen for Academic Assistance Using Entry Variables
National Research Council Canada - National Science Library
Watson, Arthur
2001-01-01
.... Categorical values from the Learning and Study Strategies Inventory (LASSI), SAT scores and high school rank were incorporated as independent variables in a linear regression model with dependent variable Cumulative Quality Point Rating (CQPR...
Scarduelli, Lucia; Giacchini, Roberto; Parenti, Paolo; Migliorati, Sonia; Di Brisco, Agnese Maria; Vighi, Marco
2017-11-01
Biomarkers are widely used in ecotoxicology as indicators of exposure to toxicants. However, their ability to provide ecologically relevant information remains controversial. One of the major problems is understanding whether the measured responses are determined by stress factors or lie within the natural variability range. In a previous work, the natural variability of enzymatic levels in invertebrates sampled in pristine rivers was proven to be relevant across both space and time. In the present study, the experimental design was improved by considering different life stages of the selected taxa and by measuring more environmental parameters. The experimental design considered sampling sites in 2 different rivers, 8 sampling dates covering the whole seasonal cycle, 4 species from 3 different taxonomic groups (Plecoptera, Perla grandis; Ephemeroptera, Baetis alpinus and Epeorus alpicula; Tricoptera, Hydropsyche pellucidula), different life stages for each species, and 4 enzymes (acetylcholinesterase, glutathione S-transferase, alkaline phosphatase, and catalase). Biomarker levels were related to environmental (physicochemical) parameters to verify any kind of dependence. Data were statistically elaborated using hierarchical multilevel Bayesian models. Natural variability was found to be relevant across both space and time. The results of the present study proved that care should be paid when interpreting biomarker results. Further research is needed to better understand the dependence of the natural variability on environmental parameters. Environ Toxicol Chem 2017;36:3158-3167. © 2017 SETAC. © 2017 SETAC.
Guo, A.; Wang, Y.
2017-12-01
Investigating variability in dependence structures of hydrological processes is of critical importance for developing an understanding of mechanisms of hydrological cycles in changing environments. In focusing on this topic, present work involves the following: (1) identifying and eliminating serial correlation and conditional heteroscedasticity in monthly streamflow (Q), precipitation (P) and potential evapotranspiration (PE) series using the ARMA-GARCH model (ARMA: autoregressive moving average; GARCH: generalized autoregressive conditional heteroscedasticity); (2) describing dependence structures of hydrological processes using partial copula coupled with the ARMA-GARCH model and identifying their variability via copula-based likelihood-ratio test method; and (3) determining conditional probability of annual Q under different climate scenarios on account of above results. This framework enables us to depict hydrological variables in the presence of conditional heteroscedasticity and to examine dependence structures of hydrological processes while excluding the influence of covariates by using partial copula-based ARMA-GARCH model. Eight major catchments across the Loess Plateau (LP) are used as study regions. Results indicate that (1) The occurrence of change points in dependence structures of Q and P (PE) varies across the LP. Change points of P-PE dependence structures in all regions almost fully correspond to the initiation of global warming, i.e., the early 1980s. (3) Conditional probabilities of annual Q under various P and PE scenarios are estimated from the 3-dimensional joint distribution of (Q, P and PE) based on the above change points. These findings shed light on mechanisms of the hydrological cycle and can guide water supply planning and management, particularly in changing environments.
A Predictive Logistic Regression Model of World Conflict Using Open Source Data
2015-03-26
No correlation between the error terms and the independent variables 9. Absence of perfect multicollinearity (Menard, 2001) When assumptions are...some of the variables before initial model building. Multicollinearity , or near-linear dependence among the variables will cause problems in the...model. High multicollinearity tends to produce unreasonably high logistic regression coefficients and can result in coefficients that are not
Energy Technology Data Exchange (ETDEWEB)
Kumar, V; Mukherjee, S [Cornell Univ., Ithaca, N.Y. (USA)
1977-03-01
A computational technique in terms of stress, strain and displacement rates is presented for the solution of boundary value problems for metallic structural elements at uniform elevated temperatures subjected to time varying loads. This method can accommodate any number of constitutive relations with state variables recently proposed by other researchers to model the inelastic deformation of metallic media at elevated temperatures. Numerical solutions are obtained for several structural elements subjected to steady loads. The constitutive relations used for these numerical solutions are due to Hart. The solutions are discussed in the context of the computational scheme and Hart's theory.
THE DIFFERENCES IN MORAL, GROUP IDENTITY AND THE PERCON’S VARIABILITY DEPENDING ON THE EDUCATION
Directory of Open Access Journals (Sweden)
Irina Aleksandrobna Kolinichenko
2017-06-01
Results. The results of the study have revealed the dominance of all specified assessment parameters in the group of test subjects with incomplete higher education: higher level of moral development in all dilemmas (the opposition of life values (compassion and following the law, self-interest – the interests of the city (law, business (benefit and law, personal interests (career and the freedom of another person, except for the dilemma of the opposition between the interests of a majority and a single person. The differences have also been revealed between the two groups of test subjects according to the group identity, group variability, the desirability of the common categories of identity.
Crown, William H
2014-02-01
This paper examines the use of propensity score matching in economic analyses of observational data. Several excellent papers have previously reviewed practical aspects of propensity score estimation and other aspects of the propensity score literature. The purpose of this paper is to compare the conceptual foundation of propensity score models with alternative estimators of treatment effects. References are provided to empirical comparisons among methods that have appeared in the literature. These comparisons are available for a subset of the methods considered in this paper. However, in some cases, no pairwise comparisons of particular methods are yet available, and there are no examples of comparisons across all of the methods surveyed here. Irrespective of the availability of empirical comparisons, the goal of this paper is to provide some intuition about the relative merits of alternative estimators in health economic evaluations where nonlinearity, sample size, availability of pre/post data, heterogeneity, and missing variables can have important implications for choice of methodology. Also considered is the potential combination of propensity score matching with alternative methods such as differences-in-differences and decomposition methods that have not yet appeared in the empirical literature.
Dependence of conductivity on thickness within the variable-range hopping regime for Coulomb glasses
Directory of Open Access Journals (Sweden)
M. Caravaca
Full Text Available In this paper, we provide some computational evidence concerning the dependence of conductivity on the system thickness for Coulomb glasses. We also verify the Efros–Shklovskii law and deal with the calculation of its characteristic parameter as a function of the thickness. Our results strengthen the link between theoretical and experimental fields. Keywords: Coulomb glass, Conductivity, Density of states, Efros–Shklovskii law
Temperature-dependent behaviours are genetically variable in the nematode Caenorhabditis briggsae.
Stegeman, Gregory W; de Mesquita, Matthew Bueno; Ryu, William S; Cutter, Asher D
2013-03-01
Temperature-dependent behaviours in Caenorhabditis elegans, such as thermotaxis and isothermal tracking, are complex behavioural responses that integrate sensation, foraging and learning, and have driven investigations to discover many essential genetic and neural pathways. The ease of manipulation of the Caenorhabditis model system also has encouraged its application to comparative analyses of phenotypic evolution, particularly contrasts of the classic model C. elegans with C. briggsae. And yet few studies have investigated natural genetic variation in behaviour in any nematode. Here we measure thermotaxis and isothermal tracking behaviour in genetically distinct strains of C. briggsae, further motivated by the latitudinal differentiation in C. briggsae that is associated with temperature-dependent fitness differences in this species. We demonstrate that C. briggsae performs thermotaxis and isothermal tracking largely similar to that of C. elegans, with a tendency to prefer its rearing temperature. Comparisons of these behaviours among strains reveal substantial heritable natural variation within each species that corresponds to three general patterns of behavioural response. However, intraspecific genetic differences in thermal behaviour often exceed interspecific differences. These patterns of temperature-dependent behaviour motivate further development of C. briggsae as a model system for dissecting the genetic underpinnings of complex behavioural traits.
Lu, Zeqin; Jhoja, Jaspreet; Klein, Jackson; Wang, Xu; Liu, Amy; Flueckiger, Jonas; Pond, James; Chrostowski, Lukas
2017-05-01
This work develops an enhanced Monte Carlo (MC) simulation methodology to predict the impacts of layout-dependent correlated manufacturing variations on the performance of photonics integrated circuits (PICs). First, to enable such performance prediction, we demonstrate a simple method with sub-nanometer accuracy to characterize photonics manufacturing variations, where the width and height for a fabricated waveguide can be extracted from the spectral response of a racetrack resonator. By measuring the spectral responses for a large number of identical resonators spread over a wafer, statistical results for the variations of waveguide width and height can be obtained. Second, we develop models for the layout-dependent enhanced MC simulation. Our models use netlist extraction to transfer physical layouts into circuit simulators. Spatially correlated physical variations across the PICs are simulated on a discrete grid and are mapped to each circuit component, so that the performance for each component can be updated according to its obtained variations, and therefore, circuit simulations take the correlated variations between components into account. The simulation flow and theoretical models for our layout-dependent enhanced MC simulation are detailed in this paper. As examples, several ring-resonator filter circuits are studied using the developed enhanced MC simulation, and statistical results from the simulations can predict both common-mode and differential-mode variations of the circuit performance.
Logistic Regression: Concept and Application
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Nonparametric instrumental regression with non-convex constraints
International Nuclear Information System (INIS)
Grasmair, M; Scherzer, O; Vanhems, A
2013-01-01
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition. (paper)
Nonparametric instrumental regression with non-convex constraints
Grasmair, M.; Scherzer, O.; Vanhems, A.
2013-03-01
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Saddlepoint expansions for sums of Markov dependent variables on a continuous state space
DEFF Research Database (Denmark)
Jensen, J.L.
1991-01-01
Based on the conjugate kernel studied in Iscoe et al. (1985) we derive saddlepoint expansions for either the density or distribution function of a sum f(X1)+...+f(Xn), where the Xi's constitute a Markov chain. The chain is assumed to satisfy a strong recurrence condition which makes the results...... here very similar to the classical results for i.i.d. variables. In particular we establish also conditions under which the expansions hold uniformly over the range of the saddlepoint. Expansions are also derived for sums of the form f(X1, X0)+f(X2, X1)+...+f(Xn, Xn-1) although the uniformity result...
Linear regression in astronomy. I
Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh
1990-01-01
Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
Directory of Open Access Journals (Sweden)
Mitali Sarkar
2017-01-01
Full Text Available Recently, a major trend is going to redesign a production system by controlling or making variable the production rate within some fixed interval to maintain the optimal level. This strategy is more effective when the holding cost is time-dependent as it is interrelated with holding duration of products and rate of production. An effort is made to make a supply chain model (SCM to show the joint effect of variable production rate and time-varying holding cost for specific type of complementary products, where those products are made by two different manufacturers and a common retailer makes them bundle and sells bundles to end customers. Demand of each product is specified by stochastic reservation prices with a known potential market size. Those players of the SCM are considered with unequal power. Stackelberg game approach is employed to obtain global optimum solution of the model. An illustrative numerical example, graphical representation, and managerial insights are given to illustrate the model. Results prove that variable production rate and time-dependent holding cost save more than existing literature.
Logic regression and its extensions.
Schwender, Holger; Ruczinski, Ingo
2010-01-01
Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.
ENERGY-DEPENDENT POWER SPECTRAL STATES AND ORIGIN OF APERIODIC VARIABILITY IN BLACK HOLE BINARIES
International Nuclear Information System (INIS)
Yu Wenfei; Zhang Wenda
2013-01-01
We found that the black hole candidate MAXI J1659–152 showed distinct power spectra, i.e., power-law noise (PLN) versus band-limited noise (BLN) plus quasi-periodic oscillations (QPOs) below and above about 2 keV, respectively, in observations with Swift and the Rossi X-ray Timing Explorer during the 2010 outburst, indicating a high energy cutoff of the PLN and a low energy cutoff of the BLN and QPOs around 2 keV. The emergence of the PLN and the fading of the BLN and QPOs initially took place below 2 keV when the source entered the hard intermediate state and settled in the soft state three weeks later. The evolution was accompanied by the emergence of the disk spectral component and decreases in the amplitudes of variability in the soft and hard X-ray bands. Our results indicate that the PLN is associated with an optically thick disk in both hard and intermediate states, and the power spectral state is independent of the X-ray energy spectral state in a broadband view. We suggest that in the hard or intermediate state, the BLN and QPOs emerge from the innermost hot flow subjected to Comptonization, while the PLN originates from the optically thick disk farther out. The energy cutoffs of the PLN and the BLN or QPOs then follow the temperature of the seed photons from the inner edge of the optically thick disk, while the high frequency cutoff of the PLN follows the orbital frequency of the inner edge of the optically thick disk as well.
Better Autologistic Regression
Directory of Open Access Journals (Sweden)
Mark A. Wolters
2017-11-01
Full Text Available Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent. Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding—the two numbers used to represent the two possible states of the variables—might differ. Common coding choices are (zero, one and (minus one, plus one. Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modeling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.
Dark focus of accommodation as dependent and independent variables in visual display technology
Jones, Sherrie; Kennedy, Robert; Harm, Deborah
1992-01-01
When independent stimuli are available for accommodation, as in the dark or under low contrast conditions, the lens seeks its resting position. Individual differences in resting positions are reliable, under autonomic control, and can change with visual task demands. We hypothesized that motion sickness in a flight simulator might result in dark focus changes. Method: Subjects received training flights in three different Navy flight simulators. Two were helicopter simulators entailed CRT presentation using infinity optics, one involved a dome presentation of a computer graphic visual projection system. Results: In all three experiments there were significant differences between dark focus activity before and after simulator exposure when comparisons were made between sick and not-sick pilot subjects. In two of these experiments, the average shift in dark focus for the sick subjects was toward increased myopia when each subject was compared to his own baseline. In the third experiment, the group showed an average shift outward of small amount and the subjects who were sick showed significantly less outward movement than those who were symptom free. Conclusions: Although the relationship is not a simple one, dark focus changes in simulator sickness imply parasympathetic activity. Because changes can occur in relation to endogenous and exogenous events, such measurement may have useful applications as dependent measures in studies of visually coupled systems, virtual reality systems, and space adaptation syndrome.
Alnaggar, Mohammed; Di Luzio, Giovanni; Cusatis, Gianluca
2017-04-28
Alkali Silica Reaction (ASR) is known to be a serious problem for concrete worldwide, especially in high humidity and high temperature regions. ASR is a slow process that develops over years to decades and it is influenced by changes in environmental and loading conditions of the structure. The problem becomes even more complicated if one recognizes that other phenomena like creep and shrinkage are coupled with ASR. This results in synergistic mechanisms that can not be easily understood without a comprehensive computational model. In this paper, coupling between creep, shrinkage and ASR is modeled within the Lattice Discrete Particle Model (LDPM) framework. In order to achieve this, a multi-physics formulation is used to compute the evolution of temperature, humidity, cement hydration, and ASR in both space and time, which is then used within physics-based formulations of cracking, creep and shrinkage. The overall model is calibrated and validated on the basis of experimental data available in the literature. Results show that even during free expansions (zero macroscopic stress), a significant degree of coupling exists because ASR induced expansions are relaxed by meso-scale creep driven by self-equilibriated stresses at the meso-scale. This explains and highlights the importance of considering ASR and other time dependent aging and deterioration phenomena at an appropriate length scale in coupled modeling approaches.
Single genome retrieval of context-dependent variability in mutation rates for human germline.
Sahakyan, Aleksandr B; Balasubramanian, Shankar
2017-01-13
Accurate knowledge of the core components of substitution rates is of vital importance to understand genome evolution and dynamics. By performing a single-genome and direct analysis of 39,894 retrotransposon remnants, we reveal sequence context-dependent germline nucleotide substitution rates for the human genome. The rates are characterised through rate constants in a time-domain, and are made available through a dedicated program (Trek) and a stand-alone database. Due to the nature of the method design and the imposed stringency criteria, we expect our rate constants to be good estimates for the rates of spontaneous mutations. Benefiting from such data, we study the short-range nucleotide (up to 7-mer) organisation and the germline basal substitution propensity (BSP) profile of the human genome; characterise novel, CpG-independent, substitution prone and resistant motifs; confirm a decreased tendency of moieties with low BSP to undergo somatic mutations in a number of cancer types; and, produce a Trek-based estimate of the overall mutation rate in human. The extended set of rate constants we report may enrich our resources and help advance our understanding of genome dynamics and evolution, with possible implications for the role of spontaneous mutations in the emergence of pathological genotypes and neutral evolution of proteomes.
Medyńska-Gulij, Beata; Cybulski, Paweł
2016-06-01
This paper analyses the use of table visual variables of statistical data of hospital beds as an important tool for revealing spatio-temporal dependencies. It is argued that some of conclusions from the data about public health and public expenditure on health have a spatio-temporal reference. Different from previous studies, this article adopts combination of cartographic pragmatics and spatial visualization with previous conclusions made in public health literature. While the significant conclusions about health care and economic factors has been highlighted in research papers, this article is the first to apply visual analysis to statistical table together with maps which is called previsualisation.
Directory of Open Access Journals (Sweden)
Medyńska-Gulij Beata
2016-06-01
Full Text Available This paper analyses the use of table visual variables of statistical data of hospital beds as an important tool for revealing spatio-temporal dependencies. It is argued that some of conclusions from the data about public health and public expenditure on health have a spatio-temporal reference. Different from previous studies, this article adopts combination of cartographic pragmatics and spatial visualization with previous conclusions made in public health literature. While the significant conclusions about health care and economic factors has been highlighted in research papers, this article is the first to apply visual analysis to statistical table together with maps which is called previsualisation.
Biostatistics Series Module 6: Correlation and Linear Regression.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Flanagan, S.; Hurtt, G. C.; Fisk, J. P.; Rourke, O.
2012-12-01
A robust understanding of the sensitivity of the pattern, structure, and dynamics of ecosystems to climate, climate variability, and climate change is needed to predict ecosystem responses to current and projected climate change. We present results of a study designed to first quantify the sensitivity of ecosystems to climate through the use of climate and ecosystem data, and then use the results to test the sensitivity of the climate data in a state-of the art ecosystem model. A database of available ecosystem characteristics such as mean canopy height, above ground biomass, and basal area was constructed from sources like the National Biomass and Carbon Dataset (NBCD). The ecosystem characteristics were then paired by latitude and longitude with the corresponding climate characteristics temperature, precipitation, photosynthetically active radiation (PAR) and dew point that were retrieved from the North American Regional Reanalysis (NARR). The average yearly and seasonal means of the climate data, and their associated maximum and minimum values, over the 1979-2010 time frame provided by NARR were constructed and paired with the ecosystem data. The compiled results provide natural patterns of vegetation structure and distribution with regard to climate data. An advanced ecosystem model, the Ecosystem Demography model (ED), was then modified to allow yearly alterations to its mechanistic climate lookup table and used to predict the sensitivities of ecosystem pattern, structure, and dynamics to climate data. The combined ecosystem structure and climate data results were compared to ED's output to check the validity of the model. After verification, climate change scenarios such as those used in the last IPCC were run and future forest structure changes due to climate sensitivities were identified. The results of this study can be used to both quantify and test key relationships for next generation models. The sensitivity of ecosystem characteristics to climate data
Extracellular vesicles have variable dose-dependent effects on cultured draining cells in the eye.
Tabak, Saray; Schreiber-Avissar, Sofia; Beit-Yannai, Elie
2018-03-01
The role of extracellular vesicles (EVs) as signal mediators has been described in many biological fields. How many EVs are needed to deliver the desired physiological signal is yet unclear. Using a normal trabecular meshwork (NTM) cell culture exposed to non-pigmented ciliary epithelium (NPCE)-derived EVs, a relevant model for studying the human ocular drainage system, we addressed the EVs dose-response effects on the Wnt signaling. The objective of the study was to investigate the dosing effects of NPCE-derived EVs on TM Wnt signaling. EVs were isolated by PEG 8000 method from NPCE and RPE cells (used as controls) conditioned media. Concentrations were determined by Tunable Resistive Pulse Sensing method. Various exosomes concentration were incubated with TM cells, for the determination of mRNA (β-Catenin, Axin2 and LEF1) and protein (β-Catenin, GSK-3β) expression using real-time quantitative PCR and Western blot, respectively. Exposure of NTM cells for 8 hrs to low EVs concentrations was associated with a significant decreased expression of β-Catenin, GSK-3β, as opposed to exposure to high exosomal concentrations. Pro-MMP9 and MMP9 activities were significantly enhanced in NTM cells treated with high EV concentrations of (X10) as compared to low EV concentrations of either NPCE- or RPE-derived EVs and to untreated control. Our data support the concept that EVs biological effects are concentration-dependent at their target site. Specifically in the present study, we described a general dose-response at the gene and MMPs activity and a different dose-response regarding key canonical Wnt proteins expression. © 2018 The Authors. Journal of Cellular and Molecular Medicine published by John Wiley & Sons Ltd and Foundation for Cellular and Molecular Medicine.
SEPARATION PHENOMENA LOGISTIC REGRESSION
Directory of Open Access Journals (Sweden)
Ikaro Daniel de Carvalho Barreto
2014-03-01
Full Text Available This paper proposes an application of concepts about the maximum likelihood estimation of the binomial logistic regression model to the separation phenomena. It generates bias in the estimation and provides different interpretations of the estimates on the different statistical tests (Wald, Likelihood Ratio and Score and provides different estimates on the different iterative methods (Newton-Raphson and Fisher Score. It also presents an example that demonstrates the direct implications for the validation of the model and validation of variables, the implications for estimates of odds ratios and confidence intervals, generated from the Wald statistics. Furthermore, we present, briefly, the Firth correction to circumvent the phenomena of separation.
Adaptive metric kernel regression
DEFF Research Database (Denmark)
Goutte, Cyril; Larsen, Jan
2000-01-01
Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...
Adaptive Metric Kernel Regression
DEFF Research Database (Denmark)
Goutte, Cyril; Larsen, Jan
1998-01-01
Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression...... by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Insulin-dependent glucose metabolism in dairy cows with variable fat mobilization around calving.
Weber, C; Schäff, C T; Kautzsch, U; Börner, S; Erdmann, S; Görs, S; Röntgen, M; Sauerwein, H; Bruckmaier, R M; Metges, C C; Kuhla, B; Hammon, H M
2016-08-01
clamps, pp nonesterified fatty acid concentrations did not reach the ap levels. The study demonstrated a minor influence of different degrees of body fat mobilization on insulin metabolism in cows during the transition period. The distinct decrease in the glucose-dependent release of insulin pp is the most striking finding that explains the impaired insulin action after calving, but does not explain differences in body fat mobilization between HLFC and LLFC cows. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Crime Modeling using Spatial Regression Approach
Saleh Ahmar, Ansari; Adiatma; Kasim Aidid, M.
2018-01-01
Act of criminality in Indonesia increased both variety and quantity every year. As murder, rape, assault, vandalism, theft, fraud, fencing, and other cases that make people feel unsafe. Risk of society exposed to crime is the number of reported cases in the police institution. The higher of the number of reporter to the police institution then the number of crime in the region is increasing. In this research, modeling criminality in South Sulawesi, Indonesia with the dependent variable used is the society exposed to the risk of crime. Modelling done by area approach is the using Spatial Autoregressive (SAR) and Spatial Error Model (SEM) methods. The independent variable used is the population density, the number of poor population, GDP per capita, unemployment and the human development index (HDI). Based on the analysis using spatial regression can be shown that there are no dependencies spatial both lag or errors in South Sulawesi.
Differentiating regressed melanoma from regressed lichenoid keratosis.
Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A
2017-04-01
Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Adaptive regression for modeling nonlinear relationships
Knafl, George J
2016-01-01
This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...
Entrepreneurial intention modeling using hierarchical multiple regression
Directory of Open Access Journals (Sweden)
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
Linear regression in astronomy. II
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Salloum, M S; Guzzo, M C; Velazquez, M S; Sagadin, M B; Luna, C M
2016-12-01
Breeding selection of germplasm under fertilized conditions may reduce the frequency of genes that promote mycorrhizal associations. This study was developed to compare variability in mycorrhizal colonization and its effect on mycorrhizal dependency (MD) in improved soybean genotypes (I-1 and I-2) with differential tolerance to drought stress, and in unimproved soybean genotypes (UI-3 and UI-4). As inoculum, a mixed native arbuscular mycorrhizal fungi (AMF) was isolated from soybean roots, showing spores mostly of the species Funneliformis mosseae. At 20 days, unimproved genotypes followed by I-2, showed an increase in arbuscule formation, but not in I-1. At 40 days, mycorrhizal plants showed an increase in nodulation, this effect being more evident in unimproved genotypes. Mycorrhizal dependency, evaluated as growth and biochemical parameters from oxidative stress was increased in unimproved and I-2 since 20 days, whereas in I-1, MD increased at 40 days. We cannot distinguish significant differences in AMF colonization and MD between unimproved and I-2. However, variability among improved genotypes was observed. Our results suggest that selection for improved soybean genotypes with good and rapid AMF colonization, particularly high arbuscule/hyphae ratio could be a useful strategy for the development of genotypes that optimize AMF contribution to cropping systems.
Saka, Boualem; Djouahri, Abderrahmane; Djerrad, Zineb; Terfi, Souhila; Aberrane, Sihem; Sabaou, Nasserdine; Baaliouamer, Aoumeur; Boudarene, Lynda
2017-06-01
In the present work, the Brassica rapa var. rapifera parts essential oils and their antioxidant and antimicrobial activities were investigated for the first time depending on geographic origin and extraction technique. Gas-chromatography (GC) and GC/mass spectrometry (MS) analyses showed several constituents, including alcohols, aldehydes, esters, ketones, norisoprenoids, terpenic, nitrogen and sulphur compounds, totalizing 38 and 41 compounds in leaves and root essential oils, respectively. Nitrogen compounds were the main volatiles in leaves essential oils and sulphur compounds were the main volatiles in root essential oils. Qualitative and quantitative differences were found among B. rapa var. rapifera parts essential oils collected from different locations and extracted by hydrodistillation and microwave-assisted hydrodistillation techniques. Furthermore, our findings showed a high variability for both antioxidant and antimicrobial activities. The highlighted variability reflects the high impact of plant part, geographic variation and extraction technique on chemical composition and biological activities, which led to conclude that we should select essential oils to be investigated carefully depending on these factors, in order to isolate the bioactive components or to have the best quality of essential oil in terms of biological activities and preventive effects in food. © 2017 Wiley-VHCA AG, Zurich, Switzerland.
Robust Regression Procedures for Predictor Variable Outliers.
1982-03-01
space of probability dis- tributions. Then the influence function of the estimator is defined to be the derivative of the functional evaluated at the...measure of the impact of an outlier x0 on the estimator . . . . . .0 10 T(F) is the " influence function " which is defined to be T(F) - lirT(F")-T(F...positive and negative directions. An em- pirical influence function can be defined in a similar fashion simply by replacing F with F in eqn. (3.4).n
Energy Technology Data Exchange (ETDEWEB)
Ma, L; Braunstein, S; Chiu, J [University of California San Francisco, San Francisco, CA (United States); Sahgal, A [Sunnybrook Health Sciences Center, University of Toronto, Toronto, Ontario (Canada)
2016-06-15
Purpose: Spinal cord tolerance for SBRT has been recommended for the maximum point dose level or at irradiated volumes such as 0.35 mL or 10% of contoured volumes. In this study, we investigated an inherent functional relationship that associates these dose surrogates for irradiated spinal cord volumes of up to 3.0 mL. Methods: A hidden variable termed as Effective Dose Radius (EDR) was formulated based on a dose fall-off model to correlate dose at irradiated spinal cord volumes ranging from 0 mL (point maximum) to 3.0 mL. A cohort of 15 spine SBRT cases was randomly selected to derive an EDR-parameterized formula. The mean prescription dose for the studied cases was 21.0±8.0 Gy (range, 10–40Gy) delivered in 3±1 fractions with target volumes of 39.1 ± 70.6 mL. Linear regression and variance analysis were performed for the fitting parameters of variable EDR values. Results: No direct correlation was found between the dose at maximum point and doses at variable spinal cord volumes. For example, Pearson R{sup 2} = 0.643 and R{sup 2}= 0.491 were obtained when correlating the point maximum dose with the spinal cord dose at 1 mL and 3 mL, respectively. However, near perfect correlation (R{sup 2} ≥0.99) was obtained when corresponding parameterized EDRs. Specifically, Pearson R{sup 2}= 0.996 and R{sup 2} = 0.990 were obtained when correlating EDR (maximum point dose) with EDR (dose at 1 mL) and EDR(dose at 3 mL), respectively. As a result, high confidence level look-up tables were established to correlate spinal cord doses at the maximum point to any finite irradiated volumes. Conclusion: An inherent functional relationship was demonstrated for spine SBRT. Such a relationship unifies dose surrogates at variable cord volumes and proves that a single dose surrogate (e.g. point maximum dose) is mathematically sufficient in constraining the overall spinal cord dose tolerance for SBRT.
Gaussian process regression analysis for functional data
Shi, Jian Qing
2011-01-01
Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Munoz-Price, L Silvia; Rosa, Rossana; Castro, Jose G; Laowansiri, Panthipa; Latibeaudiere, Rachel; Namias, Nicholas; Tarima, Sergey
2016-10-01
To determine the time-dependent effect of antibiotics on the initial acquisition of carbapenem-resistant Acinetobacter baumannii. Retrospective cohort study. Forty-bed trauma ICU in Miami, FL. All consecutive patients admitted to the unit from November 1, 2010, to November 30, 2011. None. Patients underwent surveillance cultures at admission to the unit and weekly thereafter. The primary outcome was the acquisition of carbapenem-resistant A. baumannii on surveillance cultures. Daily antibiotic exposures during the time of observation were used to construct time-dependent variables, including cumulative exposures (in grams and daily observed doses [defined daily doses]). Among 360 patients, 45 (12.5%) became colonized with carbapenem-resistant A. baumannii. Adjusted Cox models showed that each additional point in the Acute Physiologic and Chronic Health Evaluation score increased the hazard by 4.8% (hazard ratio, 1.048; 95% CI, 1.010-1.087; p = 0.0124) and time-dependent exposure to carbapenems quadrupled the hazard (hazard ratio, 4.087; 95% CI, 1.873-8.920; p = 0.0004) of acquiring carbapenem-resistant A. baumannii. Additionally, adjusted Cox models determined that every additional carbapenem defined daily dose increased the hazard of acquiring carbapenem-resistant A. baumannii by 5.1% (hazard ratio, 1.051; 95% CI, 1.007-1.093; p = 0.0243). Carbapenem exposure quadrupled the hazards of acquiring A. baumannii even after controlling for severity of illness.
Miller, Tom E X
2007-07-01
1. It is widely accepted that density-dependent processes play an important role in most natural populations. However, persistent challenges in our understanding of density-dependent population dynamics include evaluating the shape of the relationship between density and demographic rates (linear, concave, convex), and identifying extrinsic factors that can mediate this relationship. 2. I studied the population dynamics of the cactus bug Narnia pallidicornis on host plants (Opuntia imbricata) that varied naturally in relative reproductive effort (RRE, the proportion of meristems allocated to reproduction), an important plant quality trait. I manipulated per-plant cactus bug densities, quantified subsequent dynamics, and fit stage-structured models to the experimental data to ask if and how density influences demographic parameters. 3. In the field experiment, I found that populations with variable starting densities quickly converged upon similar growth trajectories. In the model-fitting analyses, the data strongly supported a model that defined the juvenile cactus bug retention parameter (joint probability of surviving and not dispersing) as a nonlinear decreasing function of density. The estimated shape of this relationship shifted from concave to convex with increasing host-plant RRE. 4. The results demonstrate that host-plant traits are critical sources of variation in the strength and shape of density dependence in insects, and highlight the utility of integrated experimental-theoretical approaches for identifying processes underlying patterns of change in natural populations.
Logistic regression applied to natural hazards: rare event logistic regression with replications
Directory of Open Access Journals (Sweden)
M. Guns
2012-06-01
Full Text Available Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Logistic regression applied to natural hazards: rare event logistic regression with replications
Guns, M.; Vanacker, V.
2012-06-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Penzlin, Ana Isabel; Barlinn, Kristian; Illigens, Ben Min-Woo; Weidner, Kerstin; Siepmann, Martin; Siepmann, Timo
2017-09-06
A randomized controlled study (RCT) recently showed that short-term heart rate variability (HRV) biofeedback in addition to standard rehabilitation care for alcohol dependence can reduce craving, anxiety and improve cardiovascular autonomic function. In this one-year follow-up study we aimed to explore whether completion of 2-week HRV-Biofeedback training is associated with long-term abstinence. Furthermore, we sought to identify potential predictors of post-treatment abstinence. We conducted a survey on abstinence in patients with alcohol dependence 1 year after completion of an RCT comparing HRV-biofeedback in addition to inpatient rehabilitation treatment alone (controls). Abstinence rates were compared and analysed for association with demographic data as well as psychometric and autonomic cardiac assessment before and after completion of the biofeedback training using bivariate and multivariate regression analyses. Out of 48 patients who participated in the RCT, 27 patients (9 females, ages 42.9 ± 8.6, mean ± SD) completed our one-year follow-up. When including in the analysis only patients who completed follow-up, the rate of abstinence tended to be higher in patients who underwent HRV-biofeedback 1 year earlier compared to those who received rehabilitative treatment alone (66.7% vs 50%, p = ns). This non-significant trend was also observed in the intention-to-treat analysis where patients who did not participate in the follow-up were assumed to have relapsed (46,7% biofeedback vs. 33.3% controls, p = ns). Neither cardiac autonomic function nor psychometric variables were associated with abstinence 1 year after HRV-biofeedback. Our follow-up study provide a first indication of possible increase in long-term abstinence after HRV-biofeedback for alcohol dependence in addition to rehabilitation. The original randomized controlled trial was registered in the German Clinical Trials Register ( DRKS00004618 ). This one-year follow-up survey has not been
Energy Technology Data Exchange (ETDEWEB)
Burns, S.P. [Texas Univ., Austin, TX (United States); Gianoulakis, S.E. [Sandia National Labs., Albuquerque, NM (United States)
1995-07-01
A numerical solution for buoyant natural convection within a square enclosure containing a fluid with highly temperature dependent viscosity is presented. Although the fluid properties employed do not represent any real fluid, the large variation in the fluid viscosity with temperature is characteristic of turbulent flow modeling with eddy-viscosity concepts. Results are obtained using a primitive variable formulation and the resistor method. The results presented include velocity, temperature and pressure distributions within the enclosure as well as shear stress and heat flux distributions along the enclosure walls. Three mesh refinements were employed and uncertainty values are suggested for the final mesh refinement. These solutions are part of a contributed benchmark solution set for the subject problem.
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Directional Dependence in Developmental Research
von Eye, Alexander; DeShon, Richard P.
2012-01-01
In this article, we discuss and propose methods that may be of use to determine direction of dependence in non-normally distributed variables. First, it is shown that standard regression analysis is unable to distinguish between explanatory and response variables. Then, skewness and kurtosis are discussed as tools to assess deviation from…
Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method
Prahutama, Alan; Sudarno
2018-05-01
The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).
Hoang, Hanh H; Nickerson, Nicholas N; Lee, Vincent T; Kazimirova, Anastasia; Chami, Mohamed; Pugsley, Anthony P; Lory, Stephen
2011-01-01
In Gram-negative bacteria, the Lol and Bam machineries direct the targeting of lipidated and nonlipidated proteins, respectively, to the outer membrane (OM). Using Pseudomonas aeruginosa strains with depleted levels of specific Bam and Lol proteins, we demonstrated a variable dependence of different OM proteins on these targeting pathways. Reduction in the level of BamA significantly affected the ability of the β-barrel membrane protein OprF to localize to the OM, while the targeting of three secretins that are functionally related OM proteins was less affected (PilQ and PscC) or not at all affected (XcpQ). Depletion of LolB affected all lipoproteins examined and had a variable effect on the nonlipidated proteins. While the levels of OprF, PilQ, and PscC were significantly reduced by LolB depletion, XcpQ was unaffected and was correctly localized to the OM. These results suggest that certain β-barrel proteins such as OprF primarily utilize the complete Bam machinery. The Lol machinery participates in the OM targeting of secretins to variable degrees, likely through its involvement in the assembly of lipidated Bam components. XcpQ, but not PilQ or PscC, was shown to assemble spontaneously into liposomes as multimers. This work raises the possibility that there is a gradient of utilization of Bam and Lol insertion and targeting machineries. Structural features of individual proteins, including their β-barrel content, may determine the propensity of these proteins for folding (or misfolding) during periplasmic transit and OM insertion, thereby influencing the extent of utilization of the Bam targeting machinery, respectively. Targeting of lipidated and nonlipidated proteins to the outer membrane (OM) compartment in Gram-negative bacteria involves the transfer across the periplasm utilizing the Lol and Bam machineries, respectively. We show that depletion of Bam and Lol components in Pseudomonas aeruginosa does not lead to a general OM protein translocation defect
Bind, Marie-Abele; Peters, Annette; Koutrakis, Petros; Coull, Brent; Vokonas, Pantel; Schwartz, Joel
2016-08-01
Previous studies have observed associations between air pollution and heart disease. Susceptibility to air pollution effects has been examined mostly with a test of effect modification, but little evidence is available whether air pollution distorts cardiovascular risk factor distribution. This paper aims to examine distributional and heterogeneous effects of air pollution on known cardiovascular biomarkers. A total of 1,112 men from the Normative Aging Study and residents of the greater Boston, Massachusetts, area with mean age of 69 years at baseline were included in this study during the period 1995-2013. We used quantile regression and random slope models to investigate distributional effects and heterogeneity in the traffic-related responses on blood pressure, heart rate variability, repolarization, lipids, and inflammation. We considered 28-day averaged exposure to particle number, PM2.5 black carbon, and PM2.5 mass concentrations (measured at a single monitor near the site of the study visits). We observed some evidence suggesting distributional effects of traffic-related pollutants on systolic blood pressure, heart rate variability, corrected QT interval, low density lipoprotein (LDL) cholesterol, triglyceride, and intercellular adhesion molecule-1 (ICAM-1). For example, among participants with LDL cholesterol below 80 mg/dL, an interquartile range increase in PM2.5 black carbon exposure was associated with a 7-mg/dL (95% CI: 5, 10) increase in LDL cholesterol, while among subjects with LDL cholesterol levels close to 160 mg/dL, the same exposure was related to a 16-mg/dL (95% CI: 13, 20) increase in LDL cholesterol. We observed similar heterogeneous associations across low versus high percentiles of the LDL distribution for PM2.5 mass and particle number. These results suggest that air pollution distorts the distribution of cardiovascular risk factors, and that, for several outcomes, effects may be greatest among individuals who are already at high risk
Application of range-test in multiple linear regression analysis in ...
African Journals Online (AJOL)
Application of range-test in multiple linear regression analysis in the presence of outliers is studied in this paper. First, the plot of the explanatory variables (i.e. Administration, Social/Commercial, Economic services and Transfer) on the dependent variable (i.e. GDP) was done to identify the statistical trend over the years.
Censored Hurdle Negative Binomial Regression (Case Study: Neonatorum Tetanus Case in Indonesia)
Yuli Rusdiana, Riza; Zain, Ismaini; Wulan Purnami, Santi
2017-06-01
Hurdle negative binomial model regression is a method that can be used for discreate dependent variable, excess zero and under- and overdispersion. It uses two parts approach. The first part estimates zero elements from dependent variable is zero hurdle model and the second part estimates not zero elements (non-negative integer) from dependent variable is called truncated negative binomial models. The discrete dependent variable in such cases is censored for some values. The type of censor that will be studied in this research is right censored. This study aims to obtain the parameter estimator hurdle negative binomial regression for right censored dependent variable. In the assessment of parameter estimation methods used Maximum Likelihood Estimator (MLE). Hurdle negative binomial model regression for right censored dependent variable is applied on the number of neonatorum tetanus cases in Indonesia. The type data is count data which contains zero values in some observations and other variety value. This study also aims to obtain the parameter estimator and test statistic censored hurdle negative binomial model. Based on the regression results, the factors that influence neonatorum tetanus case in Indonesia is the percentage of baby health care coverage and neonatal visits.
International Nuclear Information System (INIS)
Levine, R.D.
1979-01-01
The reaction rate constant is expressed as Z exp(-G/sub a//RT). Z is the binary collision frequency. G/sub a/, the free energy of activation, is shown to be the difference between the free energy of the reactive reactants and the free energy of all reactants. The results are derived from both a statistical mechanical and a collision theoretic point of view. While the later is more suitable for an ab-initio computation of the reaction rate, it is the former that lends itself to the search of systematics and of correlations and to compaction of data. Different thermodynamic-like routes to the characterization of G/sub a/ are thus explored. The two most promising ones appear to be the use of thermodynamic type cycles and the changes of dependent variables using the Legendre transform technique. The dependence of G/sub a/ on ΔG 0 , the standard free energy change in the reaction, is examined from the later point of view. It is shown that one can rigorously express this dependence as G/sub a/ = αΔG 0 + G/sub a/ 0 M(α). Here α is the Bronsted slope, α = -par. delta ln k(T)/par. delta(ΔG 0 /RT), G/sub a/ 0 is independent of ΔG 0 and M(α), the Legendre transform of G/sub a/, is a function only of α. For small changes in ΔG 0 , the general result reduces to the familiar ''linear'' free energy relation delta G/sub a/ = α delta ΔG 0 . It is concluded from general considerations that M(α) is a symmetric, convex function of α and hence that α is a monotonically increasing function of ΔG 0 . Experimental data appear to conform well to the form α = 1/[1 + exp(-ΔG 0 /G/sub s/ 0 )]. A simple interpretation of the ΔG 0 dependence of G/sub a/, based on an interpolation of the free energy from that of the reagents to that of the products, is offered. 4 figures, 69 references
Djerrad, Zineb; Djouahri, Abderrahmane; Kadik, Leila
2017-04-01
The impact of growth stages during vegetative cycle (B 0 - B 5 ) on chemical composition and antioxidant activities of Pinus halepensis Mill. needles essential oils was investigated for the first time. GC and GC/MS analyses pointed to a quantitative variability of components; terpene hydrocarbons derivatives, represented by α-pinene (8.5 - 12.9%), myrcene (17.5 - 21.6%), p-cymene (7.9 - 11.9%) and (Z)-β-caryophyllene (17.3 - 21.2%) as major components, decreased from 88.9% at B 0 growth stage to 66.9% at B 5 growth stage, whereas oxygenated derivatives, represented by caryophyllene oxide (5.4 - 12.6%) and terpinen-4-ol (0.4 - 3.3%) as major components, increased from 7% at B 0 growth stage to 28.4% at B 5 growth stage. Furthermore, our findings showed that essential oil of P. halepensis needles collected at B 5 growth stage possess higher antioxidant activities by four different testing systems than those collected at B 0 - B 4 growth stages. This highlighted variability led to conclude that we should select essential oils to be investigated carefully depending on growth stage, in order to have the highest effectiveness of essential oil in terms of biological activities for human health purposes. © 2017 Wiley-VHCA AG, Zurich, Switzerland.
Regression Analysis by Example. 5th Edition
Chatterjee, Samprit; Hadi, Ali S.
2012-01-01
Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
A Seemingly Unrelated Poisson Regression Model
King, Gary
1989-01-01
This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.
Regression analysis with categorized regression calibrated exposure: some interesting findings
Directory of Open Access Journals (Sweden)
Hjartåker Anette
2006-07-01
Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a
RAWS II: A MULTIPLE REGRESSION ANALYSIS PROGRAM,
This memorandum gives instructions for the use and operation of a revised version of RAWS, a multiple regression analysis program. The program...of preprocessed data, the directed retention of variable, listing of the matrix of the normal equations and its inverse, and the bypassing of the regression analysis to provide the input variable statistics only. (Author)
A Simulation Investigation of Principal Component Regression.
Allen, David E.
Regression analysis is one of the more common analytic tools used by researchers. However, multicollinearity between the predictor variables can cause problems in using the results of regression analyses. Problems associated with multicollinearity include entanglement of relative influences of variables due to reduced precision of estimation,…
Survival analysis II: Cox regression
Stel, Vianda S.; Dekker, Friedo W.; Tripepi, Giovanni; Zoccali, Carmine; Jager, Kitty J.
2011-01-01
In contrast to the Kaplan-Meier method, Cox proportional hazards regression can provide an effect estimate by quantifying the difference in survival between patient groups and can adjust for confounding effects of other variables. The purpose of this article is to explain the basic concepts of the
Kernel regression with functional response
Ferraty, Frédéric; Laksaci, Ali; Tadj, Amel; Vieu, Philippe
2011-01-01
We consider kernel regression estimate when both the response variable and the explanatory one are functional. The rates of uniform almost complete convergence are stated as function of the small ball probability of the predictor and as function of the entropy of the set on which uniformity is obtained.
Wang, J.; Nathan, R.; Horne, A.
2017-12-01
Traditional approaches to characterize water-dependent ecosystem outcomes in response to flow have been based on time-averaged hydrological indicators, however there is increasing recognition for the need to characterize ecological processes that are highly dependent on the sequencing of flow conditions (i.e. floods and droughts). This study considers the representation of flow regimes when considering assessment of ecological outcomes, and in particular, the need to account for sequencing and variability of flow. We conducted two case studies - one in the largely unregulated Ovens River catchment and one in the highly regulated Murray River catchment (both located in south-eastern Australia) - to explore the importance of flow sequencing to the condition of a typical long-lived ecological asset in Australia, the River Red Gum forests. In the first, the Ovens River case study, the implications of representing climate change using different downscaling methods (annual scaling, monthly scaling, quantile mapping, and weather generator method) on the sequencing of flows and resulting ecological outcomes were considered. In the second, the Murray River catchment, sequencing within a historic drought period was considered by systematically making modest adjustments on an annual basis to the hydrological records. In both cases, the condition of River Red Gum forests was assessed using an ecological model that incorporates transitions between ecological conditions in response to sequences of required flow components. The results of both studies show the importance of considering how hydrological alterations are represented when assessing ecological outcomes. The Ovens case study showed that there is significant variation in the predicted ecological outcomes when different downscaling techniques are applied. Similarly, the analysis in the Murray case study showed that the drought as it historically occurred provided one of the best possible outcomes for River Red Gum
DEFF Research Database (Denmark)
Johansen, Søren
2008-01-01
The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...
Bakbergenuly, Ilyas; Morgenthaler, Stephan
2016-01-01
We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group‐level studies or in meta‐analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log‐odds and arcsine transformations of the estimated probability p^, both for single‐group studies and in combining results from several groups or studies in meta‐analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta‐analysis and result in abysmal coverage of the combined effect for large K. We also propose bias‐correction for the arcsine transformation. Our simulations demonstrate that this bias‐correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta‐analyses of prevalence. PMID:27192062
Directory of Open Access Journals (Sweden)
S. G. B. K. Gorantla
2017-06-01
Full Text Available Violation of functioning of the autonomic nervous system is an important factor in the formation and progression of arterial hypertension (AH. Abnormal nocturnal blood pressure (BP reduction is regarded as an independent prognostic factor for cardiovascular complications in patients with AH. One of the possible factors that determine the violation of BP circadian rhythm can be imbalance of different parts of autonomic nervous system. The aim of our study was to study heart rate variability (HRV in patients with AH, dependently of BP profile. 72 patients with AH were examined. Average age was 57 ± 11 years. All patients underwent ambulatory BP (ABPM and ECG monitoring. To define the daily profile the nocturnal BP dip was quantified and for HRV evaluation the frequency analysis method was used. HRV changes in patients with AH present with reduced total power and with a violation in the ratio of the powers of very low, low and high frequencies, enhanced sympathycotension and influence of humoral factors. Violations of systolic BP (SBP daily profile was mainly characterized by an increase in the power of low frequency waves, which indicates an intensification of sympathetic and decreased parasympathetic influences. Violations of diastolic BP (DBP daily profile were mainly characterized by a relative increase in the power of very low frequency waves. The obtained results showed that in the management of patients with AH it is important not only to control the circadian SBP and DBP profiles, but the evaluation of HRV also.
Bakbergenuly, Ilyas; Kulinskaya, Elena; Morgenthaler, Stephan
2016-07-01
We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group-level studies or in meta-analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log-odds and arcsine transformations of the estimated probability p̂, both for single-group studies and in combining results from several groups or studies in meta-analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta-analysis and result in abysmal coverage of the combined effect for large K. We also propose bias-correction for the arcsine transformation. Our simulations demonstrate that this bias-correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta-analyses of prevalence. © 2016 The Authors. Biometrical Journal Published by Wiley-VCH Verlag GmbH & Co. KGaA.
Boufadel, Michel C.; Suidan, Makram T.; Venosa, Albert D.
1999-04-01
We present a formulation for water flow and solute transport in two-dimensional variably saturated media that accounts for the effects of the solute on water density and viscosity. The governing equations are cast in a dimensionless form that depends on six dimensionless groups of parameters. These equations are discretized in space using the Galerkin finite element formulation and integrated in time using the backward Euler scheme with mass lumping. The modified Picard method is used to linearize the water flow equation. The resulting numerical model, the MARUN model, is verified by comparison to published numerical results. It is then used to investigate beach hydraulics at seawater concentration (about 30 g l -1) in the context of nutrients delivery for bioremediation of oil spills on beaches. Numerical simulations that we conducted in a rectangular section of a hypothetical beach revealed that buoyancy in the unsaturated zone is significant in soils that are fine textured, with low anisotropy ratio, and/or exhibiting low physical dispersion. In such situations, application of dissolved nutrients to a contaminated beach in a freshwater solution is superior to their application in a seawater solution. Concentration-engendered viscosity effects were negligible with respect to concentration-engendered density effects for the cases that we considered.
Al-Khatib, Issam A; Abu Fkhidah, Ismail; Khatib, Jumana I; Kontogianni, Stamatia
2016-03-01
Forecasting of hospital solid waste generation is a critical challenge for future planning. The composition and generation rate of hospital solid waste in hospital units was the field where the proposed methodology of the present article was applied in order to validate the results and secure the outcomes of the management plan in national hospitals. A set of three multiple-variable regression models has been derived for estimating the daily total hospital waste, general hospital waste, and total hazardous waste as a function of number of inpatients, number of total patients, and number of beds. The application of several key indicators and validation procedures indicates the high significance and reliability of the developed models in predicting the hospital solid waste of any hospital. Methodology data were drawn from existent scientific literature. Also, useful raw data were retrieved from international organisations and the investigated hospitals' personnel. The primal generation outcomes are compared with other local hospitals and also with hospitals from other countries. The main outcome, which is the developed model results, are presented and analysed thoroughly. The goal is this model to act as leverage in the discussions among governmental authorities on the implementation of a national plan for safe hospital waste management in Palestine. © The Author(s) 2016.
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Analysis of Relationship Between Personality and Favorite Places with Poisson Regression Analysis
Directory of Open Access Journals (Sweden)
Yoon Song Ha
2018-01-01
Full Text Available A relationship between human personality and preferred locations have been a long conjecture for human mobility research. In this paper, we analyzed the relationship between personality and visiting place with Poisson Regression. Poisson Regression can analyze correlation between countable dependent variable and independent variable. For this analysis, 33 volunteers provided their personality data and 49 location categories data are used. Raw location data is preprocessed to be normalized into rates of visit and outlier data is prunned. For the regression analysis, independent variables are personality data and dependent variables are preprocessed location data. Several meaningful results are found. For example, persons with high tendency of frequent visiting to university laboratory has personality with high conscientiousness and low openness. As well, other meaningful location categories are presented in this paper.
Oxygen-Dependent Cell-to-Cell Variability in the Output of the Escherichia coli Tor Phosphorelay.
Roggiani, Manuela; Goulian, Mark
2015-06-15
Escherichia coli senses and responds to trimethylamine-N-oxide (TMAO) in the environment through the TorT-TorS-TorR signal transduction system. The periplasmic protein TorT binds TMAO and stimulates the hybrid kinase TorS to phosphorylate the response regulator TorR through a phosphorelay. Phosphorylated TorR, in turn, activates transcription of the torCAD operon, which encodes the proteins required for anaerobic respiration via reduction of TMAO to trimethylamine. Interestingly, E. coli respires TMAO in both the presence and absence of oxygen, a behavior that is markedly different from the utilization of other alternative electron acceptors by this bacterium. Here we describe an unusual form of regulation by oxygen for this system. While the average level of torCAD transcription is the same for aerobic and anaerobic cultures containing TMAO, the behavior across the population of cells is strikingly different under the two growth conditions. Cellular levels of torCAD transcription in aerobic cultures are highly heterogeneous, in contrast to the relatively homogeneous distribution in anaerobic cultures. Thus, oxygen regulates the variance of the output but not the mean for the Tor system. We further show that this oxygen-dependent variability stems from the phosphorelay. Trimethylamine-N-oxide (TMAO) is utilized by numerous bacteria as an electron acceptor for anaerobic respiration. In E. coli, expression of the proteins required for TMAO respiration is tightly regulated by a signal transduction system that is activated by TMAO. Curiously, although oxygen is the energetically preferred electron acceptor, TMAO is respired even in the presence of oxygen. Here we describe an interesting and unexpected form of regulation for this system in which oxygen produces highly variable expression of the TMAO utilization proteins across a population of cells without affecting the mean expression of these proteins. To our knowledge, this is the first reported example of a stimulus
Ganz, Michael L; Li, Qian; Wintfeld, Neil S; Lee, Yuan-Chi; Sorli, Christopher; Huang, Joanna C
2015-01-01
Past studies have found episodes of severe hypoglycemia (SH) to be serially dependent. Those studies, however, only considered the impact of a single (index) event on future risk; few have analyzed SH risk as it evolves over time in the presence (or absence) of continuing events. The objective of this study was to determine the dynamic risks of SH events conditional on preceding SH events among patients with type 2 diabetes (T2D) who have initiated basal insulin. We used an electronic health records database from the United States that included encounter and laboratory data and clinical notes on T2D patients who initiated basal insulin therapy between 2008 and 2011 and to identify SH events. We used a repeated-measures lagged dependent variable logistic regression model to estimate the impact of SH in one quarter on the risk of SH in the next quarter. We identified 7235 patients with T2D who initiated basal insulin. Patients who experienced ≥1 SH event during any quarter were more likely to have ≥1 SH event during the subsequent quarter than those who did not (predicted probabilities of 7.4% and 1.0%, respectively; p history of SH before starting basal insulin (predicted probabilities of 1.0% and 3.2%, respectively; p history of SH during the titration period (predicted probabilities of 1.1% and 2.8%, respectively; p history of SH events and therefore the value of preventing one SH event may be substantial. These results can inform patient care by providing clinicians with dynamic data on a patient's risk of SH, which in turn can facilitate appropriate adjustment of the risk-benefit ratio for individualized patient care. These results should, however, be interpreted in light of the key limitations of our study: not all SH events may have been captured or coded in the database, data on filled prescriptions were not available, we were unable to adjust for basal insulin dose, and the post-titration follow-up period could have divided into time units other
Zhang, Kai; Zhang, Bao-Zhong; Li, Shao-Meng; Zhang, Lei-Ming; Staebler, Ralf; Zeng, Eddy Y.
2012-09-01
Atmospheric gaseous and size-segregated particle samples were collected from urban Guangzhou at the heights of 100 and 150 m above the ground in daytime and at night in August and December 2010, and were analyzed for polycyclic aromatic hydrocarbons (PAHs). Particulate PAHs were more abundant at night than in daytime, and significantly higher in winter than in summer. The observed vertical, diurnal, and seasonal variability in the occurrences of PAH were attributed to varying meteorological conditions and atmospheric boundary layers. More than 60% of the particulate PAHs were contained in particles in the accumulation mode with an aerodynamic diameter (Dp) in the range of 0.1-1.8 μm. Different mass transfer velocities by volatilization and condensation are considered the main causes for the different particle size distributions among individual PAHs, while combustion at different temperatures and atmospheric transport were probable causes of the observed seasonal variation in the size distribution of PAHs. Based on the modeled size-dependent dry deposition velocities, daily mean dry deposition fluxes of particulate PAHs ranged from 604 to 1190 ng m-2 d-1, with PAHs in coarse particles (Dp > 1.8 μm) accounting for 55-95% of the total fluxes. In addition, gaseous PAHs were estimated to contribute 0.6-3.1% to the total dry deposition fluxes if a conservative dry deposition velocity for gaseous species (2 × 10-4 m s-1) were used. Finally, disequilibrium phase partitioning, meteorological conditions and atmospheric transport were regarded as the main reasons for the variances in dry deposition velocities of individual PAHs.
Ozawa, Rika; Nishimura, Osamu; Yazawa, Shigenobu; Muroi, Atsushi; Takabayashi, Junji; Arimura, Gen-ichiro
2012-11-01
Different organisms compensate for, and adapt to, environmental changes in different ways. In this way, environmental changes affect animal-plant interactions. In this study, we assessed the effect of temperature on a tritrophic system of the lima bean, the herbivorous spider mite Tetranychus urticae and the predatory mite Phytoseiulus persimilis. In this system, the plant defends itself against T. urticae by emitting volatiles that attract P. persimilis. Over 20-40 °C, the emission of volatiles by infested plants and the subsequent attraction of P. persimilis peaked at 30 °C, but the number of eggs laid by T. urticae adults and the number of eggs consumed by P. persimilis peaked at 35 °C. This indicates that the spider mites and predatory mites performed best at a higher temperature than that at which most volatile attractants were produced. Our data from transcriptome pyrosequencing of the mites found that P. persimilis up-regulated gene families for heat shock proteins (HSPs) and ubiquitin-associated proteins, whereas T. urticae did not. RNA interference-mediated gene suppression in P. persimilis revealed differences in temperature responses. Predation on T. urticae eggs by P. persimilis that had been fed PpHsp70-1 dsRNA was low at 35 °C but not at 25 °C when PpHsp70-1 expression was very high. Overall, our molecular and behavioural approaches revealed that the mode and tolerance of lima bean, T. urticae and P. persimilis are distinctly affected by temperature variability, thereby making their tritrophic interactions temperature dependent. © 2012 Blackwell Publishing Ltd.
Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression
Energy Technology Data Exchange (ETDEWEB)
Verdoolaege, G., E-mail: geert.verdoolaege@ugent.be [Department of Applied Physics, Ghent University, B-9000 Ghent (Belgium); Laboratory for Plasma Physics, Royal Military Academy, B-1000 Brussels (Belgium); Shabbir, A. [Department of Applied Physics, Ghent University, B-9000 Ghent (Belgium); Max Planck Institute for Plasma Physics, Boltzmannstr. 2, 85748 Garching (Germany); Hornung, G. [Department of Applied Physics, Ghent University, B-9000 Ghent (Belgium)
2016-11-15
Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standard least squares.
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Rosero-Vlasova, O.; Borini Alves, D.; Vlassova, L.; Perez-Cabello, F.; Montorio Lloveria, R.
2017-10-01
Deforestation in Amazon basin due, among other factors, to frequent wildfires demands continuous post-fire monitoring of soil and vegetation. Thus, the study posed two objectives: (1) evaluate the capacity of Visible - Near InfraRed - ShortWave InfraRed (VIS-NIR-SWIR) spectroscopy to estimate soil organic matter (SOM) in fire-affected soils, and (2) assess the feasibility of SOM mapping from satellite images. For this purpose, 30 soil samples (surface layer) were collected in 2016 in areas of grass and riparian vegetation of Campos Amazonicos National Park, Brazil, repeatedly affected by wildfires. Standard laboratory procedures were applied to determine SOM. Reflectance spectra of soils were obtained in controlled laboratory conditions using Fieldspec4 spectroradiometer (spectral range 350nm- 2500nm). Measured spectra were resampled to simulate reflectances for Landsat-8, Sentinel-2 and EnMap spectral bands, used as predictors in SOM models developed using Partial Least Squares regression and step-down variable selection algorithm (PLSR-SD). The best fit was achieved with models based on reflectances simulated for EnMap bands (R2=0.93; R2cv=0.82 and NMSE=0.07; NMSEcv=0.19). The model uses only 8 out of 244 predictors (bands) chosen by the step-down variable selection algorithm. The least reliable estimates (R2=0.55 and R2cv=0.40 and NMSE=0.43; NMSEcv=0.60) resulted from Landsat model, while Sentinel-2 model showed R2=0.68 and R2cv=0.63; NMSE=0.31 and NMSEcv=0.38. The results confirm high potential of VIS-NIR-SWIR spectroscopy for SOM estimation. Application of step-down produces sparser and better-fit models. Finally, SOM can be estimated with an acceptable accuracy (NMSE 0.35) from EnMap and Sentinel-2 data enabling mapping and analysis of impacts of repeated wildfires on soils in the study area.
Kayri, Murat; Gunuc, Selim
2010-01-01
Internet dependency is going to expand into social life in wide area whereas it has been accepted as a pathological and psychological disease. Knowing the basic effects of internet dependency is an inevitable approach to use the internet technology healthy. In this study, internet dependency levels of 754 students were examined with the Internet…
ON REGRESSION REPRESENTATIONS OF STOCHASTIC-PROCESSES
RUSCHENDORF, L; DEVALK, [No Value
We construct a.s. nonlinear regression representations of general stochastic processes (X(n))n is-an-element-of N. As a consequence we obtain in particular special regression representations of Markov chains and of certain m-dependent sequences. For m-dependent sequences we obtain a constructive
The intermediate endpoint effect in logistic and probit regression
MacKinnon, DP; Lockwood, CM; Brown, CH; Wang, W; Hoffman, JM
2010-01-01
Background An intermediate endpoint is hypothesized to be in the middle of the causal sequence relating an independent variable to a dependent variable. The intermediate variable is also called a surrogate or mediating variable and the corresponding effect is called the mediated, surrogate endpoint, or intermediate endpoint effect. Clinical studies are often designed to change an intermediate or surrogate endpoint and through this intermediate change influence the ultimate endpoint. In many intermediate endpoint clinical studies the dependent variable is binary, and logistic or probit regression is used. Purpose The purpose of this study is to describe a limitation of a widely used approach to assessing intermediate endpoint effects and to propose an alternative method, based on products of coefficients, that yields more accurate results. Methods The intermediate endpoint model for a binary outcome is described for a true binary outcome and for a dichotomization of a latent continuous outcome. Plots of true values and a simulation study are used to evaluate the different methods. Results Distorted estimates of the intermediate endpoint effect and incorrect conclusions can result from the application of widely used methods to assess the intermediate endpoint effect. The same problem occurs for the proportion of an effect explained by an intermediate endpoint, which has been suggested as a useful measure for identifying intermediate endpoints. A solution to this problem is given based on the relationship between latent variable modeling and logistic or probit regression. Limitations More complicated intermediate variable models are not addressed in the study, although the methods described in the article can be extended to these more complicated models. Conclusions Researchers are encouraged to use an intermediate endpoint method based on the product of regression coefficients. A common method based on difference in coefficient methods can lead to distorted
Jonell, T. N.; Li, Y.; Blusztajn, J.; Giosan, L.; Clift, P. D.
2017-12-01
Rare earth element (REE) radioisotope systems, such as neodymium (Nd), have been traditionally used as powerful tracers of source provenance, chemical weathering intensity, and sedimentary processes over geologic timescales. More recently, the effects of physical fractionation (hydraulic sorting) of sediments during transport have called into question the utility of Nd isotopes as a provenance tool. Is source terrane Nd provenance resolvable if sediment transport strongly induces noise? Can grain-size sorting effects be quantified? This study works to address such questions by utilizing grain size analysis, trace element geochemistry, and Nd isotope geochemistry of bulk and grain-size fractions (Pakistan. Here we evaluate how grain size effects drive Nd isotope variability and further resolve the total uncertainties associated with Nd isotope compositions of bulk sediments. Results from the Indus delta indicate bulk sediment ɛNd compositions are most similar to the <63 µm fraction as a result of strong mineralogical control on bulk compositions by silt- to clay-sized monazite and/or allanite. Replicate analyses determine that the best reproducibility (± 0.15 ɛNd points) is observed in the 125-250 µm fraction. The bulk and finest fractions display the worst reproducibility (±0.3 ɛNd points). Standard deviations (2σ) indicate that bulk sediment uncertainties are no more than ±1.0 ɛNd points. This argues that excursions of ≥1.0 ɛNd points in any bulk Indus delta sediments must in part reflect an external shift in provenance irrespective of sample composition, grain size, and grain size distribution. Sample standard deviations (2s) estimate that any terrigenous bulk sediment composition should vary no greater than ±1.1 ɛNd points if provenance remains constant. Findings from this study indicate that although there are grain-size dependent Nd isotope effects, they are minimal in the Indus delta such that resolvable provenance-driven trends can be
Gender effects in gaming research: a case for regression residuals?
Pfister, Roland
2011-10-01
Numerous recent studies have examined the impact of video gaming on various dependent variables, including the players' affective reactions, positive as well as detrimental cognitive effects, and real-world aggression. These target variables are typically analyzed as a function of game characteristics and player attributes-especially gender. However, findings on the uneven distribution of gaming experience between males and females, on the one hand, and the effect of gaming experience on several target variables, on the other hand, point at a possible confound when gaming experiments are analyzed with a standard analysis of variance. This study uses simulated data to exemplify analysis of regression residuals as a potentially beneficial data analysis strategy for such datasets. As the actual impact of gaming experience on each of the various dependent variables differs, the ultimate benefits of analysis of regression residuals entirely depend on the research question, but it offers a powerful statistical approach to video game research whenever gaming experience is a confounding factor.
Directory of Open Access Journals (Sweden)
Qinghui Du
2014-01-01
Full Text Available We consider semi-implicit Euler methods for stochastic age-dependent capital system with variable delays and random jump magnitudes, and investigate the convergence of the numerical approximation. It is proved that the numerical approximate solutions converge to the analytical solutions in the mean-square sense under given conditions.
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Introduction to regression graphics
Cook, R Dennis
2009-01-01
Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava
Alternative Methods of Regression
Birkes, David
2011-01-01
Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s
Winiwarter, Susanne; Middleton, Brian; Jones, Barry; Courtney, Paul; Lindmark, Bo; Page, Ken M.; Clark, Alan; Landqvist, Claire
2015-09-01
We demonstrate here a novel use of statistical tools to study intra- and inter-site assay variability of five early drug metabolism and pharmacokinetics in vitro assays over time. Firstly, a tool for process control is presented. It shows the overall assay variability but allows also the following of changes due to assay adjustments and can additionally highlight other, potentially unexpected variations. Secondly, we define the minimum discriminatory difference/ratio to support projects to understand how experimental values measured at different sites at a given time can be compared. Such discriminatory values are calculated for 3 month periods and followed over time for each assay. Again assay modifications, especially assay harmonization efforts, can be noted. Both the process control tool and the variability estimates are based on the results of control compounds tested every time an assay is run. Variability estimates for a limited set of project compounds were computed as well and found to be comparable. This analysis reinforces the need to consider assay variability in decision making, compound ranking and in silico modeling.
Time dependence of the UV resonance lines in the cataclysmic variables SU UMa, RX And and 0623+71
International Nuclear Information System (INIS)
Woods, J.A.; Drew, J.E.; Verbunt, Frank
1990-01-01
We present IUE observations of the dwarf novae SU UMa and RX And, and of the nova-like variable 0623 + 71. At the time of observation, SU UMa and RX And were in outburst. All three systems show variability in the wind-formed UV resonance lines of N v λ 1240, Si IV λ 1397 and C IV λ 1549 on timescale of hours. The amplitude of variation is smallest in RX And and largest in 0623 + 71. There is evidence that the variations observed in SU UMa's UV spectrum repeat on the orbital period. Our observations of SU UMa also reveal variability in the continuum flux during the decline from outburst maximum that is much more marked in the UV than at optical wavelengths. (author)
Predicting company growth using logistic regression and neural networks
Directory of Open Access Journals (Sweden)
Marijana Zekić-Sušac
2016-12-01
Full Text Available The paper aims to establish an efficient model for predicting company growth by leveraging the strengths of logistic regression and neural networks. A real dataset of Croatian companies was used which described the relevant industry sector, financial ratios, income, and assets in the input space, with a dependent binomial variable indicating whether a company had high-growth if it had annualized growth in assets by more than 20% a year over a three-year period. Due to a large number of input variables, factor analysis was performed in the pre -processing stage in order to extract the most important input components. Building an efficient model with a high classification rate and explanatory ability required application of two data mining methods: logistic regression as a parametric and neural networks as a non -parametric method. The methods were tested on the models with and without variable reduction. The classification accuracy of the models was compared using statistical tests and ROC curves. The results showed that neural networks produce a significantly higher classification accuracy in the model when incorporating all available variables. The paper further discusses the advantages and disadvantages of both approaches, i.e. logistic regression and neural networks in modelling company growth. The suggested model is potentially of benefit to investors and economic policy makers as it provides support for recognizing companies with growth potential, especially during times of economic downturn.
Illicit drug use and abuse/dependence: modeling of two-stage variables using the CCC approach.
Agrawal, A; Neale, M C; Jacobson, K C; Prescott, C A; Kendler, K S
2005-06-01
Drug use and abuse/dependence are stages of a complex drug habit. Most genetically informative models that are fit to twin data examine drug use and abuse/dependence independent of each other. This poses an interesting question: for a multistage process, how can we partition the factors influencing each stage specifically from the factors that are common to both stages? We used a causal-common-contingent (CCC) model to partition the common and specific influences on drug use and abuse/dependence. Data on use and abuse/dependence of cannabis, cocaine, sedatives, stimulants and any illicit drug was obtained from male and female twin pairs. CCC models were tested individually for each sex and in a sex-equal model. Our results suggest that there is evidence for additive genetic, shared environmental and unique environmental influences that are common to illicit drug use and abuse/dependence. Furthermore, we found substantial evidence for factors that were specific to abuse/dependence. Finally, sexes could be equated for all illicit drugs. The findings of this study emphasize the need for models that can partition the sources of individual differences into common and stage-specific influences.
DEFF Research Database (Denmark)
Oosterhoff, Peter; Thomsen, Morten Bækgaard; Maas, Joep N
2010-01-01
-term variability of repolarization (STV) as a feedback parameter of arrhythmic risk. Methods and Results: The minimal signal sampling frequency required for measuring STV was determined through computer simulation. Arrhythmogenic response to dofetilide (25 µg/kg/5minutes) was tested at two different paced heart...
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Weisberg, Sanford
2013-01-01
Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus
Hosmer, David W; Sturdivant, Rodney X
2013-01-01
A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-
Applied Regression Modeling A Business Approach
Pardoe, Iain
2012-01-01
An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
Directory of Open Access Journals (Sweden)
S. Djebali
2011-02-01
Full Text Available This paper is concerned with a second-order nonlinear boundary value problem with a derivative depending nonlinearity and posed on the positive half-line. The derivative operator is time dependent. Upon a priori estimates and under a Nagumo growth condition, the Schauder's fixed point theorem combined with the method of upper and lower solutions on unbounded domains are used to prove existence of solutions. A uniqueness theorem is also obtained and some examples of application illustrate the obtained results.
Directory of Open Access Journals (Sweden)
M. Idrees
2018-03-01
Full Text Available An analysis is performed for the fluid dynamics incorporating the variation of viscosity and thermal conductivity on an unsteady two-dimensional free surface flow of a viscous incompressible conducting fluid taking into account the effect of a magnetic field. Surface tension quadratically vary with temperature while fluid viscosity and thermal conductivity are assumed to vary as a linear function of temperature. The boundary layer partial differential equations in cartesian coordinates are transformed into a system of nonlinear ordinary differential equations (ODEs by similarity transformation. The developed nonlinear equations are solved analytically by Homotopy Analysis Method (HAM while numerically by using the shooting method. The Effects of natural parameters such as the variable viscosity parameter A, variable thermal conductivity parameter N, Hartmann number Ma, film Thickness, unsteadiness parameter S, Thermocapillary number M and Prandtl number Pr on the velocity and temperature profiles are investigated. The results for the surface skin friction coefficient f″(0, Nusselt number (heat flux -θ′(0 and free surface temperature θ(1 are presented graphically and in tabular form. Keywords: Variable viscosity and thermal conductivity, Thermocapillary number, Magnetic field, Thin film, Unsteady stretching surface
International Nuclear Information System (INIS)
Shafiq, H.; Rashid, A.; Majeed, A.; Razah, S.; Asghar, I.
2016-01-01
Objective: To examine an inflammatory effect of warfarin and comparing with IL-6 levels along with different demographic and clinical variables. Study Design: Quasi experimental study. Place and Duration of Study: Center of Research in Experimental and Applied Medicine (CREAM), Army Medical College/National University of Sciences and Technology, Islamabad from Oct 2013 to Oct 2015. Material and Methods: The study design was Quasi Experimental study. Samples were collected by Non probability convenience sampling. Total 76 patients were included according to warfarin dose response in warfarin therapy patients, i.e. 32(42 percent) were taking 10mg/day of warfarin dose. Patient's demographic and clinical variables were noted i.e. age, gender, BMI, duration of therapy, INR history, hepatic, gastrointestinal and diabetic complications. Human IL-6 ELISA assay was performed. Results: The statistically significant difference was found between age groups (in years) and different levels of warfarin dose (p=0.046) along with IL-6 production. There is a negative correlation between warfarin dose and age group i.e. as age increases, the dose of warfarin decreases. Among the inter and intra-patient variability age and serum IL-6 levels were found to be statistically significant with warfarin dose response. BMI and warfarin dose were found to be weak positively correlated. Conclusion: A marked immunomodulatory response of warfarin was noted by measuring IL-6 levels. IL-6 levels retained a significant association with warfarin dose. (author)
Understanding poisson regression.
Hayat, Matthew J; Higgins, Melinda
2014-04-01
Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.
Fritz, Robert L.
A study examined the association between field-dependence and its related information processing characteristics, and educational cognitive style as a model of conative influence. Data were collected from 145 secondary marketing education students in nothern Georgia during spring 1991. Descriptive statistics, Pearson product moment correlations,…
The M Word: Multicollinearity in Multiple Regression.
Morrow-Howell, Nancy
1994-01-01
Notes that existence of substantial correlation between two or more independent variables creates problems of multicollinearity in multiple regression. Discusses multicollinearity problem in social work research in which independent variables are usually intercorrelated. Clarifies problems created by multicollinearity, explains detection of…
Predicting Social Trust with Binary Logistic Regression
Adwere-Boamah, Joseph; Hufstedler, Shirley
2015-01-01
This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
Directory of Open Access Journals (Sweden)
Majdak Marek
2017-01-01
Full Text Available The objective of this paper was to determine the relationship between the efficiency of photovoltaic panels and the value of the angle of their inclination relative to the horizon. For the purpose of experimental research have been done tests on the photovoltaic modules made of monocrystalline, polycrystalline and amorphous silicon. The experiment consisted of measurement of the voltage and current generated by photovoltaic panels at a known value of solar radiation and a specified resistance value determined by using resistor with variable value of resistance and known value of the angle of their inclination relative to the horizon.
Mapping geogenic radon potential by regression kriging
Energy Technology Data Exchange (ETDEWEB)
Pásztor, László [Institute for Soil Sciences and Agricultural Chemistry, Centre for Agricultural Research, Hungarian Academy of Sciences, Department of Environmental Informatics, Herman Ottó út 15, 1022 Budapest (Hungary); Szabó, Katalin Zsuzsanna, E-mail: sz_k_zs@yahoo.de [Department of Chemistry, Institute of Environmental Science, Szent István University, Páter Károly u. 1, Gödöllő 2100 (Hungary); Szatmári, Gábor; Laborczi, Annamária [Institute for Soil Sciences and Agricultural Chemistry, Centre for Agricultural Research, Hungarian Academy of Sciences, Department of Environmental Informatics, Herman Ottó út 15, 1022 Budapest (Hungary); Horváth, Ákos [Department of Atomic Physics, Eötvös University, Pázmány Péter sétány 1/A, 1117 Budapest (Hungary)
2016-02-15
Radon ({sup 222}Rn) gas is produced in the radioactive decay chain of uranium ({sup 238}U) which is an element that is naturally present in soils. Radon is transported mainly by diffusion and convection mechanisms through the soil depending mainly on the physical and meteorological parameters of the soil and can enter and accumulate in buildings. Health risks originating from indoor radon concentration can be attributed to natural factors and is characterized by geogenic radon potential (GRP). Identification of areas with high health risks require spatial modeling, that is, mapping of radon risk. In addition to geology and meteorology, physical soil properties play a significant role in the determination of GRP. In order to compile a reliable GRP map for a model area in Central-Hungary, spatial auxiliary information representing GRP forming environmental factors were taken into account to support the spatial inference of the locally measured GRP values. Since the number of measured sites was limited, efficient spatial prediction methodologies were searched for to construct a reliable map for a larger area. Regression kriging (RK) was applied for the interpolation using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly, the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Overall accuracy of the map was tested by Leave-One-Out Cross-Validation. Furthermore the spatial reliability of the resultant map is also estimated by the calculation of the 90% prediction interval of the local prediction values. The applicability of the applied method as well as that of the map is discussed briefly. - Highlights: • A new method
Mapping geogenic radon potential by regression kriging
International Nuclear Information System (INIS)
Pásztor, László; Szabó, Katalin Zsuzsanna; Szatmári, Gábor; Laborczi, Annamária; Horváth, Ákos
2016-01-01
Radon ( 222 Rn) gas is produced in the radioactive decay chain of uranium ( 238 U) which is an element that is naturally present in soils. Radon is transported mainly by diffusion and convection mechanisms through the soil depending mainly on the physical and meteorological parameters of the soil and can enter and accumulate in buildings. Health risks originating from indoor radon concentration can be attributed to natural factors and is characterized by geogenic radon potential (GRP). Identification of areas with high health risks require spatial modeling, that is, mapping of radon risk. In addition to geology and meteorology, physical soil properties play a significant role in the determination of GRP. In order to compile a reliable GRP map for a model area in Central-Hungary, spatial auxiliary information representing GRP forming environmental factors were taken into account to support the spatial inference of the locally measured GRP values. Since the number of measured sites was limited, efficient spatial prediction methodologies were searched for to construct a reliable map for a larger area. Regression kriging (RK) was applied for the interpolation using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly, the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Overall accuracy of the map was tested by Leave-One-Out Cross-Validation. Furthermore the spatial reliability of the resultant map is also estimated by the calculation of the 90% prediction interval of the local prediction values. The applicability of the applied method as well as that of the map is discussed briefly. - Highlights: • A new method, regression
Spatial Quantile Regression In Analysis Of Healthy Life Years In The European Union Countries
Directory of Open Access Journals (Sweden)
Trzpiot Grażyna
2016-12-01
Full Text Available The paper investigates the impact of the selected factors on the healthy life years of men and women in the EU countries. The multiple quantile spatial autoregression models are used in order to account for substantial differences in the healthy life years and life quality across the EU members. Quantile regression allows studying dependencies between variables in different quantiles of the response distribution. Moreover, this statistical tool is robust against violations of the classical regression assumption about the distribution of the error term. Parameters of the models were estimated using instrumental variable method (Kim, Muller 2004, whereas the confidence intervals and p-values were bootstrapped.
Idrees, M.; Rehman, Sajid; Shah, Rehan Ali; Ullah, M.; Abbas, Tariq
2018-03-01
An analysis is performed for the fluid dynamics incorporating the variation of viscosity and thermal conductivity on an unsteady two-dimensional free surface flow of a viscous incompressible conducting fluid taking into account the effect of a magnetic field. Surface tension quadratically vary with temperature while fluid viscosity and thermal conductivity are assumed to vary as a linear function of temperature. The boundary layer partial differential equations in cartesian coordinates are transformed into a system of nonlinear ordinary differential equations (ODEs) by similarity transformation. The developed nonlinear equations are solved analytically by Homotopy Analysis Method (HAM) while numerically by using the shooting method. The Effects of natural parameters such as the variable viscosity parameter A, variable thermal conductivity parameter N, Hartmann number Ma, film Thickness, unsteadiness parameter S, Thermocapillary number M and Prandtl number Pr on the velocity and temperature profiles are investigated. The results for the surface skin friction coefficient f″ (0) , Nusselt number (heat flux) -θ‧ (0) and free surface temperature θ (1) are presented graphically and in tabular form.
Regression dilution bias: tools for correction methods and sample size calculation.
Berglund, Lars
2012-08-01
Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.
Ulbrich, N.; Volden, T.
2018-01-01
Analysis and use of temperature-dependent wind tunnel strain-gage balance calibration data are discussed in the paper. First, three different methods are presented and compared that may be used to process temperature-dependent strain-gage balance data. The first method uses an extended set of independent variables in order to process the data and predict balance loads. The second method applies an extended load iteration equation during the analysis of balance calibration data. The third method uses temperature-dependent sensitivities for the data analysis. Physical interpretations of the most important temperature-dependent regression model terms are provided that relate temperature compensation imperfections and the temperature-dependent nature of the gage factor to sets of regression model terms. Finally, balance calibration recommendations are listed so that temperature-dependent calibration data can be obtained and successfully processed using the reviewed analysis methods.
A consistent framework for Horton regression statistics that leads to a modified Hack's law
Furey, P.R.; Troutman, B.M.
2008-01-01
A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.
DEFF Research Database (Denmark)
Bache, Stefan Holst
A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....
DEFF Research Database (Denmark)
Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas
2017-01-01
In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...... functionals. The software presented here is implemented in the riskRegression package....
Overcoming multicollinearity in multiple regression using correlation coefficient
Zainodin, H. J.; Yap, S. J.
2013-09-01
Multicollinearity happens when there are high correlations among independent variables. In this case, it would be difficult to distinguish between the contributions of these independent variables to that of the dependent variable as they may compete to explain much of the similar variance. Besides, the problem of multicollinearity also violates the assumption of multiple regression: that there is no collinearity among the possible independent variables. Thus, an alternative approach is introduced in overcoming the multicollinearity problem in achieving a well represented model eventually. This approach is accomplished by removing the multicollinearity source variables on the basis of the correlation coefficient values based on full correlation matrix. Using the full correlation matrix can facilitate the implementation of Excel function in removing the multicollinearity source variables. It is found that this procedure is easier and time-saving especially when dealing with greater number of independent variables in a model and a large number of all possible models. Hence, in this paper detailed insight of the procedure is shown, compared and implemented.
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Directory of Open Access Journals (Sweden)
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Park, Seong-Beom; Lee, Inah
2016-08-01
Place cells in the hippocampus fire at specific positions in space, and distal cues in the environment play critical roles in determining the spatial firing patterns of place cells. Many studies have shown that place fields are influenced by distal cues in foraging animals. However, it is largely unknown whether distal-cue-dependent changes in place fields appear in different ways in a memory task if distal cues bear direct significance to achieving goals. We investigated this possibility in this study. Rats were trained to choose different spatial positions in a radial arm in association with distal cue configurations formed by visual cue sets attached to movable curtains around the apparatus. The animals were initially trained to associate readily discernible distal cue configurations (0° vs. 80° angular separation between distal cue sets) with different food-well positions and then later experienced ambiguous cue configurations (14° and 66°) intermixed with the original cue configurations. Rats showed no difficulty in transferring the associated memory formed for the original cue configurations when similar cue configurations were presented. Place field positions remained at the same locations across different cue configurations, whereas stability and coherence of spatial firing patterns were significantly disrupted when ambiguous cue configurations were introduced. Furthermore, the spatial representation was extended backward and skewed more negatively at the population level when processing ambiguous cue configurations, compared with when processing the original cue configurations only. This effect was more salient for large cue-separation conditions than for small cue-separation conditions. No significant rate remapping was observed across distal cue configurations. These findings suggest that place cells in the hippocampus dynamically change their detailed firing characteristics in response to a modified cue environment and that some of the firing
Bayesian logistic regression analysis
Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.
2012-01-01
In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an
Seber, George A F
2012-01-01
Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.
Ritz, Christian; Parmigiani, Giovanni
2009-01-01
R is a rapidly evolving lingua franca of graphical display and statistical analysis of experiments from the applied sciences. This book provides a coherent treatment of nonlinear regression with R by means of examples from a diversity of applied sciences such as biology, chemistry, engineering, medicine and toxicology.
Bayesian ARTMAP for regression.
Sasu, L M; Andonie, R
2013-10-01
Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. Copyright © 2013 Elsevier Ltd. All rights reserved.
and Multinomial Logistic Regression
African Journals Online (AJOL)
This work presented the results of an experimental comparison of two models: Multinomial Logistic Regression (MLR) and Artificial Neural Network (ANN) for classifying students based on their academic performance. The predictive accuracy for each model was measured by their average Classification Correct Rate (CCR).
Mechanisms of neuroblastoma regression
Brodeur, Garrett M.; Bagatell, Rochelle
2014-01-01
Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179
Should metacognition be measured by logistic regression?
Rausch, Manuel; Zehetleitner, Michael
2017-03-01
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.
Multivariate Frequency-Severity Regression Models in Insurance
Directory of Open Access Journals (Sweden)
Edward W. Frees
2016-02-01
Full Text Available In insurance and related industries including healthcare, it is common to have several outcome measures that the analyst wishes to understand using explanatory variables. For example, in automobile insurance, an accident may result in payments for damage to one’s own vehicle, damage to another party’s vehicle, or personal injury. It is also common to be interested in the frequency of accidents in addition to the severity of the claim amounts. This paper synthesizes and extends the literature on multivariate frequency-severity regression modeling with a focus on insurance industry applications. Regression models for understanding the distribution of each outcome continue to be developed yet there now exists a solid body of literature for the marginal outcomes. This paper contributes to this body of literature by focusing on the use of a copula for modeling the dependence among these outcomes; a major advantage of this tool is that it preserves the body of work established for marginal models. We illustrate this approach using data from the Wisconsin Local Government Property Insurance Fund. This fund offers insurance protection for (i property; (ii motor vehicle; and (iii contractors’ equipment claims. In addition to several claim types and frequency-severity components, outcomes can be further categorized by time and space, requiring complex dependency modeling. We find significant dependencies for these data; specifically, we find that dependencies among lines are stronger than the dependencies between the frequency and average severity within each line.
KELEŞ, Taliha; ALTUN, Murat
2016-01-01
Regression analysis is a statistical technique for investigating and modeling the relationship between variables. The purpose of this study was the trivial presentation of the equation for orthogonal regression (OR) and the comparison of classical linear regression (CLR) and OR techniques with respect to the sum of squared perpendicular distances. For that purpose, the analyses were shown by an example. It was found that the sum of squared perpendicular distances of OR is smaller. Thus, it wa...
Directory of Open Access Journals (Sweden)
L. L. Davtian
2018-03-01
Full Text Available The influence of variables of pharmaceutical factors on the technological processes of drugs manufacturing is incredibly important. Thus, in the development of a new drug in the form of medicinal films, the relevance and necessity of determining the effect of the methods of active substances adding on the effectiveness of the drug was determined. The aim is rationalization of the method of the active pharmaceutical ingredients adding into the composition of the developed drug. Materials and methods. As experimental samples we used medicinal films, which were made using various methods of active ingredients adding. The quality of the samples was evaluated by the antimicrobial activity against Clostridium sporogenes and Staphylococcus aureus, which was determined by the diffusion method in agar. Results. The study of the antimicrobial activity of medicinal films with various methods of active ingredients adding showed that the adding of metronidazole as an aqueous solution increases the antimicrobial activity of the films by 21.23%, 16.89%, 28.59%, respectively, compared with films of similar composition, in which metronidazole was added as a suspension, and the remaining ingredients were added by the same way. The introduction of chlorhexidine bigluconate and glucosamine hydrochloride in the film-forming solution lastly together with the solution of metronidazole increases the antimicrobial activity by 24.67%, which is probably due to the absence of contact between thermolabile ingredients and solutions of film-forming substances having a high dissolution temperature. Conclusions. The most rational is adding of metronidazole to the medicinal films in the form of a 0.01% aqueous solution in a mixture with the chlorhexidine bigluconate and glucosamine hydrochloride solution to the final film-forming solution.
Directory of Open Access Journals (Sweden)
Graziela Murta Barbosa
Full Text Available It is well known that strain and virulence diversity exist within the population structure of Porphyromonas gingivalis. In the present study we investigate intra- and inter-species variability in biofilm formation of Porphyromonas gingivalis and partners Prevotella intermedia and Prevotella nigrescens. All strains tested showed similar hydrophobicity, except for P. gingivalis W83 which has roughly half of the hydrophobicity of P. gingivalis ATCC33277. An intraspecies variability in coaggregation of P. gingivalis with P. intermedia was also found. The association P. gingivalis W83/P. intermedia 17 produced the thickest biofilm and strain 17 was prevalent. In a two-compartment system P. gingivalis W83 stimulates an increase in biomass of strain 17 and the latter did not stimulate the growth of P. gingivalis W83. In addition, P. gingivalis W83 also stimulates the growth of P. intermedia ATCC25611 although strain W83 was prevalent in the association with P. intermedia ATCC25611. P. gingivalis ATCC33277 was prevalent in both associations with P. intermedia and both strains of P. intermedia stimulate the growth of P. gingivalis ATCC33277. FISH images also showed variability in biofilm structure. Thus, the outcome of the association P. gingivalis/P. intermedia seems to be strain-dependent, and both soluble factors and physical contact are relevant. The association P. gingivalis-P. nigrescens ATCC33563 produced larger biomass than each monotypic biofilm, and P. gingivalis was favored in consortia, while no differences were found in the two-compartment system. Therefore, in consortia P. gingivalis-P. nigrescens physical contact seems to favor P. gingivalis growth. The intraspecies variability found in our study suggests strain-dependence in ability of microorganisms to recognize molecules in other bacteria which may further elucidate the dysbiosis event during periodontitis development giving additional explanation for periodontal bacteria, such as P
Barbosa, Graziela Murta; Colombo, Andrea Vieira; Rodrigues, Paulo Henrique; Simionato, Maria Regina Lorenzetti
2015-01-01
It is well known that strain and virulence diversity exist within the population structure of Porphyromonas gingivalis. In the present study we investigate intra- and inter-species variability in biofilm formation of Porphyromonas gingivalis and partners Prevotella intermedia and Prevotella nigrescens. All strains tested showed similar hydrophobicity, except for P. gingivalis W83 which has roughly half of the hydrophobicity of P. gingivalis ATCC33277. An intraspecies variability in coaggregation of P. gingivalis with P. intermedia was also found. The association P. gingivalis W83/P. intermedia 17 produced the thickest biofilm and strain 17 was prevalent. In a two-compartment system P. gingivalis W83 stimulates an increase in biomass of strain 17 and the latter did not stimulate the growth of P. gingivalis W83. In addition, P. gingivalis W83 also stimulates the growth of P. intermedia ATCC25611 although strain W83 was prevalent in the association with P. intermedia ATCC25611. P. gingivalis ATCC33277 was prevalent in both associations with P. intermedia and both strains of P. intermedia stimulate the growth of P. gingivalis ATCC33277. FISH images also showed variability in biofilm structure. Thus, the outcome of the association P. gingivalis/P. intermedia seems to be strain-dependent, and both soluble factors and physical contact are relevant. The association P. gingivalis-P. nigrescens ATCC33563 produced larger biomass than each monotypic biofilm, and P. gingivalis was favored in consortia, while no differences were found in the two-compartment system. Therefore, in consortia P. gingivalis-P. nigrescens physical contact seems to favor P. gingivalis growth. The intraspecies variability found in our study suggests strain-dependence in ability of microorganisms to recognize molecules in other bacteria which may further elucidate the dysbiosis event during periodontitis development giving additional explanation for periodontal bacteria, such as P. gingivalis and P
Steger, Doris; Berry, David; Haider, Susanne; Horn, Matthias; Wagner, Michael; Stocker, Roman; Loy, Alexander
2011-01-01
The hybridization of nucleic acid targets with surface-immobilized probes is a widely used assay for the parallel detection of multiple targets in medical and biological research. Despite its widespread application, DNA microarray technology still suffers from several biases and lack of reproducibility, stemming in part from an incomplete understanding of the processes governing surface hybridization. In particular, non-random spatial variations within individual microarray hybridizations are often observed, but the mechanisms underpinning this positional bias remain incompletely explained. This study identifies and rationalizes a systematic spatial bias in the intensity of surface hybridization, characterized by markedly increased signal intensity of spots located at the boundaries of the spotted areas of the microarray slide. Combining observations from a simplified single-probe block array format with predictions from a mathematical model, the mechanism responsible for this bias is found to be a position-dependent variation in lateral diffusion of target molecules. Numerical simulations reveal a strong influence of microarray well geometry on the spatial bias. Reciprocal adjustment of the size of the microarray hybridization chamber to the area of surface-bound probes is a simple and effective measure to minimize or eliminate the diffusion-based bias, resulting in increased uniformity and accuracy of quantitative DNA microarray hybridization.
Johanson, I. A.; Miklius, A.; Okubo, P.; Montgomery-Brown, E. K.
2017-12-01
Mauna Loa volcano is the largest active volcano on earth and in the 20thcentury produced roughly one eruption every seven years. The 33-year quiescence since its last eruption 1984 has been punctuated by three inflation episodes where magma likely entered the shallow plumbing system, but was not erupted. The most recent began in 2014 and is ongoing. Unlike prior inflation episodes, the current one is accompanied by a significant increase in shallow seismicity, a pattern that is similar to earlier pre-eruptive periods. We apply the Kalman filter based Network Inversion Filter (NIF) to the 2014-present inflation episode using data from a 27 station continuous GPS network on Mauna Loa. The model geometry consists of a point volume source and tabular, dike-like body, which have previously been shown to provide a good fit to deformation data from a 2004-2009 inflation episode. The tabular body is discretized into 1km x 1km segments. For each day, the NIF solves for the rates of opening on the tabular body segments (subject to smoothing and positivity constraints), volume change rate in the point source, and slip rate on a deep décollement fault surface, which is constrained to a constant (no transient slip allowed). The Kalman filter in the NIF provides for smoothing both forwards and backwards in time. The model shows that the 2014-present inflation episode occurred as several sub-events, rather than steady inflation. It shows some spatial variability in the location of the inflation sub-events. In the model, opening in the tabular body is initially concentrated below the volcano's summit, in an area roughly outlined by shallow seismicity. In October, 2015 opening in the tabular body shifts to be centered beneath the southwest portion of the summit and seismicity becomes concentrated in this area. By late 2016, the opening rate on the tabular body decreases and is once again under the central part of summit. This modeling approach has allowed us to track these
Fuzzy multiple linear regression: A computational approach
Juang, C. H.; Huang, X. H.; Fleming, J. W.
1992-01-01
This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.
Ridge Regression Signal Processing
Kuhl, Mark R.
1990-01-01
The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
Subset selection in regression
Miller, Alan
2002-01-01
Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...
Regression in organizational leadership.
Kernberg, O F
1979-02-01
The choice of good leaders is a major task for all organizations. Inforamtion regarding the prospective administrator's personality should complement questions regarding his previous experience, his general conceptual skills, his technical knowledge, and the specific skills in the area for which he is being selected. The growing psychoanalytic knowledge about the crucial importance of internal, in contrast to external, object relations, and about the mutual relationships of regression in individuals and in groups, constitutes an important practical tool for the selection of leaders.
Classification and regression trees
Breiman, Leo; Olshen, Richard A; Stone, Charles J
1984-01-01
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Hilbe, Joseph M
2009-01-01
This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
Directory of Open Access Journals (Sweden)
Danilo F Pereira
2011-02-01
Full Text Available The increasing demand of consumer markets for the welfare of birds in poultry house has motivated many scientific researches to monitor and classify the welfare according to the production environment. Given the complexity between the birds and the environment of the aviary, the correct interpretation of the conduct becomes an important way to estimate the welfare of these birds. This study obtained multiple logistic regression models with capacity of estimating the welfare of broiler breeders in relation to the environment of the aviaries and behaviors expressed by the birds. In the experiment, were observed several behaviors expressed by breeders housed in a climatic chamber under controlled temperatures and three different ammonia concentrations from the air monitored daily. From the analysis of the data it was obtained two logistic regression models, of which the first model uses a value of ammonia concentration measured by unit and the second model uses a binary value to classify the ammonia concentration that is assigned by a person through his olfactory perception. The analysis showed that both models classified the broiler breeder's welfare successfully.As crescentes demandas e exigências dos mercados consumidores pelo bem-estar das aves nos aviários têm motivado diversas pesquisas científicas a monitorar e a classificar o bem-estar em função do ambiente de criação. Diante da complexidade com que as aves interagem com o ambiente do aviário, a correta interpretação dos comportamentos torna-se uma importante maneira para estimar o bem-estar dessas aves. Este trabalho criou modelos de regressão logística múltipla capazes de estimar o bem-estar de matrizes pesadas em função do ambiente do aviário e dos comportamentos expressos pelas aves. No experimento, foram observados diversos comportamentos expressos por matrizes pesadas alojadas em câmara climática sob três temperaturas controladas e diferentes concentrações de am
International Nuclear Information System (INIS)
Hoffman, A. J.; Lee, J. C.
2013-01-01
A new time-dependent neutron transport method based on the method of characteristics (MOC) has been developed. Whereas most spatial kinetics methods treat time dependence through temporal discretization, this new method treats time dependence by defining the characteristics to span space and time. In this implementation regions are defined in space-time where the thickness of the region in time fulfills an analogous role to the time step in discretized methods. The time dependence of the local source is approximated using a truncated Taylor series expansion with high order derivatives approximated using backward differences, permitting the solution of the resulting space-time characteristic equation. To avoid a drastic increase in computational expense and memory requirements due to solving many discrete characteristics in the space-time planes, the temporal variation of the boundary source is similarly approximated. This allows the characteristics in the space-time plane to be represented analytically rather than discretely, resulting in an algorithm comparable in implementation and expense to one that arises from conventional time integration techniques. Furthermore, by defining the boundary flux time derivative in terms of the preceding local source time derivative and boundary flux time derivative, the need to store angularly-dependent data is avoided without approximating the angular dependence of the angular flux time derivative. The accuracy of this method is assessed through implementation in the neutron transport code DeCART. The method is employed with variable-order local source representation to model a TWIGL transient. The results demonstrate that this method is accurate and more efficient than the discretized method. (authors)
Directory of Open Access Journals (Sweden)
Nina L. Timofeeva
2014-01-01
Full Text Available The article presents the methodological and technical bases for the creation of regression models that adequately reflect reality. The focus is on methods of removing residual autocorrelation in models. Algorithms eliminating heteroscedasticity and autocorrelation of the regression model residuals: reweighted least squares method, the method of Cochran-Orkutta are given. A model of "pure" regression is build, as well as to compare the effect on the dependent variable of the different explanatory variables when the latter are expressed in different units, a standardized form of the regression equation. The scheme of abatement techniques of heteroskedasticity and autocorrelation for the creation of regression models specific to the social and cultural sphere is developed.
Significance tests to determine the direction of effects in linear regression models.
Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander
2015-02-01
Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice. © 2014 The British Psychological Society.
Directory of Open Access Journals (Sweden)
Volosovets A.O.
2017-12-01
Full Text Available Arterial hypertension can cause a pronounced negative influence on the state of the cerebral vascular system and lead to significant microtraumatization of the walls of the vessels and disruption of vascular autoregulation. This predictor has the greatest influence on the onset of ischemic stroke of atherothrombotic and lacunar subtypes, however, hypertension occurs almost in all patients with acute cerebral ischemia. Interesting and not at all presented in modern scientific literature is the question of the relationship of oscillation of blood pressure with the period of the onset of the focus of ischemia, which predetermined the purpose of our work. The purpose of our study was to determine the relationship between deformation of the profile of fluctuations in blood pressure of patients in the acute period of ischemic stroke, depending on the time of the occurrence of cerebrovascular accident. We examined 300 patients who suffered acute ischemic stroke (men - 196, women - 104 aged 42 to 84 years (average age - 65.2 ± 8.7 years. All patients were divided into 3 groups according to the period of the day when an ischemic stroke occurred: 1 group (n=146, patients suffering from cerebral ischemia during the day (8.00-14.59; In group 2 (n=107, patients stroke was observed in the evening (15.00-21.59; Group 3 (n=47, patients had an ischemic stroke at night (22.00-7.59. For the 1st group of patients who have had ischemic stroke during the day and as a rule with an increase in blood pressure, a marked increase in blood pressure was at 12.00 and 15.00 and a tendency towards compensatory parasympathetic effect in the form of blood pressure decrease at night (over-dipper was typical. At the same time, in the 2nd group of patients with stroke in the evening, elevated blood pressure at 18.00 and 21.00 and parasympathetic activity disorders with prevalence of insufficient reduction of blood pressure in the evening and during sleep (non-dipper was observed
Directory of Open Access Journals (Sweden)
Danang Ariyanto
2017-11-01
Full Text Available Regression is a method connected independent variable and dependent variable with estimation parameter as an output. Principal problem in this method is its application in spatial data. Geographically Weighted Regression (GWR method used to solve the problem. GWR is a regression technique that extends the traditional regression framework by allowing the estimation of local rather than global parameters. In other words, GWR runs a regression for each location, instead of a sole regression for the entire study area. The purpose of this research is to analyze the factors influencing wet land paddy productivities in Tulungagung Regency. The methods used in this research is GWR using cross validation bandwidth and weighted by adaptive Gaussian kernel fungtion.This research using 4 variables which are presumed affecting the wet land paddy productivities such as: the rate of rainfall(X1, the average cost of fertilizer per hectare(X2, the average cost of pestisides per hectare(X3 and Allocation of subsidized NPK fertilizer of food crops sub-sector(X4. Based on the result, X1, X2, X3 and X4 has a different effect on each Distric. So, to improve the productivity of wet land paddy in Tulungagung Regency required a special policy based on the GWR model in each distric.
Steganalysis using logistic regression
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
DEFF Research Database (Denmark)
Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas
2017-01-01
In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface......-product we obtain fast access to the baseline hazards (compared to survival::basehaz()) and predictions of survival probabilities, their confidence intervals and confidence bands. Confidence intervals and confidence bands are based on point-wise asymptotic expansions of the corresponding statistical...
On concurvity in nonlinear and nonparametric regression models
Directory of Open Access Journals (Sweden)
Sonia Amodio
2014-12-01
Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.
Hayes, Andrew F; Matthes, Jörg
2009-08-01
Researchers often hypothesize moderated effects, in which the effect of an independent variable on an outcome variable depends on the value of a moderator variable. Such an effect reveals itself statistically as an interaction between the independent and moderator variables in a model of the outcome variable. When an interaction is found, it is important to probe the interaction, for theories and hypotheses often predict not just interaction but a specific pattern of effects of the focal independent variable as a function of the moderator. This article describes the familiar pick-a-point approach and the much less familiar Johnson-Neyman technique for probing interactions in linear models and introduces macros for SPSS and SAS to simplify the computations and facilitate the probing of interactions in ordinary least squares and logistic regression. A script version of the SPSS macro is also available for users who prefer a point-and-click user interface rather than command syntax.
Sparse Regression by Projection and Sparse Discriminant Analysis
Qi, Xin
2015-04-03
© 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
Hadayeghi, Alireza; Shalaby, Amer S; Persaud, Bhagwant N
2010-03-01
A common technique used for the calibration of collision prediction models is the Generalized Linear Modeling (GLM) procedure with the assumption of Negative Binomial or Poisson error distribution. In this technique, fixed coefficients that represent the average relationship between the dependent variable and each explanatory variable are estimated. However, the stationary relationship assumed may hide some important spatial factors of the number of collisions at a particular traffic analysis zone. Consequently, the accuracy of such models for explaining the relationship between the dependent variable and the explanatory variables may be suspected since collision frequency is likely influenced by many spatially defined factors such as land use, demographic characteristics, and traffic volume patterns. The primary objective of this study is to investigate the spatial variations in the relationship between the number of zonal collisions and potential transportation planning predictors, using the Geographically Weighted Poisson Regression modeling technique. The secondary objective is to build on knowledge comparing the accuracy of Geographically Weighted Poisson Regression models to that of Generalized Linear Models. The results show that the Geographically Weighted Poisson Regression models are useful for capturing spatially dependent relationships and generally perform better than the conventional Generalized Linear Models. Copyright 2009 Elsevier Ltd. All rights reserved.
DEFF Research Database (Denmark)
Hansen, Henrik; Tarp, Finn
2001-01-01
This paper examines the relationship between foreign aid and growth in real GDP per capita as it emerges from simple augmentations of popular cross country growth specifications. It is shown that aid in all likelihood increases the growth rate, and this result is not conditional on ‘good’ policy....... investment. We conclude by stressing the need for more theoretical work before this kind of cross-country regressions are used for policy purposes.......This paper examines the relationship between foreign aid and growth in real GDP per capita as it emerges from simple augmentations of popular cross country growth specifications. It is shown that aid in all likelihood increases the growth rate, and this result is not conditional on ‘good’ policy...
Prediction accuracy and stability of regression with optimal scaling transformations
Kooij, van der Anita J.
2007-01-01
The central topic of this thesis is the CATREG approach to nonlinear regression. This approach finds optimal quantifications for categorical variables and/or nonlinear transformations for numerical variables in regression analysis. (CATREG is implemented in SPSS Categories by the author of the
Nonlinear Forecasting With Many Predictors Using Kernel Ridge Regression
DEFF Research Database (Denmark)
Exterkate, Peter; Groenen, Patrick J.F.; Heij, Christiaan
This paper puts forward kernel ridge regression as an approach for forecasting with many predictors that are related nonlinearly to the target variable. In kernel ridge regression, the observed predictor variables are mapped nonlinearly into a high-dimensional space, where estimation of the predi...
Implicit collinearity effect in linear regression: Application to basal ...
African Journals Online (AJOL)
Collinearity of predictor variables is a severe problem in the least square regression analysis. It contributes to the instability of regression coefficients and leads to a wrong prediction accuracy. Despite these problems, studies are conducted with a large number of observed and derived variables linked with a response ...
Few crystal balls are crystal clear : eyeballing regression
International Nuclear Information System (INIS)
Wittebrood, R.T.
1998-01-01
The theory of regression and statistical analysis as it applies to reservoir analysis was discussed. It was argued that regression lines are not always the final truth. It was suggested that regression lines and eyeballed lines are often equally accurate. The many conditions that must be fulfilled to calculate a proper regression were discussed. Mentioned among these conditions were the distribution of the data, hidden variables, knowledge of how the data was obtained, the need for causal correlation of the variables, and knowledge of the manner in which the regression results are going to be used. 1 tab., 13 figs
Group-wise partial least square regression
Camacho, José; Saccenti, Edoardo
2018-01-01
This paper introduces the group-wise partial least squares (GPLS) regression. GPLS is a new sparse PLS technique where the sparsity structure is defined in terms of groups of correlated variables, similarly to what is done in the related group-wise principal component analysis. These groups are
Regression Equations for Birth Weight Estimation using ...
African Journals Online (AJOL)
In this study, Birth Weight has been estimated from anthropometric measurements of hand and foot. Linear regression equations were formed from each of the measured variables. These simple equations can be used to estimate Birth Weight of new born babies, in order to identify those with low birth weight and referred to ...
Quantum algorithm for linear regression
Wang, Guoming
2017-07-01
We present a quantum algorithm for fitting a linear regression model to a given data set using the least-squares approach. Differently from previous algorithms which yield a quantum state encoding the optimal parameters, our algorithm outputs these numbers in the classical form. So by running it once, one completely determines the fitted model and then can use it to make predictions on new data at little cost. Moreover, our algorithm works in the standard oracle model, and can handle data sets with nonsparse design matrices. It runs in time poly( log2(N ) ,d ,κ ,1 /ɛ ) , where N is the size of the data set, d is the number of adjustable parameters, κ is the condition number of the design matrix, and ɛ is the desired precision in the output. We also show that the polynomial dependence on d and κ is necessary. Thus, our algorithm cannot be significantly improved. Furthermore, we also give a quantum algorithm that estimates the quality of the least-squares fit (without computing its parameters explicitly). This algorithm runs faster than the one for finding this fit, and can be used to check whether the given data set qualifies for linear regression in the first place.
DEFF Research Database (Denmark)
Gencay, Yilmaz Emre; Sørensen, Martine C.H.; Wenzel, Cory Q.
2018-01-01
Campylobacter jejuni NCTC12662 is sensitive to infection by many Campylobacter bacteriophages. Here we used this strain to investigate the molecular mechanism behind phage resistance development when exposed to a single phage and demonstrate how phase variable expression of one surface component...... influences phage sensitivity against many diverse C. jejuni phages. When C. jejuni NCTC12662 was exposed to phage F207 overnight, 25% of the bacterial cells were able to grow on a lawn of phage F207, suggesting that resistance develops at a high frequency. One resistant variant, 12662R, was further...... characterized and shown to be an adsorption mutant. Plaque assays using our large phage collection showed that seven out of 36 diverse capsular polysaccharide (CPS)-dependent phages could not infect 12662R, whereas the remaining phages formed plaques on 12662R with reduced efficiencies. Analysis of the CPS...
International Nuclear Information System (INIS)
Menouar, Salah; Maamache, Mustapha; Choi, Jeong Ryeol
2010-01-01
The quantum states of time-dependent coupled oscillator model for charged particles subjected to variable magnetic field are investigated using the invariant operator methods. To do this, we have taken advantage of an alternative method, so-called unitary transformation approach, available in the framework of quantum mechanics, as well as a generalized canonical transformation method in the classical regime. The transformed quantum Hamiltonian is obtained using suitable unitary operators and is represented in terms of two independent harmonic oscillators which have the same frequencies as that of the classically transformed one. Starting from the wave functions in the transformed system, we have derived the full wave functions in the original system with the help of the unitary operators. One can easily take a complete description of how the charged particle behaves under the given Hamiltonian by taking advantage of these analytical wave functions.
Gaxiola-Valdez, Ismael; Goodyear, Bradley G
2012-12-01
Accurate localization of brain activity using blood oxygenation level-dependent (BOLD) functional magnetic resonance imaging (fMRI) has been challenged because of the large BOLD signal within distal veins. Arterial spin labeling (ASL) techniques offer greater sensitivity to the microvasculature but possess low temporal resolution and limited brain coverage. In this study, we show that the physiological origins of BOLD and ASL depend on whether percent change or statistical significance is being considered. For BOLD and ASL fMRI data collected during a simple unilateral hand movement task, we found that in the area of the contralateral motor cortex the centre of gravity (CoG) of the intersubject coefficient of variation (CV) of BOLD fMRI was near the brain surface for percent change in signal, whereas the CoG of the intersubject CV for Z-score was in close proximity of sites of brain activity for both BOLD and ASL. These findings suggest that intersubject variability of BOLD percent change is vascular in origin, whereas the origin of inter-subject variability of Z-score is neuronal for both BOLD and ASL. For longer duration tasks (12 s or greater), however, there was a significant correlation between BOLD and ASL percent change, which was not evident for short duration tasks (6 s). These findings suggest that analyses directly comparing percent change in BOLD signal between pre-defined regions of interest using short duration stimuli, as for example in event-related designs, may be heavily weighted by large-vessel responses rather than neuronal responses. Copyright © 2012 Elsevier Inc. All rights reserved.
Osborn, B.; Chapple, W.; Ewers, B. E.; Williams, D. G.
2014-12-01
The interaction between soil conditions and climate variability plays a central role in the ecohydrological functions of montane conifer forests. Although soil moisture availability to trees is largely dependent on climate, the depth and texture of soil exerts a key secondary influence. Multiple Pleistocene glacial events have shaped the landscape of the central Rocky Mountains creating a patchwork of soils differing in age and textural classification. This mosaic of soil conditions impacts hydrological properties, and montane conifer forests potentially respond to climate variability quite differently depending on the age of glacial till and soil development. We hypothesized that the age of glacial till and associated soil textural changes exert strong control on growth and photosynthetic gas exchange of lodgepole pine. We examined physiological and growth responses of lodgepole pine to interannual variation in maximum annual snow water equivalence (SWEmax) of montane snowpack and growing season air temperature (Tair) and vapor pressure deficit (VPD) across a chronosequence of Pleistocene glacial tills ranging in age from 700k to 12k years. Soil textural differences across the glacial tills illustrate the varying degrees of weathering with the most well developed soils with highest clay content on the oldest till surfaces. We show that sensitivity of growth and carbon isotope discrimination, an integrated measure of canopy gas exchange properties, to interannual variation SWEmax , Tair and VPD is greatest on young till surfaces, whereas trees on old glacial tills with well-developed soils are mostly insensitive to these interannual climate fluctuations. Tree-ring widths were most sensitive to changes in SWEmax on young glacial tills (p < 0.01), and less sensitive on the oldest till (p < 0.05). Tair correlates strongly with δ13C values on the oldest and youngest tills sites, but shows no significant relationship on the middle aged glacial till. It is clear that
Wurster, Sebastian; Weis, Philipp; Page, Lukas; Helm, Johanna; Lazariotou, Maria; Einsele, Hermann; Ullmann, Andrew J
2017-10-01
Invasive aspergillosis remains a deadly disease in immunocompromised patients, whereas the combination of an exaggerated immune response and continuous exposure lead to various hyperinflammatory diseases. This pilot study aimed to gain an overview of the intra- and inter-individual variability in Aspergillus fumigatus reactive T-helper cells in healthy adults and the correlation with environmental mould exposure. In this flow cytometric study, the frequencies of CD154 + A. fumigatus reactive T cells were evaluated in 70 healthy volunteers. All subjects completed a standardised questionnaire addressing their mould exposure. Subjects with intensive mould exposure in their professional or residential surrounding demonstrated considerably higher mean frequencies of A. fumigatus reactive T-helper and T-memory cells. Comparative evaluation of multiple measurements over time demonstrated relatively conserved reactive T-cell frequencies in the absence of major changes to the exposure profile, whereas those frequently exposed in professional environment or with changes to their risk score demonstrated a marked dependency of antigen reactive T-cell frequencies on recent mould exposure. This pilot study was the first to provide data on the intra-individual variability in A. fumigatus reactive T-cell frequencies and its linkage to mould encounter. Fungus reactive T cells are to be considered a valued tool for the assessment of environmental mould exposure. © 2017 Blackwell Verlag GmbH.
Directory of Open Access Journals (Sweden)
Keita eMitani
2016-06-01
Full Text Available The processing of time intervals is fundamental for sensorimotor and cognitive functions. Perceptual and motor timing are often performed concurrently (e.g., playing a musical instrument. Although previous studies have shown the influence of body movements on time perception, how we perceive self-produced time intervals has remained unclear. Furthermore, it has been suggested that the timing mechanisms are distinct for the sub- and suprasecond ranges. Here, we compared perceptual performances for self-produced and passively presented time intervals in random contexts (i.e., multiple target intervals presented in a session across the sub- and suprasecond ranges (Experiment 1 and within the sub- (Experiment 2 and suprasecond (Experiment 3 ranges, and in a constant context (i.e., a single target interval presented in a session in the sub- and suprasecond ranges (Experiment 4. We show that self-produced time intervals were perceived as shorter and more variable across the sub- and suprasecond ranges and within the suprasecond range but not within the subsecond range in a random context. In a constant context, the self-produced time intervals were perceived as more variable in the suprasecond range but not in the subsecond range. The impairing effects indicate that motor timing interferes with perceptual timing. The dependence of impairment on temporal contexts suggests multiple timing mechanisms for the subsecond and suprasecond ranges. In addition, violation of the scalar property (i.e., a constant variability to target interval ratio was observed between the sub- and suprasecond ranges. The violation was clearer for motor timing than for perceptual timing. This suggests that the multiple timing mechanisms for the sub- and suprasecond ranges overlap more for perception than for motor. Moreover, the central tendency effect (i.e., where shorter base intervals are overestimated and longer base intervals are underestimated disappeared with subsecond
Energy Technology Data Exchange (ETDEWEB)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam [Pusat Pengajian Sains Matematik, Universiti Sains Malaysia, 11800 USM, Pulau Pinang, Malaysia amirul@unisel.edu.my, zalila@cs.usm.my, norlida@usm.my, adam@usm.my (Malaysia)
2015-10-22
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
International Nuclear Information System (INIS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-01-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake
Spatial vulnerability assessments by regression kriging
Pásztor, László; Laborczi, Annamária; Takács, Katalin; Szatmári, Gábor
2016-04-01
information representing IEW or GRP forming environmental factors were taken into account to support the spatial inference of the locally experienced IEW frequency and measured GRP values respectively. An efficient spatial prediction methodology was applied to construct reliable maps, namely regression kriging (RK) using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Application of RK also provides the possibility of inherent accuracy assessment. The resulting maps are characterized by global and local measures of its accuracy. Additionally the method enables interval estimation for spatial extension of the areas of predefined risk categories. All of these outputs provide useful contribution to spatial planning, action planning and decision making. Acknowledgement: Our work was partly supported by the Hungarian National Scientific Research Foundation (OTKA, Grant No. K105167).
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A
2016-08-01
The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data
Rodó, Xavier; Rodríguez-Arias, Miquel-Àngel
2006-10-01
The study of transitory signals and local variability structures in both/either time and space and their role as sources of climatic memory, is an important but often neglected topic in climate research despite its obvious importance and extensive coverage in the literature. Transitory signals arise either from non-linearities, in the climate system, transitory atmosphere-ocean couplings, and other processes in the climate system evolving after a critical threshold is crossed. These temporary interactions that, though intense, may not last long, can be responsible for a large amount of unexplained variability but are normally considered of limited relevance and often, discarded. With most of the current techniques at hand these typology of signatures are difficult to isolate because the low signal-to-noise ratio in midlatitudes, the limited recurrence of the transitory signals during a customary interval of data considered. Also, there is often a serious problem arising from the smoothing of local or transitory processes if statistical techniques are applied, that consider all the length of data available, rather than taking into account the size of the specific variability structure under investigation. Scale-dependent correlation (SDC) analysis is a new statistical method capable of highlighting the presence of transitory processes, these former being understood as temporary significant lag-dependent autocovariance in a single series, or covariance structures between two series. This approach, therefore, complements other approaches such as those resulting from the families of wavelet analysis, singular-spectrum analysis and recurrence plots. A main feature of SDC is its high-performance for short time series, its ability to characterize phase-relationships and thresholds in the bivariate domain. Ultimately, SDC helps tracking short-lagged relationships among processes that locally or temporarily couple and uncouple. The use of SDC is illustrated in the present
Stochastic development regression using method of moments
DEFF Research Database (Denmark)
Kühnel, Line; Sommer, Stefan Horst
2017-01-01
This paper considers the estimation problem arising when inferring parameters in the stochastic development regression model for manifold valued non-linear data. Stochastic development regression captures the relation between manifold-valued response and Euclidean covariate variables using...... the stochastic development construction. It is thereby able to incorporate several covariate variables and random effects. The model is intrinsically defined using the connection of the manifold, and the use of stochastic development avoids linearizing the geometry. We propose to infer parameters using...... the Method of Moments procedure that matches known constraints on moments of the observations conditional on the latent variables. The performance of the model is investigated in a simulation example using data on finite dimensional landmark manifolds....
Generalized allometric regression to estimate biomass of Populus in short-rotation coppice
Energy Technology Data Exchange (ETDEWEB)
Ben Brahim, Mohammed; Gavaland, Andre; Cabanettes, Alain [INRA Centre de Toulouse, Castanet-Tolosane Cedex (France). Unite Agroforesterie et Foret Paysanne
2000-07-01
Data from four different stands were combined to establish a single generalized allometric equation to estimate above-ground biomass of individual Populus trees grown on short-rotation coppice. The generalized model was performed using diameter at breast height, the mean diameter and the mean height of each site as dependent variables and then compared with the stand-specific regressions using F-test. Results showed that this single regression estimates tree biomass well at each stand and does not introduce bias with increasing diameter.
Arizono, H; Morita, N; Iizuka, S; Satoh, S; Nakatani, Y
2000-12-01
This research was based on the hypothesis that when alcohol-dependent patients describe themselves, awakening of emotion by affirmative odor stimulation may facilitate memory reframing focusing on more affirmative emotion and memories. To prove the hypothesis, physiological changes accompanied by emotional awakening were evaluated by measuring the autonomic activity. In addition, subjective evaluation by a self-report manner was examined to investigate the effectiveness of Reminiscence Therapy (RT) using odor in alcohol-dependent patients. Thirty-four patients who met the DSM-IV criteria of alcohol-related disorders and were hospitalized in a ward specialized to alcohol dependence therapy. Each patient underwent a one-to-one interview twice. For counterbalance, one interview was performed with odor stimulation using an odor with a relaxing effect that recall pleasant emotion, and the other was without odor stimulation. As the evaluation indices of physiological changes accompanied by emotional awakening, index of autonomic function (HRV; Heart rate variability) for objective evaluation and psychological indices (STAI; State-Trait Anxiety Inventory VAS; Visual Analog Scale) for subjective evaluation were measured. 1) Objective evaluation: Regarding the evaluation index of the autonomic function, the sympathetic nervous system activity (LF/HF; low frequency component/high frequency component ratio) was significantly inhibited by odor stimulation (p Subjective evaluation: Compared to the state prior to interview, state anxiety judged by STAI was significantly decreased after interview (p subjective evaluation, but the objective evaluation suggested that the odor inhibited the sympathetic nervous system. Thus, it was suggested that odor can be used in RT, that is, emotional changes due to stimulation of odor may be applicable in RT.
International Nuclear Information System (INIS)
Moselewski, Fabian; Ferencik, Maros; Achenbach, Stephan; Abbara, Suhny; Cury, Ricardo C.; Booth, Sarah L.; Jang, Ik-Kyung; Brady, Thomas J.; Hoffmann, Udo
2006-01-01
Introduction: The present study investigated the threshold-dependent variability of coronary artery calcification (CAC) measurements and the potential to quantify CAC in contrast-enhanced multi-detector row-computed tomography (MDCT). Methods: We compared the mean CT attenuation of CAC to luminal contrast enhancement of the coronary arteries in 30 patients (n = 30) undergoing standard coronary contrast-enhanced spiral MDCT. The modified Agatston score [AS], calcified plaque volume [CV], and mineral mass [MM]) at four different thresholds (130, 200, 300, and 400 HU) were measured in 50 patients who underwent non-contrast-enhanced MDCT. Results: Mean CT attenuation of CAC was similar to the attenuation of the contrast-enhanced coronary lumen (CAC 297.1 ± 68.7 HU versus 295 ± 65 HU (p < 0.0001), respectively). Above a threshold of 300 HU CAC measurements significantly varied to standard measurements obtained at a threshold of 130 HU (p < 0.0001). The threshold-dependent variation of MM measurements was significantly smaller than for AS and CV (130 HU versus 400 HU: 63, 75, and 81, respectively; p < 0.001). These differences resulted in a change of age and gender based percentile category for AS in 78% of subjects. Discussion: We demonstrated that CAC measurements are threshold dependent with MM measurements having significantly less variation than AS or CV. Due to the similarity of mean CT attenuation of CAC and the contrast-enhanced coronary lumen accurate quantification of CAC may be difficult in standard coronary contrast-enhanced spiral MDCT
Energy Technology Data Exchange (ETDEWEB)
Moselewski, Fabian [Division of Cardiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Ferencik, Maros [Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Achenbach, Stephan [Division of Cardiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Department of Internal Medicine II (Cardiology), University of Erlangen (Germany); Abbara, Suhny [Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Cury, Ricardo C. [Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Booth, Sarah L. [Jean Mayer USDA Human Nutrition Research Center on Aging, 711 Washington St., Boston, MA 02114 (United States); Jang, Ik-Kyung [Division of Cardiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Brady, Thomas J. [Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Hoffmann, Udo [Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States)]. E-mail: uhoffman@partners.org
2006-03-15
Introduction: The present study investigated the threshold-dependent variability of coronary artery calcification (CAC) measurements and the potential to quantify CAC in contrast-enhanced multi-detector row-computed tomography (MDCT). Methods: We compared the mean CT attenuation of CAC to luminal contrast enhancement of the coronary arteries in 30 patients (n = 30) undergoing standard coronary contrast-enhanced spiral MDCT. The modified Agatston score [AS], calcified plaque volume [CV], and mineral mass [MM] at four different thresholds (130, 200, 300, and 400 HU) were measured in 50 patients who underwent non-contrast-enhanced MDCT. Results: Mean CT attenuation of CAC was similar to the attenuation of the contrast-enhanced coronary lumen (CAC 297.1 {+-} 68.7 HU versus 295 {+-} 65 HU (p < 0.0001), respectively). Above a threshold of 300 HU CAC measurements significantly varied to standard measurements obtained at a threshold of 130 HU (p < 0.0001). The threshold-dependent variation of MM measurements was significantly smaller than for AS and CV (130 HU versus 400 HU: 63, 75, and 81, respectively; p < 0.001). These differences resulted in a change of age and gender based percentile category for AS in 78% of subjects. Discussion: We demonstrated that CAC measurements are threshold dependent with MM measurements having significantly less variation than AS or CV. Due to the similarity of mean CT attenuation of CAC and the contrast-enhanced coronary lumen accurate quantification of CAC may be difficult in standard coronary contrast-enhanced spiral MDCT.
Ordinal regression models to describe tourist satisfaction with Sintra's world heritage
Mouriño, Helena
2013-10-01
In Tourism Research, ordinal regression models are becoming a very powerful tool in modelling the relationship between an ordinal response variable and a set of explanatory variables. In August and September 2010, we conducted a pioneering Tourist Survey in Sintra, Portugal. The data were obtained by face-to-face interviews at the entrances of the Palaces and Parks of Sintra. The work developed in this paper focus on two main points: tourists' perception of the entrance fees; overall level of satisfaction with this heritage site. For attaining these goals, ordinal regression models were developed. We concluded that tourist's nationality was the only significant variable to describe the perception of the admission fees. Also, Sintra's image among tourists depends not only on their nationality, but also on previous knowledge about Sintra's World Heritage status.
Targeting: Logistic Regression, Special Cases and Extensions
Directory of Open Access Journals (Sweden)
Helmut Schaeben
2014-12-01
Full Text Available Logistic regression is a classical linear model for logit-transformed conditional probabilities of a binary target variable. It recovers the true conditional probabilities if the joint distribution of predictors and the target is of log-linear form. Weights-of-evidence is an ordinary logistic regression with parameters equal to the differences of the weights of evidence if all predictor variables are discrete and conditionally independent given the target variable. The hypothesis of conditional independence can be tested in terms of log-linear models. If the assumption of conditional independence is violated, the application of weights-of-evidence does not only corrupt the predicted conditional probabilities, but also their rank transform. Logistic regression models, including the interaction terms, can account for the lack of conditional independence, appropriate interaction terms compensate exactly for violations of conditional independence. Multilayer artificial neural nets may be seen as nested regression-like models, with some sigmoidal activation function. Most often, the logistic function is used as the activation function. If the net topology, i.e., its control, is sufficiently versatile to mimic interaction terms, artificial neural nets are able to account for violations of conditional independence and yield very similar results. Weights-of-evidence cannot reasonably include interaction terms; subsequent modifications of the weights, as often suggested, cannot emulate the effect of interaction terms.
Directory of Open Access Journals (Sweden)
Juliana Petrini
2012-12-01
Full Text Available The objective of this work was to assess the degree of multicollinearity and to identify the variables involved in linear dependence relations in additive-dominant models. Data of birth weight (n=141,567, yearling weight (n=58,124, and scrotal circumference (n=20,371 of Montana Tropical composite cattle were used. Diagnosis of multicollinearity was based on the variance inflation factor (VIF and on the evaluation of the condition indexes and eigenvalues from the correlation matrix among explanatory variables. The first model studied (RM included the fixed effect of dam age class at calving and the covariates associated to the direct and maternal additive and non-additive effects. The second model (R included all the effects of the RM model except the maternal additive effects. Multicollinearity was detected in both models for all traits considered, with VIF values of 1.03 - 70.20 for RM and 1.03 - 60.70 for R. Collinearity increased with the increase of variables in the model and the decrease in the number of observations, and it was classified as weak, with condition index values between 10.00 and 26.77. In general, the variables associated with additive and non-additive effects were involved in multicollinearity, partially due to the natural connection between these covariables as fractions of the biological types in breed composition.O objetivo deste trabalho foi avaliar o grau de multicolinearidade e identificar as variáveis envolvidas na dependência linear em modelos aditivo-dominantes. Foram utilizados dados de peso ao nascimento (n=141.567, peso ao ano (n=58.124 e perímetro escrotal (n=20.371 de bovinos de corte compostos Montana Tropical. O diagnóstico de multicolinearidade foi baseado no fator de inflação de variância (VIF e no exame dos índices de condição e dos autovalores da matriz de correlações entre as variáveis explanatórias. O primeiro modelo estudado (RM incluiu o efeito fixo de classe de idade da mãe ao parto e
Polynomial regression analysis and significance test of the regression function
International Nuclear Information System (INIS)
Gao Zhengming; Zhao Juan; He Shengping
2012-01-01
In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)
Regression Discontinuity Designs Based on Population Thresholds
DEFF Research Database (Denmark)
Eggers, Andrew C.; Freier, Ronny; Grembi, Veronica
In many countries, important features of municipal government (such as the electoral system, mayors' salaries, and the number of councillors) depend on whether the municipality is above or below arbitrary population thresholds. Several papers have used a regression discontinuity design (RDD...
Panel data specifications in nonparametric kernel regression
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...
Tutorial on Using Regression Models with Count Outcomes Using R
Directory of Open Access Journals (Sweden)
A. Alexander Beaujean
2016-02-01
Full Text Available Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix.
Recursive Algorithm For Linear Regression
Varanasi, S. V.
1988-01-01
Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.
Optimized support vector regression for drilling rate of penetration estimation
Bodaghi, Asadollah; Ansari, Hamid Reza; Gholami, Mahsa
2015-12-01
In the petroleum industry, drilling optimization involves the selection of operating conditions for achieving the desired depth with the minimum expenditure while requirements of personal safety, environment protection, adequate information of penetrated formations and productivity are fulfilled. Since drilling optimization is highly dependent on the rate of penetration (ROP), estimation of this parameter is of great importance during well planning. In this research, a novel approach called `optimized support vector regression' is employed for making a formulation between input variables and ROP. Algorithms used for optimizing the support vector regression are the genetic algorithm (GA) and the cuckoo search algorithm (CS). Optimization implementation improved the support vector regression performance by virtue of selecting proper values for its parameters. In order to evaluate the ability of optimization algorithms in enhancing SVR performance, their results were compared to the hybrid of pattern search and grid search (HPG) which is conventionally employed for optimizing SVR. The results demonstrated that the CS algorithm achieved further improvement on prediction accuracy of SVR compared to the GA and HPG as well. Moreover, the predictive model derived from back propagation neural network (BPNN), which is the traditional approach for estimating ROP, is selected for comparisons with CSSVR. The comparative results revealed the superiority of CSSVR. This study inferred that CSSVR is a viable option for precise estimation of ROP.
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
Regularized Label Relaxation Linear Regression.
Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung; Fang, Bingwu
2018-04-01
Linear regression (LR) and some of its variants have been widely used for classification problems. Most of these methods assume that during the learning phase, the training samples can be exactly transformed into a strict binary label matrix, which has too little freedom to fit the labels adequately. To address this problem, in this paper, we propose a novel regularized label relaxation LR method, which has the following notable characteristics. First, the proposed method relaxes the strict binary label matrix into a slack variable matrix by introducing a nonnegative label relaxation matrix into LR, which provides more freedom to fit the labels and simultaneously enlarges the margins between different classes as much as possible. Second, the proposed method constructs the class compactness graph based on manifold learning and uses it as the regularization item to avoid the problem of overfitting. The class compactness graph is used to ensure that the samples sharing the same labels can be kept close after they are transformed. Two different algorithms, which are, respectively, based on -norm and -norm loss functions are devised. These two algorithms have compact closed-form solutions in each iteration so that they are easily implemented. Extensive experiments show that these two algorithms outperform the state-of-the-art algorithms in terms of the classification accuracy and running time.
Sparse reduced-rank regression with covariance estimation
Chen, Lisha
2014-12-08
Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
Sparse reduced-rank regression with covariance estimation
Chen, Lisha; Huang, Jianhua Z.
2014-01-01
Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
A gentle introduction to quantile regression for ecologists
Cade, B.S.; Noon, B.R.
2003-01-01
Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.
Directory of Open Access Journals (Sweden)
Filip Kokotovic
2016-06-01
Full Text Available The study of human capital relevance to economic growth is becoming increasingly important taking into account its relevance in many of the Sustainable Development Goals proposed by the UN. This paper conducted a panel regression analysis of selected SE European countries and Scandinavian countries using the Granger causality test and pooled panel regression. In order to test the relevance of human capital on economic growth, several human capital proxy variables were identified. Aside from the human capital proxy variables, other explanatory variables were selected using stepwise regression while the dependant variable was GDP. This paper concludes that there are significant structural differences in the economies of the two observed panels. Of the human capital proxy variables observed, for the panel of SE European countries only life expectancy was statistically significant and it had a negative impact on economic growth, while in the panel of Scandinavian countries total public expenditure on education had a statistically significant positive effect on economic growth. Based upon these results and existing studies, this paper concludes that human capital has a far more significant impact on economic growth in more developed economies.
Discriminative Elastic-Net Regularized Linear Regression.
Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen
2017-03-01
In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.
Directory of Open Access Journals (Sweden)
Huixin Tian
2016-01-01
Full Text Available Different from most researches focused on the single objective hybrid flowshop scheduling (HFS problem, this paper investigates a biobjective HFS problem with sequence dependent setup time. The two objectives are the minimization of total weighted tardiness and the total setup time. To efficiently solve this problem, a Pareto-based adaptive biobjective variable neighborhood search (PABOVNS is developed. In the proposed PABOVNS, a solution is denoted as a sequence of all jobs and a decoding procedure is presented to obtain the corresponding complete schedule. In addition, the proposed PABOVNS has three major features that can guarantee a good balance of exploration and exploitation. First, an adaptive selection strategy of neighborhoods is proposed to automatically select the most promising neighborhood instead of the sequential selection strategy of canonical VNS. Second, a two phase multiobjective local search based on neighborhood search and path relinking is designed for each selected neighborhood. Third, an external archive with diversity maintenance is adopted to store the nondominated solutions and at the same time provide initial solutions for the local search. Computational results based on randomly generated instances show that the PABOVNS is efficient and even superior to some other powerful multiobjective algorithms in the literature.
International Nuclear Information System (INIS)
Asvany, O.; Savic, I.; Schlemmer, S.; Gerlich, D.
2004-01-01
Reactions of methane cations, CH 4 + , with H 2 , HD and D 2 have been studied in a variable temperature 22-pole ion trap from room temperature down to 15 K. The formation of CH 5 + in collisions with H 2 is slow at 300 K, but it becomes faster by at least one order of magnitude when the temperature is lowered to 15 K. This behavior is tentatively explained with a longer complex lifetime at low temperatures. However, since tunneling is most probably not responsible for product formation, other dynamical or statistical restrictions must be responsible for the negative temperature dependence. In collisions of CH 4 + with HD, the CH 5 + product ion (68% at 15 K) prevails over CH 4 D + (32%). Reaction of CH 4 + with D 2 is found to be much slower than with H 2 or HD. The rate coefficient for converting CH 4 + into CH 3 D + by H-D exchange has been determined to be smaller than 10 -12 cm 3 /s, indicating that scrambling in the CH 6 + complex is very unlikely
Energy Technology Data Exchange (ETDEWEB)
Asvany, O.; Savic, I.; Schlemmer, S.; Gerlich, D
2004-03-08
Reactions of methane cations, CH{sub 4}{sup +}, with H{sub 2}, HD and D{sub 2} have been studied in a variable temperature 22-pole ion trap from room temperature down to 15 K. The formation of CH{sub 5}{sup +} in collisions with H{sub 2} is slow at 300 K, but it becomes faster by at least one order of magnitude when the temperature is lowered to 15 K. This behavior is tentatively explained with a longer complex lifetime at low temperatures. However, since tunneling is most probably not responsible for product formation, other dynamical or statistical restrictions must be responsible for the negative temperature dependence. In collisions of CH{sub 4}{sup +} with HD, the CH{sub 5}{sup +} product ion (68% at 15 K) prevails over CH{sub 4}D{sup +} (32%). Reaction of CH{sub 4}{sup +} with D{sub 2} is found to be much slower than with H{sub 2} or HD. The rate coefficient for converting CH{sub 4}{sup +} into CH{sub 3}D{sup +} by H-D exchange has been determined to be smaller than 10{sup -12} cm{sup 3}/s, indicating that scrambling in the CH{sub 6}{sup +} complex is very unlikely.
Bialuk, Izabela; Whitney, Stephen; Andresen, Vibeke; Florese, Ruth H; Nacsa, Janos; Cecchinato, Valentina; Valeri, Valerio W; Heraud, Jean-Michel; Gordon, Shari; Parks, Robyn Washington; Montefiori, David C; Venzon, David; Demberg, Thorsten; Guroff, Marjorie Robert-; Landucci, Gary; Forthal, Donald N; Franchini, Genoveffa
2011-12-09
The role of antibodies directed against the hyper variable envelope region V1 of human immunodeficiency virus type 1 (HIV-1), has not been thoroughly studied. We show that a vaccine able to elicit strain-specific non-neutralizing antibodies to this region of gp120 is associated with control of highly pathogenic chimeric SHIV(89.6P) replication in rhesus macaques. The vaccinated animal that had the highest titers of antibodies to the amino terminus portion of V1, prior to challenge, had secondary antibody responses that mediated cell killing by antibody-dependent cellular cytotoxicity (ADCC), as early as 2 weeks after infection and inhibited viral replication by antibody-dependent cell-mediated virus inhibition (ADCVI), by 4 weeks after infection. There was a significant inverse correlation between virus level and binding antibody titers to the envelope protein, (R=-0.83, p=0.015), and ADCVI (R=-0.84 p=0.044). Genotyping of plasma virus demonstrated in vivo selection of three SHIV(89.6P) variants with changes in potential N-linked glycosylation sites in V1. We found a significant inverse correlation between virus levels and titers of antibodies that mediated ADCVI against all the identified V1 virus variants. A significant inverse correlation was also found between neutralizing antibody titers to SHIV(89.6) and virus levels (R=-0.72 p=0.0050). However, passive inoculation of purified immunoglobulin from animal M316, the macaque that best controlled virus, to a naïve macaque, resulted in a low serum neutralizing antibodies and low ADCVI activity that failed to protect from SHIV(89.6P) challenge. Collectively, while our data suggest that anti-envelope antibodies with neutralizing and non-neutralizing Fc(R-dependent activities may be important in the control of SHIV replication, they also demonstrate that low levels of these antibodies alone are not sufficient to protect from infection. Published by Elsevier Ltd.
On a Robust MaxEnt Process Regression Model with Sample-Selection
Directory of Open Access Journals (Sweden)
Hea-Jung Kim
2018-04-01
Full Text Available In a regression analysis, a sample-selection bias arises when a dependent variable is partially observed as a result of the sample selection. This study introduces a Maximum Entropy (MaxEnt process regression model that assumes a MaxEnt prior distribution for its nonparametric regression function and finds that the MaxEnt process regression model includes the well-known Gaussian process regression (GPR model as a special case. Then, this special MaxEnt process regression model, i.e., the GPR model, is generalized to obtain a robust sample-selection Gaussian process regression (RSGPR model that deals with non-normal data in the sample selection. Various properties of the RSGPR model are established, including the stochastic representation, distributional hierarchy, and magnitude of the sample-selection bias. These properties are used in the paper to develop a hierarchical Bayesian methodology to estimate the model. This involves a simple and computationally feasible Markov chain Monte Carlo algorithm that avoids analytical or numerical derivatives of the log-likelihood function of the model. The performance of the RSGPR model in terms of the sample-selection bias correction, robustness to non-normality, and prediction, is demonstrated through results in simulations that attest to its good finite-sample performance.
Combining Alphas via Bounded Regression
Directory of Open Access Journals (Sweden)
Zura Kakushadze
2015-11-01
Full Text Available We give an explicit algorithm and source code for combining alpha streams via bounded regression. In practical applications, typically, there is insufficient history to compute a sample covariance matrix (SCM for a large number of alphas. To compute alpha allocation weights, one then resorts to (weighted regression over SCM principal components. Regression often produces alpha weights with insufficient diversification and/or skewed distribution against, e.g., turnover. This can be rectified by imposing bounds on alpha weights within the regression procedure. Bounded regression can also be applied to stock and other asset portfolio construction. We discuss illustrative examples.
Regression in autistic spectrum disorders.
Stefanatos, Gerry A
2008-12-01
A significant proportion of children diagnosed with Autistic Spectrum Disorder experience a developmental regression characterized by a loss of previously-acquired skills. This may involve a loss of speech or social responsitivity, but often entails both. This paper critically reviews the phenomena of regression in autistic spectrum disorders, highlighting the characteristics of regression, age of onset, temporal course, and long-term outcome. Important considerations for diagnosis are discussed and multiple etiological factors currently hypothesized to underlie the phenomenon are reviewed. It is argued that regressive autistic spectrum disorders can be conceptualized on a spectrum with other regressive disorders that may share common pathophysiological features. The implications of this viewpoint are discussed.
Estimating integrated variance in the presence of microstructure noise using linear regression
Holý, Vladimír
2017-07-01
Using financial high-frequency data for estimation of integrated variance of asset prices is beneficial but with increasing number of observations so-called microstructure noise occurs. This noise can significantly bias the realized variance estimator. We propose a method for estimation of the integrated variance robust to microstructure noise as well as for testing the presence of the noise. Our method utilizes linear regression in which realized variances estimated from different data subsamples act as dependent variable while the number of observations act as explanatory variable. We compare proposed estimator with other methods on simulated data for several microstructure noise structures.
LENUS (Irish Health Repository)
Hooi, Paul
2013-01-01
To determine how the variability in biaxial flexure strength of a soda-lime glass analogue for a PLV and DBC material was influenced by precementation operative variables and following resin-cement coating.
Testing homogeneity in Weibull-regression models.
Bolfarine, Heleno; Valença, Dione M
2005-10-01
In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.
Selecting a Regression Saturated by Indicators
DEFF Research Database (Denmark)
Hendry, David F.; Johansen, Søren; Santos, Carlos
We consider selecting a regression model, using a variant of Gets, when there are more variables than observations, in the special case that the variables are impulse dummies (indicators) for every observation. We show that the setting is unproblematic if tackled appropriately, and obtain the fin...... the finite-sample distribution of estimators of the mean and variance in a simple location-scale model under the null that no impulses matter. A Monte Carlo simulation confirms the null distribution, and shows power against an alternative of interest....
Selecting a Regression Saturated by Indicators
DEFF Research Database (Denmark)
Hendry, David F.; Johansen, Søren; Santos, Carlos
We consider selecting a regression model, using a variant of Gets, when there are more variables than observations, in the special case that the variables are impulse dummies (indicators) for every observation. We show that the setting is unproblematic if tackled appropriately, and obtain the fin...... the finite-sample distribution of estimators of the mean and variance in a simple location-scale model under the null that no impulses matter. A Monte Carlo simulation confirms the null distribution, and shows power against an alternative of interest...
Variable selection in multiple linear regression: The influence of ...
African Journals Online (AJOL)
provide an indication of whether the fit of the selected model improves or ... and calculate M(−i); quantify the influence of case i in terms of a function, f(•), of M and ..... [21] Venter JH & Snyman JLJ, 1997, Linear model selection based on risk ...
Assessing risk factors for periodontitis using regression
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
2013-10-01
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Determination of regression laws: Linear and nonlinear
International Nuclear Information System (INIS)
Onishchenko, A.M.
1994-01-01
A detailed mathematical determination of regression laws is presented in the article. Particular emphasis is place on determining the laws of X j on X l to account for source nuclei decay and detector errors in nuclear physics instrumentation. Both linear and nonlinear relations are presented. Linearization of 19 functions is tabulated, including graph, relation, variable substitution, obtained linear function, and remarks. 6 refs., 1 tab
Logistic regression applied to natural hazards: rare event logistic regression with replications
Guns, M.; Vanacker, Veerle
2012-01-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logisti...
Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict
Ismail, Mohd Tahir; Alias, Siti Nor Shadila
2014-07-01
For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..
Detecting overdispersion in count data: A zero-inflated Poisson regression analysis
Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Nor, Maria Elena; Mohamed, Maryati; Ismail, Norradihah
2017-09-01
This study focusing on analysing count data of butterflies communities in Jasin, Melaka. In analysing count dependent variable, the Poisson regression model has been known as a benchmark model for regression analysis. Continuing from the previous literature that used Poisson regression analysis, this study comprising the used of zero-inflated Poisson (ZIP) regression analysis to gain acute precision on analysing the count data of butterfly communities in Jasin, Melaka. On the other hands, Poisson regression should be abandoned in the favour of count data models, which are capable of taking into account the extra zeros explicitly. By far, one of the most popular models include ZIP regression model. The data of butterfly communities which had been called as the number of subjects in this study had been taken in Jasin, Melaka and consisted of 131 number of subjects visits Jasin, Melaka. Since the researchers are considering the number of subjects, this data set consists of five families of butterfly and represent the five variables involve in the analysis which are the types of subjects. Besides, the analysis of ZIP used the SAS procedure of overdispersion in analysing zeros value and the main purpose of continuing the previous study is to compare which models would be better than when exists zero values for the observation of the count data. The analysis used AIC, BIC and Voung test of 5% level significance in order to achieve the objectives. The finding indicates that there is a presence of over-dispersion in analysing zero value. The ZIP regression model is better than Poisson regression model when zero values exist.
LOGISTIC REGRESSION AS A TOOL FOR DETERMINATION OF THE PROBABILITY OF DEFAULT FOR ENTERPRISES
Directory of Open Access Journals (Sweden)
Erika SPUCHLAKOVA
2017-12-01
Full Text Available In a rapidly changing world it is necessary to adapt to new conditions. From a day to day approaches can vary. For the proper management of the company it is essential to know the financial situation. Assessment of the company financial health can be carried out by financial analysis which provides a number of methods how to evaluate the company financial health. Analysis indicators are often included in the company assessment, in obtaining bank loans and other financial resources to ensure the functioning of the company. As company focuses on the future and its planning, it is essential to forecast the future financial situation. According to the results of company´s financial health prediction, the company decides on the extension or limitation of its business. It depends mainly on the capabilities of company´s management how they will use information obtained from financial analysis in practice. The findings of logistic regression methods were published firstly in the 60s, as an alternative to the least squares method. The essence of logistic regression is to determine the relationship between being explained (dependent variable and explanatory (independent variables. The basic principle of this static method is based on the regression analysis, but unlike linear regression, it can predict the probability of a phenomenon that has occurred or not. The aim of this paper is to determine the probability of bankruptcy enterprises.
Time-adaptive quantile regression
DEFF Research Database (Denmark)
Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik
2008-01-01
and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....
DEFF Research Database (Denmark)
Krogsgaard, K; Christensen, E; Gluud, C
1987-01-01
In 46 alcoholic patients the association of wedged-to-free hepatic-vein pressure with other variables (clinical, histologic, hemodynamic, and liver function data) was studied by means of multiple regression analysis, taking the wedged-to-free hepatic-vein pressure as the dependent variable. Four ...
Retro-regression--another important multivariate regression improvement.
Randić, M
2001-01-01
We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.
International Nuclear Information System (INIS)
Janssen, I.; Stebbings, J.H.
1990-01-01
In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and ∼200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs
Quantile regression theory and applications
Davino, Cristina; Vistocco, Domenico
2013-01-01
A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Distributed Monitoring of the R2 Statistic for Linear Regression
National Aeronautics and Space Administration — The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and...
INVESTIGATION OF E-MAIL TRAFFIC BY USING ZERO-INFLATED REGRESSION MODELS
Directory of Open Access Journals (Sweden)
Yılmaz KAYA
2012-06-01
Full Text Available Based on count data obtained with a value of zero may be greater than anticipated. These types of data sets should be used to analyze by regression methods taking into account zero values. Zero- Inflated Poisson (ZIP, Zero-Inflated negative binomial (ZINB, Poisson Hurdle (PH, negative binomial Hurdle (NBH are more common approaches in modeling more zero value possessing dependent variables than expected. In the present study, the e-mail traffic of Yüzüncü Yıl University in 2009 spring semester was investigated. ZIP and ZINB, PH and NBH regression methods were applied on the data set because more zeros counting (78.9% were found in data set than expected. ZINB and NBH regression considered zero dispersion and overdispersion were found to be more accurate results due to overdispersion and zero dispersion in sending e-mail. ZINB is determined to be best model accordingto Vuong statistics and information criteria.
General Nature of Multicollinearity in Multiple Regression Analysis.
Liu, Richard
1981-01-01
Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Moderation analysis using a two-level regression model.
Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott
2014-10-01
Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.
ANYOLS, Least Square Fit by Stepwise Regression
International Nuclear Information System (INIS)
Atwoods, C.L.; Mathews, S.
1986-01-01
Description of program or function: ANYOLS is a stepwise program which fits data using ordinary or weighted least squares. Variables are selected for the model in a stepwise way based on a user- specified input criterion or a user-written subroutine. The order in which variables are entered can be influenced by user-defined forcing priorities. Instead of stepwise selection, ANYOLS can try all possible combinations of any desired subset of the variables. Automatic output for the final model in a stepwise search includes plots of the residuals, 'studentized' residuals, and leverages; if the model is not too large, the output also includes partial regression and partial leverage plots. A data set may be re-used so that several selection criteria can be tried. Flexibility is increased by allowing the substitution of user-written subroutines for several default subroutines
Panel Smooth Transition Regression Models
DEFF Research Database (Denmark)
González, Andrés; Terasvirta, Timo; Dijk, Dick van
We introduce the panel smooth transition regression model. This new model is intended for characterizing heterogeneous panels, allowing the regression coefficients to vary both across individuals and over time. Specifically, heterogeneity is allowed for by assuming that these coefficients are bou...
Testing discontinuities in nonparametric regression
Dai, Wenlin
2017-01-19
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Testing discontinuities in nonparametric regression
Dai, Wenlin; Zhou, Yuejin; Tong, Tiejun
2017-01-01
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Stochastic search, optimization and regression with energy applications
Hannah, Lauren A.
Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression
Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki
2014-12-01
This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Tumor regression patterns in retinoblastoma
International Nuclear Information System (INIS)
Zafar, S.N.; Siddique, S.N.; Zaheer, N.
2016-01-01
To observe the types of tumor regression after treatment, and identify the common pattern of regression in our patients. Study Design: Descriptive study. Place and Duration of Study: Department of Pediatric Ophthalmology and Strabismus, Al-Shifa Trust Eye Hospital, Rawalpindi, Pakistan, from October 2011 to October 2014. Methodology: Children with unilateral and bilateral retinoblastoma were included in the study. Patients were referred to Pakistan Institute of Medical Sciences, Islamabad, for chemotherapy. After every cycle of chemotherapy, dilated funds examination under anesthesia was performed to record response of the treatment. Regression patterns were recorded on RetCam II. Results: Seventy-four tumors were included in the study. Out of 74 tumors, 3 were ICRB group A tumors, 43 were ICRB group B tumors, 14 tumors belonged to ICRB group C, and remaining 14 were ICRB group D tumors. Type IV regression was seen in 39.1% (n=29) tumors, type II in 29.7% (n=22), type III in 25.6% (n=19), and type I in 5.4% (n=4). All group A tumors (100%) showed type IV regression. Seventeen (39.5%) group B tumors showed type IV regression. In group C, 5 tumors (35.7%) showed type II regression and 5 tumors (35.7%) showed type IV regression. In group D, 6 tumors (42.9%) regressed to type II non-calcified remnants. Conclusion: The response and success of the focal and systemic treatment, as judged by the appearance of different patterns of tumor regression, varies with the ICRB grouping of the tumor. (author)
The Initial Regression Statistical Characteristics of Intervals Between Zeros of Random Processes
Directory of Open Access Journals (Sweden)
V. K. Hohlov
2014-01-01
Full Text Available The article substantiates the initial regression statistical characteristics of intervals between zeros of realizing random processes, studies their properties allowing the use these features in the autonomous information systems (AIS of near location (NL. Coefficients of the initial regression (CIR to minimize the residual sum of squares of multiple initial regression views are justified on the basis of vector representations associated with a random vector notion of analyzed signal parameters. It is shown that even with no covariance-based private CIR it is possible to predict one random variable through another with respect to the deterministic components. The paper studies dependences of CIR interval sizes between zeros of the narrowband stationary in wide-sense random process with its energy spectrum. Particular CIR for random processes with Gaussian and rectangular energy spectra are obtained. It is shown that the considered CIRs do not depend on the average frequency of spectra, are determined by the relative bandwidth of the energy spectra, and weakly depend on the type of spectrum. CIR properties enable its use as an informative parameter when implementing temporary regression methods of signal processing, invariant to the average rate and variance of the input implementations. We consider estimates of the average energy spectrum frequency of the random stationary process by calculating the length of the time interval corresponding to the specified number of intervals between zeros. It is shown that the relative variance in estimation of the average energy spectrum frequency of stationary random process with increasing relative bandwidth ceases to depend on the last process implementation in processing above ten intervals between zeros. The obtained results can be used in the AIS NL to solve the tasks of detection and signal recognition, when a decision is made in conditions of unknown mathematical expectations on a limited observation
Regression to Causality : Regression-style presentation influences causal attribution
DEFF Research Database (Denmark)
Bordacconi, Mats Joe; Larsen, Martin Vinæs
2014-01-01
of equivalent results presented as either regression models or as a test of two sample means. Our experiment shows that the subjects who were presented with results as estimates from a regression model were more inclined to interpret these results causally. Our experiment implies that scholars using regression...... models – one of the primary vehicles for analyzing statistical results in political science – encourage causal interpretation. Specifically, we demonstrate that presenting observational results in a regression model, rather than as a simple comparison of means, makes causal interpretation of the results...... more likely. Our experiment drew on a sample of 235 university students from three different social science degree programs (political science, sociology and economics), all of whom had received substantial training in statistics. The subjects were asked to compare and evaluate the validity...
Directory of Open Access Journals (Sweden)
Xiaowei Ma
2018-01-01
Full Text Available Background. Glucose-dependent insulinotropic polypeptide (GIP is closely related to diabetes and obesity, both of which are confirmed to increase the risk of coronary artery disease (CAD. Our study aimed to investigate whether the polymorphisms in GIP genes could affect the risk of cardiovascular disease in type 2 diabetic patients in the Chinese Han population. Methods. We selected and genotyped two haplotype-tagging single nucleotide polymorphisms (tag-SNPs (rs2291725 C>T, rs8078510 G>A of GIP gene based on CHB data in HapMap Phase II database (r2<0.8. The case-control study of Chinese Han population involved 390 diabetic patients with CAD as positive group and 276 diabetic patients without CAD as control group. Allele and genotype frequencies were compared between the two groups. Results. In dominant inheritance model, the carriers of T/T or T/C had a lower risk of CAD (OR = 0.635, 95% CI = 0.463–0.872, p=0.005, even after adjustment other CAD risk factors (gender, age, BMI, smoking status, dyslipidemia, hypertension history, and diabetic duration (OR′ = 0.769, 95% CI′ = 0.626–0.945, p′=0.013. The allele A at rs8078510 was associated with decreased risk of CAD (OR = 0.732, p=0.039. p=0.018 in subgroup analysis, individuals with higher BMI (≥24 kg/m2 had increased risk for CAD when carrying C/C at rs2291725 (OR′ = 1.291, 95% CI′ = 1.017–1.639, p′=0.036. In age < 55 men and age < 65 women, the carriers of allele C at rs2291725 had a higher risk of CAD than noncarriers (OR = 1.627, p=0.015. Carriers of allele G in rs8078510 had higher susceptibility to CAD (OR = 2.049, 95% = CI 1.213–3.463, p=0.007. p=0.004; in addition, allele G in rs8078510 would bring higher CAD risk to the carriers who ever smoked (OR = 1.695, 95% CI = 1.080–2.660, p=0.021. Conclusion. The genetic variability of GIP gene is associated with CAD and it may play a role in the premature CAD in the
Taghizadeh, Mohammad; Goliaei, Bahram; Madadkar-Sobhani, Armin
2016-06-01
Protein flexibility, which has been referred as a dynamic behavior has various roles in proteins' functions. Furthermore, for some developed tools in bioinformatics, such as protein-protein docking software, considering the protein flexibility, causes a higher degree of accuracy. Through undertaking the present work, we have accomplished the quantification plus analysis of the variations in the human Cyclin Dependent Kinase 2 (hCDK2) protein flexibility without affecting a significant change in its initial environment or the protein per se. The main goal of the present research was to calculate variations in the flexibility for each residue of the hCDK2, analysis of their flexibility variations through clustering, and to investigate the functional aspects of the residues with high flexibility variations. Using Gromacs package (version 4.5.4), three independent molecular dynamics (MD) simulations of the hCDK2 protein (PDB ID: 1HCL) was accomplished with no significant changes in their initial environments, structures, or conformations, followed by Root Mean Square Fluctuations (RMSF) calculation of these MD trajectories. The amount of variations in these three curves of RMSF was calculated using two formulas. More than 50% of the variation in the flexibility (the distance between the maximum and the minimum amount of the RMSF) was found at the region of Val-154. As well, there are other major flexibility fluctuations in other residues. These residues were mostly positioned in the vicinity of the functional residues. The subsequent works were done, as followed by clustering all hCDK2 residues into four groups considering the amount of their variability with respect to flexibility and their position in the RMSF curves. This work has introduced a new class of flexibility aspect of the proteins' residues. It could also help designing and engineering proteins, with introducing a new dynamic aspect of hCDK2, and accordingly, for the other similar globular proteins. In
On logistic regression analysis of dichotomized responses.
Lu, Kaifeng
2017-01-01
We study the properties of treatment effect estimate in terms of odds ratio at the study end point from logistic regression model adjusting for the baseline value when the underlying continuous repeated measurements follow a multivariate normal distribution. Compared with the analysis that does not adjust for the baseline value, the adjusted analysis produces a larger treatment effect as well as a larger standard error. However, the increase in standard error is more than offset by the increase in treatment effect so that the adjusted analysis is more powerful than the unadjusted analysis for detecting the treatment effect. On the other hand, the true adjusted odds ratio implied by the normal distribution of the underlying continuous variable is a function of the baseline value and hence is unlikely to be able to be adequately represented by a single value of adjusted odds ratio from the logistic regression model. In contrast, the risk difference function derived from the logistic regression model provides a reasonable approximation to the true risk difference function implied by the normal distribution of the underlying continuous variable over the range of the baseline distribution. We show that different metrics of treatment effect have similar statistical power when evaluated at the baseline mean. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
C. Wu
2018-03-01
Full Text Available Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS, Deming regression (DR, orthogonal distance regression (ODR, weighted ODR (WODR, and York regression (YR. We first introduce a new data generation scheme that employs the Mersenne twister (MT pseudorandom number generator. The numerical simulations are also improved by (a refining the parameterization of nonlinear measurement uncertainties, (b inclusion of a linear measurement uncertainty, and (c inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot was developed to facilitate the implementation of error-in-variables regressions.
Wu, Cheng; Zhen Yu, Jian
2018-03-01
Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.
Iterative Strain-Gage Balance Calibration Data Analysis for Extended Independent Variable Sets
Ulbrich, Norbert Manfred
2011-01-01
A new method was developed that makes it possible to use an extended set of independent calibration variables for an iterative analysis of wind tunnel strain gage balance calibration data. The new method permits the application of the iterative analysis method whenever the total number of balance loads and other independent calibration variables is greater than the total number of measured strain gage outputs. Iteration equations used by the iterative analysis method have the limitation that the number of independent and dependent variables must match. The new method circumvents this limitation. It simply adds a missing dependent variable to the original data set by using an additional independent variable also as an additional dependent variable. Then, the desired solution of the regression analysis problem can be obtained that fits each gage output as a function of both the original and additional independent calibration variables. The final regression coefficients can be converted to data reduction matrix coefficients because the missing dependent variables were added to the data set without changing the regression analysis result for each gage output. Therefore, the new method still supports the application of the two load iteration equation choices that the iterative method traditionally uses for the prediction of balance loads during a wind tunnel test. An example is discussed in the paper that illustrates the application of the new method to a realistic simulation of temperature dependent calibration data set of a six component balance.
Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana
2015-05-01
Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. Copyright © 2015 Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
Soldić-Aleksić Jasna
2009-01-01
Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
Functional data analysis of generalized regression quantiles
Guo, Mengmeng
2013-11-05
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
Functional data analysis of generalized regression quantiles
Guo, Mengmeng; Zhou, Lan; Huang, Jianhua Z.; Hä rdle, Wolfgang Karl
2013-01-01
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients
Gorgees, HazimMansoor; Mahdi, FatimahAssim
2018-05-01
This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.
Hecht, Jeffrey B.
The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…
Mapping urban environmental noise: a land use regression method.
Xie, Dan; Liu, Yi; Chen, Jining
2011-09-01
Forecasting and preventing urban noise pollution are major challenges in urban environmental management. Most existing efforts, including experiment-based models, statistical models, and noise mapping, however, have limited capacity to explain the association between urban growth and corresponding noise change. Therefore, these conventional methods can hardly forecast urban noise at a given outlook of development layout. This paper, for the first time, introduces a land use regression method, which has been applied for simulating urban air quality for a decade, to construct an urban noise model (LUNOS) in Dalian Municipality, Northwest China. The LUNOS model describes noise as a dependent variable of surrounding various land areas via a regressive function. The results suggest that a linear model performs better in fitting monitoring data, and there is no significant difference of the LUNOS's outputs when applied to different spatial scales. As the LUNOS facilitates a better understanding of the association between land use and urban environmental noise in comparison to conventional methods, it can be regarded as a promising tool for noise prediction for planning purposes and aid smart decision-making.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Regression Models For Multivariate Count Data.
Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei
2017-01-01
Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.
Model selection in kernel ridge regression
DEFF Research Database (Denmark)
Exterkate, Peter
2013-01-01
Kernel ridge regression is a technique to perform ridge regression with a potentially infinite number of nonlinear transformations of the independent variables as regressors. This method is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts....... The influence of the choice of kernel and the setting of tuning parameters on forecast accuracy is investigated. Several popular kernels are reviewed, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. The latter two kernels are interpreted in terms of their smoothing properties......, and the tuning parameters associated to all these kernels are related to smoothness measures of the prediction function and to the signal-to-noise ratio. Based on these interpretations, guidelines are provided for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study...