Flexible survival regression modelling
DEFF Research Database (Denmark)
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
2009-01-01
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...
(Non) linear regression modelling
Cizek, P.; Gentle, J.E.; Hardle, W.K.; Mori, Y.
2012-01-01
We will study causal relationships of a known form between random variables. Given a model, we distinguish one or more dependent (endogenous) variables Y = (Y1,…,Yl), l ∈ N, which are explained by a model, and independent (exogenous, explanatory) variables X = (X1,…,Xp),p ∈ N, which explain or
Hilbe, Joseph M
2009-01-01
This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
Ridge Regression for Interactive Models.
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…
Forecasting with Dynamic Regression Models
Pankratz, Alan
2012-01-01
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Nonparametric Mixture of Regression Models.
Huang, Mian; Li, Runze; Wang, Shaoli
2013-07-01
Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.
Regression Models for Repairable Systems
Czech Academy of Sciences Publication Activity Database
Novák, Petr
2015-01-01
Roč. 17, č. 4 (2015), s. 963-972 ISSN 1387-5841 Institutional support: RVO:67985556 Keywords : Reliability analysis * Repair models * Regression Subject RIV: BB - Applied Statistics , Operational Research Impact factor: 0.782, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/novak-0450902.pdf
Modified Regression Correlation Coefficient for Poisson Regression Model
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Panel Smooth Transition Regression Models
DEFF Research Database (Denmark)
González, Andrés; Terasvirta, Timo; Dijk, Dick van
models to the panel context. The strategy consists of model specification based on homogeneity tests, parameter estimation, and model evaluation, including tests of parameter constancy and no remaining heterogeneity. The model is applied to describing firms' investment decisions in the presence...
A Seemingly Unrelated Poisson Regression Model
King, Gary
1989-01-01
This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.
Regression Models for Market-Shares
DEFF Research Database (Denmark)
Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue
2005-01-01
On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the interpretat......On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put...... on the interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Variable importance in latent variable regression models
Kvalheim, O.M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, A.K.; Westerhuis, J.A.
2014-01-01
The quality and practical usefulness of a regression model are a function of both interpretability and prediction performance. This work presents some new graphical tools for improved interpretation of latent variable regression models that can also assist in improved algorithms for variable
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Gaussian Process Regression Model in Spatial Logistic Regression
Sofro, A.; Oktaviarina, A.
2018-01-01
Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.
Structured Dimensionality Reduction for Additive Model Regression
Fawzi, Alhussein; Fiot, Jean-Baptiste; Chen, Bei; Sinn, Mathieu; Frossard, Pascal
2016-01-01
Additive models are regression methods which model the response variable as the sum of univariate transfer functions of the input variables. Key benefits of additive models are their accuracy and interpretability on many real-world tasks. Additive models are however not adapted to problems involving a large number (e.g., hundreds) of input variables, as they are prone to overfitting in addition to losing interpretability. In this paper, we introduce a novel framework for applying additive ...
Nonparametric and semiparametric dynamic additive regression models
DEFF Research Database (Denmark)
Scheike, Thomas Harder; Martinussen, Torben
Dynamic additive regression models provide a flexible class of models for analysis of longitudinal data. The approach suggested in this work is suited for measurements obtained at random time points and aims at estimating time-varying effects. Both fully nonparametric and semiparametric models can...... in special cases. We investigate the finite sample properties of the estimators and conclude that the asymptotic results are valid for even samll samples....
Mixed-effects regression models in linguistics
Heylen, Kris; Geeraerts, Dirk
2018-01-01
When data consist of grouped observations or clusters, and there is a risk that measurements within the same group are not independent, group-specific random effects can be added to a regression model in order to account for such within-group associations. Regression models that contain such group-specific random effects are called mixed-effects regression models, or simply mixed models. Mixed models are a versatile tool that can handle both balanced and unbalanced datasets and that can also be applied when several layers of grouping are present in the data; these layers can either be nested or crossed. In linguistics, as in many other fields, the use of mixed models has gained ground rapidly over the last decade. This methodological evolution enables us to build more sophisticated and arguably more realistic models, but, due to its technical complexity, also introduces new challenges. This volume brings together a number of promising new evolutions in the use of mixed models in linguistics, but also addres...
Flexible regression models with cubic splines.
Durrleman, S; Simon, R
1989-05-01
We describe the use of cubic splines in regression models to represent the relationship between the response variable and a vector of covariates. This simple method can help prevent the problems that result from inappropriate linearity assumptions. We compare restricted cubic spline regression to non-parametric procedures for characterizing the relationship between age and survival in the Stanford Heart Transplant data. We also provide an illustrative example in cancer therapeutics.
AIRLINE ACTIVITY FORECASTING BY REGRESSION MODELS
Directory of Open Access Journals (Sweden)
Н. Білак
2012-04-01
Full Text Available Proposed linear and nonlinear regression models, which take into account the equation of trend and seasonality indices for the analysis and restore the volume of passenger traffic over the past period of time and its prediction for future years, as well as the algorithm of formation of these models based on statistical analysis over the years. The desired model is the first step for the synthesis of more complex models, which will enable forecasting of passenger (income level airline with the highest accuracy and time urgency.
Applied Regression Modeling A Business Approach
Pardoe, Iain
2012-01-01
An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
Model building in nonproportional hazard regression.
Rodríguez-Girondo, Mar; Kneib, Thomas; Cadarso-Suárez, Carmen; Abu-Assi, Emad
2013-12-30
Recent developments of statistical methods allow for a very flexible modeling of covariates affecting survival times via the hazard rate, including also the inspection of possible time-dependent associations. Despite their immediate appeal in terms of flexibility, these models typically introduce additional difficulties when a subset of covariates and the corresponding modeling alternatives have to be chosen, that is, for building the most suitable model for given data. This is particularly true when potentially time-varying associations are given. We propose to conduct a piecewise exponential representation of the original survival data to link hazard regression with estimation schemes based on of the Poisson likelihood to make recent advances for model building in exponential family regression accessible also in the nonproportional hazard regression context. A two-stage stepwise selection approach, an approach based on doubly penalized likelihood, and a componentwise functional gradient descent approach are adapted to the piecewise exponential regression problem. These three techniques were compared via an intensive simulation study. An application to prognosis after discharge for patients who suffered a myocardial infarction supplements the simulation to demonstrate the pros and cons of the approaches in real data analyses. Copyright © 2013 John Wiley & Sons, Ltd.
Model performance analysis and model validation in logistic regression
Directory of Open Access Journals (Sweden)
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Modeling maximum daily temperature using a varying coefficient regression model
Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith
2014-01-01
Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...
A Skew-Normal Mixture Regression Model
Liu, Min; Lin, Tsung-I
2014-01-01
A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…
Linear Regression Models for Estimating True Subsurface ...
Indian Academy of Sciences (India)
47
The objective is to minimize the processing time and computer memory required .... Survey. 65 time to acquire extra GPR or seismic data for large sites and picking the first arrival time. 66 to provide the needed datasets for the joint inversion are also .... The data utilized for the regression modelling was acquired from ground.
Adaptive regression for modeling nonlinear relationships
Knafl, George J
2016-01-01
This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...
Geographically weighted regression model on poverty indicator
Slamet, I.; Nugroho, N. F. T. A.; Muslich
2017-12-01
In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.
Influence diagnostics in meta-regression model.
Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua
2017-09-01
This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.
Crime Modeling using Spatial Regression Approach
Saleh Ahmar, Ansari; Adiatma; Kasim Aidid, M.
2018-01-01
Act of criminality in Indonesia increased both variety and quantity every year. As murder, rape, assault, vandalism, theft, fraud, fencing, and other cases that make people feel unsafe. Risk of society exposed to crime is the number of reported cases in the police institution. The higher of the number of reporter to the police institution then the number of crime in the region is increasing. In this research, modeling criminality in South Sulawesi, Indonesia with the dependent variable used is the society exposed to the risk of crime. Modelling done by area approach is the using Spatial Autoregressive (SAR) and Spatial Error Model (SEM) methods. The independent variable used is the population density, the number of poor population, GDP per capita, unemployment and the human development index (HDI). Based on the analysis using spatial regression can be shown that there are no dependencies spatial both lag or errors in South Sulawesi.
Bayesian Inference of a Multivariate Regression Model
Directory of Open Access Journals (Sweden)
Marick S. Sinay
2014-01-01
Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.
General regression and representation model for classification.
Directory of Open Access Journals (Sweden)
Jianjun Qian
Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.
An Additive-Multiplicative Cox-Aalen Regression Model
DEFF Research Database (Denmark)
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...
Confidence bands for inverse regression models
International Nuclear Information System (INIS)
Birke, Melanie; Bissantz, Nicolai; Holzmann, Hajo
2010-01-01
We construct uniform confidence bands for the regression function in inverse, homoscedastic regression models with convolution-type operators. Here, the convolution is between two non-periodic functions on the whole real line rather than between two periodic functions on a compact interval, since the former situation arguably arises more often in applications. First, following Bickel and Rosenblatt (1973 Ann. Stat. 1 1071–95) we construct asymptotic confidence bands which are based on strong approximations and on a limit theorem for the supremum of a stationary Gaussian process. Further, we propose bootstrap confidence bands based on the residual bootstrap and prove consistency of the bootstrap procedure. A simulation study shows that the bootstrap confidence bands perform reasonably well for moderate sample sizes. Finally, we apply our method to data from a gel electrophoresis experiment with genetically engineered neuronal receptor subunits incubated with rat brain extract
Directory of Open Access Journals (Sweden)
Jaime Araujo Cobuci
2012-09-01
Full Text Available Milk yield test-day records on the first three lactations of 25,500 Holstein cows were used to estimate genetic parameters and predict breeding values for nine measures of persistency and 305-d milk yield in a random regression animal model using two criteria to define the fixed regression. Legendre polynomials of fourth and fifth orders were used to model the fixed and random regressions of lactation curves. The fixed regressions were adjusted for average milk yield on populations (single or subpopulations (multiple formed by cows that calved at the same age and in the same season. Akaike Information (AIC and Bayesian Information (BIC criteria indicated that models with multiple regression lactation curves had the best fit to test-day milk records of first lactations, while models with a single regression curve had the best fit for the second and third lactations. Heritability and genetic correlation estimates between persistency and milk yield differed significantly depending on the lactation order and the measures of persistency used. These parameters did not differ significantly depending on the criteria used for defining the fixed regressions for lactation curves. In general, the heritability estimates were higher for first (0.07 to 0.43, followed by the second (0.08 to 0.21 and third (0.04 to 0.10 lactation. The rank of sires resulting from the processes of genetic evaluation for milk yield or persistency using random regression models differed according to the criteria used for determining the fixed regression of lactation curve.
Regression modeling methods, theory, and computation with SAS
Panik, Michael
2009-01-01
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Probabilistic Solar Forecasting Using Quantile Regression Models
Directory of Open Access Journals (Sweden)
Philippe Lauret
2017-10-01
Full Text Available In this work, we assess the performance of three probabilistic models for intra-day solar forecasting. More precisely, a linear quantile regression method is used to build three models for generating 1 h–6 h-ahead probabilistic forecasts. Our approach is applied to forecasting solar irradiance at a site experiencing highly variable sky conditions using the historical ground observations of solar irradiance as endogenous inputs and day-ahead forecasts as exogenous inputs. Day-ahead irradiance forecasts are obtained from the Integrated Forecast System (IFS, a Numerical Weather Prediction (NWP model maintained by the European Center for Medium-Range Weather Forecast (ECMWF. Several metrics, mainly originated from the weather forecasting community, are used to evaluate the performance of the probabilistic forecasts. The results demonstrated that the NWP exogenous inputs improve the quality of the intra-day probabilistic forecasts. The analysis considered two locations with very dissimilar solar variability. Comparison between the two locations highlighted that the statistical performance of the probabilistic models depends on the local sky conditions.
Multitask Quantile Regression under the Transnormal Model.
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2016-01-01
We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ 1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.
Inferring gene regression networks with model trees
Directory of Open Access Journals (Sweden)
Aguilar-Ruiz Jesus S
2010-10-01
Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear
Bias-corrected quantile regression estimation of censored regression models
Cizek, Pavel; Sadikoglu, Serhan
2018-01-01
In this paper, an extension of the indirect inference methodology to semiparametric estimation is explored in the context of censored regression. Motivated by weak small-sample performance of the censored regression quantile estimator proposed by Powell (J Econom 32:143–155, 1986a), two- and
Entrepreneurial intention modeling using hierarchical multiple regression
Directory of Open Access Journals (Sweden)
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
Logistic Regression Modeling of Diminishing Manufacturing Sources for Integrated Circuits
National Research Council Canada - National Science Library
Gravier, Michael
1999-01-01
.... The research identified logistic regression as a powerful tool for analysis of DMSMS and further developed twenty models attempting to identify the "best" way to model and predict DMSMS using logistic regression...
Autoregressive Model Using Fuzzy C-Regression Model Clustering for Traffic Modeling
Tanaka, Fumiaki; Suzuki, Yukinori; Maeda, Junji
A robust traffic modeling is required to perform an effective congestion control for the broad band digital network. An autoregressive model using a fuzzy c-regression model (FCRM) clustering is proposed for a traffic modeling. This is a simpler modeling method than previous methods. The experiments show that the proposed method is more robust for traffic modeling than the previous method.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
STREAMFLOW AND WATER QUALITY REGRESSION MODELING ...
African Journals Online (AJOL)
Journal of Modeling, Design and Management of Engineering Systems ... Consistency tests, trend analyses and mathematical modeling of water quality constituents and riverflow characteristics at upstream Nekede station and downstream Obigbo station show: consistent time-trends in degree of contamination; linear and ...
Mixture of Regression Models with Single-Index
Xiang, Sijia; Yao, Weixin
2016-01-01
In this article, we propose a class of semiparametric mixture regression models with single-index. We argue that many recently proposed semiparametric/nonparametric mixture regression models can be considered special cases of the proposed model. However, unlike existing semiparametric mixture regression models, the new pro- posed model can easily incorporate multivariate predictors into the nonparametric components. Backfitting estimates and the corresponding algorithms have been proposed for...
Logistic Regression Model on Antenna Control Unit Autotracking Mode
2015-10-20
412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCT 15 4. TITLE AND SUBTITLE Logistic Regression Model on Antenna Control Unit Autotracking Mode 5a. CONTRACT NUMBER 5b. GRANT...alternative-hypothesis. This paper will present an Antenna Auto- tracking model using Logistic Regression modeling. This paper presents an example of
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
Linear Regression Models for Estimating True Subsurface ...
Indian Academy of Sciences (India)
47
of the processing time and memory space required to carry out the inversion with the. 29. SCLS algorithm. ... consumption of time and memory space for the iterative computations to converge at. 54 minimum data ..... colour scale and blanking as the observed true resistivity models, for visual assessment. 163. The accuracy ...
Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace
Culpepper, Steven Andrew; Park, Trevor
2017-01-01
A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…
Hierarchical regression analysis in structural Equation Modeling
de Jong, P.F.
1999-01-01
In a hierarchical or fixed-order regression analysis, the independent variables are entered into the regression equation in a prespecified order. Such an analysis is often performed when the extra amount of variance accounted for in a dependent variable by a specific independent variable is the main
Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C
2018-06-29
A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.
Stochastic Approximation Methods for Latent Regression Item Response Models
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Linear regression models for quantitative assessment of left ...
African Journals Online (AJOL)
STORAGESEVER
2008-07-04
Jul 4, 2008 ... computed. Linear regression models for the prediction of left ventricular structures were established. Prediction models for ... study aimed at establishing linear regression models that could be used in the prediction ..... Is white cat hypertension associated with artenal disease or left ventricular hypertrophy?
Linear regression crash prediction models : issues and proposed solutions.
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Alternative regression models to assess increase in childhood BMI
Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M
2008-01-01
Abstract Background Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 childre...
Modelling Issues in Kernel Ridge Regression
P. Exterkate (Peter)
2011-01-01
textabstractKernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular
A test for the parameters of multiple linear regression models ...
African Journals Online (AJOL)
A test for the parameters of multiple linear regression models is developed for conducting tests simultaneously on all the parameters of multiple linear regression models. The test is robust relative to the assumptions of homogeneity of variances and absence of serial correlation of the classical F-test. Under certain null and ...
Optimal experimental designs for inverse quadratic regression models
Dette, Holger; Kiss, Christine
2007-01-01
In this paper optimal experimental designs for inverse quadratic regression models are determined. We consider two different parameterizations of the model and investigate local optimal designs with respect to the $c$-, $D$- and $E$-criteria, which reflect various aspects of the precision of the maximum likelihood estimator for the parameters in inverse quadratic regression models. In particular it is demonstrated that for a sufficiently large design space geometric allocation rules are optim...
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Application of random regression models to the genetic evaluation ...
African Journals Online (AJOL)
The model included fixed regression on AM (range from 30 to 138 mo) and the effect of herd-measurement date concatenation. Random parts of the model were RRM coefficients for additive and permanent environmental effects, while residual effects were modelled to account for heterogeneity of variance by AY. Estimates ...
Regression Model Optimization for the Analysis of Experimental Data
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
An approach for quantifying small effects in regression models.
Bedrick, Edward J; Hund, Lauren
2018-04-01
We develop a novel approach for quantifying small effects in regression models. Our method is based on variation in the mean function, in contrast to methods that focus on regression coefficients. Our idea applies in diverse settings such as testing for a negligible trend and quantifying differences in regression functions across strata. Straightforward Bayesian methods are proposed for inference. Four examples are used to illustrate the ideas.
Methods of Detecting Outliers in A Regression Analysis Model ...
African Journals Online (AJOL)
PROF. O. E. OSUAGWU
2013-06-01
Jun 1, 2013 ... This is the type of linear regression that involves only two variables one independent and one dependent plus the random error term. The simple linear regression model assumes that there is a straight line (linear) relationship between the dependent variable Y and the independent variable X. This can be.
Directory of Open Access Journals (Sweden)
Hong-Juan Li
2013-04-01
Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Alternative regression models to assess increase in childhood BMI
Directory of Open Access Journals (Sweden)
Mansmann Ulrich
2008-09-01
Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Alternative regression models to assess increase in childhood BMI.
Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M
2008-09-08
Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Linear regression models for quantitative assessment of left ...
African Journals Online (AJOL)
Changes in left ventricular structures and function have been reported in cardiomyopathies. No prediction models have been established in this environment. This study established regression models for prediction of left ventricular structures in normal subjects. A sample of normal subjects was drawn from a large urban ...
Uncertainties in spatially aggregated predictions from a logistic regression model
Horssen, P.W. van; Pebesma, E.J.; Schot, P.P.
2002-01-01
This paper presents a method to assess the uncertainty of an ecological spatial prediction model which is based on logistic regression models, using data from the interpolation of explanatory predictor variables. The spatial predictions are presented as approximate 95% prediction intervals. The
Regression models for estimating charcoal yield in a Eucalyptus ...
African Journals Online (AJOL)
... dbh2H, and the product of dbh and merchantable height [(dbh)MH] as independent variables. Results of residual analysis showed that the models satisfied all the assumptions of regression analysis. Keywords: Models, charcoal production, biomass, Eucalyptus, arid, anergy, allometric. Bowen Journal of Agriculture Vol.
CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model
DEFF Research Database (Denmark)
Dyrholm, Mads; Hansen, Lars Kai
2004-01-01
We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least squares...
Default Bayes Factors for Model Selection in Regression
Rouder, Jeffrey N.; Morey, Richard D.
2012-01-01
In this article, we present a Bayes factor solution for inference in multiple regression. Bayes factors are principled measures of the relative evidence from data for various models or positions, including models that embed null hypotheses. In this regard, they may be used to state positive evidence for a lack of an effect, which is not possible…
Geographically Weighted Logistic Regression Applied to Credit Scoring Models
Directory of Open Access Journals (Sweden)
Pedro Henrique Melo Albuquerque
Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.
Real estate value prediction using multivariate regression models
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
2017-11-01
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Tutorial on Using Regression Models with Count Outcomes Using R
Directory of Open Access Journals (Sweden)
A. Alexander Beaujean
2016-02-01
Full Text Available Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix.
Model building strategy for logistic regression: purposeful selection.
Zhang, Zhongheng
2016-03-01
Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.
The art of regression modeling in road safety
Hauer, Ezra
2015-01-01
This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...
Visualisation and interpretation of Support Vector Regression models.
Ustün, B; Melssen, W J; Buydens, L M C
2007-07-09
This paper introduces a technique to visualise the information content of the kernel matrix and a way to interpret the ingredients of the Support Vector Regression (SVR) model. Recently, the use of Support Vector Machines (SVM) for solving classification (SVC) and regression (SVR) problems has increased substantially in the field of chemistry and chemometrics. This is mainly due to its high generalisation performance and its ability to model non-linear relationships in a unique and global manner. Modeling of non-linear relationships will be enabled by applying a kernel function. The kernel function transforms the input data, usually non-linearly related to the associated output property, into a high dimensional feature space where the non-linear relationship can be represented in a linear form. Usually, SVMs are applied as a black box technique. Hence, the model cannot be interpreted like, e.g., Partial Least Squares (PLS). For example, the PLS scores and loadings make it possible to visualise and understand the driving force behind the optimal PLS machinery. In this study, we have investigated the possibilities to visualise and interpret the SVM model. Here, we exclusively have focused on Support Vector Regression to demonstrate these visualisation and interpretation techniques. Our observations show that we are now able to turn a SVR black box model into a transparent and interpretable regression modeling technique.
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Buffalos milk yield analysis using random regression models
Directory of Open Access Journals (Sweden)
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
International Nuclear Information System (INIS)
Jafri, Y.Z.; Kamal, L.
2007-01-01
Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)
Applications of some discrete regression models for count data
Directory of Open Access Journals (Sweden)
B. M. Golam Kibria
2006-01-01
Full Text Available In this paper we have considered several regression models to fit the count data that encounter in the field of Biometrical, Environmental, Social Sciences and Transportation Engineering. We have fitted Poisson (PO, Negative Binomial (NB, Zero-Inflated Poisson (ZIP and Zero-Inflated Negative Binomial (ZINB regression models to run-off-road (ROR crash data which collected on arterial roads in south region (rural of Florida State. To compare the performance of these models, we analyzed data with moderate to high percentage of zero counts. Because the variances were almost three times greater than the means, it appeared that both NB and ZINB models performed better than PO and ZIP models for the zero inflated and over dispersed count data.
Bayesian approach to errors-in-variables in regression models
Rozliman, Nur Aainaa; Ibrahim, Adriana Irawati Nur; Yunus, Rossita Mohammad
2017-05-01
In many applications and experiments, data sets are often contaminated with error or mismeasured covariates. When at least one of the covariates in a model is measured with error, Errors-in-Variables (EIV) model can be used. Measurement error, when not corrected, would cause misleading statistical inferences and analysis. Therefore, our goal is to examine the relationship of the outcome variable and the unobserved exposure variable given the observed mismeasured surrogate by applying the Bayesian formulation to the EIV model. We shall extend the flexible parametric method proposed by Hossain and Gustafson (2009) to another nonlinear regression model which is the Poisson regression model. We shall then illustrate the application of this approach via a simulation study using Markov chain Monte Carlo sampling methods.
Thermal Efficiency Degradation Diagnosis Method Using Regression Model
International Nuclear Information System (INIS)
Jee, Chang Hyun; Heo, Gyun Young; Jang, Seok Won; Lee, In Cheol
2011-01-01
This paper proposes an idea for thermal efficiency degradation diagnosis in turbine cycles, which is based on turbine cycle simulation under abnormal conditions and a linear regression model. The correlation between the inputs for representing degradation conditions (normally unmeasured but intrinsic states) and the simulation outputs (normally measured but superficial states) was analyzed with the linear regression model. The regression models can inversely response an associated intrinsic state for a superficial state observed from a power plant. The diagnosis method proposed herein is classified into three processes, 1) simulations for degradation conditions to get measured states (referred as what-if method), 2) development of the linear model correlating intrinsic and superficial states, and 3) determination of an intrinsic state using the superficial states of current plant and the linear regression model (referred as inverse what-if method). The what-if method is to generate the outputs for the inputs including various root causes and/or boundary conditions whereas the inverse what-if method is the process of calculating the inverse matrix with the given superficial states, that is, component degradation modes. The method suggested in this paper was validated using the turbine cycle model for an operating power plant
Multiple Response Regression for Gaussian Mixture Models with Known Labels.
Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng
2012-12-01
Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes.
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Detecting influential observations in nonlinear regression modeling of groundwater flow
Yager, Richard M.
1998-01-01
Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
Flexible competing risks regression modeling and goodness-of-fit
DEFF Research Database (Denmark)
Scheike, Thomas; Zhang, Mei-Jie
2008-01-01
In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause......-specific hazards. Another recent approach is to directly model the cumulative incidence by a proportional model (Fine and Gray, J Am Stat Assoc 94:496-509, 1999), and then obtain direct estimates of how covariates influences the cumulative incidence curve. We consider a simple and flexible class of regression...
Model-based Quantile Regression for Discrete Data
Padellini, Tullia
2018-04-10
Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution\\'s parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.
Yang, Xue; Lauzon, Carolyn B; Crainiceanu, Ciprian; Caffo, Brian; Resnick, Susan M; Landman, Bennett A
2012-09-01
Massively univariate regression and inference in the form of statistical parametric mapping have transformed the way in which multi-dimensional imaging data are studied. In functional and structural neuroimaging, the de facto standard "design matrix"-based general linear regression model and its multi-level cousins have enabled investigation of the biological basis of the human brain. With modern study designs, it is possible to acquire multi-modal three-dimensional assessments of the same individuals--e.g., structural, functional and quantitative magnetic resonance imaging, alongside functional and ligand binding maps with positron emission tomography. Largely, current statistical methods in the imaging community assume that the regressors are non-random. For more realistic multi-parametric assessment (e.g., voxel-wise modeling), distributional consideration of all observations is appropriate. Herein, we discuss two unified regression and inference approaches, model II regression and regression calibration, for use in massively univariate inference with imaging data. These methods use the design matrix paradigm and account for both random and non-random imaging regressors. We characterize these methods in simulation and illustrate their use on an empirical dataset. Both methods have been made readily available as a toolbox plug-in for the SPM software. Copyright © 2012 Elsevier Inc. All rights reserved.
Regression Model to Predict Global Solar Irradiance in Malaysia
Directory of Open Access Journals (Sweden)
Hairuniza Ahmed Kutty
2015-01-01
Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.
Approximating prediction uncertainty for random forest regression models
John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne
2016-01-01
Machine learning approaches such as random forest haveÂ increased for the spatial modeling and mapping of continuousÂ variables. Random forest is a non-parametric ensembleÂ approach, and unlike traditional regression approaches thereÂ is no direct quantification of prediction error. UnderstandingÂ prediction uncertainty is important when using model-basedÂ continuous maps as...
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Ogutu, Joseph O; Schulz-Streeck, Torben; Piepho, Hans-Peter
2012-05-21
Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). The elastic net, lasso, adaptive lasso and the adaptive elastic net all had
Efficient estimation of an additive quantile regression model
Cheng, Y.; de Gooijer, J.G.; Zerom, D.
2009-01-01
In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By
Efficient estimation of an additive quantile regression model
Cheng, Y.; de Gooijer, J.G.; Zerom, D.
2010-01-01
In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By
Efficient estimation of an additive quantile regression model
Cheng, Y.; de Gooijer, J.G.; Zerom, D.
2011-01-01
In this paper, two non-parametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a more viable alternative to existing kernel-based approaches. The second estimator
Linearity and Misspecification Tests for Vector Smooth Transition Regression Models
DEFF Research Database (Denmark)
Teräsvirta, Timo; Yang, Yukai
The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...
Application of multilinear regression analysis in modeling of soil ...
African Journals Online (AJOL)
The application of Multi-Linear Regression Analysis (MLRA) model for predicting soil properties in Calabar South offers a technical guide and solution in foundation designs problems in the area. Forty-five soil samples were collected from fifteen different boreholes at a different depth and 270 tests were carried out for CBR, ...
Regression Models and Experimental Designs : A Tutorial for Simulation Analaysts
Kleijnen, J.P.C.
2006-01-01
This tutorial explains the basics of linear regression models. especially low-order polynomials. and the corresponding statistical designs. namely, designs of resolution III, IV, V, and Central Composite Designs (CCDs).This tutorial assumes 'white noise', which means that the residuals of the fitted
application of multilinear regression analysis in modeling of soil
African Journals Online (AJOL)
Windows User
APPLICATION OF MULTILINEAR REGRESSION ANALYSIS IN MODELING OF. SOIL PROPERTIES FOR GEOTECHNICAL CIVIL ENGINEERING WORKS. IN CALABAR SOUTH. J. G. Egbe1, D. E. Ewa2, S. E. Ubi3, G. B. Ikwa4 and O. O. Tumenayo5. 1, 2, 3, 4, DEPT. OF CIVIL ENGINEERING, CROSS RIVER UNIV.
A binary logistic regression model with complex sampling design of ...
African Journals Online (AJOL)
A binary logistic regression model with complex sampling design of unmet need for family planning among all women aged (15-49) in Ethiopia. ... Conclusion: The key determinants of unmet need family planning in Ethiopia were residence, age, marital-status, education, household members, birth-events and number of ...
Transpiration of glasshouse rose crops: evaluation of regression models
Baas, R.; Rijssel, van E.
2006-01-01
Regression models of transpiration (T) based on global radiation inside the greenhouse (G), with or without energy input from heating pipes (Eh) and/or vapor pressure deficit (VPD) were parameterized. Therefore, data on T, G, temperatures from air, canopy and heating pipes, and VPD from both a
Resampling procedures to validate dendro-auxometric regression models
Directory of Open Access Journals (Sweden)
2009-03-01
Full Text Available Regression analysis has a large use in several sectors of forest research. The validation of a dendro-auxometric model is a basic step in the building of the model itself. The more a model resists to attempts of demonstrating its groundlessness, the more its reliability increases. In the last decades many new theories, that quite utilizes the calculation speed of the calculators, have been formulated. Here we show the results obtained by the application of a bootsprap resampling procedure as a validation tool.
A mixed-effects multinomial logistic regression model.
Hedeker, Donald
2003-05-15
A mixed-effects multinomial logistic regression model is described for analysis of clustered or longitudinal nominal or ordinal response data. The model is parameterized to allow flexibility in the choice of contrasts used to represent comparisons across the response categories. Estimation is achieved using a maximum marginal likelihood (MML) solution that uses quadrature to numerically integrate over the distribution of random effects. An analysis of a psychiatric data set, in which homeless adults with serious mental illness are repeatedly classified in terms of their living arrangement, is used to illustrate features of the model. Copyright 2003 by John Wiley & Sons, Ltd.
Electricity consumption forecasting in Italy using linear regression models
International Nuclear Information System (INIS)
Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio
2009-01-01
The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of ±1% for the best case and ±11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)
Dynamic logistic regression and dynamic model averaging for binary classification.
McCormick, Tyler H; Raftery, Adrian E; Madigan, David; Burd, Randall S
2012-03-01
We propose an online binary classification procedure for cases when there is uncertainty about the model to use and parameters within a model change over time. We account for model uncertainty through dynamic model averaging, a dynamic extension of Bayesian model averaging in which posterior model probabilities may also change with time. We apply a state-space model to the parameters of each model and we allow the data-generating model to change over time according to a Markov chain. Calibrating a "forgetting" factor accommodates different levels of change in the data-generating mechanism. We propose an algorithm that adjusts the level of forgetting in an online fashion using the posterior predictive distribution, and so accommodates various levels of change at different times. We apply our method to data from children with appendicitis who receive either a traditional (open) appendectomy or a laparoscopic procedure. Factors associated with which children receive a particular type of procedure changed substantially over the 7 years of data collection, a feature that is not captured using standard regression modeling. Because our procedure can be implemented completely online, future data collection for similar studies would require storing sensitive patient information only temporarily, reducing the risk of a breach of confidentiality. © 2011, The International Biometric Society.
Model and Variable Selection Procedures for Semiparametric Time Series Regression
Directory of Open Access Journals (Sweden)
Risa Kato
2009-01-01
Full Text Available Semiparametric regression models are very useful for time series analysis. They facilitate the detection of features resulting from external interventions. The complexity of semiparametric models poses new challenges for issues of nonparametric and parametric inference and model selection that frequently arise from time series data analysis. In this paper, we propose penalized least squares estimators which can simultaneously select significant variables and estimate unknown parameters. An innovative class of variable selection procedure is proposed to select significant variables and basis functions in a semiparametric model. The asymptotic normality of the resulting estimators is established. Information criteria for model selection are also proposed. We illustrate the effectiveness of the proposed procedures with numerical simulations.
Development and Application of Nonlinear Land-Use Regression Models
Champendal, Alexandre; Kanevski, Mikhail; Huguenot, Pierre-Emmanuel
2014-05-01
The problem of air pollution modelling in urban zones is of great importance both from scientific and applied points of view. At present there are several fundamental approaches either based on science-based modelling (air pollution dispersion) or on the application of space-time geostatistical methods (e.g. family of kriging models or conditional stochastic simulations). Recently, there were important developments in so-called Land Use Regression (LUR) models. These models take into account geospatial information (e.g. traffic network, sources of pollution, average traffic, population census, land use, etc.) at different scales, for example, using buffering operations. Usually the dimension of the input space (number of independent variables) is within the range of (10-100). It was shown that LUR models have some potential to model complex and highly variable patterns of air pollution in urban zones. Most of LUR models currently used are linear models. In the present research the nonlinear LUR models are developed and applied for Geneva city. Mainly two nonlinear data-driven models were elaborated: multilayer perceptron and random forest. An important part of the research deals also with a comprehensive exploratory data analysis using statistical, geostatistical and time series tools. Unsupervised self-organizing maps were applied to better understand space-time patterns of the pollution. The real data case study deals with spatial-temporal air pollution data of Geneva (2002-2011). Nitrogen dioxide (NO2) has caught our attention. It has effects on human health and on plants; NO2 contributes to the phenomenon of acid rain. The negative effects of nitrogen dioxides on plants are the reduction of the growth, production and pesticide resistance. And finally, the effects on materials: nitrogen dioxide increases the corrosion. The data used for this study consist of a set of 106 NO2 passive sensors. 80 were used to build the models and the remaining 36 have constituted
Transductive Ridge Regression in Structure-activity Modeling.
Marcou, Gilles; Delouis, Grace; Mokshyna, Olena; Horvath, Dragos; Lachiche, Nicolas; Varnek, Alexandre
2018-01-01
In this article we consider the application of the Transductive Ridge Regression (TRR) approach to structure-activity modeling. An original procedure of the TRR parameters optimization is suggested. Calculations performed on 3 different datasets involving two types of descriptors demonstrated that TRR outperforms its non-transductive analogue (Ridge Regression) in more than 90 % of cases. The most significant transductive effect was observed for small datasets. This suggests that transduction may be particularly useful when the data are expensive or difficult to collect. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model
DEFF Research Database (Denmark)
Møller, Niels Framroze
This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its...
Modeling energy expenditure in children and adolescents using quantile regression.
Yang, Yunwen; Adolph, Anne L; Puyau, Maurice R; Vohra, Firoz A; Butte, Nancy F; Zakeri, Issa F
2013-07-15
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obese children. First, QR models will be developed to predict minute-by-minute awake EE at different quantile levels based on heart rate (HR) and physical activity (PA) accelerometry counts, and child characteristics of age, sex, weight, and height. Second, the QR models will be used to evaluate the covariate effects of weight, PA, and HR across the conditional EE distribution. QR and ordinary least squares (OLS) regressions are estimated in 109 children, aged 5-18 yr. QR modeling of EE outperformed OLS regression for both nonobese and obese populations. Average prediction errors for QR compared with OLS were not only smaller at the median τ = 0.5 (18.6 vs. 21.4%), but also substantially smaller at the tails of the distribution (10.2 vs. 39.2% at τ = 0.1 and 8.7 vs. 19.8% at τ = 0.9). Covariate effects of weight, PA, and HR on EE for the nonobese and obese children differed across quantiles (P effects of weight, PA, and HR on EE in nonobese and obese children.
A mathematical model of tumour angiogenesis: growth, regression and regrowth.
Vilanova, Guillermo; Colominas, Ignasi; Gomez, Hector
2017-01-01
Cancerous tumours have the ability to recruit new blood vessels through a process called angiogenesis. By stimulating vascular growth, tumours get connected to the circulatory system, receive nutrients and open a way to colonize distant organs. Tumour-induced vascular networks become unstable in the absence of tumour angiogenic factors (TAFs). They may undergo alternating stages of growth, regression and regrowth. Following a phase-field methodology, we propose a model of tumour angiogenesis that reproduces the aforementioned features and highlights the importance of vascular regression and regrowth. In contrast with previous theories which focus on vessel remodelling due to the absence of flow, we model an alternative regression mechanism based on the dependency of tumour-induced vascular networks on TAFs. The model captures capillaries at full scale, the plastic dynamics of tumour-induced vessel networks at long time scales, and shows the key role played by filopodia during angiogenesis. The predictions of our model are in agreement with in vivo experiments and may prove useful for the design of antiangiogenic therapies. © 2017 The Author(s).
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
A generalized additive regression model for survival times
DEFF Research Database (Denmark)
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
Dynamic Regression Intervention Modeling for the Malaysian Daily Load
Directory of Open Access Journals (Sweden)
Fadhilah Abdrazak
2014-05-01
Full Text Available Malaysia is a unique country due to having both fixed and moving holidays. These moving holidays may overlap with other fixed holidays and therefore, increase the complexity of the load forecasting activities. The errors due to holidays’ effects in the load forecasting are known to be higher than other factors. If these effects can be estimated and removed, the behavior of the series could be better viewed. Thus, the aim of this paper is to improve the forecasting errors by using a dynamic regression model with intervention analysis. Based on the linear transfer function method, a daily load model consists of either peak or average is developed. The developed model outperformed the seasonal ARIMA model in estimating the fixed and moving holidays’ effects and achieved a smaller Mean Absolute Percentage Error (MAPE in load forecast.
Multivariate Frequency-Severity Regression Models in Insurance
Directory of Open Access Journals (Sweden)
Edward W. Frees
2016-02-01
Full Text Available In insurance and related industries including healthcare, it is common to have several outcome measures that the analyst wishes to understand using explanatory variables. For example, in automobile insurance, an accident may result in payments for damage to one’s own vehicle, damage to another party’s vehicle, or personal injury. It is also common to be interested in the frequency of accidents in addition to the severity of the claim amounts. This paper synthesizes and extends the literature on multivariate frequency-severity regression modeling with a focus on insurance industry applications. Regression models for understanding the distribution of each outcome continue to be developed yet there now exists a solid body of literature for the marginal outcomes. This paper contributes to this body of literature by focusing on the use of a copula for modeling the dependence among these outcomes; a major advantage of this tool is that it preserves the body of work established for marginal models. We illustrate this approach using data from the Wisconsin Local Government Property Insurance Fund. This fund offers insurance protection for (i property; (ii motor vehicle; and (iii contractors’ equipment claims. In addition to several claim types and frequency-severity components, outcomes can be further categorized by time and space, requiring complex dependency modeling. We find significant dependencies for these data; specifically, we find that dependencies among lines are stronger than the dependencies between the frequency and average severity within each line.
Augmented Beta rectangular regression models: A Bayesian perspective.
Wang, Jue; Luo, Sheng
2016-01-01
Mixed effects Beta regression models based on Beta distributions have been widely used to analyze longitudinal percentage or proportional data ranging between zero and one. However, Beta distributions are not flexible to extreme outliers or excessive events around tail areas, and they do not account for the presence of the boundary values zeros and ones because these values are not in the support of the Beta distributions. To address these issues, we propose a mixed effects model using Beta rectangular distribution and augment it with the probabilities of zero and one. We conduct extensive simulation studies to assess the performance of mixed effects models based on both the Beta and Beta rectangular distributions under various scenarios. The simulation studies suggest that the regression models based on Beta rectangular distributions improve the accuracy of parameter estimates in the presence of outliers and heavy tails. The proposed models are applied to the motivating Neuroprotection Exploratory Trials in Parkinson's Disease (PD) Long-term Study-1 (LS-1 study, n = 1741), developed by The National Institute of Neurological Disorders and Stroke Exploratory Trials in Parkinson's Disease (NINDS NET-PD) network. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Extended cox regression model: The choice of timefunction
Isik, Hatice; Tutkun, Nihal Ata; Karasoy, Durdu
2017-07-01
Cox regression model (CRM), which takes into account the effect of censored observations, is one the most applicative and usedmodels in survival analysis to evaluate the effects of covariates. Proportional hazard (PH), requires a constant hazard ratio over time, is the assumptionofCRM. Using extended CRM provides the test of including a time dependent covariate to assess the PH assumption or an alternative model in case of nonproportional hazards. In this study, the different types of real data sets are used to choose the time function and the differences between time functions are analyzed and discussed.
Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap
Emmanuel Flachaire
2005-01-01
In regression models, appropriate bootstrap methods for inference robust to heteroskedasticity of unknown form are the wild bootstrap and the pairs bootstrap. The finite sample performance of a heteroskedastic-robust test is investigated with Monte Carlo experiments. The simulation results suggest that one specific version of the wild bootstrap outperforms the other versions of the wild bootstrap and of the pairs bootstrap. It is the only one for which the bootstrap test gives always better r...
Correlation-regression model for physico-chemical quality of ...
African Journals Online (AJOL)
abusaad
3Department of Zoology, Gulbarga University Gulbarga, India. Accepted 2 July, 2012 ... multiple R2 value of 0.999 indicated that 99.9% variability in observed EC could be ascribed to Clˉ (76%),. HCO3. ˉ. (12.5%), NO3. - (10.3%) and SO4. 2- (1.1%). Multiple regression models can predict EC at 5% level of significance.
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
Modeling of the Monthly Rainfall-Runoff Process Through Regressions
Directory of Open Access Journals (Sweden)
Campos-Aranda Daniel Francisco
2014-10-01
Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.
Genetic evaluation of European quails by random regression models
Directory of Open Access Journals (Sweden)
Flaviana Miranda Gonçalves
2012-09-01
Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.
Learning Supervised Topic Models for Classification and Regression from Crowds
DEFF Research Database (Denmark)
Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete
2017-01-01
annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression......The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most...... problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages...
A new inverse regression model applied to radiation biodosimetry
Higueras, Manuel; Puig, Pedro; Ainsbury, Elizabeth A.; Rothkamm, Kai
2015-01-01
Biological dosimetry based on chromosome aberration scoring in peripheral blood lymphocytes enables timely assessment of the ionizing radiation dose absorbed by an individual. Here, new Bayesian-type count data inverse regression methods are introduced for situations where responses are Poisson or two-parameter compound Poisson distributed. Our Poisson models are calculated in a closed form, by means of Hermite and negative binomial (NB) distributions. For compound Poisson responses, complete and simplified models are provided. The simplified models are also expressible in a closed form and involve the use of compound Hermite and compound NB distributions. Three examples of applications are given that demonstrate the usefulness of these methodologies in cytogenetic radiation biodosimetry and in radiotherapy. We provide R and SAS codes which reproduce these examples. PMID:25663804
Diagnostic Measures for the Cox Regression Model with Missing Covariates.
Zhu, Hongtu; Ibrahim, Joseph G; Chen, Ming-Hui
2015-12-01
This paper investigates diagnostic measures for assessing the influence of observations and model misspecification in the presence of missing covariate data for the Cox regression model. Our diagnostics include case-deletion measures, conditional martingale residuals, and score residuals. The Q-distance is proposed to examine the effects of deleting individual observations on the estimates of finite-dimensional and infinite-dimensional parameters. Conditional martingale residuals are used to construct goodness of fit statistics for testing possible misspecification of the model assumptions. A resampling method is developed to approximate the p -values of the goodness of fit statistics. Simulation studies are conducted to evaluate our methods, and a real data set is analyzed to illustrate their use.
Collision prediction models using multivariate Poisson-lognormal regression.
El-Basyouny, Karim; Sayed, Tarek
2009-07-01
This paper advocates the use of multivariate Poisson-lognormal (MVPLN) regression to develop models for collision count data. The MVPLN approach presents an opportunity to incorporate the correlations across collision severity levels and their influence on safety analyses. The paper introduces a new multivariate hazardous location identification technique, which generalizes the univariate posterior probability of excess that has been commonly proposed and applied in the literature. In addition, the paper presents an alternative approach for quantifying the effect of the multivariate structure on the precision of expected collision frequency. The MVPLN approach is compared with the independent (separate) univariate Poisson-lognormal (PLN) models with respect to model inference, goodness-of-fit, identification of hot spots and precision of expected collision frequency. The MVPLN is modeled using the WinBUGS platform which facilitates computation of posterior distributions as well as providing a goodness-of-fit measure for model comparisons. The results indicate that the estimates of the extra Poisson variation parameters were considerably smaller under MVPLN leading to higher precision. The improvement in precision is due mainly to the fact that MVPLN accounts for the correlation between the latent variables representing property damage only (PDO) and injuries plus fatalities (I+F). This correlation was estimated at 0.758, which is highly significant, suggesting that higher PDO rates are associated with higher I+F rates, as the collision likelihood for both types is likely to rise due to similar deficiencies in roadway design and/or other unobserved factors. In terms of goodness-of-fit, the MVPLN model provided a superior fit than the independent univariate models. The multivariate hazardous location identification results demonstrated that some hazardous locations could be overlooked if the analysis was restricted to the univariate models.
Directory of Open Access Journals (Sweden)
Nataša Šarlija
2017-01-01
Full Text Available This study sheds light on the most common issues related to applying logistic regression in prediction models for company growth. The purpose of the paper is 1 to provide a detailed demonstration of the steps in developing a growth prediction model based on logistic regression analysis, 2 to discuss common pitfalls and methodological errors in developing a model, and 3 to provide solutions and possible ways of overcoming these issues. Special attention is devoted to the question of satisfying logistic regression assumptions, selecting and defining dependent and independent variables, using classification tables and ROC curves, for reporting model strength, interpreting odds ratios as effect measures and evaluating performance of the prediction model. Development of a logistic regression model in this paper focuses on a prediction model of company growth. The analysis is based on predominantly financial data from a sample of 1471 small and medium-sized Croatian companies active between 2009 and 2014. The financial data is presented in the form of financial ratios divided into nine main groups depicting following areas of business: liquidity, leverage, activity, profitability, research and development, investing and export. The growth prediction model indicates aspects of a business critical for achieving high growth. In that respect, the contribution of this paper is twofold. First, methodological, in terms of pointing out pitfalls and potential solutions in logistic regression modelling, and secondly, theoretical, in terms of identifying factors responsible for high growth of small and medium-sized companies.
Fuzzy regression modeling for tool performance prediction and degradation detection.
Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L
2010-10-01
In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.
Regularized multivariate regression models with skew-t error distributions
Chen, Lianfu
2014-06-01
We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.
[Interaction between continuous variables in logistic regression model].
Qiu, Hong; Yu, Ignatius Tak-Sun; Tse, Lap Ah; Wang, Xiao-rong; Fu, Zhen-ming
2010-07-01
Rothman argued that interaction estimated as departure from additivity better reflected the biological interaction. In a logistic regression model, the product term reflects the interaction as departure from multiplicativity. So far, literature on estimating interaction regarding an additive scale using logistic regression was only focusing on two dichotomous factors. The objective of the present report was to provide a method to examine the interaction as departure from additivity between two continuous variables or between one continuous variable and one categorical variable. We used data from a lung cancer case-control study among males in Hong Kong as an example to illustrate the bootstrap re-sampling method for calculating the corresponding confidence intervals. Free software R (Version 2.8.1) was used to estimate interaction on the additive scale.
Modeling the number of car theft using Poisson regression
Zulkifli, Malina; Ling, Agnes Beh Yen; Kasim, Maznah Mat; Ismail, Noriszura
2016-10-01
Regression analysis is the most popular statistical methods used to express the relationship between the variables of response with the covariates. The aim of this paper is to evaluate the factors that influence the number of car theft using Poisson regression model. This paper will focus on the number of car thefts that occurred in districts in Peninsular Malaysia. There are two groups of factor that have been considered, namely district descriptive factors and socio and demographic factors. The result of the study showed that Bumiputera composition, Chinese composition, Other ethnic composition, foreign migration, number of residence with the age between 25 to 64, number of employed person and number of unemployed person are the most influence factors that affect the car theft cases. These information are very useful for the law enforcement department, insurance company and car owners in order to reduce and limiting the car theft cases in Peninsular Malaysia.
Global Land Use Regression Model for Nitrogen Dioxide Air Pollution.
Larkin, Andrew; Geddes, Jeffrey A; Martin, Randall V; Xiao, Qingyang; Liu, Yang; Marshall, Julian D; Brauer, Michael; Hystad, Perry
2017-06-20
Nitrogen dioxide is a common air pollutant with growing evidence of health impacts independent of other common pollutants such as ozone and particulate matter. However, the worldwide distribution of NO 2 exposure and associated impacts on health is still largely uncertain. To advance global exposure estimates we created a global nitrogen dioxide (NO 2 ) land use regression model for 2011 using annual measurements from 5,220 air monitors in 58 countries. The model captured 54% of global NO 2 variation, with a mean absolute error of 3.7 ppb. Regional performance varied from R 2 = 0.42 (Africa) to 0.67 (South America). Repeated 10% cross-validation using bootstrap sampling (n = 10,000) demonstrated a robust performance with respect to air monitor sampling in North America, Europe, and Asia (adjusted R 2 within 2%) but not for Africa and Oceania (adjusted R 2 within 11%) where NO 2 monitoring data are sparse. The final model included 10 variables that captured both between and within-city spatial gradients in NO 2 concentrations. Variable contributions differed between continental regions, but major roads within 100 m and satellite-derived NO 2 were consistently the strongest predictors. The resulting model can be used for global risk assessments and health studies, particularly in countries without existing NO 2 monitoring data or models.
Generalized constraint neural network regression model subject to linear priors.
Qu, Ya-Jun; Hu, Bao-Gang
2011-12-01
This paper is reports an extension of our previous investigations on adding transparency to neural networks. We focus on a class of linear priors (LPs), such as symmetry, ranking list, boundary, monotonicity, etc., which represent either linear-equality or linear-inequality priors. A generalized constraint neural network-LPs (GCNN-LPs) model is studied. Unlike other existing modeling approaches, the GCNN-LP model exhibits its advantages. First, any LP is embedded by an explicitly structural mode, which may add a higher degree of transparency than using a pure algorithm mode. Second, a direct elimination and least squares approach is adopted to study the model, which produces better performances in both accuracy and computational cost over the Lagrange multiplier techniques in experiments. Specific attention is paid to both "hard (strictly satisfied)" and "soft (weakly satisfied)" constraints for regression problems. Numerical investigations are made on synthetic examples as well as on the real-world datasets. Simulation results demonstrate the effectiveness of the proposed modeling approach in comparison with other existing approaches.
Conditional Monte Carlo randomization tests for regression models.
Parhat, Parwen; Rosenberger, William F; Diao, Guoqing
2014-08-15
We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.
Preference learning with evolutionary Multivariate Adaptive Regression Spline model
DEFF Research Database (Denmark)
Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll
2015-01-01
This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...... for function approximation as well as being relatively easy to interpret. MARS models are evolved based on their efficiency in learning pairwise data. The method is tested on two datasets that collectively provide pairwise preference data of five cognitive states expressed by users. The method is analysed...
Bayesian Regression of Thermodynamic Models of Redox Active Materials
Energy Technology Data Exchange (ETDEWEB)
Johnston, Katherine [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2017-09-01
Finding a suitable functional redox material is a critical challenge to achieving scalable, economically viable technologies for storing concentrated solar energy in the form of a defected oxide. Demonstrating e ectiveness for thermal storage or solar fuel is largely accomplished by using a thermodynamic model derived from experimental data. The purpose of this project is to test the accuracy of our regression model on representative data sets. Determining the accuracy of the model includes parameter tting the model to the data, comparing the model using di erent numbers of param- eters, and analyzing the entropy and enthalpy calculated from the model. Three data sets were considered in this project: two demonstrating materials for solar fuels by wa- ter splitting and the other of a material for thermal storage. Using Bayesian Inference and Markov Chain Monte Carlo (MCMC), parameter estimation was preformed on the three data sets. Good results were achieved, except some there was some deviations on the edges of the data input ranges. The evidence values were then calculated in a variety of ways and used to compare models with di erent number of parameters. It was believed that at least one of the parameters was unnecessary and comparing evidence values demonstrated that the parameter was need on one data set and not signi cantly helpful on another. The entropy was calculated by taking the derivative in one variable and integrating over another. and its uncertainty was also calculated by evaluating the entropy over multiple MCMC samples. Afterwards, all the parts were written up as a tutorial for the Uncertainty Quanti cation Toolkit (UQTk).
A Gompertz regression model for fern spores germination
Directory of Open Access Journals (Sweden)
Gabriel y Galán, Jose María
2015-06-01
Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados
DEFF Research Database (Denmark)
Azarang, Leyla; Scheike, Thomas; de Uña-Álvarez, Jacobo
2017-01-01
In this work, we present direct regression analysis for the transition probabilities in the possibly non-Markov progressive illness–death model. The method is based on binomial regression, where the response is the indicator of the occupancy for the given state along time. Randomly weighted score...
Characteristics and Properties of a Simple Linear Regression Model
Directory of Open Access Journals (Sweden)
Kowal Robert
2016-12-01
Full Text Available A simple linear regression model is one of the pillars of classic econometrics. Despite the passage of time, it continues to raise interest both from the theoretical side as well as from the application side. One of the many fundamental questions in the model concerns determining derivative characteristics and studying the properties existing in their scope, referring to the first of these aspects. The literature of the subject provides several classic solutions in that regard. In the paper, a completely new design is proposed, based on the direct application of variance and its properties, resulting from the non-correlation of certain estimators with the mean, within the scope of which some fundamental dependencies of the model characteristics are obtained in a much more compact manner. The apparatus allows for a simple and uniform demonstration of multiple dependencies and fundamental properties in the model, and it does it in an intuitive manner. The results were obtained in a classic, traditional area, where everything, as it might seem, has already been thoroughly studied and discovered.
Variable selection in Logistic regression model with genetic algorithm.
Zhang, Zhongheng; Trevino, Victor; Hoseini, Sayed Shahabuddin; Belciug, Smaranda; Boopathi, Arumugam Manivanna; Zhang, Ping; Gorunescu, Florin; Subha, Velappan; Dai, Songshi
2018-02-01
Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.
Epsilon-insensitive fuzzy c-regression models: introduction to epsilon-insensitive fuzzy modeling.
Leski, Jacek M
2004-02-01
This paper introduces a new epsilon-insensitive fuzzy c-regression models (epsilonFCRM), that can be used in fuzzy modeling. To fit these regression models to real data, a weighted epsilon-insensitive loss function is used. The proposed method make it possible to exclude an intrinsic inconsistency of fuzzy modeling, where crisp loss function (usually quadratic) is used to match real data and the fuzzy model. The epsilon-insensitive fuzzy modeling is based on human thinking and learning. This method allows easy control of generalization ability and outliers robustness. This approach leads to c simultaneous quadratic programming problems with bound constraints and one linear equality constraint. To solve this problem, computationally efficient numerical method, called incremental learning, is proposed. Finally, examples are given to demonstrate the validity of introduced approach to fuzzy modeling.
Regression Models for Predicting Force Coefficients of Aerofoils
Directory of Open Access Journals (Sweden)
Mohammed ABDUL AKBAR
2015-09-01
Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press
Lamont, A.E.; Vermunt, J.K.; Van Horn, M.L.
2016-01-01
Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we tested the effects of violating an implicit assumption often made in these models; that is, independent variables in the
Ultracentrifuge separative power modeling with multivariate regression using covariance matrix
International Nuclear Information System (INIS)
Migliavacca, Elder
2004-01-01
In this work, the least-squares methodology with covariance matrix is applied to determine a data curve fitting to obtain a performance function for the separative power δU of a ultracentrifuge as a function of variables that are experimentally controlled. The experimental data refer to 460 experiments on the ultracentrifugation process for uranium isotope separation. The experimental uncertainties related with these independent variables are considered in the calculation of the experimental separative power values, determining an experimental data input covariance matrix. The process variables, which significantly influence the δU values are chosen in order to give information on the ultracentrifuge behaviour when submitted to several levels of feed flow rate F, cut θ and product line pressure P p . After the model goodness-of-fit validation, a residual analysis is carried out to verify the assumed basis concerning its randomness and independence and mainly the existence of residual heteroscedasticity with any explained regression model variable. The surface curves are made relating the separative power with the control variables F, θ and P p to compare the fitted model with the experimental data and finally to calculate their optimized values. (author)
A Bayesian semiparametric Markov regression model for juvenile dermatomyositis.
De Iorio, Maria; Gallot, Natacha; Valcarcel, Beatriz; Wedderburn, Lucy
2018-02-20
Juvenile dermatomyositis (JDM) is a rare autoimmune disease that may lead to serious complications, even to death. We develop a 2-state Markov regression model in a Bayesian framework to characterise disease progression in JDM over time and gain a better understanding of the factors influencing disease risk. The transition probabilities between disease and remission state (and vice versa) are a function of time-homogeneous and time-varying covariates. These latter types of covariates are introduced in the model through a latent health state function, which describes patient-specific health over time and accounts for variability among patients. We assume a nonparametric prior based on the Dirichlet process to model the health state function and the baseline transition intensities between disease and remission state and vice versa. The Dirichlet process induces a clustering of the patients in homogeneous risk groups. To highlight clinical variables that most affect the transition probabilities, we perform variable selection using spike and slab prior distributions. Posterior inference is performed through Markov chain Monte Carlo methods. Data were made available from the UK JDM Cohort and Biomarker Study and Repository, hosted at the UCL Institute of Child Health. Copyright © 2018 John Wiley & Sons, Ltd.
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Almedeij, Jaber
2012-01-01
Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984
Model checks for Cox-type regression models based on optimally weighted martingale residuals.
Gandy, Axel; Jensen, Uwe
2009-12-01
We introduce directed goodness-of-fit tests for Cox-type regression models in survival analysis. "Directed" means that one may choose against which alternatives the tests are particularly powerful. The tests are based on sums of weighted martingale residuals and their asymptotic distributions.We derive optimal tests against certain competing models which include Cox-type regression models with different covariates and/or a different link function. We report results from several simulation studies and apply our test to a real dataset.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Convergence diagnostics for Eigenvalue problems with linear regression model
International Nuclear Information System (INIS)
Shi, Bo; Petrovic, Bojan
2011-01-01
Although the Monte Carlo method has been extensively used for criticality/Eigenvalue problems, a reliable, robust, and efficient convergence diagnostics method is still desired. Most methods are based on integral parameters (multiplication factor, entropy) and either condense the local distribution information into a single value (e.g., entropy) or even disregard it. We propose to employ the detailed cycle-by-cycle local flux evolution obtained by using mesh tally mechanism to assess the source and flux convergence. By applying a linear regression model to each individual mesh in a mesh tally for convergence diagnostics, a global convergence criterion can be obtained. We exemplify this method on two problems and obtain promising diagnostics results. (author)
The R Package threg to Implement Threshold Regression Models
Directory of Open Access Journals (Sweden)
Tao Xiao
2015-08-01
This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE.
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-10-01
The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran's universities. This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran's public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran's libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries.
An Ordered Regression Model to Predict Transit Passengers’ Behavioural Intentions
Energy Technology Data Exchange (ETDEWEB)
Oña, J. de; Oña, R. de; Eboli, L.; Forciniti, C.; Mazzulla, G.
2016-07-01
Passengers’ behavioural intentions after experiencing transit services can be viewed as signals that show if a customer continues to utilise a company’s service. Users’ behavioural intentions can depend on a series of aspects that are difficult to measure directly. More recently, transit passengers’ behavioural intentions have been just considered together with the concepts of service quality and customer satisfaction. Due to the characteristics of the ways for evaluating passengers’ behavioural intentions, service quality and customer satisfaction, we retain that this kind of issue could be analysed also by applying ordered regression models. This work aims to propose just an ordered probit model for analysing service quality factors that can influence passengers’ behavioural intentions towards the use of transit services. The case study is the LRT of Seville (Spain), where a survey was conducted in order to collect the opinions of the passengers about the existing transit service, and to have a measure of the aspects that can influence the intentions of the users to continue using the transit service in the future. (Author)
Beta Regression Finite Mixture Models of Polarization and Priming
Smithson, Michael; Merkle, Edgar C.; Verkuilen, Jay
2011-01-01
This paper describes the application of finite-mixture general linear models based on the beta distribution to modeling response styles, polarization, anchoring, and priming effects in probability judgments. These models, in turn, enhance our capacity for explicitly testing models and theories regarding the aforementioned phenomena. The mixture…
Directory of Open Access Journals (Sweden)
Laura M. Grajeda
2016-01-01
Full Text Available Abstract Background Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. Methods We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Results Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001 when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001 and slopes (p < 0.001 of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001, which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and
Dynamic modeling of predictive uncertainty by regression on absolute errors
Pianosi, F.; Raso, L.
2012-01-01
Uncertainty of hydrological forecasts represents valuable information for water managers and hydrologists. This explains the popularity of probabilistic models, which provide the entire distribution of the hydrological forecast. Nevertheless, many existing hydrological models are deterministic and
Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad
2017-07-01
The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM ( R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.
Color Image Segmentation Using Fuzzy C-Regression Model
Directory of Open Access Journals (Sweden)
Min Chen
2017-01-01
Full Text Available Image segmentation is one important process in image analysis and computer vision and is a valuable tool that can be applied in fields of image processing, health care, remote sensing, and traffic image detection. Given the lack of prior knowledge of the ground truth, unsupervised learning techniques like clustering have been largely adopted. Fuzzy clustering has been widely studied and successfully applied in image segmentation. In situations such as limited spatial resolution, poor contrast, overlapping intensities, and noise and intensity inhomogeneities, fuzzy clustering can retain much more information than the hard clustering technique. Most fuzzy clustering algorithms have originated from fuzzy c-means (FCM and have been successfully applied in image segmentation. However, the cluster prototype of the FCM method is hyperspherical or hyperellipsoidal. FCM may not provide the accurate partition in situations where data consists of arbitrary shapes. Therefore, a Fuzzy C-Regression Model (FCRM using spatial information has been proposed whose prototype is hyperplaned and can be either linear or nonlinear allowing for better cluster partitioning. Thus, this paper implements FCRM and applies the algorithm to color segmentation using Berkeley’s segmentation database. The results show that FCRM obtains more accurate results compared to other fuzzy clustering algorithms.
Application of regression model on stream water quality parameters
International Nuclear Information System (INIS)
Suleman, M.; Maqbool, F.; Malik, A.H.; Bhatti, Z.A.
2012-01-01
Statistical analysis was conducted to evaluate the effect of solid waste leachate from the open solid waste dumping site of Salhad on the stream water quality. Five sites were selected along the stream. Two sites were selected prior to mixing of leachate with the surface water. One was of leachate and other two sites were affected with leachate. Samples were analyzed for pH, water temperature, electrical conductivity (EC), total dissolved solids (TDS), Biological oxygen demand (BOD), chemical oxygen demand (COD), dissolved oxygen (DO) and total bacterial load (TBL). In this study correlation coefficient r among different water quality parameters of various sites were calculated by using Pearson model and then average of each correlation between two parameters were also calculated, which shows TDS and EC and pH and BOD have significantly increasing r value, while temperature and TDS, temp and EC, DO and BL, DO and COD have decreasing r value. Single factor ANOVA at 5% level of significance was used which shows EC, TDS, TCL and COD were significantly differ among various sites. By the application of these two statistical approaches TDS and EC shows strongly positive correlation because the ions from the dissolved solids in water influence the ability of that water to conduct an electrical current. These two parameters significantly vary among 5 sites which are further confirmed by using linear regression. (author)
A generalized exponential time series regression model for electricity prices
DEFF Research Database (Denmark)
Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso
on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a better...
Ng, Kar Yong; Awang, Norhashidah
2018-01-06
Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.
Semiparametric Mixtures of Regressions with Single-index for Model Based Clustering
Xiang, Sijia; Yao, Weixin
2017-01-01
In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models a...
Genome-wide selection by mixed model ridge regression and extensions based on geostatistical models.
Schulz-Streeck, Torben; Piepho, Hans-Peter
2010-03-31
The success of genome-wide selection (GS) approaches will depend crucially on the availability of efficient and easy-to-use computational tools. Therefore, approaches that can be implemented using mixed models hold particular promise and deserve detailed study. A particular class of mixed models suitable for GS is given by geostatistical mixed models, when genetic distance is treated analogously to spatial distance in geostatistics. We consider various spatial mixed models for use in GS. The analyses presented for the QTL-MAS 2009 dataset pay particular attention to the modelling of residual errors as well as of polygenetic effects. It is shown that geostatistical models are viable alternatives to ridge regression, one of the common approaches to GS. Correlations between genome-wide estimated breeding values and true breeding values were between 0.879 and 0.889. In the example considered, we did not find a large effect of the residual error variance modelling, largely because error variances were very small. A variance components model reflecting the pedigree of the crosses did not provide an improved fit. We conclude that geostatistical models deserve further study as a tool to GS that is easily implemented in a mixed model package.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
A logistic regression model for Ghana National Health Insurance claims
Directory of Open Access Journals (Sweden)
Samuel Antwi
2013-07-01
Full Text Available In August 2003, the Ghanaian Government made history by implementing the first National Health Insurance System (NHIS in Sub-Saharan Africa. Within three years, over half of the country’s population had voluntarily enrolled into the National Health Insurance Scheme. This study had three objectives: 1 To estimate the risk factors that influences the Ghana national health insurance claims. 2 To estimate the magnitude of each of the risk factors in relation to the Ghana national health insurance claims. In this work, data was collected from the policyholders of the Ghana National Health Insurance Scheme with the help of the National Health Insurance database and the patients’ attendance register of the Koforidua Regional Hospital, from 1st January to 31st December 2011. Quantitative analysis was done using the generalized linear regression (GLR models. The results indicate that risk factors such as sex, age, marital status, distance and length of stay at the hospital were important predictors of health insurance claims. However, it was found that the risk factors; health status, billed charges and income level are not good predictors of national health insurance claim. The outcome of the study shows that sex, age, marital status, distance and length of stay at the hospital are statistically significant in the determination of the Ghana National health insurance premiums since they considerably influence claims. We recommended, among other things that, the National Health Insurance Authority should facilitate the institutionalization of the collection of appropriate data on a continuous basis to help in the determination of future premiums.
Additive Intensity Regression Models in Corporate Default Analysis
DEFF Research Database (Denmark)
Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety...... of model checking techniques to identify misspecifications. In our final model, we find evidence of time-variation in the effects of distance-to-default and short-to-long term debt. Also we identify interactions between distance-to-default and other covariates, and the quick ratio covariate is significant....... None of our macroeconomic covariates are significant....
By, Kunthel; Qaqish, Bahjat F; Preisser, John S; Perin, Jamie; Zink, Richard C
2014-02-01
This article describes a new software for modeling correlated binary data based on orthogonalized residuals, a recently developed estimating equations approach that includes, as a special case, alternating logistic regressions. The software is flexible with respect to fitting in that the user can choose estimating equations for association models based on alternating logistic regressions or orthogonalized residuals, the latter choice providing a non-diagonal working covariance matrix for second moment parameters providing potentially greater efficiency. Regression diagnostics based on this method are also implemented in the software. The mathematical background is briefly reviewed and the software is applied to medical data sets. Published by Elsevier Ireland Ltd.
Directory of Open Access Journals (Sweden)
Soldić-Aleksić Jasna
2009-01-01
Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
Misspecified poisson regression models for large-scale registry data
DEFF Research Database (Denmark)
Grøn, Randi; Gerds, Thomas A.; Andersen, Per K.
2016-01-01
working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...
Approximate Tests of Hypotheses in Regression Models with Grouped Data
1979-02-01
in terms of Kolmogoroff -Smirnov statistic in the next section. I 1 1 I t A 4. Simulations Two models have been considered for simulations. Model I. Yuk...Fort Meade, MD 20755 2 Commanding Officer Navy LibraryrnhOffice o Naval Research National Space Technology LaboratoryBranch Office *Attn: Navy
Effects of Employing Ridge Regression in Structural Equation Models.
McQuitty, Shaun
1997-01-01
LISREL 8 invokes a ridge option when maximum likelihood or generalized least squares are used to estimate a structural equation model with a nonpositive definite covariance or correlation matrix. Implications of the ridge option for model fit, parameter estimates, and standard errors are explored through two examples. (SLD)
Interpreting parameters in the logistic regression model with random effects
DEFF Research Database (Denmark)
Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben
2000-01-01
interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...
Reflexion on linear regression trip production modelling method for ensuring good model quality
Suprayitno, Hitapriya; Ratnasari, Vita
2017-11-01
Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.
Segmented relationships to model erosion of regression effect in Cox regression.
Muggeo, Vito M R; Attanasio, Massimo
2011-08-01
In this article we propose a parsimonious parameterisation to model the so-called erosion of the covariate effect in the Cox model, namely a covariate effect approaching to zero as the follow-up time increases. The proposed parameterisation is based on the segmented relationship where proper constraints are set to accomodate for the erosion. Relevant hypothesis testing is discussed. The approach is illustrated on two historical datasets in the survival analysis literature, and some simulation studies are presented to show how the proposed framework leads to a test for a global effect with good power as compared with alternative procedures. Finally, possible generalisations are also presented for future research.
Learning Supervised Topic Models for Classification and Regression from Crowds
Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete; Pereira, Francisco
2017-01-01
The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this...
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing
Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.
2006-01-01
The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval
Directory of Open Access Journals (Sweden)
Simone Becker Lopes
2014-04-01
Full Text Available Considering the importance of spatial issues in transport planning, the main objective of this study was to analyze the results obtained from different approaches of spatial regression models. In the case of spatial autocorrelation, spatial dependence patterns should be incorporated in the models, since that dependence may affect the predictive power of these models. The results obtained with the spatial regression models were also compared with the results of a multiple linear regression model that is typically used in trips generation estimations. The findings support the hypothesis that the inclusion of spatial effects in regression models is important, since the best results were obtained with alternative models (spatial regression models or the ones with spatial variables included. This was observed in a case study carried out in the city of Porto Alegre, in the state of Rio Grande do Sul, Brazil, in the stages of specification and calibration of the models, with two distinct datasets.
Cox's regression model for dynamics of grouped unemployment data
Czech Academy of Sciences Publication Activity Database
Volf, Petr
2003-01-01
Roč. 10, č. 19 (2003), s. 151-162 ISSN 1212-074X R&D Projects: GA ČR GA402/01/0539 Institutional research plan: CEZ:AV0Z1075907 Keywords : mathematical statistics * survival analysis * Cox's model Subject RIV: BB - Applied Statistics, Operational Research
Inflation, Forecast Intervals and Long Memory Regression Models
C.S. Bos (Charles); Ph.H.B.F. Franses (Philip Hans); M. Ooms (Marius)
2001-01-01
textabstractWe examine recursive out-of-sample forecasting of monthly postwar U.S. core inflation and log price levels. We use the autoregressive fractionally integrated moving average model with explanatory variables (ARFIMAX). Our analysis suggests a significant explanatory power of leading
Inflation, Forecast Intervals and Long Memory Regression Models
Ooms, M.; Bos, C.S.; Franses, P.H.
2003-01-01
We examine recursive out-of-sample forecasting of monthly postwar US core inflation and log price levels. We use the autoregressive fractionally integrated moving average model with explanatory variables (ARFIMAX). Our analysis suggests a significant explanatory power of leading indicators
Multiple Linear Regression Model for Estimating the Price of a ...
African Journals Online (AJOL)
Ghana Mining Journal ... In the modeling, the Ordinary Least Squares (OLS) normality assumption which could introduce errors in the statistical analyses was dealt with by log transformation of the data, ensuring the data is normally ... The resultant MLRM is: Ŷi MLRM = (X'X)-1X'Y(xi') where X is the sample data matrix.
Parametric modelling of thresholds across scales in wavelet regression
Anestis Antoniadis; Piotr Fryzlewicz
2006-01-01
We propose a parametric wavelet thresholding procedure for estimation in the ‘function plus independent, identically distributed Gaussian noise’ model. To reflect the decreasing sparsity of wavelet coefficients from finer to coarser scales, our thresholds also decrease. They retain the noise-free reconstruction property while being lower than the universal threshold, and are jointly parameterised by a single scalar parameter. We show that our estimator achieves near-optimal risk rates for the...
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
A computational approach to compare regression modelling strategies in prediction research
Pajouheshnia, R.; Pestman, W.R.; Teerenstra, S.; Groenwold, R.H.
2016-01-01
BACKGROUND: It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Directory of Open Access Journals (Sweden)
Drzewiecki Wojciech
2016-12-01
Full Text Available In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques.
Shaofu Zhuyu Decoction Regresses Endometriotic Lesions in a Rat Model
Directory of Open Access Journals (Sweden)
Guanghui Zhu
2018-01-01
Full Text Available The current therapies for endometriosis are restricted by various side effects and treatment outcome has been less than satisfactory. Shaofu Zhuyu Decoction (SZD, a classic traditional Chinese medicinal (TCM prescription for dysmenorrhea, has been widely used in clinical practice by TCM doctors to relieve symptoms of endometriosis. The present study aimed to investigate the effects of SZD on a rat model of endometriosis. Forty-eight female Sprague-Dawley rats with regular estrous cycles went through autotransplantation operation to establish endometriosis model. Then 38 rats with successful ectopic implants were randomized into two groups: vehicle- and SZD-treated groups. The latter were administered SZD through oral gavage for 4 weeks. By the end of the treatment period, the volume of the endometriotic lesions was measured, the histopathological properties of the ectopic endometrium were evaluated, and levels of proliferating cell nuclear antigen (PCNA, CD34, and hypoxia inducible factor- (HIF- 1α in the ectopic endometrium were detected with immunohistochemistry. Furthermore, apoptosis was assessed using the terminal deoxynucleotidyl transferase (TdT deoxyuridine 5′-triphosphate (dUTP nick-end labeling (TUNEL assay. In this study, SZD significantly reduced the size of ectopic lesions in rats with endometriosis, inhibited cell proliferation, increased cell apoptosis, and reduced microvessel density and HIF-1α expression. It suggested that SZD could be an effective therapy for the treatment and prevention of endometriosis recurrence.
U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...
Tan, C. H.; Matjafri, M. Z.; Lim, H. S.
2015-10-01
This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
Parametric vs. Nonparametric Regression Modelling within Clinical Decision Support
Czech Academy of Sciences Publication Activity Database
Kalina, Jan; Zvárová, Jana
2017-01-01
Roč. 5, č. 1 (2017), s. 21-27 ISSN 1805-8698 R&D Projects: GA ČR GA17-01251S Institutional support: RVO:67985807 Keywords : decision support systems * decision rules * statistical analysis * nonparametric regression Subject RIV: IN - Informatics, Computer Science OBOR OECD: Statistics and probability
Bilinear regression model with Kronecker and linear structures for ...
African Journals Online (AJOL)
On the basis of n independent observations from a matrix normal distribution, estimating equations in a flip-flop relation are established and the consistency of estimators is studied. Keywords: Bilinear regression; Estimating equations; Flip- flop algorithm; Kronecker product structure; Linear structured covariance matrix; ...
Covariance Functions and Random Regression Models in the ...
African Journals Online (AJOL)
ARC-IRENE
CFs were on age of the cow expressed in months (AM) using quadratic (order three) regressions based on orthogonal (Legendre) polynomials, initially proposed by Kirkpatrick & Heckman (1989). The matrices of coefficients KG and KC (corresponding to the additive genetic and permanent environmental functions, G.
A binary logistic regression model with complex sampling design of ...
African Journals Online (AJOL)
2017-09-03
Sep 3, 2017 ... SPSS-21. Binary logistic regression with complex sam- pling design was fitted for the unmet need outcomes. Married women are disaggregated by various background characteristics to have an insight of their characteristics. All background characteristics of women used in this study were categorical ...
Nagel-Alne, G E; Krontveit, R; Bohlin, J; Valle, P S; Skjerve, E; Sølverød, L S
2014-07-01
In 2001, the Norwegian Goat Health Service initiated the Healthier Goats program (HG), with the aim of eradicating caprine arthritis encephalitis, caseous lymphadenitis, and Johne's disease (caprine paratuberculosis) in Norwegian goat herds. The aim of the present study was to explore how control and eradication of the above-mentioned diseases by enrolling in HG affected milk yield by comparison with herds not enrolled in HG. Lactation curves were modeled using a multilevel cubic spline regression model where farm, goat, and lactation were included as random effect parameters. The data material contained 135,446 registrations of daily milk yield from 28,829 lactations in 43 herds. The multilevel cubic spline regression model was applied to 4 categories of data: enrolled early, control early, enrolled late, and control late. For enrolled herds, the early and late notations refer to the situation before and after enrolling in HG; for nonenrolled herds (controls), they refer to development over time, independent of HG. Total milk yield increased in the enrolled herds after eradication: the total milk yields in the fourth lactation were 634.2 and 873.3 kg in enrolled early and enrolled late herds, respectively, and 613.2 and 701.4 kg in the control early and control late herds, respectively. Day of peak yield differed between enrolled and control herds. The day of peak yield came on d 6 of lactation for the control early category for parities 2, 3, and 4, indicating an inability of the goats to further increase their milk yield from the initial level. For enrolled herds, on the other hand, peak yield came between d 49 and 56, indicating a gradual increase in milk yield after kidding. Our results indicate that enrollment in the HG disease eradication program improved the milk yield of dairy goats considerably, and that the multilevel cubic spline regression was a suitable model for exploring effects of disease control and eradication on milk yield. Copyright © 2014
Song, Chao; Kwan, Mei-Po; Zhu, Jiping
2017-04-08
An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Kala, Abhishek K; Tiwari, Chetan; Mikler, Armin R; Atkinson, Samuel F
2017-01-01
The primary aim of the study reported here was to determine the effectiveness of utilizing local spatial variations in environmental data to uncover the statistical relationships between West Nile Virus (WNV) risk and environmental factors. Because least squares regression methods do not account for spatial autocorrelation and non-stationarity of the type of spatial data analyzed for studies that explore the relationship between WNV and environmental determinants, we hypothesized that a geographically weighted regression model would help us better understand how environmental factors are related to WNV risk patterns without the confounding effects of spatial non-stationarity. We examined commonly mapped environmental factors using both ordinary least squares regression (LSR) and geographically weighted regression (GWR). Both types of models were applied to examine the relationship between WNV-infected dead bird counts and various environmental factors for those locations. The goal was to determine which approach yielded a better predictive model. LSR efforts lead to identifying three environmental variables that were statistically significantly related to WNV infected dead birds (adjusted R 2 = 0.61): stream density, road density, and land surface temperature. GWR efforts increased the explanatory value of these three environmental variables with better spatial precision (adjusted R 2 = 0.71). The spatial granularity resulting from the geographically weighted approach provides a better understanding of how environmental spatial heterogeneity is related to WNV risk as implied by WNV infected dead birds, which should allow improved planning of public health management strategies.
Directory of Open Access Journals (Sweden)
Chao Song
2017-04-01
Full Text Available An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR and geographically and temporally weighted regression (GTWR, which integrates spatial and temporal effects and global linear regression models (LM for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Semiparametric nonlinear quantile regression model for financial returns
Czech Academy of Sciences Publication Activity Database
Avdulaj, Krenar; Baruník, Jozef
2017-01-01
Roč. 21, č. 1 (2017), s. 81-97 ISSN 1081-1826 R&D Projects: GA ČR(CZ) GBP402/12/G097 Institutional support: RVO:67985556 Keywords : copula quantile regression * realized volatility * value-at-risk Subject RIV: AH - Economics OBOR OECD: Applied Economics, Econometrics Impact factor: 0.649, year: 2016 http://library.utia.cas.cz/separaty/2017/E/avdulaj-0472346.pdf
Directory of Open Access Journals (Sweden)
Soyoung Park
2017-07-01
Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.
Bonfatti, V; Tiezzi, F; Miglior, F; Carnier, P
2017-09-01
The objective of this study was to compare the prediction accuracy of 92 infrared prediction equations obtained by different statistical approaches. The predicted traits included fatty acid composition (n = 1,040); detailed protein composition (n = 1,137); lactoferrin (n = 558); pH and coagulation properties (n = 1,296); curd yield and composition obtained by a micro-cheese making procedure (n = 1,177); and Ca, P, Mg, and K contents (n = 689). The statistical methods used to develop the prediction equations were partial least squares regression (PLSR), Bayesian ridge regression, Bayes A, Bayes B, Bayes C, and Bayesian least absolute shrinkage and selection operator. Model performances were assessed, for each trait and model, in training and validation sets over 10 replicates. In validation sets, Bayesian regression models performed significantly better than PLSR for the prediction of 33 out of 92 traits, especially fatty acids, whereas they yielded a significantly lower prediction accuracy than PLSR in the prediction of 8 traits: the percentage of C18:1n-7 trans-9 in fat; the content of unglycosylated κ-casein and its percentage in protein; the content of α-lactalbumin; the percentage of α S2 -casein in protein; and the contents of Ca, P, and Mg. Even though Bayesian methods produced a significant enhancement of model accuracy in many traits compared with PLSR, most variations in the coefficient of determination in validation sets were smaller than 1 percentage point. Over traits, the highest predictive ability was obtained by Bayes C even though most of the significant differences in accuracy between Bayesian regression models were negligible. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
[Application of detecting and taking overdispersion into account in Poisson regression model].
Bouche, G; Lepage, B; Migeot, V; Ingrand, P
2009-08-01
Researchers often use the Poisson regression model to analyze count data. Overdispersion can occur when a Poisson regression model is used, resulting in an underestimation of variance of the regression model parameters. Our objective was to take overdispersion into account and assess its impact with an illustration based on the data of a study investigating the relationship between use of the Internet to seek health information and number of primary care consultations. Three methods, overdispersed Poisson, a robust estimator, and negative binomial regression, were performed to take overdispersion into account in explaining variation in the number (Y) of primary care consultations. We tested overdispersion in the Poisson regression model using the ratio of the sum of Pearson residuals over the number of degrees of freedom (chi(2)/df). We then fitted the three models and compared parameter estimation to the estimations given by Poisson regression model. Variance of the number of primary care consultations (Var[Y]=21.03) was greater than the mean (E[Y]=5.93) and the chi(2)/df ratio was 3.26, which confirmed overdispersion. Standard errors of the parameters varied greatly between the Poisson regression model and the three other regression models. Interpretation of estimates from two variables (using the Internet to seek health information and single parent family) would have changed according to the model retained, with significant levels of 0.06 and 0.002 (Poisson), 0.29 and 0.09 (overdispersed Poisson), 0.29 and 0.13 (use of a robust estimator) and 0.45 and 0.13 (negative binomial) respectively. Different methods exist to solve the problem of underestimating variance in the Poisson regression model when overdispersion is present. The negative binomial regression model seems to be particularly accurate because of its theorical distribution ; in addition this regression is easy to perform with ordinary statistical software packages.
Prahutama, Alan; Suparti; Wahyu Utami, Tiani
2018-03-01
Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
Parental Vaccine Acceptance: A Logistic Regression Model Using Previsit Decisions.
Lee, Sara; Riley-Behringer, Maureen; Rose, Jeanmarie C; Meropol, Sharon B; Lazebnik, Rina
2017-07-01
This study explores how parents' intentions regarding vaccination prior to their children's visit were associated with actual vaccine acceptance. A convenience sample of parents accompanying 6-week-old to 17-year-old children completed a written survey at 2 pediatric practices. Using hierarchical logistic regression, for hospital-based participants (n = 216), vaccine refusal history ( P < .01) and vaccine decision made before the visit ( P < .05) explained 87% of vaccine refusals. In community-based participants (n = 100), vaccine refusal history ( P < .01) explained 81% of refusals. Over 1 in 5 parents changed their minds about vaccination during the visit. Thirty parents who were previous vaccine refusers accepted current vaccines, and 37 who had intended not to vaccinate choose vaccination. Twenty-nine parents without a refusal history declined vaccines, and 32 who did not intend to refuse before the visit declined vaccination. Future research should identify key factors to nudge parent decision making in favor of vaccination.
Forecast Model of Urban Stagnant Water Based on Logistic Regression
Directory of Open Access Journals (Sweden)
Liu Pan
2017-01-01
Full Text Available With the development of information technology, the construction of water resource system has been gradually carried out. In the background of big data, the work of water information needs to carry out the process of quantitative to qualitative change. Analyzing the correlation of data and exploring the deep value of data which are the key of water information’s research. On the basis of the research on the water big data and the traditional data warehouse architecture, we try to find out the connection of different data source. According to the temporal and spatial correlation of stagnant water and rainfall, we use spatial interpolation to integrate data of stagnant water and rainfall which are from different data source and different sensors, then use logistic regression to find out the relationship between them.
Validation of regression models for nitrate concentrations in the upper groundwater in sandy soils
Sonneveld, M.P.W.; Brus, D.J.; Roelsma, J.
2010-01-01
For Dutch sandy regions, linear regression models have been developed that predict nitrate concentrations in the upper groundwater on the basis of residual nitrate contents in the soil in autumn. The objective of our study was to validate these regression models for one particular sandy region
Use of a Regression Model to Study Host-Genomic Determinants of Phage Susceptibility in MRSA
DEFF Research Database (Denmark)
Zschach, Henrike; Larsen, Mette Voldby; Hasman, Henrik
2018-01-01
strains to 12 (nine monovalent) different therapeutic phage preparations and subsequently employed linear regression models to estimate the influence of individual host gene families on resistance to phages. Specifically, we used a two-step regression model setup with a preselection step based on gene...
Bayesian networks with a logistic regression model for the conditional probabilities
Rijmen, F.P.J.
2008-01-01
Logistic regression techniques can be used to restrict the conditional probabilities of a Bayesian network for discrete variables. More specifically, each variable of the network can be modeled through a logistic regression model, in which the parents of the variable define the covariates. When all
Technology diffusion in hospitals : A log odds random effects regression model
Blank, J.L.T.; Valdmanis, V.G.
2013-01-01
This study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describe the
Technology diffusion in hospitals: A log odds random effects regression model
J.L.T. Blank (Jos); V.G. Valdmanis (Vivian G.)
2015-01-01
textabstractThis study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to
Logistic regression model for detecting radon prone areas in Ireland.
Elío, J; Crowley, Q; Scanlon, R; Hodgson, J; Long, S
2017-12-01
A new high spatial resolution radon risk map of Ireland has been developed, based on a combination of indoor radon measurements (n=31,910) and relevant geological information (i.e. Bedrock Geology, Quaternary Geology, soil permeability and aquifer type). Logistic regression was used to predict the probability of having an indoor radon concentration above the national reference level of 200Bqm -3 in Ireland. The four geological datasets evaluated were found to be statistically significant, and, based on combinations of these four variables, the predicted probabilities ranged from 0.57% to 75.5%. Results show that the Republic of Ireland may be divided in three main radon risk categories: High (HR), Medium (MR) and Low (LR). The probability of having an indoor radon concentration above 200Bqm -3 in each area was found to be 19%, 8% and 3%; respectively. In the Republic of Ireland, the population affected by radon concentrations above 200Bqm -3 is estimated at ca. 460k (about 10% of the total population). Of these, 57% (265k), 35% (160k) and 8% (35k) are in High, Medium and Low Risk Areas, respectively. Our results provide a high spatial resolution utility which permit customised radon-awareness information to be targeted at specific geographic areas. Copyright © 2017 Elsevier B.V. All rights reserved.
A Regression Algorithm for Model Reduction of Large-Scale Multi-Dimensional Problems
Rasekh, Ehsan
2011-11-01
Model reduction is an approach for fast and cost-efficient modelling of large-scale systems governed by Ordinary Differential Equations (ODEs). Multi-dimensional model reduction has been suggested for reduction of the linear systems simultaneously with respect to frequency and any other parameter of interest. Multi-dimensional model reduction is also used to reduce the weakly nonlinear systems based on Volterra theory. Multiple dimensions degrade the efficiency of reduction by increasing the size of the projection matrix. In this paper a new methodology is proposed to efficiently build the reduced model based on regression analysis. A numerical example confirms the validity of the proposed regression algorithm for model reduction.
DEFF Research Database (Denmark)
Tan, Qihua; Bathum, L; Christiansen, L
2003-01-01
In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...
Amalia, Junita; Purhadi, Otok, Bambang Widjanarko
2017-11-01
Poisson distribution is a discrete distribution with count data as the random variables and it has one parameter defines both mean and variance. Poisson regression assumes mean and variance should be same (equidispersion). Nonetheless, some case of the count data unsatisfied this assumption because variance exceeds mean (over-dispersion). The ignorance of over-dispersion causes underestimates in standard error. Furthermore, it causes incorrect decision in the statistical test. Previously, paired count data has a correlation and it has bivariate Poisson distribution. If there is over-dispersion, modeling paired count data is not sufficient with simple bivariate Poisson regression. Bivariate Poisson Inverse Gaussian Regression (BPIGR) model is mix Poisson regression for modeling paired count data within over-dispersion. BPIGR model produces a global model for all locations. In another hand, each location has different geographic conditions, social, cultural and economic so that Geographically Weighted Regression (GWR) is needed. The weighting function of each location in GWR generates a different local model. Geographically Weighted Bivariate Poisson Inverse Gaussian Regression (GWBPIGR) model is used to solve over-dispersion and to generate local models. Parameter estimation of GWBPIGR model obtained by Maximum Likelihood Estimation (MLE) method. Meanwhile, hypothesis testing of GWBPIGR model acquired by Maximum Likelihood Ratio Test (MLRT) method.
Regression models for analyzing radiological visual grading studies--an empirical comparison.
Saffari, S Ehsan; Löve, Áskell; Fredrikson, Mats; Smedby, Örjan
2015-10-30
For optimizing and evaluating image quality in medical imaging, one can use visual grading experiments, where observers rate some aspect of image quality on an ordinal scale. To analyze the grading data, several regression methods are available, and this study aimed at empirically comparing such techniques, in particular when including random effects in the models, which is appropriate for observers and patients. Data were taken from a previous study where 6 observers graded or ranked in 40 patients the image quality of four imaging protocols, differing in radiation dose and image reconstruction method. The models tested included linear regression, the proportional odds model for ordinal logistic regression, the partial proportional odds model, the stereotype logistic regression model and rank-order logistic regression (for ranking data). In the first two models, random effects as well as fixed effects could be included; in the remaining three, only fixed effects. In general, the goodness of fit (AIC and McFadden's Pseudo R (2)) showed small differences between the models with fixed effects only. For the mixed-effects models, higher AIC and lower Pseudo R (2) was obtained, which may be related to the different number of parameters in these models. The estimated potential for dose reduction by new image reconstruction methods varied only slightly between models. The authors suggest that the most suitable approach may be to use ordinal logistic regression, which can handle ordinal data and random effects appropriately.
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Can We Use Regression Modeling to Quantify Mean Annual Streamflow at a Global-Scale?
Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.
2016-12-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.
[Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].
Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L
2017-03-10
To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.
Clinical Predictors of Regression of Choroidal Melanomas after Brachytherapy: A Growth Curve Model.
Rashid, Mamunur; Heikkonen, Jorma; Singh, Arun D; Kivelä, Tero T
2018-02-27
To build multivariate models to assess correctly and efficiently the contribution of tumor characteristics on the rate of regression of choroidal melanomas after brachytherapy in a way that adjusts for confounding and takes into account variation in tumor regression patterns. Modeling of longitudinal observational data. Ultrasound images from 330 of 388 consecutive choroidal melanomas (87%) irradiated from 2000 through 2008 at the Helsinki University Hospital, Helsinki, Finland, a national referral center. Images were obtained with a 10-MHz B-scan during 3 years of follow-up. Change in tumor thickness and cross-sectional area were modeled using a polynomial growth-curve function in a nested mixed linear regression model considering regression pattern and tumor levels. Initial tumor dimensions, tumor-node-metastasis (TNM) stage, shape, ciliary body involvement, pigmentation, isotope, plaque size, detached muscles, and radiation parameters were considered as covariates. Covariates that independently predict tumor regression. Initial tumor thickness, largest basal diameter, ciliary body involvement, TNM stage, tumor shape group, break in Bruch's membrane, having muscles detached, and radiation dose to tumor base predicted faster regression, whether considering all tumors or those that regressed in a pattern compatible with exponential decay. Dark brown pigmentation was associated with slower regression. In multivariate modeling, initial tumor thickness remained the predominant and robust predictor of tumor regression (P future analyses efficiently without matching. Copyright © 2018 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Formulating state space models in R with focus on longitudinal regression models
DEFF Research Database (Denmark)
Dethlefsen, Claus; Lundbye-Christensen, Søren
We provide a language for formulating a range of state space models. The described methodology is implemented in the R -package sspir available from cran.r-project.org . A state space model is specified similarly to a generalized linear model in R , by marking the time-varying terms in the form...... We provide a language for formulating a range of state space models. The described methodology is implemented in the R -package sspir available from cran.r-project.org . A state space model is specified similarly to a generalized linear model in R , by marking the time-varying terms...
International Nuclear Information System (INIS)
Fang, Xiande; Xu, Yu
2011-01-01
The empirical model of turbine efficiency is necessary for the control- and/or diagnosis-oriented simulation and useful for the simulation and analysis of dynamic performances of the turbine equipment and systems, such as air cycle refrigeration systems, power plants, turbine engines, and turbochargers. Existing empirical models of turbine efficiency are insufficient because there is no suitable form available for air cycle refrigeration turbines. This work performs a critical review of empirical models (called mean value models in some literature) of turbine efficiency and develops an empirical model in the desired form for air cycle refrigeration, the dominant cooling approach in aircraft environmental control systems. The Taylor series and regression analysis are used to build the model, with the Taylor series being used to expand functions with the polytropic exponent and the regression analysis to finalize the model. The measured data of a turbocharger turbine and two air cycle refrigeration turbines are used for the regression analysis. The proposed model is compact and able to present the turbine efficiency map. Its predictions agree with the measured data very well, with the corrected coefficient of determination R c 2 ≥ 0.96 and the mean absolute percentage deviation = 1.19% for the three turbines. -- Highlights: → Performed a critical review of empirical models of turbine efficiency. → Developed an empirical model in the desired form for air cycle refrigeration, using the Taylor expansion and regression analysis. → Verified the method for developing the empirical model. → Verified the model.
Random regression test-day model for the analysis of dairy cattle ...
African Journals Online (AJOL)
Genetic evaluation of dairy cattle using test-day models is now common internationally. In South Africa a fixed regression test-day model is used to generate breeding values for dairy animals on a routine basis. The model is, however, often criticized for erroneously assuming a standard lactation curve for cows in similar ...
Structured Additive Regression Models: An R Interface to BayesX
Directory of Open Access Journals (Sweden)
Nikolaus Umlauf
2015-02-01
Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.
Bonellie, Sandra R
2012-10-01
To illustrate the use of regression and logistic regression models to investigate changes over time in size of babies particularly in relation to social deprivation, age of the mother and smoking. Mean birthweight has been found to be increasing in many countries in recent years, but there are still a group of babies who are born with low birthweights. Population-based retrospective cohort study. Multiple linear regression and logistic regression models are used to analyse data on term 'singleton births' from Scottish hospitals between 1994-2003. Mothers who smoke are shown to give birth to lighter babies on average, a difference of approximately 0.57 Standard deviations lower (95% confidence interval. 0.55-0.58) when adjusted for sex and parity. These mothers are also more likely to have babies that are low birthweight (odds ratio 3.46, 95% confidence interval 3.30-3.63) compared with non-smokers. Low birthweight is 30% more likely where the mother lives in the most deprived areas compared with the least deprived, (odds ratio 1.30, 95% confidence interval 1.21-1.40). Smoking during pregnancy is shown to have a detrimental effect on the size of infants at birth. This effect explains some, though not all, of the observed socioeconomic birthweight. It also explains much of the observed birthweight differences by the age of the mother. Identifying mothers at greater risk of having a low birthweight baby as important implications for the care and advice this group receives. © 2012 Blackwell Publishing Ltd.
Formulating state space models in R with focus on longitudinal regression models
DEFF Research Database (Denmark)
Dethlefsen, Claus; Lundbye-Christensen, Søren
2006-01-01
We provide a language for formulating a range of state space models with response densities within the exponential family. The described methodology is implemented in the R-package sspir. A state space model is specified similarly to a generalized linear model in R, and then the time-varying terms...
Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei
2014-01-01
The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.
Developing and testing a global-scale regression model to quantify mean annual streamflow
Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.
2017-01-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.
Generic global regression models for growth prediction of Salmonella in ground pork and pork cuts
DEFF Research Database (Denmark)
Buschhardt, Tasja; Hansen, Tina Beck; Bahl, Martin Iain
2017-01-01
Introduction and Objectives Models for the prediction of bacterial growth in fresh pork are primarily developed using two-step regression (i.e. primary models followed by secondary models). These models are also generally based on experiments in liquids or ground meat and neglect surface growth....... It has been shown that one-step global regressions can result in more accurate models and that bacterial growth on intact surfaces can substantially differ from growth in liquid culture. Material and Methods We used a global-regression approach to develop predictive models for the growth of Salmonella...... for three pork matrices: on the surface of shoulder (neck) and hind part (ham), and in ground pork. We conducted five experimental trials and inoculated essentially sterile pork pieces with a Salmonella cocktail (n = 192). Inoculated meat was aerobically incubated at 4 °C, 7 °C, 12 °C, and 16 °C for 96 h...
Directory of Open Access Journals (Sweden)
Lukianenko Iryna H.
2014-01-01
Full Text Available The article considers possibilities and specific features of modelling economic phenomena with the help of the category of models that unite elements of econometric regressions and artificial neural networks. This category of models contains auto-regression neural networks (AR-NN, regressions of smooth transition (STR/STAR, multi-mode regressions of smooth transition (MRSTR/MRSTAR and smooth transition regressions with neural coefficients (NCSTR/NCSTAR. Availability of the neural network component allows models of this category achievement of a high empirical authenticity, including reproduction of complex non-linear interrelations. On the other hand, the regression mechanism expands possibilities of interpretation of the obtained results. An example of multi-mode monetary rule is used to show one of the cases of specification and interpretation of this model. In particular, the article models and interprets principles of management of the UAH exchange rate that come into force when economy passes from a relatively stable into a crisis state.
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Drzewiecki, Wojciech
2016-12-01
In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.
Modeling and prediction of Turkey's electricity consumption using Support Vector Regression
International Nuclear Information System (INIS)
Kavaklioglu, Kadir
2011-01-01
Support Vector Regression (SVR) methodology is used to model and predict Turkey's electricity consumption. Among various SVR formalisms, ε-SVR method was used since the training pattern set was relatively small. Electricity consumption is modeled as a function of socio-economic indicators such as population, Gross National Product, imports and exports. In order to facilitate future predictions of electricity consumption, a separate SVR model was created for each of the input variables using their current and past values; and these models were combined to yield consumption prediction values. A grid search for the model parameters was performed to find the best ε-SVR model for each variable based on Root Mean Square Error. Electricity consumption of Turkey is predicted until 2026 using data from 1975 to 2006. The results show that electricity consumption can be modeled using Support Vector Regression and the models can be used to predict future electricity consumption. (author)
MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES
Directory of Open Access Journals (Sweden)
Parameshwar V. Pandit
2012-06-01
Full Text Available Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established by multiple logistic regression model using SPSS statistical software. Results: The factors like frequency of brushing, timings of cleaning teeth and type of toothpastes are significant persistent predictors of dental caries and periodontal disease. The log likelihood value of full model is –1013.1364 and Akaike’s Information Criterion (AIC is 1.1752 as compared to reduced regression model are -1019.8106 and 1.1748 respectively for dental caries. But, the log likelihood value of full model is –1085.7876 and AIC is 1.2577 followed by reduced regression model are -1019.8106 and 1.1748 respectively for periodontal disease. The area under Receiver Operating Characteristic (ROC curve for the dental caries is 0.7509 (full model and 0.7447 (reduced model; the ROC for the periodontal disease is 0.6128 (full model and 0.5821 (reduced model. Conclusions: The frequency of brushing, timings of cleaning teeth and type of toothpastes are main signifi cant risk factors of dental caries and periodontal disease. The fitting performance of reduced logistic regression model is slightly a better fit as compared to full logistic regression model in identifying the these risk factors for both dichotomous dental caries and periodontal disease.
Bias and Uncertainty in Regression-Calibrated Models of Groundwater Flow in Heterogeneous Media
DEFF Research Database (Denmark)
Cooley, R.L.; Christensen, Steen
2006-01-01
small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate θ* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear...... are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is large to test robustness of the methodology. Numerical results conform with the theoretical analysis....
Directory of Open Access Journals (Sweden)
Hossein Fallahzadeh
2017-05-01
Full Text Available Introduction: Different statistical methods can be used to analyze fertility data. When the response variable is discrete, Poisson model is applied. If the condition does not hold for the Poisson model, its generalized model will be applied. The goal of this study was to compare the efficiency of generalized Poisson regression model with the standard Poisson regression model in estimating the coefficient of effective factors onthe current number of children. Methods: This is a cross-sectional study carried out on a populationof married women within the age range of15-49 years in Kashan, Iran. The cluster sampling method was used for data collection. Clusters consisted ofthe urbanblocksdeterminedby the municipality.Atotal number of10clusters each containing30households was selected according to the health center's framework. The necessary data were then collected through a self-madequestionnaireanddirectinterviewswith women under study. Further, the data analysiswas performed by usingthe standard and generalizedPoisson regression models through theRsoftware. Results: The average number of children for each woman was 1.45 with a variance of 1.073.A significant relationship was observed between the husband's age, number of unwanted pregnancies, and the average durationof breastfeeding with the present number of children in the two standard and generalized Poisson regression models (p < 0.05.The mean ageof women participating in thisstudy was33.1± 7.57 years (from 25.53 years to 40.67, themean age of marriage was 20.09 ± 3.82 (from16.27 years to23.91, and themean age of their husbands was 37.9 ± 8.4years (from 29.5 years to 46.3. In the current study, the majority of women werein the age range of 30-35years old with the medianof 32years, however, most ofmen were in the age range of 35-40yearswith the median of37years. While 236of women did not have unwanted pregnancies, most participants of the present study had one unwanted pregnancy
The limiting behavior of the estimated parameters in a misspecified random field regression model
DEFF Research Database (Denmark)
Dahl, Christian Møller; Qin, Yu
This paper examines the limiting properties of the estimated parameters in the random field regression model recently proposed by Hamilton (Econometrica, 2001). Though the model is parametric, it enjoys the flexibility of the nonparametric approach since it can approximate a large collection...... convenient new uniform convergence results that we propose. This theory may have applications beyond those presented here. Our results indicate that classical statistical inference techniques, in general, works very well for random field regression models in finite samples and that these models succesfully...
DEFF Research Database (Denmark)
Carstensen, Bendix
1996-01-01
This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men.......This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....
The Relationship between Economic Growth and Money Laundering – a Linear Regression Model
Directory of Open Access Journals (Sweden)
Daniel Rece
2009-09-01
Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Rank Set Sampling in Improving the Estimates of Simple Regression Model
Directory of Open Access Journals (Sweden)
M Iqbal Jeelani
2015-04-01
Full Text Available In this paper Rank set sampling (RSS is introduced with a view of increasing the efficiency of estimates of Simple regression model. Regression model is considered with respect to samples taken from sampling techniques like Simple random sampling (SRS, Systematic sampling (SYS and Rank set sampling (RSS. It is found that R2 and Adj R2 obtained from regression model based on Rank set sample is higher than rest of two sampling schemes. Similarly Root mean square error, p-values, coefficient of variation are much lower in Rank set based regression model, also under validation technique (Jackknifing there is consistency in the measure of R2, Adj R2 and RMSE in case of RSS as compared to SRS and SYS. Results are supported with an empirical study involving a real data set generated of Pinus Wallichiana taken from block Langate of district Kupwara.
International Nuclear Information System (INIS)
Fang, Tingting; Lahdelma, Risto
2016-01-01
Highlights: • Social factor is considered for the linear regression models besides weather file. • Simultaneously optimize all the coefficients for linear regression models. • SARIMA combined with linear regression is used to forecast the heat demand. • The accuracy for both linear regression and time series models are evaluated. - Abstract: Forecasting heat demand is necessary for production and operation planning of district heating (DH) systems. In this study we first propose a simple regression model where the hourly outdoor temperature and wind speed forecast the heat demand. Weekly rhythm of heat consumption as a social component is added to the model to significantly improve the accuracy. The other type of model is the seasonal autoregressive integrated moving average (SARIMA) model with exogenous variables as a combination to take weather factors, and the historical heat consumption data as depending variables. One outstanding advantage of the model is that it peruses the high accuracy for both long-term and short-term forecast by considering both exogenous factors and time series. The forecasting performance of both linear regression models and time series model are evaluated based on real-life heat demand data for the city of Espoo in Finland by out-of-sample tests for the last 20 full weeks of the year. The results indicate that the proposed linear regression model (T168h) using 168-h demand pattern with midweek holidays classified as Saturdays or Sundays gives the highest accuracy and strong robustness among all the tested models based on the tested forecasting horizon and corresponding data. Considering the parsimony of the input, the ease of use and the high accuracy, the proposed T168h model is the best in practice. The heat demand forecasting model can also be developed for individual buildings if automated meter reading customer measurements are available. This would allow forecasting the heat demand based on more accurate heat consumption
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
DEFF Research Database (Denmark)
Strathe, Anders B; Mark, Thomas; Nielsen, Bjarne
2014-01-01
Random regression models were used to estimate covariance functions between cumulated feed intake (CFI) and body weight (BW) in 8424 Danish Duroc pigs. Random regressions on second order Legendre polynomials of age were used to describe genetic and permanent environmental curves in BW and CFI...
A connection between item/subtest regression and the Rasch model
Engelen, Ronald J.H.; Jannarone, Robert J.
1989-01-01
The purpose of this paper is to link empirical Bayes methods with two specific topics in item response theory--item/subtest regression, and testing the goodness of fit of the Rasch model--under the assumptions of local independence and sufficiency. It is shown that item/subtest regression results in
Bulcock, J. W.; And Others
Advantages of normalization regression estimation over ridge regression estimation are demonstrated by reference to Bloom's model of school learning. Theoretical concern centered on the structure of scholastic achievement at grade 10 in Canadian high schools. Data on 886 students were randomly sampled from the Carnegie Human Resources Data Bank.…
Auto-correlograms and auto-regressive models of trace metal distributions in Cochin backwaters
Digital Repository Service at National Institute of Oceanography (India)
Jayalakshmy, K.V.; Sankaranarayanan, V.N.
,2 and 3 and for Zn at stations 1 and 4. The stability in time for the concentration profiles increases as Fe Mn Ni Cu Co. The fraction of variability in the variables obtained by the auto-regressive model of order 1 ranges from 20 to 50%. Auto-regressive...
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.
Shi, Jinfei; Zhu, Songqing; Chen, Ruwen
2017-12-01
An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R
Directory of Open Access Journals (Sweden)
Friedrich Leisch
2004-10-01
Full Text Available FlexMix implements a general framework for fitting discrete mixtures of regression models in the R statistical computing environment: three variants of the EM algorithm can be used for parameter estimation, regressors and responses may be multivariate with arbitrary dimension, data may be grouped, e.g., to account for multiple observations per individual, the usual formula interface of the S language is used for convenient model specification, and a modular concept of driver functions allows to interface many different types of regression models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering. FlexMix provides the E-step and all data handling, while the M-step can be supplied by the user to easily define new models.
Chiogna, Gabriele; Marcolini, Giorgia; Liu, Wanying; Pérez Ciria, Teresa; Tuo, Ye
2018-03-21
Water management in the alpine region has an important impact on streamflow. In particular, hydropower production is known to cause hydropeaking i.e., sudden fluctuations in river stage caused by the release or storage of water in artificial reservoirs. Modeling hydropeaking with hydrological models, such as the Soil Water Assessment Tool (SWAT), requires knowledge of reservoir management rules. These data are often not available since they are sensitive information belonging to hydropower production companies. In this short communication, we propose to couple the results of a calibrated hydrological model with a machine learning method to reproduce hydropeaking without requiring the knowledge of the actual reservoir management operation. We trained a support vector machine (SVM) with SWAT model outputs, the day of the week and the energy price. We tested the model for the Upper Adige river basin in North-East Italy. A wavelet analysis showed that energy price has a significant influence on river discharge, and a wavelet coherence analysis demonstrated the improved performance of the SVM model in comparison to the SWAT model alone. The SVM model was also able to capture the fluctuations in streamflow caused by hydropeaking when both energy price and river discharge displayed a complex temporal dynamic. Copyright © 2018 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Anke Hüls
2017-05-01
Full Text Available Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model and (ii to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate
Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A
2014-09-01
Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
Longitudinal beta regression models for analyzing health-related quality of life scores over time
Directory of Open Access Journals (Sweden)
Hunger Matthias
2012-09-01
Full Text Available Abstract Background Health-related quality of life (HRQL has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice. Methods We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy. Results At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors. Conclusions Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
2016-09-01
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Genomic prediction based on data from three layer lines using non-linear regression models
Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.
2014-01-01
Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate
Kleijnen, J.P.C.
1995-01-01
This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for
A national fine spatial scale land-use regression model for ozone
Kerckhoffs, Jules|info:eu-repo/dai/nl/411260502; Wang, Meng|info:eu-repo/dai/nl/345480279; Meliefste, Kees; Malmqvist, Ebba; Fischer, Paul; Janssen, Nicole A H; Beelen, Rob|info:eu-repo/dai/nl/30483100X; Hoek, Gerard|info:eu-repo/dai/nl/069553475
Uncertainty about health effects of long-term ozone exposure remains. Land use regression (LUR) models have been used successfully for modeling fine scale spatial variation of primary pollutants but very limited for ozone. Our objective was to assess the feasibility of developing a national LUR
A LATENT CLASS POISSON REGRESSION-MODEL FOR HETEROGENEOUS COUNT DATA
WEDEL, M; DESARBO, WS; BULT, [No Value; RAMASWAMY, [No Value
1993-01-01
In this paper an approach is developed that accommodates heterogeneity in Poisson regression models for count data. The model developed assumes that heterogeneity arises from a distribution of both the intercept and the coefficients of the explanatory variables. We assume that the mixing
Determining factors influencing survival of breast cancer by fuzzy logistic regression model.
Nikbakht, Roya; Bahrampour, Abbas
2017-01-01
Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.
de Vries, S O; Fidler, Vaclav; Kuipers, Wietze D; Hunink, Maria G M
1998-01-01
The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a
Logistic regression models of factors influencing the location of bioenergy and biofuels plants
T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu
2011-01-01
Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...
Profile-driven regression for modeling and runtime optimization of mobile networks
DEFF Research Database (Denmark)
McClary, Dan; Syrotiuk, Violet; Kulahci, Murat
2010-01-01
of throughput in a mobile ad hoc network, a self-organizing collection of mobile wireless nodes without any fixed infrastructure. The intermediate models generated in profile-driven regression are used to fit an overall model of throughput, and are also used to optimize controllable factors at runtime. Unlike...
The use of logistic regression in modelling the distributions of bird ...
African Journals Online (AJOL)
The method of logistic regression was used to model the observed geographical distribution patterns of bird species in Swaziland in relation to a set of environmental variables. Reporting rates derived from bird atlas data are used as an index of population densities. This is justified in part by the success of the modelling ...
The use of logistic regression in modelling the distributions of bird ...
African Journals Online (AJOL)
The method of logistic regression was used to model the observed geographical distribution patterns of bird species in Swaziland in relation to a set of environmental variables. Reporting rates derived from brrd atlas data are used as an index of population densities. This is justified in part by the success of the modelling ...
Random regression models in the evaluation of the growth curve of Simbrasil beef cattle
Mota, M.; Marques, F.A.; Lopes, P.S.; Hidalgo, A.M.
2013-01-01
Random regression models were used to estimate the types and orders of random effects of (co)variance functions in the description of the growth trajectory of the Simbrasil cattle breed. Records for 7049 animals totaling 18,677 individual weighings were submitted to 15 models from the third to the
DEFF Research Database (Denmark)
Carstensen, Bendix
1996-01-01
This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....
Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali
2014-05-01
Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities.
Poisson regression for modeling count and frequency outcomes in trauma research.
Gagnon, David R; Doron-LaMarca, Susan; Bell, Margret; O'Farrell, Timothy J; Taft, Casey T
2008-10-01
The authors describe how the Poisson regression method for analyzing count or frequency outcome variables can be applied in trauma studies. The outcome of interest in trauma research may represent a count of the number of incidents of behavior occurring in a given time interval, such as acts of physical aggression or substance abuse. Traditional linear regression approaches assume a normally distributed outcome variable with equal variances over the range of predictor variables, and may not be optimal for modeling count outcomes. An application of Poisson regression is presented using data from a study of intimate partner aggression among male patients in an alcohol treatment program and their female partners. Results of Poisson regression and linear regression models are compared.
Amaliana, Luthfatul; Sa'adah, Umu; Wayan Surya Wardhani, Ni
2017-12-01
Tetanus Neonatorum is an infectious disease that can be prevented by immunization. The number of Tetanus Neonatorum cases in East Java Province is the highest in Indonesia until 2015. Tetanus Neonatorum data contain over dispersion and big enough proportion of zero-inflation. Negative Binomial (NB) regression is an alternative method when over dispersion happens in Poisson regression. However, the data containing over dispersion and zero-inflation are more appropriately analyzed by using Zero-Inflated Negative Binomial (ZINB) regression. The purpose of this study are: (1) to model Tetanus Neonatorum cases in East Java Province with 71.05 percent proportion of zero-inflation by using NB and ZINB regression, (2) to obtain the best model. The result of this study indicates that ZINB is better than NB regression with smaller AIC.
Accounting for spatial effects in land use regression for urban air pollution modeling.
Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G
2015-01-01
In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe
2018-01-01
In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.
Nonlinear regression modeling of nutrient loads in streams: A Bayesian approach
Qian, S.S.; Reckhow, K.H.; Zhai, J.; McMahon, G.
2005-01-01
A Bayesian nonlinear regression modeling method is introduced and compared with the least squares method for modeling nutrient loads in stream networks. The objective of the study is to better model spatial correlation in river basin hydrology and land use for improving the model as a forecasting tool. The Bayesian modeling approach is introduced in three steps, each with a more complicated model and data error structure. The approach is illustrated using a data set from three large river basins in eastern North Carolina. Results indicate that the Bayesian model better accounts for model and data uncertainties than does the conventional least squares approach. Applications of the Bayesian models for ambient water quality standards compliance and TMDL assessment are discussed. Copyright 2005 by the American Geophysical Union.
DEFF Research Database (Denmark)
Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit
2014-01-01
to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...... conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models...
Preisser, J. S.; Phillips, C.; Perin, J.; Schwartz, T. A.
2011-01-01
Objectives The article reviews proportional and partial proportional odds regression for ordered categorical outcomes, such as patient-reported measures, that are frequently used in clinical research in dentistry. Methods The proportional odds regression model for ordinal data is a generalization of ordinary logistic regression for dichotomous responses. When the proportional odds assumption holds for some but not all of the covariates, the lesser known partial proportional odds model is shown to provide a useful extension. Results The ordinal data models are illustrated for the analysis of repeated ordinal outcomes to determine whether the burden associated with sensory alteration following a bilateral sagittal split osteotomy procedure differed for those patients who were given opening exercises only following surgery and those who received sensory retraining exercises in conjunction with standard opening exercises. Conclusions Proportional and partial proportional odds models are broadly applicable to the analysis of cross-sectional and longitudinal ordinal data in dental research. PMID:21070317
truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models
Directory of Open Access Journals (Sweden)
Maria Karlsson
2014-05-01
Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.
Observer-Based and Regression Model-Based Detection of Emerging Faults in Coal Mills
DEFF Research Database (Denmark)
Odgaard, Peter Fogh; Lin, Bao; Jørgensen, Sten Bay
2006-01-01
In order to improve the reliability of power plants it is important to detect fault as fast as possible. Doing this it is interesting to find the most efficient method. Since modeling of large scale systems is time consuming it is interesting to compare a model-based method with data driven ones....... In this paper three different fault detection approaches are compared using a example of a coal mill, where a fault emerges. The compared methods are based on: an optimal unknown input observer, static and dynamic regression model-based detections. The conclusion on the comparison is that observer-based scheme...... detects the fault 13 samples earlier than the dynamic regression model-based method, and that the static regression based method is not usable due to generation of far too many false detections....
A componential model of human interaction with graphs: 1. Linear regression modeling
Gillan, Douglas J.; Lewis, Robert
1994-01-01
Task analyses served as the basis for developing the Mixed Arithmetic-Perceptual (MA-P) model, which proposes (1) that people interacting with common graphs to answer common questions apply a set of component processes-searching for indicators, encoding the value of indicators, performing arithmetic operations on the values, making spatial comparisons among indicators, and repsonding; and (2) that the type of graph and user's task determine the combination and order of the components applied (i.e., the processing steps). Two experiments investigated the prediction that response time will be linearly related to the number of processing steps according to the MA-P model. Subjects used line graphs, scatter plots, and stacked bar graphs to answer comparison questions and questions requiring arithmetic calculations. A one-parameter version of the model (with equal weights for all components) and a two-parameter version (with different weights for arithmetic and nonarithmetic processes) accounted for 76%-85% of individual subjects' variance in response time and 61%-68% of the variance taken across all subjects. The discussion addresses possible modifications in the MA-P model, alternative models, and design implications from the MA-P model.
Accounting for Zero Inflation of Mussel Parasite Counts Using Discrete Regression Models
Directory of Open Access Journals (Sweden)
Emel Çankaya
2017-06-01
Full Text Available In many ecological applications, the absences of species are inevitable due to either detection faults in samples or uninhabitable conditions for their existence, resulting in high number of zero counts or abundance. Usual practice for modelling such data is regression modelling of log(abundance+1 and it is well know that resulting model is inadequate for prediction purposes. New discrete models accounting for zero abundances, namely zero-inflated regression (ZIP and ZINB, Hurdle-Poisson (HP and Hurdle-Negative Binomial (HNB amongst others are widely preferred to the classical regression models. Due to the fact that mussels are one of the economically most important aquatic products of Turkey, the purpose of this study is therefore to examine the performances of these four models in determination of the significant biotic and abiotic factors on the occurrences of Nematopsis legeri parasite harming the existence of Mediterranean mussels (Mytilus galloprovincialis L.. The data collected from the three coastal regions of Sinop city in Turkey showed more than 50% of parasite counts on the average are zero-valued and model comparisons were based on information criterion. The results showed that the probability of the occurrence of this parasite is here best formulated by ZINB or HNB models and influential factors of models were found to be correspondent with ecological differences of the regions.
Buonaccorsi, John P; Romeo, Giovanni; Thoresen, Magne
2018-03-01
When fitting regression models, measurement error in any of the predictors typically leads to biased coefficients and incorrect inferences. A plethora of methods have been proposed to correct for this. Obtaining standard errors and confidence intervals using the corrected estimators can be challenging and, in addition, there is concern about remaining bias in the corrected estimators. The bootstrap, which is one option to address these problems, has received limited attention in this context. It has usually been employed by simply resampling observations, which, while suitable in some situations, is not always formally justified. In addition, the simple bootstrap does not allow for estimating bias in non-linear models, including logistic regression. Model-based bootstrapping, which can potentially estimate bias in addition to being robust to the original sampling or whether the measurement error variance is constant or not, has received limited attention. However, it faces challenges that are not present in handling regression models with no measurement error. This article develops new methods for model-based bootstrapping when correcting for measurement error in logistic regression with replicate measures. The methodology is illustrated using two examples, and a series of simulations are carried out to assess and compare the simple and model-based bootstrap methods, as well as other standard methods. While not always perfect, the model-based approaches offer some distinct improvements over the other methods. © 2017, The International Biometric Society.
Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression
Directory of Open Access Journals (Sweden)
Li Jian
2017-01-01
Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.
Directory of Open Access Journals (Sweden)
Gang WU
2016-01-01
Full Text Available Objective To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13
Attribute Selection Impact on Linear and Nonlinear Regression Models for Crop Yield Prediction
Directory of Open Access Journals (Sweden)
Alberto Gonzalez-Sanchez
2014-01-01
Full Text Available Efficient cropping requires yield estimation for each involved crop, where data-driven models are commonly applied. In recent years, some data-driven modeling technique comparisons have been made, looking for the best model to yield prediction. However, attributes are usually selected based on expertise assessment or in dimensionality reduction algorithms. A fairer comparison should include the best subset of features for each regression technique; an evaluation including several crops is preferred. This paper evaluates the most common data-driven modeling techniques applied to yield prediction, using a complete method to define the best attribute subset for each model. Multiple linear regression, stepwise linear regression, M5′ regression trees, and artificial neural networks (ANN were ranked. The models were built using real data of eight crops sowed in an irrigation module of Mexico. To validate the models, three accuracy metrics were used: the root relative square error (RRSE, relative mean absolute error (RMAE, and correlation factor (R. The results show that ANNs are more consistent in the best attribute subset composition between the learning and the training stages, obtaining the lowest average RRSE (86.04%, lowest average RMAE (8.75%, and the highest average correlation factor (0.63.
Regional regression models of watershed suspended-sediment discharge for the eastern United States
Roman, David C.; Vogel, Richard M.; Schwarz, Gregory E.
2012-01-01
Estimates of mean annual watershed sediment discharge, derived from long-term measurements of suspended-sediment concentration and streamflow, often are not available at locations of interest. The goal of this study was to develop multivariate regression models to enable prediction of mean annual suspended-sediment discharge from available basin characteristics useful for most ungaged river locations in the eastern United States. The models are based on long-term mean sediment discharge estimates and explanatory variables obtained from a combined dataset of 1201 US Geological Survey (USGS) stations derived from a SPAtially Referenced Regression on Watershed attributes (SPARROW) study and the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES) database. The resulting regional regression models summarized for major US water resources regions 1–8, exhibited prediction R2 values ranging from 76.9% to 92.7% and corresponding average model prediction errors ranging from 56.5% to 124.3%. Results from cross-validation experiments suggest that a majority of the models will perform similarly to calibration runs. The 36-parameter regional regression models also outperformed a 16-parameter national SPARROW model of suspended-sediment discharge and indicate that mean annual sediment loads in the eastern United States generally correlates with a combination of basin area, land use patterns, seasonal precipitation, soil composition, hydrologic modification, and to a lesser extent, topography.
Directory of Open Access Journals (Sweden)
Menon Carlo
2011-09-01
Full Text Available Abstract Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2 values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS was shown to have high isometric torque estimation accuracy combined with very short training times.
CANFIS: A non-linear regression procedure to produce statistical air-quality forecast models
Energy Technology Data Exchange (ETDEWEB)
Burrows, W.R.; Montpetit, J. [Environment Canada, Downsview, Ontario (Canada). Meteorological Research Branch; Pudykiewicz, J. [Environment Canada, Dorval, Quebec (Canada)
1997-12-31
Statistical models for forecasts of environmental variables can provide a good trade-off between significance and precision in return for substantial saving of computer execution time. Recent non-linear regression techniques give significantly increased accuracy compared to traditional linear regression methods. Two are Classification and Regression Trees (CART) and the Neuro-Fuzzy Inference System (NFIS). Both can model predict and distributions, including the tails, with much better accuracy than linear regression. Given a learning data set of matched predict and predictors, CART regression produces a non-linear, tree-based, piecewise-continuous model of the predict and data. Its variance-minimizing procedure optimizes the task of predictor selection, often greatly reducing initial data dimensionality. NFIS reduces dimensionality by a procedure known as subtractive clustering but it does not of itself eliminate predictors. Over-lapping coverage in predictor space is enhanced by NFIS with a Gaussian membership function for each cluster component. Coefficients for a continuous response model based on the fuzzified cluster centers are obtained by a least-squares estimation procedure. CANFIS is a two-stage data-modeling technique that combines the strength of CART to optimize the process of selecting predictors from a large pool of potential predictors with the modeling strength of NFIS. A CANFIS model requires negligible computer time to run. CANFIS models for ground-level O{sub 3}, particulates, and other pollutants will be produced for each of about 100 Canadian sites. The air-quality models will run twice daily using a small number of predictors isolated from a large pool of upstream and local Lagrangian potential predictors.
Improved model of the retardance in citric acid coated ferrofluids using stepwise regression
Lin, J. F.; Qiu, X. R.
2017-06-01
Citric acid (CA) coated Fe3O4 ferrofluids (FFs) have been conducted for biomedical application. The magneto-optical retardance of CA coated FFs was measured by a Stokes polarimeter. Optimization and multiple regression of retardance in FFs were executed by Taguchi method and Microsoft Excel previously, and the F value of regression model was large enough. However, the model executed by Excel was not systematic. Instead we adopted the stepwise regression to model the retardance of CA coated FFs. From the results of stepwise regression by MATLAB, the developed model had highly predictable ability owing to F of 2.55897e+7 and correlation coefficient of one. The average absolute error of predicted retardances to measured retardances was just 0.0044%. Using the genetic algorithm (GA) in MATLAB, the optimized parametric combination was determined as [4.709 0.12 39.998 70.006] corresponding to the pH of suspension, molar ratio of CA to Fe3O4, CA volume, and coating temperature. The maximum retardance was found as 31.712°, close to that obtained by evolutionary solver in Excel and a relative error of -0.013%. Above all, the stepwise regression method was successfully used to model the retardance of CA coated FFs, and the maximum global retardance was determined by the use of GA.
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.
Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H
2016-01-01
Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.
Application of Soft Computing Techniques and Multiple Regression Models for CBR prediction of Soils
Directory of Open Access Journals (Sweden)
Fatimah Khaleel Ibrahim
2017-08-01
Full Text Available The techniques of soft computing technique such as Artificial Neutral Network (ANN have improved the predicting capability and have actually discovered application in Geotechnical engineering. The aim of this research is to utilize the soft computing technique and Multiple Regression Models (MLR for forecasting the California bearing ratio CBR( of soil from its index properties. The indicator of CBR for soil could be predicted from various soils characterizing parameters with the assist of MLR and ANN methods. The data base that collected from the laboratory by conducting tests on 86 soil samples that gathered from different projects in Basrah districts. Data gained from the experimental result were used in the regression models and soft computing techniques by using artificial neural network. The liquid limit, plastic index , modified compaction test and the CBR test have been determined. In this work, different ANN and MLR models were formulated with the different collection of inputs to be able to recognize their significance in the prediction of CBR. The strengths of the models that were developed been examined in terms of regression coefficient (R2, relative error (RE% and mean square error (MSE values. From the results of this paper, it absolutely was noticed that all the proposed ANN models perform better than that of MLR model. In a specific ANN model with all input parameters reveals better outcomes than other ANN models.
Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat
2015-04-01
Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.
Schmidtmann, I; Elsäßer, A; Weinmann, A; Binder, H
2014-12-30
For determining a manageable set of covariates potentially influential with respect to a time-to-event endpoint, Cox proportional hazards models can be combined with variable selection techniques, such as stepwise forward selection or backward elimination based on p-values, or regularized regression techniques such as component-wise boosting. Cox regression models have also been adapted for dealing with more complex event patterns, for example, for competing risks settings with separate, cause-specific hazard models for each event type, or for determining the prognostic effect pattern of a variable over different landmark times, with one conditional survival model for each landmark. Motivated by a clinical cancer registry application, where complex event patterns have to be dealt with and variable selection is needed at the same time, we propose a general approach for linking variable selection between several Cox models. Specifically, we combine score statistics for each covariate across models by Fisher's method as a basis for variable selection. This principle is implemented for a stepwise forward selection approach as well as for a regularized regression technique. In an application to data from hepatocellular carcinoma patients, the coupled stepwise approach is seen to facilitate joint interpretation of the different cause-specific Cox models. In conditional survival models at landmark times, which address updates of prediction as time progresses and both treatment and other potential explanatory variables may change, the coupled regularized regression approach identifies potentially important, stably selected covariates together with their effect time pattern, despite having only a small number of events. These results highlight the promise of the proposed approach for coupling variable selection between Cox models, which is particularly relevant for modeling for clinical cancer registries with their complex event patterns. Copyright © 2014 John Wiley & Sons
Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS
Directory of Open Access Journals (Sweden)
Ade Widyaningsih
2015-04-01
Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.
Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS
Directory of Open Access Journals (Sweden)
Ade Widyaningsih
2014-06-01
Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.
Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.
2018-03-01
Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.
Kawashima, Issaku; Kumano, Hiroaki
2017-01-01
Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling
Directory of Open Access Journals (Sweden)
Issaku Kawashima
2017-07-01
Full Text Available Mind-wandering (MW, task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA
Directory of Open Access Journals (Sweden)
Ersin Yılmaz
2016-05-01
Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for the estimation of the model. With the weights regression model will be consistent and unbiased with that. And also there is a method for the censored data that is a semi parametric regression and this method also give useful results for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.
Arora, Amarpreet S; Reddy, Akepati S
2014-01-01
Stormwater management at urban sub-watershed level has been envisioned to include stormwater collection, treatment, and disposal of treated stormwater through groundwater recharging. Sizing, operation and control of the stormwater management systems require information on the quantities and characteristics of the stormwater generated. Stormwater characteristics depend upon dry spell between two successive rainfall events, intensity of rainfall and watershed characteristics. However, sampling and analysis of stormwater, spanning only few rainfall events, provides insufficient information on the characteristics. An attempt has been made in the present study to assess the stormwater characteristics through regression modeling. Stormwater of five sub-watersheds of Patiala city were sampled and analyzed. The results obtained were related with the antecedent dry periods and with the intensity of the rainfall event through regression modeling. Obtained regression models were used to assess the stormwater quality for various antecedent dry periods and rainfall event intensities.
Deep ensemble learning of sparse regression models for brain disease diagnosis.
Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang
2017-04-01
Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.
Bayes Wavelet Regression Approach to Solve Problems in Multivariable Calibration Modeling
Directory of Open Access Journals (Sweden)
Setiawan Setiawan
2010-05-01
Full Text Available In the multiple regression modeling, a serious problems would arise if the independent variables are correlated among each other (the problem of ill conditioned and the number of observations is much smaller than the number of independent variables (the problem of singularity. Bayes Regression (BR is an approach that can be used to solve the problem of ill conditioned, but computing constraints will be experienced, so pre-processing methods will be necessary in the form of dimensional reduction of independent variables. The results of empirical studies and literature shows that the discrete wavelet transform (WT gives estimation results of regression model which is better than the other preprocessing methods. This experiment will study a combination of BR with WT as pre-processing method to solve the problems ill conditioned and singularities. One application of calibration in the field of chemistry is relationship modeling between the concentration of active substance as measured by High Performance Liquid Chromatography (HPLC with Fourier Transform Infrared (FTIR absorbance spectrum. Spectrum pattern is expected to predict the value of the concentration of active substance. The exploration of Continuum Regression Wavelet Transform (CR-WT, and Partial Least Squares Regression Wavelet Transform (PLS-WT, and Bayes Regression Wavelet Transform (BR-WT shows that the BR-WT has a good performance. BR-WT is superior than PLS-WT method, and relatively is as good as CR-WT method.
SPSS macros to compare any two fitted values from a regression model.
Weaver, Bruce; Dubois, Sacha
2012-12-01
In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.
Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models.
Zeng, Ping; Zhou, Xiang
2017-09-06
Using genotype data to perform accurate genetic prediction of complex traits can facilitate genomic selection in animal and plant breeding programs, and can aid in the development of personalized medicine in humans. Because most complex traits have a polygenic architecture, accurate genetic prediction often requires modeling all genetic variants together via polygenic methods. Here, we develop such a polygenic method, which we refer to as the latent Dirichlet process regression model. Dirichlet process regression is non-parametric in nature, relies on the Dirichlet process to flexibly and adaptively model the effect size distribution, and thus enjoys robust prediction performance across a broad spectrum of genetic architectures. We compare Dirichlet process regression with several commonly used prediction methods with simulations. We further apply Dirichlet process regression to predict gene expressions, to conduct PrediXcan based gene set test, to perform genomic selection of four traits in two species, and to predict eight complex traits in a human cohort.Genetic prediction of complex traits with polygenic architecture has wide application from animal breeding to disease prevention. Here, Zeng and Zhou develop a non-parametric genetic prediction method based on latent Dirichlet Process regression models.
Multiple regression models for energy use in air-conditioned office buildings in different climates
International Nuclear Information System (INIS)
Lam, Joseph C.; Wan, Kevin K.W.; Liu Dalong; Tsang, C.L.
2010-01-01
An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R 2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.
Quantile regression for censored mixed-effects models with applications to HIV studies.
Lachos, Victor H; Chen, Ming-Hui; Abanto-Valle, Carlos A; Azevedo, Caio L N
HIV RNA viral load measures are often subjected to some upper and lower detection limits depending on the quantification assays. Hence, the responses are either left or right censored. Linear/nonlinear mixed-effects models, with slight modifications to accommodate censoring, are routinely used to analyze this type of data. Usually, the inference procedures are based on normality (or elliptical distribution) assumptions for the random terms. However, those analyses might not provide robust inference when the distribution assumptions are questionable. In this paper, we discuss a fully Bayesian quantile regression inference using Markov Chain Monte Carlo (MCMC) methods for longitudinal data models with random effects and censored responses. Compared to the conventional mean regression approach, quantile regression can characterize the entire conditional distribution of the outcome variable, and is more robust to outliers and misspecification of the error distribution. Under the assumption that the error term follows an asymmetric Laplace distribution, we develop a hierarchical Bayesian model and obtain the posterior distribution of unknown parameters at the p th level, with the median regression ( p = 0.5) as a special case. The proposed procedures are illustrated with two HIV AIDS studies on viral loads that were initially analyzed using the typical normal (censored) mean regression mixed-effects models, as well as a simulation study.
Davies, Patrick Laurie
2014-01-01
Introduction IntroductionApproximate Models Notation Two Modes of Statistical AnalysisTowards One Mode of Analysis Approximation, Randomness, Chaos, Determinism ApproximationA Concept of Approximation Approximation Approximating a Data Set by a Model Approximation Regions Functionals and EquivarianceRegularization and Optimality Metrics and DiscrepanciesStrong and Weak Topologies On Being (almost) Honest Simulations and Tables Degree of Approximation and p-values ScalesStability of Analysis The Choice of En(α, P) Independence Procedures, Approximation and VaguenessDiscrete Models The Empirical Density Metrics and Discrepancies The Total Variation Metric The Kullback-Leibler and Chi-Squared Discrepancies The Po(λ) ModelThe b(k, p) and nb(k, p) Models The Flying Bomb Data The Student Study Times Data OutliersOutliers, Data Analysis and Models Breakdown Points and Equivariance Identifying Outliers and Breakdown Outliers in Multivariate Data Outliers in Linear Regression Outliers in Structured Data The Location...
A Multilevel Regression Model for Geographical Studies in Sets of Non-Adjacent Cities.
Marí-Dell'Olmo, Marc; Martínez-Beneito, Miguel Ángel
2015-01-01
In recent years, small-area-based ecological regression analyses have been published that study the association between a health outcome and a covariate in several cities. These analyses have usually been performed independently for each city and have therefore yielded unrelated estimates for the cities considered, even though the same process has been studied in all of them. In this study, we propose a joint ecological regression model for multiple cities that accounts for spatial structure both within and between cities and explore the advantages of this model. The proposed model merges both disease mapping and geostatistical ideas. Our proposal is compared with two alternatives, one that models the association for each city as fixed effects and another that treats them as independent and identically distributed random effects. The proposed model allows us to estimate the association (and assess its significance) at locations with no available data. Our proposal is illustrated by an example of the association between unemployment (as a deprivation surrogate) and lung cancer mortality among men in 31 Spanish cities. In this example, the associations found were far more accurate for the proposed model than those from the fixed effects model. Our main conclusion is that ecological regression analyses can be markedly improved by performing joint analyses at several locations that share information among them. This finding should be taken into consideration in the design of future epidemiological studies.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
Combination of supervised and semi-supervised regression models for improved unbiased estimation
DEFF Research Database (Denmark)
Arenas-Garía, Jeronimo; Moriana-Varo, Carlos; Larsen, Jan
2010-01-01
In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised and semisupervi......In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised...
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Replica analysis of overfitting in regression models for time-to-event data
Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.
2017-09-01
Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
International Nuclear Information System (INIS)
Alvarez R, J.T.; Morales P, R.
1992-06-01
The absorbed dose for equivalent soft tissue is determined,it is imparted by ophthalmologic applicators, ( 90 Sr/ 90 Y, 1850 MBq) using an extrapolation chamber of variable electrodes; when estimating the slope of the extrapolation curve using a simple lineal regression model is observed that the dose values are underestimated from 17.7 percent up to a 20.4 percent in relation to the estimate of this dose by means of a regression model polynomial two grade, at the same time are observed an improvement in the standard error for the quadratic model until in 50%. Finally the global uncertainty of the dose is presented, taking into account the reproducibility of the experimental arrangement. As conclusion it can infers that in experimental arrangements where the source is to contact with the extrapolation chamber, it was recommended to substitute the lineal regression model by the quadratic regression model, in the determination of the slope of the extrapolation curve, for more exact and accurate measurements of the absorbed dose. (Author)
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Directory of Open Access Journals (Sweden)
S. G. Gocheva-Ilieva
2010-01-01
Full Text Available In order to model the output laser power of a copper bromide laser with wavelengths of 510.6 and 578.2 nm we have applied two regression techniques—multiple linear regression and multivariate adaptive regression splines. The models have been constructed on the basis of PCA factors for historical data. The influence of first- and second-order interactions between predictors has been taken into account. The models are easily interpreted and have good prediction power, which is established from the results of their validation. The comparison of the derived models shows that these based on multivariate adaptive regression splines have an advantage over the others. The obtained results allow for the clarification of relationships between laser generation and the observed laser input variables, for better determining their influence on laser generation, in order to improve the experimental setup and laser production technology. They can be useful for evaluation of known experiments as well as for prediction of future experiments. The developed modeling methodology is also applicable for a wide range of similar laser devices—metal vapor lasers and gas lasers.
Directory of Open Access Journals (Sweden)
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Measurement error in epidemiologic studies of air pollution based on land-use regression models.
Basagaña, Xavier; Aguilera, Inmaculada; Rivera, Marcela; Agis, David; Foraster, Maria; Marrugat, Jaume; Elosua, Roberto; Künzli, Nino
2013-10-15
Land-use regression (LUR) models are increasingly used to estimate air pollution exposure in epidemiologic studies. These models use air pollution measurements taken at a small set of locations and modeling based on geographical covariates for which data are available at all study participant locations. The process of LUR model development commonly includes a variable selection procedure. When LUR model predictions are used as explanatory variables in a model for a health outcome, measurement error can lead to bias of the regression coefficients and to inflation of their variance. In previous studies dealing with spatial predictions of air pollution, bias was shown to be small while most of the effect of measurement error was on the variance. In this study, we show that in realistic cases where LUR models are applied to health data, bias in health-effect estimates can be substantial. This bias depends on the number of air pollution measurement sites, the number of available predictors for model selection, and the amount of explainable variability in the true exposure. These results should be taken into account when interpreting health effects from studies that used LUR models.
Use of a Regression Model to Study Host-Genomic Determinants of Phage Susceptibility in MRSA
DEFF Research Database (Denmark)
Zschach, Henrike; Larsen, Mette V; Hasman, Henrik
2018-01-01
strains to 12 (nine monovalent) different therapeutic phage preparations and subsequently employed linear regression models to estimate the influence of individual host gene families on resistance to phages. Specifically, we used a two-step regression model setup with a preselection step based on gene...... family enrichment. We show that our models are robust and capture the data's underlying signal by comparing their performance to that of models build on randomized data. In doing so, we have identified 167 gene families that govern phage resistance in our strain set and performed functional analysis...... on them. This revealed genes of possible prophage or mobile genetic element origin, along with genes involved in restriction-modification and transcription regulators, though the majority were genes of unknown function. This study is a step in the direction of understanding the intricate host...
James W. Hardin; Henrik Schmeidiche; Raymond J. Carroll
2003-01-01
This paper discusses and illustrates the method of regression calibration. This is a straightforward technique for fitting models with additive measurement error. We present this discussion in terms of generalized linear models (GLMs) following the notation defined in Hardin and Carroll (2003). Discussion will include specified measurement error, measurement error estimated by replicate error-prone proxies, and measurement error estimated by instrumental variables. The discussion focuses on s...
Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei
2014-10-01
Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of
Global regression model for moisture content determination using near-infrared spectroscopy.
Clavaud, Matthieu; Roggo, Yves; Dégardin, Klara; Sacré, Pierre-Yves; Hubert, Philippe; Ziemons, Eric
2017-10-01
Near-infrared (NIR) global quantitative models were evaluated for the moisture content (MC) determination of three different freeze-dried drug products. The quantitative models were based on 3822 spectra measured on two identical spectrometers to include variability. The MC, measured with the reference Karl Fischer (KF) method, were ranged from 0.05% to 4.96%. Linear and non-linear regression models using Partial Least Square (PLS), Decision Tree (DT), Bayesian Ridge Regression (Bayes-RR), K-Nearest Neighbors (KNN), and Support Vector Regression (SVR) algorithms were created and evaluated. Among them, the SVR model was retained for a global application. The Standard Error of Calibration (SEC) and the Standard Error of Prediction (SEP) were respectively 0.12% and 0.15%. This model was then evaluated in terms of total error and risk-based assessment, linearity, and accuracy. It was observed that MC can be fastly and simultaneously determined in freeze-dried pharmaceutical products thanks to a global NIR model created with different medicines. This innovative approach allows to speed up the validation time and the in-lab release analyses. Copyright © 2017 Elsevier B.V. All rights reserved.
Reduction of the number of parameters needed for a polynomial random regression test-day model
Pool, M.H.; Meuwissen, T.H.E.
2000-01-01
Legendre polynomials were used to describe the (co)variance matrix within a random regression test day model. The goodness of fit depended on the polynomial order of fit, i.e., number of parameters to be estimated per animal but is limited by computing capacity. Two aspects: incomplete lactation
FRICTION MODELING OF Al-Mg ALLOY SHEETS BASED ON MULTIPLE REGRESSION ANALYSIS AND NEURAL NETWORKS
Directory of Open Access Journals (Sweden)
Hirpa G. Lemu
2017-03-01
Full Text Available This article reports a proposed approach to a frictional resistance description in sheet metal forming processes that enables determination of the friction coefficient value under a wide range of friction conditions without performing time-consuming experiments. The motivation for this proposal is the fact that there exists a considerable amount of factors affect the friction coefficient value and as a result building analytical friction model for specified process conditions is practically impossible. In this proposed approach, a mathematical model of friction behaviour is created using multiple regression analysis and artificial neural networks. The regression analysis was performed using a subroutine in MATLAB programming code and STATISTICA Neural Networks was utilized to build an artificial neural networks model. The effect of different training strategies on the quality of neural networks was studied. As input variables for regression model and training of radial basis function networks, generalized regression neural networks and multilayer networks the results of strip drawing friction test were utilized. Four kinds of Al-Mg alloy sheets were used as a test material.
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Clinical trials: odds ratios and multiple regression models--why and how to assess them
Sobh, Mohamad; Cleophas, Ton J.; Hadj-Chaib, Amel; Zwinderman, Aeilko H.
2008-01-01
Odds ratios (ORs), unlike chi2 tests, provide direct insight into the strength of the relationship between treatment modalities and treatment effects. Multiple regression models can reduce the data spread due to certain patient characteristics and thus improve the precision of the treatment
On the Latent Regression Model of Item Response Theory. Research Report. ETS RR-07-12
Antal, Tamás
2007-01-01
Full account of the latent regression model for the National Assessment of Educational Progress is given. The treatment includes derivation of the EM algorithm, Newton-Raphson method, and the asymptotic standard errors. The paper also features the use of the adaptive Gauss-Hermite numerical integration method as a basic tool to evaluate…
Truong Ngoc Phuong, Phuong; Stein, A.
2017-01-01
Health data and environmental data are commonly collected at different levels of aggregation. A persistent challenge of using a spatial regression model to link these data is that their associations can vary as a function of aggregation. This results into ecological fallacy if association at one
Regression model for the study of sole and cumulative effect of ...
African Journals Online (AJOL)
The effect of variability in temperature, solar radiation and photothermal quotient were studied under varying planting windows in three wheat genotypes to cope environmental vulnerability. Regression models are regarded as valuable tools for the evaluation of temperature, solar radiation and photothermal quotient effects ...
Assessing the performance of variational methods for mixed logistic regression models
Rijmen, F.P.J.; Vomlel, J.
2008-01-01
We present a variational estimation method for the mixed logistic regression model. The method is based on a lower bound approximation of the logistic function [Jaakkola, J.S. and Jordan, M.I., 2000, Bayesian parameter estimation via variational methods. Statistics Computing, 10, 25-37.]. Based on
Susan L. King
2003-01-01
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Cason, Gerald J.; Cason, Carolyn L.
A more familiar and efficient method for estimating the parameters of Cason and Cason's model was examined. Using a two-step analysis based on linear regression, rather than the direct search interative procedure, gave about equally good results while providing a 33 to 1 computer processing time advantage, across 14 cohorts of junior medical…
Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett
2009-01-01
Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
L.F. Hoogerheide (Lennart); F.R. Kleibergen (Frank); H.K. van Dijk (Herman)
2006-01-01
textabstractWe propose a natural conjugate prior for the instrumental variables regression model. The prior is a natural conjugate one since the marginal prior and posterior of the structural parameter have the same functional expressions which directly reveal the update from prior to posterior. The
Li, Spencer D.
2011-01-01
Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
Kleijnen, J.P.C.
2006-01-01
Classic linear regression models and their concomitant statistical designs assume a univariate response and white noise.By definition, white noise is normally, independently, and identically distributed with zero mean.This survey tries to answer the following questions: (i) How realistic are these
DEFF Research Database (Denmark)
Petersen, Jørgen Holm
2016-01-01
This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied...
Climate Impacts on Chinese Corn Yields: A Fractional Polynomial Regression Model
Kooten, van G.C.; Sun, Baojing
2012-01-01
In this study, we examine the effect of climate on corn yields in northern China using data from ten districts in Inner Mongolia and two in Shaanxi province. A regression model with a flexible functional form is specified, with explanatory variables that include seasonal growing degree days,
Muttoo, Sheena; Ramsay, Lisa; Brunekreef, Bert; Beelen, Rob; Meliefste, Kees; Naidoo, Rajen N
2018-01-01
The South Durban (SD) area of Durban, South Africa, has a history of air pollution issues due to the juxtaposition of low-income communities with industrial areas. This study used measurements of oxides of nitrogen (NOx) to develop a land use regression (LUR) model to explain the spatial variation
The efficiency of OLS estimator in the linear-regression model with ...
African Journals Online (AJOL)
Bounds for the efficiency of ordinary least squares estimator relative to generalized least squares estimator in the linear regression model with first-order spatial error process are given. SINET: Ethiopian Journal of Science Vol. 24, No. 1 (June 2001), pp. 17-33. Key words/phrases: Efficiency, generalized least squares, ...
Stone, Wesley W.; Gilliom, Robert J.
2012-01-01
Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, are improved for application to the United States (U.S.) Corn Belt region by developing region-specific models that include watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. The WARP-CB models accounted for 53 to 62% of the variability in the various concentration statistics among the model-development sites. Model predictions were within a factor of 5 of the observed concentration statistic for over 90% of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. Although atrazine-use intensity is the most important explanatory variable in the National WARP models, it is not a significant variable in the WARP-CB models. The WARP-CB models provide improved predictions for Corn Belt streams draining watersheds with atrazine-use intensities of 17 kg/km2 of watershed area or greater.
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling
Directory of Open Access Journals (Sweden)
Eric R. Edelman
2017-06-01
Full Text Available For efficient utilization of operating rooms (ORs, accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT. We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT. TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.
Edelman, Eric R; van Kuijk, Sander M J; Hamaekers, Ankie E W; de Korte, Marcel J M; van Merode, Godefridus G; Buhre, Wolfgang F F A
2017-01-01
For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.
INVESTIGATION OF E-MAIL TRAFFIC BY USING ZERO-INFLATED REGRESSION MODELS
Directory of Open Access Journals (Sweden)
Yılmaz KAYA
2012-06-01
Full Text Available Based on count data obtained with a value of zero may be greater than anticipated. These types of data sets should be used to analyze by regression methods taking into account zero values. Zero- Inflated Poisson (ZIP, Zero-Inflated negative binomial (ZINB, Poisson Hurdle (PH, negative binomial Hurdle (NBH are more common approaches in modeling more zero value possessing dependent variables than expected. In the present study, the e-mail traffic of Yüzüncü Yıl University in 2009 spring semester was investigated. ZIP and ZINB, PH and NBH regression methods were applied on the data set because more zeros counting (78.9% were found in data set than expected. ZINB and NBH regression considered zero dispersion and overdispersion were found to be more accurate results due to overdispersion and zero dispersion in sending e-mail. ZINB is determined to be best model accordingto Vuong statistics and information criteria.
Directory of Open Access Journals (Sweden)
Wun Wong
2003-01-01
Full Text Available The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression and machine learning (i.e., neural network technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models.
Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors
Directory of Open Access Journals (Sweden)
Xibin Zhang
2016-04-01
Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.
Regression based modeling of vegetation and climate variables for the Amazon rainforests
Kodali, A.; Khandelwal, A.; Ganguly, S.; Bongard, J.; Das, K.
2015-12-01
Both short-term (weather) and long-term (climate) variations in the atmosphere directly impact various ecosystems on earth. Forest ecosystems, especially tropical forests, are crucial as they are the largest reserves of terrestrial carbon sink. For example, the Amazon forests are a critical component of global carbon cycle storing about 100 billion tons of carbon in its woody biomass. There is a growing concern that these forests could succumb to precipitation reduction in a progressively warming climate, leading to release of significant amount of carbon in the atmosphere. Therefore, there is a need to accurately quantify the dependence of vegetation growth on different climate variables and obtain better estimates of drought-induced changes to atmospheric CO2. The availability of globally consistent climate and earth observation datasets have allowed global scale monitoring of various climate and vegetation variables such as precipitation, radiation, surface greenness, etc. Using these diverse datasets, we aim to quantify the magnitude and extent of ecosystem exposure, sensitivity and resilience to droughts in forests. The Amazon rainforests have undergone severe droughts twice in last decade (2005 and 2010), which makes them an ideal candidate for the regional scale analysis. Current studies on vegetation and climate relationships have mostly explored linear dependence due to computational and domain knowledge constraints. We explore a modeling technique called symbolic regression based on evolutionary computation that allows discovery of the dependency structure without any prior assumptions. In symbolic regression the population of possible solutions is defined via trees structures. Each tree represents a mathematical expression that includes pre-defined functions (mathematical operators) and terminal sets (independent variables from data). Selection of these sets is critical to computational efficiency and model accuracy. In this work we investigate
Bertrand, Julie; Balding, David J
2013-03-01
Studies on the influence of single nucleotide polymorphisms (SNPs) on drug pharmacokinetics (PK) have usually been limited to the analysis of observed drug concentration or area under the concentration versus time curve. Nonlinear mixed effects models enable analysis of the entire curve, even for sparse data, but until recently, there has been no systematic method to examine the effects of multiple SNPs on the model parameters. The aim of this study was to assess different penalized regression methods for including SNPs in PK analyses. A total of 200 data sets were simulated under both the null and an alternative hypothesis. In each data set for each of the 300 participants, a PK profile at six sampling times was simulated and 1227 genotypes were generated through haplotypes. After modelling the PK profiles using an expectation maximization algorithm, genetic association with individual parameters was investigated using the following approaches: (i) a classical stepwise approach, (ii) ridge regression modified to include a test, (iii) Lasso and (iv) a generalization of Lasso, the HyperLasso. Penalized regression approaches are often much faster than the stepwise approach. There are significantly fewer true positives for ridge regression than for the stepwise procedure and HyperLasso. The higher number of true positives in the stepwise procedure was accompanied by a higher count of false positives (not significant). We find that all approaches except ridge regression show similar power, but penalized regression can be much less computationally demanding. We conclude that penalized regression should be preferred over stepwise procedures for PK analyses with a large panel of genetic covariates.
Zihajehzadeh, Shaghayegh; Park, Edward J
2016-08-01
This study provides a concurrent comparison of regression model-based walking speed estimation accuracy using lower body mounted inertial sensors. The comparison is based on different sets of variables, features, mounting locations and regression methods. An experimental evaluation was performed on 15 healthy subjects during free walking trials. Our results show better accuracy of Gaussian process regression compared to least square regression using Lasso. Among the variables, external acceleration tends to provide improved accuracy. By using both time-domain and frequency-domain features, waist and ankle-mounted sensors result in similar accuracies: 4.5% for the waist and 4.9% for the ankle. When using only frequency-domain features, estimation accuracy based on a waist-mounted sensor suffers more compared to the one from ankle.
Bias and uncertainty in regression-calibrated models of groundwater flow in heterogeneous media
Cooley, R.L.; Christensen, S.
2006-01-01
Groundwater models need to account for detailed but generally unknown spatial variability (heterogeneity) of the hydrogeologic model inputs. To address this problem we replace the large, m-dimensional stochastic vector ?? that reflects both small and large scales of heterogeneity in the inputs by a lumped or smoothed m-dimensional approximation ????*, where ?? is an interpolation matrix and ??* is a stochastic vector of parameters. Vector ??* has small enough dimension to allow its estimation with the available data. The consequence of the replacement is that model function f(????*) written in terms of the approximate inputs is in error with respect to the same model function written in terms of ??, ??,f(??), which is assumed to be nearly exact. The difference f(??) - f(????*), termed model error, is spatially correlated, generates prediction biases, and causes standard confidence and prediction intervals to be too small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate ??* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear regression methods are extended to analyze the revised method. The analysis develops analytical expressions for bias terms reflecting the interaction of model nonlinearity and model error, for correction factors needed to adjust the sizes of confidence and prediction intervals for this interaction, and for correction factors needed to adjust the sizes of confidence and prediction intervals for possible use of a diagonal weight matrix in place of the correct one. If terms expressing the degree of intrinsic nonlinearity for f(??) and f(????*) are small, then most of the biases are small and the correction factors are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is
Directory of Open Access Journals (Sweden)
Yoojeong Seo
2018-01-01
Full Text Available The issue of detecting objects bottoming on the sea floor is significant in various fields including civilian and military areas. The objective of this study is to investigate the logistic regression model to discriminate the target from the clutter and to verify the possibility of applying the model trained by the simulated data generated by the mathematical model to the real experimental data because it is not easy to obtain sufficient data in the underwater field. In the first stage of this study, when the clutter signal energy is so strong that the detection of a target is difficult, the logistic regression model is employed to distinguish the strong clutter signal and the target signal. Previous studies have found that if the clutter energy is larger, false detection occurs even for the various existing detection schemes. For this reason, the discrete Fourier transform (DFT magnitude spectrum of acoustic signals received by active sonar is applied to train the model to distinguish whether the received signal contains a target signal or not. The goodness of fit of the model is verified in terms of receiver operation characteristic (ROC, area under ROC curve (AUC, and classification table. The detection performance of the proposed model is evaluated in terms of detection rate according to target to clutter ratio (TCR. Furthermore, the real experimental data are employed to test the proposed approach. When using the experimental data to test the model, the logistic regression model is trained by the simulated data that are generated based on the mathematical model for the backscattering of the cylindrical object. The mathematical model is developed according to the size of the cylinder used in the experiment. Since the information on the experimental environment including the sound speed, the sediment type and such is not available, once simulated data are generated under various conditions, valid simulated data are selected using 70% of the
A Linear Regression Model for Global Solar Radiation on Horizontal Surfaces at Warri, Nigeria
Directory of Open Access Journals (Sweden)
Michael S. Okundamiya
2013-10-01
Full Text Available The growing anxiety on the negative effects of fossil fuels on the environment and the global emission reduction targets call for a more extensive use of renewable energy alternatives. Efficient solar energy utilization is an essential solution to the high atmospheric pollution caused by fossil fuel combustion. Global solar radiation (GSR data, which are useful for the design and evaluation of solar energy conversion system, are not measured at the forty-five meteorological stations in Nigeria. The dearth of the measured solar radiation data calls for accurate estimation. This study proposed a temperature-based linear regression, for predicting the monthly average daily GSR on horizontal surfaces, at Warri (latitude 5.020N and longitude 7.880E an oil city located in the south-south geopolitical zone, in Nigeria. The proposed model is analyzed based on five statistical indicators (coefficient of correlation, coefficient of determination, mean bias error, root mean square error, and t-statistic, and compared with the existing sunshine-based model for the same study. The results indicate that the proposed temperature-based linear regression model could replace the existing sunshine-based model for generating global solar radiation data. Keywords: air temperature; empirical model; global solar radiation; regression analysis; renewable energy; Warri
Directory of Open Access Journals (Sweden)
Yoonsu Shin
2016-01-01
Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.
A simulation study on Bayesian Ridge regression models for several collinearity levels
Efendi, Achmad; Effrihan
2017-12-01
When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.
Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks
Gray-Davies, Tristan; Holmes, Chris C.; Caron, François
2018-01-01
We present a novel Bayesian nonparametric regression model for covariates X and continuous response variable Y ∈ ℝ. The model is parametrized in terms of marginal distributions for Y and X and a regression function which tunes the stochastic ordering of the conditional distributions F (y|x). By adopting an approximate composite likelihood approach, we show that the resulting posterior inference can be decoupled for the separate components of the model. This procedure can scale to very large datasets and allows for the use of standard, existing, software from Bayesian nonparametric density estimation and Plackett-Luce ranking estimation to be applied. As an illustration, we show an application of our approach to a US Census dataset, with over 1,300,000 data points and more than 100 covariates. PMID:29623150
Random regression models for daily feed intake in Danish Duroc pigs
DEFF Research Database (Denmark)
Strathe, Anders Bjerring; Mark, Thomas; Jensen, Just
The objective of this study was to develop random regression models and estimate covariance functions for daily feed intake (DFI) in Danish Duroc pigs. A total of 476201 DFI records were available on 6542 Duroc boars between 70 to 160 days of age. The data originated from the National test station......-year-season, permanent, and animal genetic effects. The functional form was based on Legendre polynomials. A total of 64 models for random regressions were initially ranked by BIC to identify the approximate order for the Legendre polynomials using AI-REML. The parsimonious model included Legendre polynomials of 2nd....... Eigenvalues of the genetic covariance function showed that 33% of genetic variability was explained by the individual genetic curve of the pigs. This proportion was covered by linear (27%) and quadratic (6%) coefficients. Genetic eigenfunctions revealed that altering the shape of the feed intake curve...
Zheng, Shimin; Rao, Uma; Bartolucci, Alfred A.; Singh, Karan P.
2011-01-01
Bartolucci et al.(2003) extended the distribution assumption from the normal (Lyles et al., 2000) to the elliptical contoured distribution (ECD) for random regression models used in analysis of longitudinal data accounting for both undetectable values and informative drop-outs. In this paper, the random regression models are constructed on the multivariate skew ECD. A real data set is used to illustrate that the skew ECDs can fit some unimodal continuous data better than the Gaussian distributions or more general continuous symmetric distributions when the symmetric distribution assumption is violated. Also, a simulation study is done for illustrating the model fitness from a variety of skew ECDs. The software we used is SAS/STAT, V. 9.13. PMID:21637734
International Nuclear Information System (INIS)
Shin, Ho Cheol; Park, Moon Ghu; You, Skin
2006-01-01
Recently, many on-line approaches to instrument channel surveillance (drift monitoring and fault detection) have been reported worldwide. On-line monitoring (OLM) method evaluates instrument channel performance by assessing its consistency with other plant indications through parametric or non-parametric models. The heart of an OLM system is the model giving an estimate of the true process parameter value against individual measurements. This model gives process parameter estimate calculated as a function of other plant measurements which can be used to identify small sensor drifts that would require the sensor to be manually calibrated or replaced. This paper describes an improvement of auto associative kernel regression (AAKR) by introducing a correlation coefficient weighting on kernel distances. The prediction performance of the developed method is compared with conventional auto-associative kernel regression
Obtaining adjusted prevalence ratios from logistic regression models in cross-sectional studies.
Bastos, Leonardo Soares; Oliveira, Raquel de Vasconcellos Carvalhaes de; Velasque, Luciane de Souza
2015-03-01
In the last decades, the use of the epidemiological prevalence ratio (PR) instead of the odds ratio has been debated as a measure of association in cross-sectional studies. This article addresses the main difficulties in the use of statistical models for the calculation of PR: convergence problems, availability of tools and inappropriate assumptions. We implement the direct approach to estimate the PR from binary regression models based on two methods proposed by Wilcosky & Chambless and compare with different methods. We used three examples and compared the crude and adjusted estimate of PR, with the estimates obtained by use of log-binomial, Poisson regression and the prevalence odds ratio (POR). PRs obtained from the direct approach resulted in values close enough to those obtained by log-binomial and Poisson, while the POR overestimated the PR. The model implemented here showed the following advantages: no numerical instability; assumes adequate probability distribution and, is available through the R statistical package.
Ordinal regression models to describe tourist satisfaction with Sintra's world heritage
Mouriño, Helena
2013-10-01
In Tourism Research, ordinal regression models are becoming a very powerful tool in modelling the relationship between an ordinal response variable and a set of explanatory variables. In August and September 2010, we conducted a pioneering Tourist Survey in Sintra, Portugal. The data were obtained by face-to-face interviews at the entrances of the Palaces and Parks of Sintra. The work developed in this paper focus on two main points: tourists' perception of the entrance fees; overall level of satisfaction with this heritage site. For attaining these goals, ordinal regression models were developed. We concluded that tourist's nationality was the only significant variable to describe the perception of the admission fees. Also, Sintra's image among tourists depends not only on their nationality, but also on previous knowledge about Sintra's World Heritage status.
International Nuclear Information System (INIS)
Urrutia, J D; Bautista, L A; Baccay, E B
2014-01-01
The aim of this study was to develop mathematical models for estimating earthquake casualties such as death, number of injured persons, affected families and total cost of damage. To quantify the direct damages from earthquakes to human beings and properties given the magnitude, intensity, depth of focus, location of epicentre and time duration, the regression models were made. The researchers formulated models through regression analysis using matrices and used α = 0.01. The study considered thirty destructive earthquakes that hit the Philippines from the inclusive years 1968 to 2012. Relevant data about these said earthquakes were obtained from Philippine Institute of Volcanology and Seismology. Data on damages and casualties were gathered from the records of National Disaster Risk Reduction and Management Council. This study will be of great value in emergency planning, initiating and updating programs for earthquake hazard reduction in the Philippines, which is an earthquake-prone country.
International Nuclear Information System (INIS)
Mavromatidis, Lazaros Elias; Bykalyuk, Anna; Lequay, Hervé
2013-01-01
Highlights: ► Original software for composite dynamic envelope’s thermal performance forecasting. ► Construction of two hypothetical composite dynamic wall’s prototypes. ► Different simulation scenarios based on fractional factorial simulation design. ► Development of polynomial regression models. ► Validation and evaluation of polynomial regression models. - Abstract: The building envelope’s insulating efficiency is always a key element regarding the energy consumption control of the whole building. This article aims to propose a simple method based on classic and fractional factorial simulation plans to obtain regression models in the form of polynomial functions that link the angle, the thermal conductivity and the thickness of each envelope’s component to the overall wall’s thermal resistance. Original software that combines classic and novel modeling techniques has been used in order to have a precise and validated numerical investigation that focuses in a variety of possible composite dynamic wall’s configurations. For the purposes of this study, the combined radiation/conduction heat transfer finite volume numerical model was updated complex enough to predict the temperature distribution and heat transfer in composite envelopes for a variety of inclination angles. The model takes into account the coupling between the solid conduction of both solid and fibrous systems and the gaseous conduction and radiation. The radiation heat transfer through each insulating layer has been modeled via the two flux approximation in order to take into account both optically thick and optically thin materials, as well as potential reflective surfaces currently used on composite wall’s applications. Different simulation scenarios have been conceived according to basic fractional factorial simulation plans in order to obtain valid empirical polynomial functions. To validate this statistical forecast system, many simulation scenarios were carried out and
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
Mohebbi, Mohammadreza; Wolfe, Rory; Jolley, Damien
2011-10-03
Analytic methods commonly used in epidemiology do not account for spatial correlation between observations. In regression analyses, omission of that autocorrelation can bias parameter estimates and yield incorrect standard error estimates. We used age standardised incidence ratios (SIRs) of esophageal cancer (EC) from the Babol cancer registry from 2001 to 2005, and extracted socioeconomic indices from the Statistical Centre of Iran. The following models for SIR were used: (1) Poisson regression with agglomeration-specific nonspatial random effects; (2) Poisson regression with agglomeration-specific spatial random effects. Distance-based and neighbourhood-based autocorrelation structures were used for defining the spatial random effects and a pseudolikelihood approach was applied to estimate model parameters. The Bayesian information criterion (BIC), Akaike's information criterion (AIC) and adjusted pseudo R2, were used for model comparison. A Gaussian semivariogram with an effective range of 225 km best fit spatial autocorrelation in agglomeration-level EC incidence. The Moran's I index was greater than its expected value indicating systematic geographical clustering of EC. The distance-based and neighbourhood-based Poisson regression estimates were generally similar. When residual spatial dependence was modelled, point and interval estimates of covariate effects were different to those obtained from the nonspatial Poisson model. The spatial pattern evident in the EC SIR and the observation that point estimates and standard errors differed depending on the modelling approach indicate the importance of accounting for residual spatial correlation in analyses of EC incidence in the Caspian region of Iran. Our results also illustrate that spatial smoothing must be applied with care.
Directory of Open Access Journals (Sweden)
Jolley Damien
2011-10-01
Full Text Available Abstract Background Analytic methods commonly used in epidemiology do not account for spatial correlation between observations. In regression analyses, omission of that autocorrelation can bias parameter estimates and yield incorrect standard error estimates. Methods We used age standardised incidence ratios (SIRs of esophageal cancer (EC from the Babol cancer registry from 2001 to 2005, and extracted socioeconomic indices from the Statistical Centre of Iran. The following models for SIR were used: (1 Poisson regression with agglomeration-specific nonspatial random effects; (2 Poisson regression with agglomeration-specific spatial random effects. Distance-based and neighbourhood-based autocorrelation structures were used for defining the spatial random effects and a pseudolikelihood approach was applied to estimate model parameters. The Bayesian information criterion (BIC, Akaike's information criterion (AIC and adjusted pseudo R2, were used for model comparison. Results A Gaussian semivariogram with an effective range of 225 km best fit spatial autocorrelation in agglomeration-level EC incidence. The Moran's I index was greater than its expected value indicating systematic geographical clustering of EC. The distance-based and neighbourhood-based Poisson regression estimates were generally similar. When residual spatial dependence was modelled, point and interval estimates of covariate effects were different to those obtained from the nonspatial Poisson model. Conclusions The spatial pattern evident in the EC SIR and the observation that point estimates and standard errors differed depending on the modelling approach indicate the importance of accounting for residual spatial correlation in analyses of EC incidence in the Caspian region of Iran. Our results also illustrate that spatial smoothing must be applied with care.
Directory of Open Access Journals (Sweden)
You Hailong
2014-01-01
Full Text Available Plasma etching process plays a critical role in semiconductor manufacturing. Because physical and chemical mechanisms involved in plasma etching are extremely complicated, models supporting process control are difficult to construct. This paper uses a 35-run D-optimal design to efficiently collect data under well planned conditions for important controllable variables such as power, pressure, electrode gap and gas flows of Cl2 and He and the response, etching rate, for building an empirical underlying model. Since the relationship between the control and response variables could be highly nonlinear, a generalized regression neural network is used to select important model variables and their combination effects and to fit the model. Compared with the response surface methodology, the proposed method has better prediction performance in training and testing samples. A success application of the model to control the plasma etching process demonstrates the effectiveness of the methods.
Significance tests to determine the direction of effects in linear regression models.
Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander
2015-02-01
Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice. © 2014 The British Psychological Society.
A review of a priori regression models for warfarin maintenance dose prediction.
Directory of Open Access Journals (Sweden)
Ben Francis
Full Text Available A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.
A review of a priori regression models for warfarin maintenance dose prediction.
Francis, Ben; Lane, Steven; Pirmohamed, Munir; Jorgensen, Andrea
2014-01-01
A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.
Directory of Open Access Journals (Sweden)
Changhao Fan
2017-01-01
Full Text Available In modeling, only information from the deviation between the output of the support vector regression (SVR model and the training sample is considered, whereas the other prior information of the training sample, such as probability distribution information, is ignored. Probabilistic distribution information describes the overall distribution of sample data in a training sample that contains different degrees of noise and potential outliers, as well as helping develop a high-accuracy model. To mine and use the probability distribution information of a training sample, a new support vector regression model that incorporates probability distribution information weight SVR (PDISVR is proposed. In the PDISVR model, the probability distribution of each sample is considered as the weight and is then introduced into the error coefficient and slack variables of SVR. Thus, the deviation and probability distribution information of the training sample are both used in the PDISVR model to eliminate the influence of noise and outliers in the training sample and to improve predictive performance. Furthermore, examples with different degrees of noise were employed to demonstrate the performance of PDISVR, which was then compared with those of three SVR-based methods. The results showed that PDISVR performs better than the three other methods.
Gomes, Marcos José Timbó Lima; Cunto, Flávio; da Silva, Alan Ricardo
2017-09-01
Generalized Linear Models (GLM) with negative binomial distribution for errors, have been widely used to estimate safety at the level of transportation planning. The limited ability of this technique to take spatial effects into account can be overcome through the use of local models from spatial regression techniques, such as Geographically Weighted Poisson Regression (GWPR). Although GWPR is a system that deals with spatial dependency and heterogeneity and has already been used in some road safety studies at the planning level, it fails to account for the possible overdispersion that can be found in the observations on road-traffic crashes. Two approaches were adopted for the Geographically Weighted Negative Binomial Regression (GWNBR) model to allow discrete data to be modeled in a non-stationary form and to take note of the overdispersion of the data: the first examines the constant overdispersion for all the traffic zones and the second includes the variable for each spatial unit. This research conducts a comparative analysis between non-spatial global crash prediction models and spatial local GWPR and GWNBR at the level of traffic zones in Fortaleza/Brazil. A geographic database of 126 traffic zones was compiled from the available data on exposure, network characteristics, socioeconomic factors and land use. The models were calibrated by using the frequency of injury crashes as a dependent variable and the results showed that GWPR and GWNBR achieved a better performance than GLM for the average residuals and likelihood as well as reducing the spatial autocorrelation of the residuals, and the GWNBR model was more able to capture the spatial heterogeneity of the crash frequency. Copyright © 2017 Elsevier Ltd. All rights reserved.
A brief introduction to regression designs and mixed-effects modelling by a recent convert
DEFF Research Database (Denmark)
Balling, Laura Winther
2008-01-01
This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic...... tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable selection are discussed. The advantages of these techniques are exemplified in an analysis of a word...
Knight, Rodney R.; Gain, W. Scott; Wolfe, William J.
2011-01-01
Predictive equations were developed using stepbackward regression for 19 ecologically relevant streamflow characteristics grouped in five major classes (magnitude, ratio, frequency, variability, and date) for use in the Tennessee and Cumberland River watersheds. Basin characteristics explain 50 percent or more of the variation for 10 of the 19 equations. Independent variables identified through stepbackward regression were statistically significant in 81 of 304 coefficients tested across 19 models (⬚ Ridge streams, similar hydrologic behavior for watersheds with widely varying degrees of forest cover, and distinct hydrologic profiles for streams in different geographic regions.
Yu, Yuanyuan; Li, Hongkai; Sun, Xiaoru; Su, Ping; Wang, Tingting; Liu, Yi; Yuan, Zhongshang; Liu, Yanxun; Xue, Fuzhong
2017-12-28
Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM) were compared. The "do-calculus" was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal strategy was to adjust for the parent nodes of outcome, which
Directory of Open Access Journals (Sweden)
Yuanyuan Yu
2017-12-01
Full Text Available Abstract Background Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Methods Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM were compared. The “do-calculus” was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Results Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal
Tan, Chao; Chen, Hui; Zhu, Wanping
2010-05-01
The study on the relationship between trace elements and diseases often need to build a classification/regression model. Furthermore, the accuracy of such a model is of particular importance and directly decides its applicability. The goal of this study is to explore the feasibility of applying boosting, i.e., a new strategy from machine learning, to model the relationship between trace elements and diseases. Two examples are employed to illustrate the technique in the applications of classification and regression, respectively. The first example involves the diagnosis of anorexia according to the concentrations of six elements (i.e. classification task). Decision stump and support vector machine are used as the weak/base algorithm and reference algorithm, respectively. The second example involves the prediction of breast cancer mortality based on the intake of trace elements (i.e. a regression task). In this regard, partial least squares is not only used as the weak/base algorithm, but also the reference algorithm. The results from both examples confirm the potential of boosting in modeling the relationship between trace elements and diseases.
Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.
Cluster regression model and level fluctuation features of Van Lake, Turkey
Directory of Open Access Journals (Sweden)
Z. Şen
Full Text Available Lake water levels change under the influences of natural and/or anthropogenic environmental conditions. Among these influences are the climate change, greenhouse effects and ozone layer depletions which are reflected in the hydrological cycle features over the lake drainage basins. Lake levels are among the most significant hydrological variables that are influenced by different atmospheric and environmental conditions. Consequently, lake level time series in many parts of the world include nonstationarity components such as shifts in the mean value, apparent or hidden periodicities. On the other hand, many lake level modeling techniques have a stationarity assumption. The main purpose of this work is to develop a cluster regression model for dealing with nonstationarity especially in the form of shifting means. The basis of this model is the combination of transition probability and classical regression technique. Both parts of the model are applied to monthly level fluctuations of Lake Van in eastern Turkey. It is observed that the cluster regression procedure does preserve the statistical properties and the transitional probabilities that are indistinguishable from the original data.
Key words. Hydrology (hydrologic budget; stochastic processes · Meteorology and atmospheric dynamics (ocean-atmosphere interactions
Directory of Open Access Journals (Sweden)
Carlos Fernandez-Lozano
2016-12-01
Full Text Available The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.
Heddam, Salim; Kisi, Ozgur
2018-04-01
In the present study, three types of artificial intelligence techniques, least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5T) are applied for modeling daily dissolved oxygen (DO) concentration using several water quality variables as inputs. The DO concentration and water quality variables data from three stations operated by the United States Geological Survey (USGS) were used for developing the three models. The water quality data selected consisted of daily measured of water temperature (TE, °C), pH (std. unit), specific conductance (SC, μS/cm) and discharge (DI cfs), are used as inputs to the LSSVM, MARS and M5T models. The three models were applied for each station separately and compared to each other. According to the results obtained, it was found that: (i) the DO concentration could be successfully estimated using the three models and (ii) the best model among all others differs from one station to another.
The use of regression for assessing a seasonal forecast model experiment
Benestad, Rasmus E.; Senan, Retish; Orsolini, Yvan
2016-11-01
We show how factorial regression can be used to analyse numerical model experiments, testing the effect of different model settings. We analysed results from a coupled atmosphere-ocean model to explore how the different choices in the experimental set-up influence the seasonal predictions. These choices included a representation of the sea ice and the height of top of the atmosphere, and the results suggested that the simulated monthly mean air temperatures poleward of the mid-latitudes were highly sensitivity to the specification of the top of the atmosphere, interpreted as the presence or absence of a stratosphere. The seasonal forecasts for the mid-latitudes to high latitudes were also sensitive to whether the model set-up included a dynamic or non-dynamic sea-ice representation, although this effect was somewhat less important than the role of the stratosphere. The air temperature in the tropics was insensitive to these choices.
Computational neural network regression model for Host based Intrusion Detection System
Directory of Open Access Journals (Sweden)
Sunil Kumar Gautam
2016-09-01
Full Text Available The current scenario of information gathering and storing in secure system is a challenging task due to increasing cyber-attacks. There exists computational neural network techniques designed for intrusion detection system, which provide security to single machine and entire network's machine. In this paper, we have used two types of computational neural network models, namely, Generalized Regression Neural Network (GRNN model and Multilayer Perceptron Neural Network (MPNN model for Host based Intrusion Detection System using log files that are generated by a single personal computer. The simulation results show correctly classified percentage of normal and abnormal (intrusion class using confusion matrix. On the basis of results and discussion, we found that the Host based Intrusion Systems Model (HISM significantly improved the detection accuracy while retaining minimum false alarm rate.
Multiple Linear Regression Model Based on Neural Network and Its Application in the MBR Simulation
Directory of Open Access Journals (Sweden)
Chunqing Li
2012-01-01
Full Text Available The computer simulation of the membrane bioreactor MBR has become the research focus of the MBR simulation. In order to compensate for the defects, for example, long test period, high cost, invisible equipment seal, and so forth, on the basis of conducting in-depth study of the mathematical model of the MBR, combining with neural network theory, this paper proposed a three-dimensional simulation system for MBR wastewater treatment, with fast speed, high efficiency, and good visualization. The system is researched and developed with the hybrid programming of VC++ programming language and OpenGL, with a multifactor linear regression model of affecting MBR membrane fluxes based on neural network, applying modeling method of integer instead of float and quad tree recursion. The experiments show that the three-dimensional simulation system, using the above models and methods, has the inspiration and reference for the future research and application of the MBR simulation technology.
Focused information criterion and model averaging based on weighted composite quantile regression
Xu, Ganggang
2013-08-13
We study the focused information criterion and frequentist model averaging and their application to post-model-selection inference for weighted composite quantile regression (WCQR) in the context of the additive partial linear models. With the non-parametric functions approximated by polynomial splines, we show that, under certain conditions, the asymptotic distribution of the frequentist model averaging WCQR-estimator of a focused parameter is a non-linear mixture of normal distributions. This asymptotic distribution is used to construct confidence intervals that achieve the nominal coverage probability. With properly chosen weights, the focused information criterion based WCQR estimators are not only robust to outliers and non-normal residuals but also can achieve efficiency close to the maximum likelihood estimator, without assuming the true error distribution. Simulation studies and a real data analysis are used to illustrate the effectiveness of the proposed procedure. © 2013 Board of the Foundation of the Scandinavian Journal of Statistics..
Joint Bayesian variable and graph selection for regression models with network-structured predictors
Peterson, C. B.; Stingo, F. C.; Vannucci, M.
2015-01-01
In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications since it allows the identification of pathways of functionally related genes or proteins which impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings, and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival. PMID:26514925
Estimating Compressive Strength of High Performance Concrete with Gaussian Process Regression Model
Directory of Open Access Journals (Sweden)
Nhat-Duc Hoang
2016-01-01
Full Text Available This research carries out a comparative study to investigate a machine learning solution that employs the Gaussian Process Regression (GPR for modeling compressive strength of high-performance concrete (HPC. This machine learning approach is utilized to establish the nonlinear functional mapping between the compressive strength and HPC ingredients. To train and verify the aforementioned prediction model, a data set containing 239 HPC experimental tests, recorded from an overpass construction project in Danang City (Vietnam, has been collected for this study. Based on experimental outcomes, prediction results of the GPR model are superior to those of the Least Squares Support Vector Machine and the Artificial Neural Network. Furthermore, GPR model is strongly recommended for estimating HPC strength because this method demonstrates good learning performance and can inherently express prediction outputs coupled with prediction intervals.
Heteroscedasticity as a Basis of Direction Dependence in Reversible Linear Regression Models.
Wiedermann, Wolfgang; Artner, Richard; von Eye, Alexander
2017-01-01
Heteroscedasticity is a well-known issue in linear regression modeling. When heteroscedasticity is observed, researchers are advised to remedy possible model misspecification of the explanatory part of the model (e.g., considering alternative functional forms and/or omitted variables). The present contribution discusses another source of heteroscedasticity in observational data: Directional model misspecifications in the case of nonnormal variables. Directional misspecification refers to situations where alternative models are equally likely to explain the data-generating process (e.g., x → y versus y → x). It is shown that the homoscedasticity assumption is likely to be violated in models that erroneously treat true nonnormal predictors as response variables. Recently, Direction Dependence Analysis (DDA) has been proposed as a framework to empirically evaluate the direction of effects in linear models. The present study links the phenomenon of heteroscedasticity with DDA and describes visual diagnostics and nine homoscedasticity tests that can be used to make decisions concerning the direction of effects in linear models. Results of a Monte Carlo simulation that demonstrate the adequacy of the approach are presented. An empirical example is provided, and applicability of the methodology in cases of violated assumptions is discussed.
Isingizwe Nturambirwe, J. Frédéric; Perold, Willem J.; Opara, Umezuruike L.
2016-02-01
Near infrared (NIR) spectroscopy has gained extensive use in quality evaluation. It is arguably one of the most advanced spectroscopic tools in non-destructive quality testing of food stuff, from measurement to data analysis and interpretation. NIR spectral data are interpreted through means often involving multivariate statistical analysis, sometimes associated with optimisation techniques for model improvement. The objective of this research was to explore the extent to which genetic algorithms (GA) can be used to enhance model development, for predicting fruit quality. Apple fruits were used, and NIR spectra in the range from 12000 to 4000 cm-1 were acquired on both bruised and healthy tissues, with different degrees of mechanical damage. GAs were used in combination with partial least squares regression methods to develop bruise severity prediction models, and compared to PLS models developed using the full NIR spectrum. A classification model was developed, which clearly separated bruised from unbruised apple tissue. GAs helped improve prediction models by over 10%, in comparison with full spectrum-based models, as evaluated in terms of error of prediction (Root Mean Square Error of Cross-validation). PLS models to predict internal quality, such as sugar content and acidity were developed and compared to the versions optimized by genetic algorithm. Overall, the results highlighted the potential use of GA method to improve speed and accuracy of fruit quality prediction.
[Application of Land-use Regression Models in Spatial-temporal Differentiation of Air Pollution].
Wu, Jian-sheng; Xie, Wu-dan; Li, Jia-cheng
2016-02-15
With the rapid development of urbanization, industrialization and motorization, air pollution has become one of the most serious environmental problems in our country, which has negative impacts on public health and ecological environment. LUR model is one of the common methods simulating spatial-temporal differentiation of air pollution at city scale. It has broad application in Europe and North America, but not really in China. Based on many studies at home and abroad, this study started with the main steps to develop LUR model, including obtaining the monitoring data, generating variables, developing models, model validation and regression mapping. Then a conclusion was drawn on the progress of LUR models in spatial-temporal differentiation of air pollution. Furthermore, the research focus and orientation in the future were prospected, including highlighting spatial-temporal differentiation, increasing classes of model variables and improving the methods of model development. This paper was aimed to popularize the application of LUR model in China, and provide a methodological basis for human exposure, epidemiologic study and health risk assessment.
Belfatto, Antonella; Riboldi, Marco; Ciardo, Delia; Cattani, Federica; Cecconi, Agnese; Lazzari, Roberta; Jereczek-Fossa, Barbara Alicja; Orecchia, Roberto; Baroni, Guido; Cerveri, Pietro
2016-03-01
This paper describes a patient-specific mathematical model to predict the evolution of uterine cervical tumors at a macroscopic scale, during fractionated external radiotherapy. The model provides estimates of tumor regrowth and dead-cell reabsorption, incorporating the interplay between tumor regression rate and radiosensitivity, as a function of the tumor oxygenation level. Model parameters were estimated by minimizing the difference between predicted and measured tumor volumes, these latter being obtained from a set of 154 serial cone-beam computed tomography scans acquired on 16 patients along the course of the therapy. The model stratified patients according to two different estimated dynamics of dead-cell removal and to the predicted initial value of the tumor oxygenation. The comparison with a simpler model demonstrated an improvement in fitting properties of this approach (fitting error average value <5%, p < 0.01), especially in case of tumor late responses, which can hardly be handled by models entailing a constant radiosensitivity, failing to model changes from initial severe hypoxia to aerobic conditions during the treatment course. The model predictive capabilities suggest the need of clustering patients accounting for cancer cell line, tumor staging, as well as microenvironment conditions (e.g., oxygenation level).
A Logistic Regression Based Auto Insurance Rate-Making Model Designed for the Insurance Rate Reform
Directory of Open Access Journals (Sweden)
Zhengmin Duan
2018-02-01
Full Text Available Using a generalized linear model to determine the claim frequency of auto insurance is a key ingredient in non-life insurance research. Among auto insurance rate-making models, there are very few considering auto types. Therefore, in this paper we are proposing a model that takes auto types into account by making an innovative use of the auto burden index. Based on this model and data from a Chinese insurance company, we built a clustering model that classifies auto insurance rates into three risk levels. The claim frequency and the claim costs are fitted to select a better loss distribution. Then the Logistic Regression model is employed to fit the claim frequency, with the auto burden index considered. Three key findings can be concluded from our study. First, more than 80% of the autos with an auto burden index of 20 or higher belong to the highest risk level. Secondly, the claim frequency is better fitted using the Poisson distribution, however the claim cost is better fitted using the Gamma distribution. Lastly, based on the AIC criterion, the claim frequency is more adequately represented by models that consider the auto burden index than those do not. It is believed that insurance policy recommendations that are based on Generalized linear models (GLM can benefit from our findings.
International Nuclear Information System (INIS)
Nickerson, David M.; Madsen, Brooks C.
2005-01-01
Continuous monitoring of precipitation in East Central Florida has occurred since 1978 at a sampling site located on the University of Central Florida (UCF) campus. Monthly volume-weighted average (VWA) concentration for several major analytes that are present in precipitation samples was calculated from samples collected daily. Monthly VWA concentration and wet deposition of H + , NH 4 + , Ca 2+ , Mg 2+ , NO 3 - , Cl - and SO 4 2- were evaluated by a nonlinear regression (NLR) model that considered 10-year data (from 1978 to 1987) and 20-year data (from 1978 to 1997). Little change in the NLR parameter estimates was indicated among the 10-year and 20-year evaluations except for general decreases in the predicted trends from the 10-year to the 20-year fits. Box-Jenkins autoregressive integrated moving average (ARIMA) models with linear trend were considered as an alternative to the NLR models for these data. The NLR and ARIMA model forecasts for 1998 were compared to the actual 1998 data. For monthly VWA concentration values, the two models gave similar results. For the wet deposition values, the ARIMA models performed considerably better. - Autoregressive integrated moving average models of precipitation data are an improvement over nonlinear models for the prediction of precipitation chemistry composition
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Describing Growth Pattern of Bali Cows Using Non-linear Regression Models
Directory of Open Access Journals (Sweden)
Mohd. Hafiz A.W
2016-12-01
Full Text Available The objective of this study was to evaluate the best fit non-linear regression model to describe the growth pattern of Bali cows. Estimates of asymptotic mature weight, rate of maturing and constant of integration were derived from Brody, von Bertalanffy, Gompertz and Logistic models which were fitted to cross-sectional data of body weight taken from 74 Bali cows raised in MARDI Research Station Muadzam Shah Pahang. Coefficient of determination (R2 and residual mean squares (MSE were used to determine the best fit model in describing the growth pattern of Bali cows. Von Bertalanffy model was the best model among the four growth functions evaluated to determine the mature weight of Bali cattle as shown by the highest R2 and lowest MSE values (0.973 and 601.9, respectively, followed by Gompertz (0.972 and 621.2, respectively, Logistic (0.971 and 648.4, respectively and Brody (0.932 and 660.5, respectively models. The correlation between rate of maturing and mature weight was found to be negative in the range of -0.170 to -0.929 for all models, indicating that animals of heavier mature weight had lower rate of maturing. The use of non-linear model could summarize the weight-age relationship into several biologically interpreted parameters compared to the entire lifespan weight-age data points that are difficult and time consuming to interpret.
Adamkiewicz, Gary; Hsu, Hsiao-Hsien; Vallarino, Jose; Melly, Steven J; Spengler, John D; Levy, Jonathan I
2010-11-17
There is growing concern in communities surrounding airports regarding the contribution of various emission sources (such as aircraft and ground support equipment) to nearby ambient concentrations. We used extensive monitoring of nitrogen dioxide (NO2) in neighborhoods surrounding T.F. Green Airport in Warwick, RI, and land-use regression (LUR) modeling techniques to determine the impact of proximity to the airport and local traffic on these concentrations. Palmes diffusion tube samplers were deployed along the airport's fence line and within surrounding neighborhoods for one to two weeks. In total, 644 measurements were collected over three sampling campaigns (October 2007, March 2008 and June 2008) and each sampling location was geocoded. GIS-based variables were created as proxies for local traffic and airport activity. A forward stepwise regression methodology was employed to create general linear models (GLMs) of NO2 variability near the airport. The effect of local meteorology on associations with GIS-based variables was also explored. Higher concentrations of NO2 were seen near the airport terminal, entrance roads to the terminal, and near major roads, with qualitatively consistent spatial patterns between seasons. In our final multivariate model (R2 = 0.32), the local influences of highways and arterial/collector roads were statistically significant, as were local traffic density and distance to the airport terminal (all p GIS variables, and the regression model structure was robust to various model-building approaches. Our study has shown that there are clear local variations in NO2 in the neighborhoods that surround an urban airport, which are spatially consistent across seasons. LUR modeling demonstrated a strong influence of local traffic, except the smallest roads that predominate in residential areas, as well as proximity to the airport terminal.
Regression and artificial neural network modeling for the prediction of gray leaf spot of maize.
Paul, P A; Munkvold, G P
2005-04-01
ABSTRACT Regression and artificial neural network (ANN) modeling approaches were combined to develop models to predict the severity of gray leaf spot of maize, caused by Cercospora zeae-maydis. In all, 329 cases consisting of environmental, cultural, and location-specific variables were collected for field plots in Iowa between 1998 and 2002. Disease severity on the ear leaf at the dough to dent plant growth stage was used as the response variable. Correlation and regression analyses were performed to select potentially useful predictor variables. Predictors from the best 9 of 80 regression models were used to develop ANN models. A random sample of 60% of the cases was used to train the networks, and 20% each for testing and validation. Model performance was evaluated based on coefficient of determination (R(2)) and mean square error (MSE) for the validation data set. The best models had R(2) ranging from 0.70 to 0.75 and MSE ranging from 174.7 to 202.8. The most useful predictor variables were hours of daily temperatures between 22 and 30 degrees C (85.50 to 230.50 h) and hours of nightly relative humidity >/=90% (122 to 330 h) for the period between growth stages V4 and V12, mean nightly temperature (65.26 to 76.56 degrees C) for the period between growth stages V12 and R2, longitude (90.08 to 95.14 degrees W), maize residue on the soil surface (0 to 100%), planting date (in day of the year; 112 to 182), and gray leaf spot resistance rating (2 to 7; based on a 1-to-9 scale, where 1 = most susceptible to 9 = most resistant).
International Nuclear Information System (INIS)
Wu, Jie; Wang, Jianzhou; Lu, Haiyan; Dong, Yao; Lu, Xiaoxiao
2013-01-01
Highlights: ► The seasonal and trend items of the data series are forecasted separately. ► Seasonal item in the data series is verified by the Kendall τ correlation testing. ► Different regression models are applied to the trend item forecasting. ► We examine the superiority of the combined models by the quartile value comparison. ► Paired-sample T test is utilized to confirm the superiority of the combined models. - Abstract: For an energy-limited economy system, it is crucial to forecast load demand accurately. This paper devotes to 1-week-ahead daily load forecasting approach in which load demand series are predicted by employing the information of days before being similar to that of the forecast day. As well as in many nonlinear systems, seasonal item and trend item are coexisting in load demand datasets. In this paper, the existing of the seasonal item in the load demand data series is firstly verified according to the Kendall τ correlation testing method. Then in the belief of the separate forecasting to the seasonal item and the trend item would improve the forecasting accuracy, hybrid models by combining seasonal exponential adjustment method (SEAM) with the regression methods are proposed in this paper, where SEAM and the regression models are employed to seasonal and trend items forecasting respectively. Comparisons of the quartile values as well as the mean absolute percentage error values demonstrate this forecasting technique can significantly improve the accuracy though models applied to the trend item forecasting are eleven different ones. This superior performance of this separate forecasting technique is further confirmed by the paired-sample T tests
Karami, K; Zerehdaran, S; Barzanooni, B; Lotfi, E
2017-12-01
1. The aim of the present study was to estimate genetic parameters for average egg weight (EW) and egg number (EN) at different ages in Japanese quail using multi-trait random regression (MTRR) models. 2. A total of 8534 records from 900 quail, hatched between 2014 and 2015, were used in the study. Average weekly egg weights and egg numbers were measured from second until sixth week of egg production. 3. Nine random regression models were compared to identify the best order of the Legendre polynomials (LP). The most optimal model was identified by the Bayesian Information Criterion. A model with second order of LP for fixed effects, second order of LP for additive genetic effects and third order of LP for permanent environmental effects (MTRR23) was found to be the best. 4. According to the MTRR23 model, direct heritability for EW increased from 0.26 in the second week to 0.53 in the sixth week of egg production, whereas the ratio of permanent environment to phenotypic variance decreased from 0.48 to 0.1. Direct heritability for EN was low, whereas the ratio of permanent environment to phenotypic variance decreased from 0.57 to 0.15 during the production period. 5. For each trait, estimated genetic correlations among weeks of egg production were high (from 0.85 to 0.98). Genetic correlations between EW and EN were low and negative for the first two weeks, but they were low and positive for the rest of the egg production period. 6. In conclusion, random regression models can be used effectively for analysing egg production traits in Japanese quail. Response to selection for increased egg weight would be higher at older ages because of its higher heritability and such a breeding program would have no negative genetic impact on egg production.
Exploratory regression analysis: a tool for selecting models and determining predictor importance.
Braun, Michael T; Oswald, Frederick L
2011-06-01
Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.
van Veen, S H C M; van Kleef, R C; van de Ven, W P M M; van Vliet, R C J A
2018-02-01
This study explores the predictive power of interaction terms between the risk adjusters in the Dutch risk equalization (RE) model of 2014. Due to the sophistication of this RE-model and the complexity of the associations in the dataset (N = ~16.7 million), there are theoretically more than a million interaction terms. We used regression tree modelling, which has been applied rarely within the field of RE, to identify interaction terms that statistically significantly explain variation in observed expenses that is not already explained by the risk adjusters in this RE-model. The interaction terms identified were used as additional risk adjusters in the RE-model. We found evidence that interaction terms can improve the prediction of expenses overall and for specific groups in the population. However, the prediction of expenses for some other selective groups may deteriorate. Thus, interactions can reduce financial incentives for risk selection for some groups but may increase them for others. Furthermore, because regression trees are not robust, additional criteria are needed to decide which interaction terms should be used in practice. These criteria could be the right incentive structure for risk selection and efficiency or the opinion of medical experts. Copyright © 2017 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
M. Saki
2013-03-01
Full Text Available The relationship between plant species and environmental factors has always been a central issue in plant ecology. With rising power of statistical techniques, geo-statistics and geographic information systems (GIS, the development of predictive habitat distribution models of organisms has rapidly increased in ecology. This study aimed to evaluate the ability of Logistic Regression Tree model to create potential habitat map of Astragalus verus. This species produces Tragacanth and has economic value. A stratified- random sampling was applied to 100 sites (50 presence- 50 absence of given species, and produced environmental and edaphic factors maps by using Kriging and Inverse Distance Weighting methods in the ArcGIS software for the whole study area. Relationships between species occurrence and environmental factors were determined by Logistic Regression Tree model and extended to the whole study area. The results indicated species occurrence has strong correlation with environmental factors such as mean daily temperature and clay, EC and organic carbon content of the soil. Species occurrence showed direct relationship with mean daily temperature and clay and organic carbon, and inverse relationship with EC. Model accuracy was evaluated both by Cohen’s kappa statistics (κ and by area under Receiver Operating Characteristics curve based on independent test data set. Their values (kappa=0.9, Auc of ROC=0.96 indicated the high power of LRT to create potential habitat map on local scales. This model, therefore, can be applied to recognize potential sites for rangeland reclamation projects.
Estimating the Impact of Urbanization on Air Quality in China Using Spatial Regression Models
Directory of Open Access Journals (Sweden)
Chuanglin Fang
2015-11-01
Full Text Available Urban air pollution is one of the most visible environmental problems to have accompanied China’s rapid urbanization. Based on emission inventory data from 2014, gathered from 289 cities, we used Global and Local Moran’s I to measure the spatial autorrelation of Air Quality Index (AQI values at the city level, and employed Ordinary Least Squares (OLS, Spatial Lag Model (SAR, and Geographically Weighted Regression (GWR to quantitatively estimate the comprehensive impact and spatial variations of China’s urbanization process on air quality. The results show that a significant spatial dependence and heterogeneity existed in AQI values. Regression models revealed urbanization has played an important negative role in determining air quality in Chinese cities. The population, urbanization rate, automobile density, and the proportion of secondary industry were all found to have had a significant influence over air quality. Per capita Gross Domestic Product (GDP and the scale of urban land use, however, failed the significance test at 10% level. The GWR model performed better than global models and the results of GWR modeling show that the relationship between urbanization and air quality was not constant in space. Further, the local parameter estimates suggest significant spatial variation in the impacts of various urbanization factors on air quality.
Use of Poisson spatiotemporal regression models for the Brazilian Amazon Forest: malaria count data.
Achcar, Jorge Alberto; Martinez, Edson Zangiacomi; Souza, Aparecida Doniseti Pires de; Tachibana, Vilma Mayumi; Flores, Edilson Ferreira
2011-01-01
Malaria is a serious problem in the Brazilian Amazon region, and the detection of possible risk factors could be of great interest for public health authorities. The objective of this article was to investigate the association between environmental variables and the yearly registers of malaria in the Amazon region using bayesian spatiotemporal methods. We used Poisson spatiotemporal regression models to analyze the Brazilian Amazon forest malaria count for the period from 1999 to 2008. In this study, we included some covariates that could be important in the yearly prediction of malaria, such as deforestation rate. We obtained the inferences using a bayesian approach and Markov Chain Monte Carlo (MCMC) methods to simulate samples for the joint posterior distribution of interest. The discrimination of different models was also discussed. The model proposed here suggests that deforestation rate, the number of inhabitants per km², and the human development index (HDI) are important in the prediction of malaria cases. It is possible to conclude that human development, population growth, deforestation, and their associated ecological alterations are conducive to increasing malaria risk. We conclude that the use of Poisson regression models that capture the spatial and temporal effects under the bayesian paradigm is a good strategy for modeling malaria counts.
Regression Modeling of EDM Process for AISI D2 Tool Steel with RSM
Directory of Open Access Journals (Sweden)
Shakir M. Mousa
2018-01-01
Full Text Available In this paper, Response Surface Method (RSM is utilized to carry out an investigation of the impact of input parameters: electrode type (E.T. [Gr, Cu and CuW], pulse duration of current (Ip, pulse duration on time (Ton, and pulse duration off time (Toff on the surface finish in EDM operation. To approximate and concentrate the suggested second- order regression model is generally accepted for Surface Roughness Ra, a Central Composite Design (CCD is utilized for evaluating the model constant coefficients of the input parameters on Surface Roughness (Ra. Examinations were performed on AISI D2 tool steel. The important coefficients are gotten by achieving successfully an Analysis of Variance (ANOVA at the 5 % confidence interval. The outcomes discover that Surface Roughness (Ra is much more impacted by E.T., Ton, Toff, Ip and little of their interactions action or influence. To predict the average Surface Roughness (Ra, a mathematical regression model was developed. Furthermore, for saving in time, the created model could be utilized for the choice of the high levels in the EDM procedure. The model adequacy was extremely agreeable as the constant Coefficient of Determination (R2 is observed to be 99.72% and adjusted R2-measurement (R2adj 99.60%.
Pseudo-partial likelihood estimators for the Cox regression model with missing covariates.
Luo, Xiaodong; Tsai, Wei Yann; Xu, Qiang
2009-09-01
By embedding the missing covariate data into a left-truncated and right-censored survival model, we propose a new class of weighted estimating functions for the Cox regression model with missing covariates. The resulting estimators, called the pseudo-partial likelihood estimators, are shown to be consistent and asymptotically normal. A simulation study demonstrates that, compared with the popular inverse-probability weighted estimators, the new estimators perform better when the observation probability is small and improve efficiency of estimating the missing covariate effects. Application to a practical example is reported.
On pseudo-values for regression analysis in competing risks models
DEFF Research Database (Denmark)
Graw, F; Gerds, Thomas Alexander; Schumacher, M
2009-01-01
For regression on state and transition probabilities in multi-state models Andersen et al. (Biometrika 90:15-27, 2003) propose a technique based on jackknife pseudo-values. In this article we analyze the pseudo-values suggested for competing risks models and prove some conjectures regarding...... their asymptotics (Klein and Andersen, Biometrics 61:223-229, 2005). The key is a second order von Mises expansion of the Aalen-Johansen estimator which yields an appropriate representation of the pseudo-values. The method is illustrated with data from a clinical study on total joint replacement. In the application...
Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza
2014-10-01
The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of
Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G
2015-12-01
The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer's disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM.
On pseudo-values for regression analysis in competing risks models
DEFF Research Database (Denmark)
Graw, F; Gerds, Thomas Alexander; Schumacher, M
2009-01-01
For regression on state and transition probabilities in multi-state models Andersen et al. (Biometrika 90:15-27, 2003) propose a technique based on jackknife pseudo-values. In this article we analyze the pseudo-values suggested for competing risks models and prove some conjectures regarding...... their asymptotics (Klein and Andersen, Biometrics 61:223-229, 2005). The key is a second order von Mises expansion of the Aalen-Johansen estimator which yields an appropriate representation of the pseudo-values. The method is illustrated with data from a clinical study on total joint replacement. In the application...... we consider for comparison the estimates obtained with the Fine and Gray approach (J Am Stat Assoc 94:496-509, 1999) and also time-dependent solutions of pseudo-value regression equations....
Estimating the Impact of Urbanization on Air Quality in China Using Spatial Regression Models
Fang, Chuanglin; Liu, Haimeng; Li, Guangdong; Sun, Dongqi; Miao, Zhuang
2015-01-01
Urban air pollution is one of the most visible environmental problems to have accompanied China’s rapid urbanization. Based on emission inventory data from 2014, gathered from 289 cities, we used Global and Local Moran’s I to measure the spatial autorrelation of Air Quality Index (AQI) values at the city level, and employed Ordinary Least Squares (OLS), Spatial Lag Model (SAR), and Geographically Weighted Regression (GWR) to quantitatively estimate the comprehensive impact and spatial variati...
Flexible regression models for ROC and risk analysis, with or without a gold standard
Branscum, AJ; Johnson, WO; Hanson, TE; Baron, AT
2015-01-01
A novel semiparametric regression model is developed for evaluating the covariate-specific accuracy of a continuous medical test or biomarker. Ideally, studies designed to estimate or compare medical test accuracy will use a separate, flawless gold-standard procedure to determine the true disease status of sampled individuals. We treat this as a special case of the more complicated and increasingly common scenario in which disease status is unknown because a gold-standard procedure does not e...
Forecasting Model for IPTV Service in Korea Using Bootstrap Ridge Regression Analysis
Lee, Byoung Chul; Kee, Seho; Kim, Jae Bum; Kim, Yun Bae
The telecom firms in Korea are taking new step to prepare for the next generation of convergence services, IPTV. In this paper we described our analysis on the effective method for demand forecasting about IPTV broadcasting. We have tried according to 3 types of scenarios based on some aspects of IPTV potential market and made a comparison among the results. The forecasting method used in this paper is the multi generation substitution model with bootstrap ridge regression analysis.
USE OF THE SIMPLE LINEAR REGRESSION MODEL IN MACRO-ECONOMICAL ANALYSES
Directory of Open Access Journals (Sweden)
Constantin ANGHELACHE
2011-10-01
Full Text Available The article presents the fundamental aspects of the linear regression, as a toolbox which can be used in macroeconomic analyses. The article describes the estimation of the parameters, the statistical tests used, the homoscesasticity and heteroskedasticity. The use of econometrics instrument in macroeconomics is an important factor that guarantees the quality of the models, analyses, results and possible interpretation that can be drawn at this level.
Assessing the performance of variational methods for mixed logistic regression models
Czech Academy of Sciences Publication Activity Database
Rijmen, F.; Vomlel, Jiří
2008-01-01
Roč. 78, č. 8 (2008), s. 765-779 ISSN 0094-9655 R&D Projects: GA MŠk 1M0572 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Mixed models * Logistic regression * Variational methods * Lower bound approximation Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.353, year: 2008
An application of nonparametric Cox regression model in reliability analysis: A case study
Czech Academy of Sciences Publication Activity Database
Volf, Petr
2004-01-01
Roč. 40, č. 5 (2004), s. 639-648 ISSN 0023-5954 R&D Projects: GA ČR GA201/02/0049; GA ČR GA402/01/0539 Institutional research plan: CEZ:AV0Z1075907 Keywords : hazard rate * nonparametric regression * Cox model Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.224, year: 2004
Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.
2013-01-01
Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.
Poisson regression approach for modeling fatal injury rates amongst Malaysian workers
International Nuclear Information System (INIS)
Kamarulzaman Ibrahim; Heng Khai Theng
2005-01-01
Many safety studies are based on the analysis carried out on injury surveillance data. The injury surveillance data gathered for the analysis include information on number of employees at risk of injury in each of several strata where the strata are defined in terms of a series of important predictor variables. Further insight into the relationship between fatal injury rates and predictor variables may be obtained by the poisson regression approach. Poisson regression is widely used in analyzing count data. In this study, poisson regression is used to model the relationship between fatal injury rates and predictor variables which are year (1995-2002), gender, recording system and industry type. Data for the analysis were obtained from PERKESO and Jabatan Perangkaan Malaysia. It is found that the assumption that the data follow poisson distribution has been violated. After correction for the problem of over dispersion, the predictor variables that are found to be significant in the model are gender, system of recording, industry type, two interaction effects (interaction between recording system and industry type and between year and industry type). Introduction Regression analysis is one of the most popular
Jackman, Patrick; Sun, Da-Wen; Elmasry, Gamal
2012-08-01
A new algorithm for the conversion of device dependent RGB colour data into device independent L*a*b* colour data without introducing noticeable error has been developed. By combining a linear colour space transform and advanced multiple regression methodologies it was possible to predict L*a*b* colour data with less than 2.2 colour units of error (CIE 1976). By transforming the red, green and blue colour components into new variables that better reflect the structure of the L*a*b* colour space, a low colour calibration error was immediately achieved (ΔE(CAL) = 14.1). Application of a range of regression models on the data further reduced the colour calibration error substantially (multilinear regression ΔE(CAL) = 5.4; response surface ΔE(CAL) = 2.9; PLSR ΔE(CAL) = 2.6; LASSO regression ΔE(CAL) = 2.1). Only the PLSR models deteriorated substantially under cross validation. The algorithm is adaptable and can be easily recalibrated to any working computer vision system. The algorithm was tested on a typical working laboratory computer vision system and delivered only a very marginal loss of colour information ΔE(CAL) = 2.35. Colour features derived on this system were able to safely discriminate between three classes of ham with 100% correct classification whereas colour features measured on a conventional colourimeter were not. Copyright © 2012 Elsevier Ltd. All rights reserved.
Asavaskulkiet, Krissada
2014-01-01
This paper proposes a novel face super-resolution reconstruction (hallucination) technique for YCbCr color space. The underlying idea is to learn with an error regression model and multi-linear principal component analysis (MPCA). From hallucination framework, many color face images are explained in YCbCr space. To reduce the time complexity of color face hallucination, we can be naturally described the color face imaged as tensors or multi-linear arrays. In addition, the error regression analysis is used to find the error estimation which can be obtained from the existing LR in tensor space. In learning process is from the mistakes in reconstruct face images of the training dataset by MPCA, then finding the relationship between input and error by regression analysis. In hallucinating process uses normal method by backprojection of MPCA, after that the result is corrected with the error estimation. In this contribution we show that our hallucination technique can be suitable for color face images both in RGB and YCbCr space. By using the MPCA subspace with error regression model, we can generate photorealistic color face images. Our approach is demonstrated by extensive experiments with high-quality hallucinated color faces. Comparison with existing algorithms shows the effectiveness of the proposed method.
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform.
Unconditional or Conditional Logistic Regression Model for Age-Matched Case-Control Data?
Kuo, Chia-Ling; Duan, Yinghui; Grady, James
2018-01-01
Matching on demographic variables is commonly used in case-control studies to adjust for confounding at the design stage. There is a presumption that matched data need to be analyzed by matched methods. Conditional logistic regression has become a standard for matched case-control data to tackle the sparse data problem. The sparse data problem, however, may not be a concern for loose-matching data when the matching between cases and controls is not unique, and one case can be matched to other controls without substantially changing the association. Data matched on a few demographic variables are clearly loose-matching data, and we hypothesize that unconditional logistic regression is a proper method to perform. To address the hypothesis, we compare unconditional and conditional logistic regression models by precision in estimates and hypothesis testing using simulated matched case-control data. Our results support our hypothesis; however, the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status. When the study design involves other complex features or the computational burden is high, matching in loose-matching data can be ignored for negligible loss in testing and estimation if the distributions of matching variables are not extremely different between cases and controls.
Nieto, P J García; Antón, J C Álvarez; Vilán, J A Vilán; García-Gonzalo, E
2015-05-01
The aim of this research work is to build a regression model of air quality by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (northern Spain) at a local scale. To accomplish the objective of this study, the experimental data set made up of nitrogen oxides (NO x ), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), and dust (PM10) was collected over 3 years (2006-2008). The US National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of these numerical calculations, using the MARS technique, conclusions of this research work are exposed.
Model-free prediction and regression a transformation-based approach to inference
Politis, Dimitris N
2015-01-01
The Model-Free Prediction Principle expounded upon in this monograph is based on the simple notion of transforming a complex dataset to one that is easier to work with, e.g., i.i.d. or Gaussian. As such, it restores the emphasis on observable quantities, i.e., current and future data, as opposed to unobservable model parameters and estimates thereof, and yields optimal predictors in diverse settings such as regression and time series. Furthermore, the Model-Free Bootstrap takes us beyond point prediction in order to construct frequentist prediction intervals without resort to unrealistic assumptions such as normality. Prediction has been traditionally approached via a model-based paradigm, i.e., (a) fit a model to the data at hand, and (b) use the fitted model to extrapolate/predict future data. Due to both mathematical and computational constraints, 20th century statistical practice focused mostly on parametric models. Fortunately, with the advent of widely accessible powerful computing in the late 1970s, co...
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML).
Park, J; Lechevalier, D; Ak, R; Ferguson, M; Law, K H; Lee, Y-T T; Rachuri, S
2017-01-01
This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain.
A joint logistic regression and covariate-adjusted continuous-time Markov chain model.
Rubin, Maria Laura; Chan, Wenyaw; Yamal, Jose-Miguel; Robertson, Claudia Sue
2017-12-10
The use of longitudinal measurements to predict a categorical outcome is an increasingly common goal in research studies. Joint models are commonly used to describe two or more models simultaneously by considering the correlated nature of their outcomes and the random error present in the longitudinal measurements. However, there is limited research on joint models with longitudinal predictors and categorical cross-sectional outcomes. Perhaps the most challenging task is how to model the longitudinal predictor process such that it represents the true biological mechanism that dictates the association with the categorical response. We propose a joint logistic regression and Markov chain model to describe a binary cross-sectional response, where the unobserved transition rates of a two-state continuous-time Markov chain are included as covariates. We use the method of maximum likelihood to estimate the parameters of our model. In a simulation study, coverage probabilities of about 95%, standard deviations close to standard errors, and low biases for the parameter values show that our estimation method is adequate. We apply the proposed joint model to a dataset of patients with traumatic brain injury to describe and predict a 6-month outcome based on physiological data collected post-injury and admission characteristics. Our analysis indicates that the information provided by physiological changes over time may help improve prediction of long-term functional status of these severely ill subjects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Zhang, Y J; Xue, F X; Bai, Z P
2017-03-06
The impact of maternal air pollution exposure on offspring health has received much attention. Precise and feasible exposure estimation is particularly important for clarifying exposure-response relationships and reducing heterogeneity among studies. Temporally-adjusted land use regression (LUR) models are exposure assessment methods developed in recent years that have the advantage of having high spatial-temporal resolution. Studies on the health effects of outdoor air pollution exposure during pregnancy have been increasingly carried out using this model. In China, research applying LUR models was done mostly at the model construction stage, and findings from related epidemiological studies were rarely reported. In this paper, the sources of heterogeneity and research progress of meta-analysis research on the associations between air pollution and adverse pregnancy outcomes were analyzed. The methods of the characteristics of temporally-adjusted LUR models were introduced. The current epidemiological studies on adverse pregnancy outcomes that applied this model were systematically summarized. Recommendations for the development and application of LUR models in China are presented. This will encourage the implementation of more valid exposure predictions during pregnancy in large-scale epidemiological studies on the health effects of air pollution in China.
Directory of Open Access Journals (Sweden)
Svetlana O. Musienko
2017-03-01
Full Text Available Objective to develop the economicmathematical model of the dependence of revenue on other balance sheet items taking into account the sectoral affiliation of the companies. Methods using comparative analysis the article studies the existing approaches to the construction of the company management models. Applying the regression analysis and the least squares method which is widely used for financial management of enterprises in Russia and abroad the author builds a model of the dependence of revenue on other balance sheet items taking into account the sectoral affiliation of the companies which can be used in the financial analysis and prediction of small enterprisesrsquo performance. Results the article states the need to identify factors affecting the financial management efficiency. The author analyzed scientific research and revealed the lack of comprehensive studies on the methodology for assessing the small enterprisesrsquo management while the methods used for large companies are not always suitable for the task. The systematized approaches of various authors to the formation of regression models describe the influence of certain factors on the company activity. It is revealed that the resulting indicators in the studies were revenue profit or the company relative profitability. The main drawback of most models is the mathematical not economic approach to the definition of the dependent and independent variables. Basing on the analysis it was determined that the most correct is the model of dependence between revenues and total assets of the company using the decimal logarithm. The model was built using data on the activities of the 507 small businesses operating in three spheres of economic activity. Using the presented model it was proved that there is direct dependence between the sales proceeds and the main items of the asset balance as well as differences in the degree of this effect depending on the economic activity of small
Logistic regression model for diagnosis of transition zone prostate cancer on multi-parametric MRI
Energy Technology Data Exchange (ETDEWEB)
Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh; Fujiwara, Taiki [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim; Emberton, Mark [University College London, Research Department of Urology, London (United Kingdom); Kirkham, Alex; Allen, Clare [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)
2014-09-17
We aimed to develop logistic regression (LR) models for classifying prostate cancer within the transition zone on multi-parametric magnetic resonance imaging (mp-MRI). One hundred and fifty-five patients (training cohort, 70 patients; temporal validation cohort, 85 patients) underwent mp-MRI and transperineal-template-prostate-mapping (TPM) biopsy. Positive cores were classified by cancer definitions: (1) any-cancer; (2) definition-1 [≥Gleason 4 + 3 or ≥ 6 mm cancer core length (CCL)] [high risk significant]; and (3) definition-2 (≥Gleason 3 + 4 or ≥ 4 mm CCL) cancer [intermediate-high risk significant]. For each, logistic-regression mp-MRI models were derived from the training cohort and validated internally and with the temporal cohort. Sensitivity/specificity and the area under the receiver operating characteristic (ROC-AUC) curve were calculated. LR model performance was compared to radiologists' performance. Twenty-eight of 70 patients from the training cohort, and 25/85 patients from the temporal validation cohort had significant cancer on TPM. The ROC-AUC of the LR model for classification of cancer was 0.73/0.67 at internal/temporal validation. The radiologist A/B ROC-AUC was 0.65/0.74 (temporal cohort). For patients scored by radiologists as Prostate Imaging Reporting and Data System (Pi-RADS) score 3, sensitivity/specificity of radiologist A 'best guess' and LR model was 0.14/0.54 and 0.71/0.61, respectively; and radiologist B 'best guess' and LR model was 0.40/0.34 and 0.50/0.76, respectively. LR models can improve classification of Pi-RADS score 3 lesions similar to experienced radiologists. (orig.)
Evaluation of Regression and Neuro_Fuzzy Models in Estimating Saturated Hydraulic Conductivity
Directory of Open Access Journals (Sweden)
J. Behmanesh
2015-06-01
Full Text Available Study of soil hydraulic properties such as saturated and unsaturated hydraulic conductivity is required in the environmental investigations. Despite numerous research, measuring saturated hydraulic conductivity using by direct methods are still costly, time consuming and professional. Therefore estimating saturated hydraulic conductivity using rapid and low cost methods such as pedo-transfer functions with acceptable accuracy was developed. The purpose of this research was to compare and evaluate 11 pedo-transfer functions and Adaptive Neuro-Fuzzy Inference System (ANFIS to estimate saturated hydraulic conductivity of soil. In this direct, saturated hydraulic conductivity and physical properties in 40 points of Urmia were calculated. The soil excavated was used in the lab to determine its easily accessible parameters. The results showed that among existing models, Aimrun et al model had the best estimation for soil saturated hydraulic conductivity. For mentioned model, the Root Mean Square Error and Mean Absolute Error parameters were 0.174 and 0.028 m/day respectively. The results of the present research, emphasises the importance of effective porosity application as an important accessible parameter in accuracy of pedo-transfer functions. sand and silt percent, bulk density and soil particle density were selected to apply in 561 ANFIS models. In training phase of best ANFIS model, the R2 and RMSE were calculated 1 and 1.2×10-7 respectively. These amounts in the test phase were 0.98 and 0.0006 respectively. Comparison of regression and ANFIS models showed that the ANFIS model had better results than regression functions. Also Nuro-Fuzzy Inference System had capability to estimatae with high accuracy in various soil textures.
Energy Technology Data Exchange (ETDEWEB)
Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim U.; Emberton, Mark [University College London, Research Department of Urology, Division of Surgery and Interventional Science, London (United Kingdom); Kirkham, Alex [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)
2015-09-15
To assess the interchangeability of zone-specific (peripheral-zone (PZ) and transition-zone (TZ)) multiparametric-MRI (mp-MRI) logistic-regression (LR) models for classification of prostate cancer. Two hundred and thirty-one patients (70 TZ training-cohort; 76 PZ training-cohort; 85 TZ temporal validation-cohort) underwent mp-MRI and transperineal-template-prostate-mapping biopsy. PZ and TZ uni/multi-variate mp-MRI LR-models for classification of significant cancer (any cancer-core-length (CCL) with Gleason > 3 + 3 or any grade with CCL ≥ 4 mm) were derived from the respective cohorts and validated within the same zone by leave-one-out analysis. Inter-zonal performance was tested by applying TZ models to the PZ training-cohort and vice-versa. Classification performance of TZ models for TZ cancer was further assessed in the TZ validation-cohort. ROC area-under-curve (ROC-AUC) analysis was used to compare models. The univariate parameters with the best classification performance were the normalised T2 signal (T2nSI) within the TZ (ROC-AUC = 0.77) and normalized early contrast-enhanced T1 signal (DCE-nSI) within the PZ (ROC-AUC = 0.79). Performance was not significantly improved by bi-variate/tri-variate modelling. PZ models that contained DCE-nSI performed poorly in classification of TZ cancer. The TZ model based solely on maximum-enhancement poorly classified PZ cancer. LR-models dependent on DCE-MRI parameters alone are not interchangeable between prostatic zones; however, models based exclusively on T2 and/or ADC are more robust for inter-zonal application. (orig.)
Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods. PMID:28738071
Warton, David I; Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
Gale, Ella M; Johns, Marcus A; Wirawan, Remigius H; Scott, Janet L
2017-07-21
Polysaccharides, such as cellulose, are often processed by dissolution in solvent mixtures, e.g. an ionic liquid (IL) combined with a dipolar aprotic co-solvent (CS) that the polymer does not dissolve in. A multi-walker, discrete-time, discrete-space 1-dimensional random walk can be applied to model solvation of a polymer in a multi-component solvent mixture. The number of IL pairs in a solvent mixture and the number of solvent shells formable, x, is associated with n, the model time-step, and N, the number of random walkers. The mean number of distinct sites visited is proportional to the amount of polymer soluble in a solution. By also fitting a polynomial regression model to the data, we can associate the random walk terms with chemical interactions between components and probe where the system deviates from a 1-D random walk. The 'frustration' between solvents shells is given as ln x in the random walk model and as a negative IL:IL interaction term in the regression model. This frustration appears in regime II of the random walk model (high volume fractions of IL) where walkers interfere with each other, and the system tends to its limiting behaviour. In the low concentration regime, (regime I) the solvent shells do not interact, and the system depends only on IL and CS terms. In both models (and both regimes), the system is almost entirely controlled by the volume available to solvation shells, and thus is a counting/space-filling problem, where the molar volume of the CS is important. Small deviations are observed when there is an IL-CS interaction. The use of two models, built on separate approaches, confirm these findings, demonstrating that this is a real effect and offering a route to identifying such systems. Specifically, the majority of CSs - such as dimethylformide - follow the random walk model, whilst 1-methylimidazole, dimethyl sulfoxide, 1,3-dimethyl-2-imidazolidinone and tetramethylurea offer a CS-mediated improvement and propylene carbonate
Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.
2016-02-01
The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.
Abad, Cesar C C; Barros, Ronaldo V; Bertuzzi, Romulo; Gagliardi, João F L; Lima-Silva, Adriano E; Lambert, Mike I; Pires, Flavio O
2016-06-01
The aim of this study was to verify the power of VO 2max , peak treadmill running velocity (PTV), and running economy (RE), unadjusted or allometrically adjusted, in predicting 10 km running performance. Eighteen male endurance runners performed: 1) an incremental test to exhaustion to determine VO 2max and PTV; 2) a constant submaximal run at 12 km·h -1 on an outdoor track for RE determination; and 3) a 10 km running race. Unadjusted (VO 2max , PTV and RE) and adjusted variables (VO 2max 0.72 , PTV 0.72 and RE 0.60 ) were investigated through independent multiple regression models to predict 10 km running race time. There were no significant correlations between 10 km running time and either the adjusted or unadjusted VO 2max . Significant correlations (p 0.84 and power > 0.88. The allometrically adjusted predictive model was composed of PTV 0.72 and RE 0.60 and explained 83% of the variance in 10 km running time with a standard error of the estimate (SEE) of 1.5 min. The unadjusted model composed of a single PVT accounted for 72% of the variance in 10 km running time (SEE of 1.9 min). Both regression models provided powerful estimates of 10 km running time; however, the unadjusted PTV may provide an uncomplicated estimation.
Random regression analyses using B-splines to model growth of Australian Angus cattle
Directory of Open Access Journals (Sweden)
Meyer Karin
2005-09-01
Full Text Available Abstract Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error.
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days
Energy Technology Data Exchange (ETDEWEB)
Bramer, L. M.; Rounds, J.; Burleyson, C. D.; Fortin, D.; Hathaway, J.; Rice, J.; Kraucunas, I.
2017-11-01
Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.
Sun, Yanqing; Sun, Liuquan; Zhou, Jie
2013-07-01
This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.
Brakebill, John W; Preston, Stephen D
2003-01-01
The U.S. Geological Survey has developed a methodology for statistically relating nutrient sources and land-surface characteristics to nutrient loads of streams. The methodology is referred to as SPAtially Referenced Regressions On Watershed attributes (SPARROW), and relates measured stream nutrient loads to nutrient sources using nonlinear statistical regression models. A spatially detailed digital hydrologic network of stream reaches, stream-reach characteristics such as mean streamflow, water velocity, reach length, and travel time, and their associated watersheds supports the regression models. This network serves as the primary framework for spatially referencing potential nutrient source information such as atmospheric deposition, septic systems, point-sources, land use, land cover, and agricultural sources and land-surface characteristics such as land use, land cover, average-annual precipitation and temperature, slope, and soil permeability. In the Chesapeake Bay watershed that covers parts of Delaware, Maryland, Pennsylvania, New York, Virginia, West Virginia, and Washington D.C., SPARROW was used to generate models estimating loads of total nitrogen and total phosphorus representing 1987 and 1992 land-surface conditions. The 1987 models used a hydrologic network derived from an enhanced version of the U.S. Environmental Protection Agency's digital River Reach File, and course resolution Digital Elevation Models (DEMs). A new hydrologic network was created to support the 1992 models by generating stream reaches representing surface-water pathways defined by flow direction and flow accumulation algorithms from higher resolution DEMs. On a reach-by-reach basis, stream reach characteristics essential to the modeling were transferred to the newly generated pathways or reaches from the enhanced River Reach File used to support the 1987 models. To complete the new network, watersheds for each reach were generated using the direction of surface-water flow derived
Cattani, Giorgio; Gaeta, Alessandra; di Menno di Bucchianico, Alessandro; de Santis, Antonella; Gaddi, Raffaela; Cusano, Mariacarmela; Ancona, Carla; Badaloni, Chiara; Forastiere, Francesco; Gariazzo, Claudio; Sozzi, Roberto; Inglessis, Marco; Silibello, Camillo; Salvatori, Elisabetta; Manes, Fausto; Cesaroni, Giulia; The Viias Study Group
2017-05-01
The health effects of long-term exposure to ultrafine particles (UFPs) are poorly understood. Data on spatial contrasts in ambient ultrafine particles (UFPs) concentrations are needed with fine resolution. This study aimed to assess the spatial variability of total particle number concentrations (PNC, a proxy for UFPs) in the city of Rome, Italy, using land use regression (LUR) models, and the correspondent exposure of population here living. PNC were measured using condensation particle counters at the building facade of 28 homes throughout the city. Three 7-day monitoring periods were carried out during cold, warm and intermediate seasons. Geographic Information System predictor variables, with buffers of varying size, were evaluated to model spatial variations of PNC. A stepwise forward selection procedure was used to develop a "base" linear regression model according to the European Study of Cohorts for Air Pollution Effects project methodology. Other variables were then included in more enhanced models and their capability of improving model performance was evaluated. Four LUR models were developed. Local variation in UFPs in the study area can be largely explained by the ratio of traffic intensity and distance to the nearest major road. The best model (adjusted R2 = 0.71; root mean square error = ±1,572 particles/cm³, leave one out cross validated R2 = 0.68) was achieved by regressing building and street configuration variables against residual from the "base" model, which added 3% more to the total variance explained. Urban green and population density in a 5,000 m buffer around each home were also relevant predictors. The spatial contrast in ambient PNC across the large conurbation of Rome, was successfully assessed. The average exposure of subjects living in the study area was 16,006 particles/cm³ (SD 2165 particles/cm³, range: 11,075-28,632 particles/cm³). A total of 203,886 subjects (16%) lives in Rome within 50 m from a high traffic road and they
Comparison of universal kriging and regression tree modelling for soil property mapping
Kempen, Bas
2013-04-01
Geostatistical modelling approaches have been dominating the field of digital soil mapping (DSM) since its inception in the early 1980s. In recent years, however, machine learning methods such as classification and regression trees, random forests, and neural networks have quickly gained popularity among researchers in the DSM community. The increased use of these methods has largely gone at the cost of geostatistical approaches. Despite the apparent shift in the application of DSM methods from geostatistics to machine learning, quantitative comparisons of the prediction performance of these methods are largely lacking. The aims of this research, therefore, are: i) to map two soil properties (topsoil organic matter content and thickness of the peat layer in the soil profile) using regression tree (RT) modelling and universal kriging (UK), and ii) to compare the prediction performance of these methods with independent data obtained by probability sampling. Using such data for validation does not only yield a statistically valid and unbiased estimates of the map accuracy, but it also allows a statistical comparison of the accuracies of the maps generated by the two methods. The topsoil organic matter content and the thickness of the peat layer were mapped for a 14,000 ha area in the province of Drenthe, The Netherlands. The calibration dataset contained soil property observations at 1,715 sites. The covariates used include layers derived from soil and paleogeography maps, land cover, relative elevation, drainage class, land reclamation period, elevation change, and historic land use. The validation dataset contained 125 observations selected by stratified simple random sampling of the study area. The root mean squared error (RMSE) of the soil organic matter map obtained by RT modelling was 0.603 log(%), that of the map obtained by UK 0.595 log(%). The difference in map accuracy was not significant (p = 0.377). The RMSE of the peat thickness map obtained by RT
DEFF Research Database (Denmark)
Xu, Man; Pinson, Pierre; Lu, Zongxiang
2016-01-01
Wind farm power curve modeling, which characterizes the relationship between meteorological variables and power production, is a crucial procedure for wind power forecasting. In many cases, power curve modeling is more impacted by the limited quality of input data rather than the stochastic nature...... of the energy conversion process. Such nature may be due the varying wind conditions, aging and state of the turbines, etc. And, an equivalent steady-state power curve, estimated under normal operating conditions with the intention to filter abnormal data, is not sufficient to solve the problem because...... of the lack of time adaptivity. In this paper, a refined local polynomial regression algorithm is proposed to yield an adaptive robust model of the time-varying scattered power curve for forecasting applications. The time adaptivity of the algorithm is considered with a new data-driven bandwidth selection...
A Spatial Disaster Assessment Model of Social Resilience Based on Geographically Weighted Regression
Directory of Open Access Journals (Sweden)
Hwikyung Chun
2017-12-01
Full Text Available Since avoiding the occurrence of natural disasters is difficult, building ‘resilient cities’ is gaining more attention as a common objective within urban communities. By enhancing community resilience, it is possible to minimize the direct and indirect losses from disasters. However, current studies have focused more on physical aspects, despite the fact that social aspects may have a closer relation to the inhabitants. The objective of this paper is to develop an assessment model for social resilience by measuring the heterogeneity of local indicators that are related to disaster risk. Firstly, variables were selected by investigating previous assessment models with statistical verification. Secondly, spatial heterogeneity was analyzed using the Geographically Weighted Regression (GWR method. A case study was then undertaken on a flood-prone area in the metropolitan city, Seoul, South Korea. Based on the findings, the paper proposes a new spatial disaster assessment model that can be used for disaster management at the local levels.
Linear Regression Model of the Ash Mass Fraction and Electrical Conductivity for Slovenian Honey
Directory of Open Access Journals (Sweden)
Mojca Jamnik
2008-01-01
Full Text Available Mass fraction of ash is a quality criterion for determining the botanical origin of honey. At present, this parameter is generally being replaced by the measurement of electrical conductivity (κ. The value κ depends on the ash and acid content of honey; the higher their content, the higher the resulting conductivity. A linear regression model for the relationship between ash and electrical conductivity has been established for Slovenian honey by analysing 290 samples of Slovenian honey (including acacia, lime, chestnut, spruce, fir, multifloral and mixed forest honeydew honey. The obtained model differs from the one proposed by the International Honey Commission (IHC in the slope, but not in the section part of the relation formula. Therefore, the Slovenian model is recommended when calculating the ash mass fraction from the results of electrical conductivity in samples of Slovenian honey.
Photovoltaic Array Condition Monitoring Based on Online Regression of Performance Model
DEFF Research Database (Denmark)
Spataru, Sergiu; Sera, Dezso; Kerekes, Tamas
2013-01-01
Photovoltaic (PV) system performance can be degraded by a series of factors affecting the PV generator, such as partial shadows, soiling, increased series resistance and shunting of the cells. This concern has led to greater interest in improving PV system operation and availability through...... regression modeling, from PV array production, plane-of-array irradiance, and module temperature measurements, acquired during an initial learning phase of the system. After the model has been parameterized automatically, the condition monitoring system enters the normal operation phase, where...... the performance model is used to predict the power output of the PV array. Utilizing the predicted and measured PV array output power values, the condition monitoring system is able to detect power losses above 5%, occurring in the PV array....
Austin, Peter C
2018-01-01
The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest.
Groothuis-Oudshoorn, Catharina Gerarda Maria; Miedema, Henk M.E.
2006-01-01
A method for modeling the relationship of polychotomous health ratings with predictors such as area characteristics, the distance to a source of environmental contamination, or exposure to environmental pollutants is presented. The model combines elements of grouped regression and multilevel
Modeling group size and scalar stress by logistic regression from an archaeological perspective.
Directory of Open Access Journals (Sweden)
Gianmarco Alberti
Full Text Available Johnson's scalar stress theory, describing the mechanics of (and the remedies to the increase in in-group conflictuality that parallels the increase in groups' size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout. Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson's theory and on Dunbar's findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c allows locating a critical scalar stress threshold at community size 127 (95% CI: 122-132, while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147-170. The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion.
Estimating Phred scores of Illumina base calls by logistic regression and sparse modeling.
Zhang, Sheng; Wang, Bo; Wan, Lin; Li, Lei M
2017-07-11
Phred quality scores are essential for downstream DNA analysis such as SNP detection and DNA assembly. Thus a valid model to define them is indispensable for any base-calling software. Recently, we developed the base-caller 3Dec for Illumina sequencing platforms, which reduces base-calling errors by 44-69% compared to the existing ones. However, the model to predict its quality scores has not been fully investigated yet. In this study, we used logistic regression models to evaluate quality scores from predictive features, which include different aspects of the sequencing signals as well as local DNA contents. Sparse models were further obtained by three methods: the backward deletion with either AIC or BIC and the L 1 regularization learning method. The L 1 -regularized one was then compared with the Illumina scoring method. The L 1 -regularized logistic regression improves the empirical discrimination power by as large as 14 and 25% respectively for two kinds of preprocessed sequencing signals, compared to the Illumina scoring method. Namely, the L 1 method identifies more base calls of high fidelity. Computationally, the L 1 method can handle large dataset and is efficient enough for daily sequencing. Meanwhile, the logistic model resulted from BIC is more interpretable. The modeling suggested that the most prominent quenching pattern in the current chemistry of Illumina occurred at the dinucleotide "GT". Besides, nucleotides were more likely to be miscalled as the previous bases if the preceding ones were not "G". It suggested that the phasing effect of bases after "G" was somewhat different from those after other nucleotide types.
Kolasa-Wiecek, Alicja
2015-04-01
The energy sector in Poland is the source of 81% of greenhouse gas (GHG) emissions. Poland, among other European Union countries, occupies a leading position with regard to coal consumption. Polish energy sector actively participates in efforts to reduce GHG emissions to the atmosphere, through a gradual decrease of the share of coal in the fuel mix and development of renewable energy sources. All evidence which completes the knowledge about issues related to GHG emissions is a valuable source of information. The article presents the results of modeling of GHG emissions which are generated by the energy sector in Poland. For a better understanding of the quantitative relationship between total consumption of primary energy and greenhouse gas emission, multiple stepwise regression model was applied. The modeling results of CO2 emissions demonstrate a high relationship (0.97) with the hard coal consumption variable. Adjustment coefficient of the model to actual data is high and equal to 95%. The backward step regression model, in the case of CH4 emission, indicated the presence of hard coal (0.66), peat and fuel wood (0.34), solid waste fuels, as well as other sources (-0.64) as the most important variables. The adjusted coefficient is suitable and equals R2=0.90. For N2O emission modeling the obtained coefficient of determination is low and equal to 43%. A significant variable influencing the amount of N2O emission is the peat and wood fuel consumption. Copyright © 2015. Published by Elsevier B.V.
Shirk, Andrew J; Landguth, Erin L; Cushman, Samuel A
2018-01-01
Anthropogenic migration barriers fragment many populations and limit the ability of species to respond to climate-induced biome shifts. Conservation actions designed to conserve habitat connectivity and mitigate barriers are needed to unite fragmented populations into larger, more viable metapopulations, and to allow species to track their climate envelope over time. Landscape genetic analysis provides an empirical means to infer landscape factors influencing gene flow and thereby inform such conservation actions. However, there are currently many methods available for model selection in landscape genetics, and considerable uncertainty as to which provide the greatest accuracy in identifying the true landscape model influencing gene flow among competing alternative hypotheses. In this study, we used population genetic simulations to evaluate the performance of seven regression-based model selection methods on a broad array of landscapes that varied by the number and type of variables contributing to resistance, the magnitude and cohesion of resistance, as well as the functional relationship between variables and resistance. We also assessed the effect of transformations designed to linearize the relationship between genetic and landscape distances. We found that linear mixed effects models had the highest accuracy in every way we evaluated model performance; however, other methods also performed well in many circumstances, particularly when landscape resistance was high and the correlation among competing hypotheses was limited. Our results provide guidance for which regression-based model selection methods provide the most accurate inferences in landscape genetic analysis and thereby best inform connectivity conservation actions. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Mapping soil organic carbon stocks by robust geostatistical and boosted regression models
Nussbaum, Madlene; Papritz, Andreas; Baltensweiler, Andri; Walthert, Lorenz
2013-04-01
Carbon (C) sequestration in forests offsets greenhouse gas emissions. Therefore, quantifying C stocks and fluxes in forest ecosystems is of interest for greenhouse gas reporting according to the Kyoto protocol. In Switzerland, the National Forest Inventory offers comprehensive data to quantify the aboveground forest biomass and its change in time. Estimating stocks of soil organic C (SOC) in forests is more difficult because the variables needed to quantify stocks vary strongly in space and precise quantification of some of them is very costly. Based on data from 1'033 plots we modeled SOC stocks of the organic layer and the mineral soil to depths of 30 cm and 100 cm for the Swiss forested area. For the statistical modeling a broad range of covariates were available: Climate data (e. g. precipitation, temperature), two elevation models (resolutions 25 and 2 m) with respective terrain attributes and spectral reflectance data representing vegetation. Furthermore, the main mapping units of an overview soil map and a coarse scale geological map were used to coarsely represent the parent material of the soils. The selection of important covariates for SOC stocks modeling out of a large set was a major challenge for the statistical modeling. We used two approaches to deal with this problem: 1) A robust restricted maximum likelihood method to fit linear regression model with spatially correlated errors. The large number of covariates was first reduced by LASSO (Least Absolute Shrinkage and Selection Operator) and then further narrowed down to a parsimonious set of important covariates by cross-validation of the robustly fitted model. To account for nonlinear dependencies of the response on the covariates interaction terms of the latter were included in model if this improved the fit. 2) A boosted structured regression model with componentwise linear least squares or componentwise smoothing splines as base procedures. The selection of important covariates was done by the
Mölter, A; Lindley, S; de Vocht, F; Simpson, A; Agius, R
2010-12-01
Over recent years land use regression (LUR) has become a frequently used method in air pollution exposure studies, as it can model intra-urban variation in pollutant concentrations at a fine spatial scale. However, very few studies have used the LUR methodology to also model the temporal variation in air pollution exposure. The aim of this study is to estimate annual mean NO(2) and PM(10) concentrations from 1996 to 2008 for Greater Manchester using land use regression models. The results from these models will be used in the Manchester Asthma and Allergy Study (MAAS) birth cohort to determine health effects of air pollution exposure. The Greater Manchester LUR model for 2005 was recalibrated using interpolated and adjusted NO(2) and PM(10) concentrations as dependent variables for 1996-2008. In addition, temporally resolved variables were available for traffic intensity and PM(10) emissions. To validate the resulting LUR models, they were applied to the locations of automatic monitoring stations and the estimated concentrations were compared against measured concentrations. The 2005 LUR models were successfully recalibrated, providing individual models for each year from 1996 to 2008. When applied to the monitoring stations the mean prediction error (MPE) for NO(2) concentrations for all stations and years was -0.8μg/m³ and the root mean squared error (RMSE) was 6.7μg/m³. For PM(10) concentrations the MPE was 0.8μg/m³ and the RMSE was 3.4μg/m³. These results indicate that it is possible to model temporal variation in air pollution through LUR with relatively small prediction errors. It is likely that most previous LUR studies did not include temporal variation, because they were based on short term monitoring campaigns and did not have historic pollution data. The advantage of this study is that it uses data from an air dispersion model, which provided concentrations for 2005 and 2010, and therefore allowed extrapolation over a longer time period
Wei, Jiawei
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt
Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir
2018-04-01
Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones
Rosli, A. D.; Hashim, H.; Khairuzzaman, N. A.; Mohd Sampian, A. F.; Baharudin, R.; Abdullah, N. E.; Sulaiman, M. S.; Kamaru'zzaman, M.
2015-11-01
This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water.
Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones
International Nuclear Information System (INIS)
Rosli, A D; Baharudin, R; Hashim, H; Khairuzzaman, N A; Mohd Sampian, A F; Abdullah, N E; Kamaru'zzaman, M; Sulaiman, M S
2015-01-01
This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water. (paper)
Using regression heteroscedasticity to model trends in the mean and variance of floods
Hecht, Jory; Vogel, Richard
2015-04-01
Changes in the frequency of extreme floods have been observed and anticipated in many hydrological settings in response to numerous drivers of environmental change, including climate, land cover, and infrastructure. To help decision-makers design flood control infrastructure in settings with non-stationary hydrological regimes, a parsimonious approach for detecting and modeling trends in extreme floods is needed. An approach using ordinary least squares (OLS) to fit a heteroscedastic regression model can accommodate nonstationarity in both the mean and variance of flood series while simultaneously offering a means of (i) analytically evaluating type I and type II trend detection errors, (ii) analytically generating expressions of uncertainty, such as confidence and prediction intervals, (iii) providing updated estimates of the frequency of floods exceeding the flood of record, (iv) accommodating a wide range of non-linear functions through ladder of powers transformations, and (v) communicating hydrological changes in a single graphical image. Previous research has shown that the two-parameter lognormal distribution can adequately model the annual maximum flood distribution of both stationary and non-stationary hydrological regimes in many regions of the United States. A simple logarithmic transformation of annual maximum flood series enables an OLS heteroscedastic regression modeling approach to be especially suitable for creating a non-stationary flood frequency distribution with parameters that are conditional upon time or physically meaningful covariates. While heteroscedasticity is often viewed as an impediment, we document how detecting and modeling heteroscedasticity presents an opportunity for characterizing both the conditional mean and variance of annual maximum floods. We introduce an approach through which variance trend models can be analytically derived from the behavior of residuals of the conditional mean flood model. Through case studies of
Directory of Open Access Journals (Sweden)
Samsuri Abdullah
2016-07-01
Full Text Available Air pollution in Peninsular Malaysia is dominated by particulate matter which is demonstrated by having the highest Air Pollution Index (API value compared to the other pollutants at most part of the country. Particulate Matter (PM10 forecasting models development is crucial because it allows the authority and citizens of a community to take necessary actions to limit their exposure to harmful levels of particulates pollution and implement protection measures to significantly improve air quality on designated locations. This study aims in improving the ability of MLR using PCs inputs for PM10 concentrations forecasting. Daily observations for PM10 in Kuala Terengganu, Malaysia from January 2003 till December 2011 were utilized to forecast PM10 concentration levels. MLR and PCR (using PCs input models were developed and the performance was evaluated using RMSE, NAE and IA. Results revealed that PCR performed better than MLR due to the implementation of PCA which reduce intricacy and eliminate data multi-collinearity.
International Nuclear Information System (INIS)
Halepoto, I.A.; Uqaili, M.A.
2014-01-01
Nowadays, due to power crisis, electricity demand forecasting is deemed an important area for socioeconomic development and proper anticipation of the load forecasting is considered essential step towards efficient power system operation, scheduling and planning. In this paper, we present STLF (Short Term Load Forecasting) using multiple regression techniques (i.e. linear, multiple linear, quadratic and exponential) by considering hour by hour load model based on specific targeted day approach with temperature variant parameter. The proposed work forecasts the future load demand correlation with linear and non-linear parameters (i.e. considering temperature in our case) through different regression approaches. The overall load forecasting error is 2.98% which is very much acceptable. From proposed regression techniques, Quadratic Regression technique performs better compared to than other techniques because it can optimally fit broad range of functions and data sets. The work proposed in this paper, will pave a path to effectively forecast the specific day load with multiple variance factors in a way that optimal accuracy can be maintained. (author)
Directory of Open Access Journals (Sweden)
Giuliano de Oliveira Freitas
2013-10-01
Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.
Aguiar, Fabio S; Almeida, Luciana L; Ruffino-Netto, Antonio; Kritski, Afranio Lineu; Mello, Fernanda Cq; Werneck, Guilherme L
2012-08-07
Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in
Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana
2015-05-01
Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. Copyright © 2015 Elsevier Ltd. All rights reserved.
Prediction of hourly PM2.5 using a space-time support vector regression model
Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang
2018-05-01
Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.
Directory of Open Access Journals (Sweden)
Katharina Galmbacher
Full Text Available A tumor promoting role of macrophages has been described for a transgenic murine breast cancer model. In this model tumor-associated macrophages (TAMs represent a major component of the leukocytic infiltrate and are associated with tumor progression. Shigella flexneri is a bacterial pathogen known to specificly induce apotosis in macrophages. To evaluate whether Shigella-induced removal of macrophages may be sufficient for achieving tumor regression we have developed an attenuated strain of S. flexneri (M90TDeltaaroA and infected tumor bearing mice. Two mouse models were employed, xenotransplantation of a murine breast cancer cell line and spontanous breast cancer development in MMTV-HER2 transgenic mice. Quantitative analysis of bacterial tumor targeting demonstrated that attenuated, invasive Shigella flexneri primarily infected TAMs after systemic administration. A single i.v. injection of invasive M90TDeltaaroA resulted in caspase-1 dependent apoptosis of TAMs followed by a 74% reduction in tumors of transgenic MMTV-HER-2 mice 7 days post infection. TAM depletion was sustained and associated with complete tumor regression.These data support TAMs as useful targets for antitumor therapy and highlight attenuated bacterial pathogens as potential tools.
Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model
Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz
2016-01-01
When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924
Roşu, M. M.; Tarbă, C. I.; Neagu, C.
2016-11-01
The current models for inventory management are complementary, but together they offer a large pallet of elements for solving complex problems of companies when wanting to establish the optimum economic order quantity for unfinished products, row of materials, goods etc. The main objective of this paper is to elaborate an automated decisional model for the calculus of the economic order quantity taking into account the price regressive rates for the total order quantity. This model has two main objectives: first, to determine the periodicity when to be done the order n or the quantity order q; second, to determine the levels of stock: lighting control, security stock etc. In this way we can provide the answer to two fundamental questions: How much must be ordered? When to Order? In the current practice, the business relationships with its suppliers are based on regressive rates for price. This means that suppliers may grant discounts, from a certain level of quantities ordered. Thus, the unit price of the products is a variable which depends on the order size. So, the most important element for choosing the optimum for the economic order quantity is the total cost for ordering and this cost depends on the following elements: the medium price per units, the stock cost, the ordering cost etc.
Exergy Analysis of a Subcritical Reheat Steam Power Plant with Regression Modeling and Optimization
Directory of Open Access Journals (Sweden)
MUHIB ALI RAJPER
2016-07-01
Full Text Available In this paper, exergy analysis of a 210 MW SPP (Steam Power Plant is performed. Firstly, the plant is modeled and validated, followed by a parametric study to show the effects of various operating parameters on the performance parameters. The net power output, energy efficiency, and exergy efficiency are taken as the performance parameters, while the condenser pressure, main steam pressure, bled steam pressures, main steam temperature, and reheat steam temperature isnominated as the operating parameters. Moreover, multiple polynomial regression models are developed to correlate each performance parameter with the operating parameters. The performance is then optimizedby using Direct-searchmethod. According to the results, the net power output, energy efficiency, and exergy efficiency are calculated as 186.5 MW, 31.37 and 30.41%, respectively under normal operating conditions as a base case. The condenser is a major contributor towards the energy loss, followed by the boiler, whereas the highest irreversibilities occur in the boiler and turbine. According to the parametric study, variation in the operating parameters greatly influences the performance parameters. The regression models have appeared to be a good estimator of the performance parameters. The optimum net power output, energy efficiency and exergy efficiency are obtained as 227.6 MW, 37.4 and 36.4, respectively, which have been calculated along with optimal values of selected operating parameters.
Zhao, Lue Ping; Bolouri, Hamid
2016-04-01
Maturing omics technologies enable researchers to generate high dimension omics data (HDOD) routinely in translational clinical studies. In the field of oncology, The Cancer Genome Atlas (TCGA) provided funding support to researchers to generate different types of omics data on a common set of biospecimens with accompanying clinical data and has made the data available for the research community to mine. One important application, and the focus of this manuscript, is to build predictive models for prognostic outcomes based on HDOD. To complement prevailing regression-based approaches, we propose to use an object-oriented regression (OOR) methodology to identify exemplars specified by HDOD patterns and to assess their associations with prognostic outcome. Through computing patient's similarities to these exemplars, the OOR-based predictive model produces a risk estimate using a patient's HDOD. The primary advantages of OOR are twofold: reducing the penalty of high dimensionality and retaining the interpretability to clinical practitioners. To illustrate its utility, we apply OOR to gene expression data from non-small cell lung cancer patients in TCGA and build a predictive model for prognostic survivorship among stage I patients, i.e., we stratify these patients by their prognostic survival risks beyond histological classifications. Identification of these high-risk patients helps oncologists to develop effective treatment protocols and post-treatment disease management plans. Using the TCGA data, the total sample is divided into training and validation data sets. After building up a predictive model in the training set, we compute risk scores from the predictive model, and validate associations of risk scores with prognostic outcome in the validation data (P-value=0.015). Copyright © 2016 Elsevier Inc. All rights reserved.
Robust inference in the negative binomial regression model with an application to falls data.
Aeberhard, William H; Cantoni, Eva; Heritier, Stephane
2014-12-01
A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.
A land use regression model incorporating data on industrial point source pollution.
Chen, Li; Wang, Yuming; Li, Peiwu; Ji, Yaqin; Kong, Shaofei; Li, Zhiyong; Bai, Zhipeng
2012-01-01
Advancing the understanding of the spatial aspects of air pollution in the city regional environment is an area where improved methods can be of great benefit to exposure assessment and policy support. We created land use regression (LUR) models for SO2, NO2 and PM10 for Tianjin, China. Traffic volumes, road networks, land use data, population density, meteorological conditions, physical conditions and satellite-derived greenness, brightness and wetness were used for predicting SO2, NO2 and PM10 concentrations. We incorporated data on industrial point sources to improve LUR model performance. In order to consider the impact of different sources, we calculated the PSIndex, LSIndex and area of different land use types (agricultural land, industrial land, commercial land, residential land, green space and water area) within different buffer radii (1 to 20 km). This method makes up for the lack of consideration of source impact based on the LUR model. Remote sensing-derived variables were significantly correlated with gaseous pollutant concentrations such as SO2 and NO2. R2 values of the multiple linear regression equations for SO2, NO2 and PM10 were 0.78, 0.89 and 0.84, respectively, and the RMSE values were 0.32, 0.18 and 0.21, respectively. Model predictions at validation monitoring sites went well with predictions generally within 15% of measured values. Compared to the relationship between dependent variables and simple variables (such as traffic variables or meteorological condition variables), the relationship between dependent variables and integrated variables was more consistent with a linear relationship. Such integration has a discernable influence on both the overall model prediction and health effects assessment on the spatial distribution of air pollution in the city region.
Directory of Open Access Journals (Sweden)
P. Arockia Jansi Rani
2010-08-01
Full Text Available Image compression is very important in reducing the costs of data storage and transmission in relatively slow channels. In this paper, a still image compression scheme driven by Self-Organizing Map with polynomial regression modeling and entropy coding, employed within the wavelet framework is presented. The image compressibility and interpretability are improved by incorporating noise reduction into the compression scheme. The implementation begins with the classical wavelet decomposition, quantization followed by Huffman encoder. The codebook for the quantization process is designed using an unsupervised learning algorithm and further modified using polynomial regression to control the amount of noise reduction. Simulation results show that the proposed method reduces bit rate significantly and provides better perceptual quality than earlier methods.
Differential measurement errors in zero-truncated regression models for count data.
Huang, Yih-Huei; Hwang, Wen-Han; Chen, Fei-Yin
2011-12-01
Measurement errors in covariates may result in biased estimates in regression analysis. Most methods to correct this bias assume nondifferential measurement errors-i.e., that measurement errors are independent of the response variable. However, in regression models for zero-truncated count data, the number of error-prone covariate measurements for a given observational unit can equal its response count, implying a situation of differential measurement errors. To address this challenge, we develop a modified conditional score approach to achieve consistent estimation. The proposed method represents a novel technique, with efficiency gains achieved by augmenting random errors, and performs well in a simulation study. The method is demonstrated in an ecology application. © 2011, The International Biometric Society.
Lim, Changwon
2015-03-30
Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. Copyright © 2014 John Wiley & Sons, Ltd.
Extending Participatory Sensing to Personal Exposure Using Microscopic Land Use Regression Models.
Dekoninck, Luc; Botteldooren, Dick; Int Panis, Luc
2017-05-31
Personal exposure is sensitive to the personal features and behavior of the individual, and including interpersonal variability will improve the health and quality of life evaluations. Participatory sensing assesses the spatial and temporal variability of environmental indicators and is used to quantify this interpersonal variability. Transferring the participatory sensing information to a specific study population is a basic requirement for epidemiological studies in the near future. We propose a methodology to reduce the void between participatory sensing and health research. Instantaneous microscopic land-use regression modeling (µLUR) is an innovative approach. Data science techniques extract the activity-specific and route-sensitive spatiotemporal variability from the data. A data workflow to prepare and apply µLUR models to any mobile population is presented. The µLUR technique and data workflow are illustrated with models for exposure to traffic related Black Carbon. The example µLURs are available for three micro-environments; bicycle, in-vehicle, and indoor. Instantaneous noise assessments supply instantaneous traffic information to the µLURs. The activity specific models are combined into an instantaneous personal exposure model for Black Carbon. An independent external validation reached a correlation of 0.65. The µLURs can be applied to simulated behavioral patterns of individuals in epidemiological cohorts for advanced health and policy research.
Land-use regression panel models of NO2 concentrations in Seoul, Korea
Kim, Youngkook; Guldmann, Jean-Michel
2015-04-01
Transportation and land-use activities are major air pollution contributors. Since their shares of emissions vary across space and time, so do air pollution concentrations. Despite these variations, panel data have rarely been used in land-use regression (LUR) modeling of air pollution. In addition, the complex interactions between traffic flows, land uses, and meteorological variables, have not been satisfactorily investigated in LUR models. The purpose of this research is to develop and estimate nitrogen dioxide (NO2) panel models based on the LUR framework with data for Seoul, Korea, accounting for the impacts of these variables, and their interactions with spatial and temporal dummy variables. The panel data vary over several scales: daily (24 h), seasonally (4), and spatially (34 intra-urban measurement locations). To enhance model explanatory power, wind direction and distance decay effects are accounted for. The results show that vehicle-kilometers-traveled (VKT) and solar radiation have statistically strong positive and negative impacts on NO2 concentrations across the four seasonal models. In addition, there are significant interactions with the dummy variables, pointing to VKT and solar radiation effects on NO2 concentrations that vary with time and intra-urban location. The results also show that residential, commercial, and industrial land uses, and wind speed, temperature, and humidity, all impact NO2 concentrations. The R2 vary between 0.95 and 0.98.
Ribeiro, Manuel Castro; Sousa, António Jorge; Pereira, Maria João
2016-05-01
The geographical distribution of health outcomes is influenced by socio-economic and environmental factors operating on different spatial scales. Geographical variations in relationships can be revealed with semi-parametric Geographically Weighted Poisson Regression (sGWPR), a model that can combine both geographically varying and geographically constant parameters. To decide whether a parameter should vary geographically, two models are compared: one in which all parameters are allowed to vary geographically and one in which all except the parameter being evaluated are allowed to vary geographically. The model with the lower corrected Akaike Information Criterion (AICc) is selected. Delivering model selection exclusively according to the AICc might hide important details in spatial variations of associations. We propose assisting the decision by using a Linear Model of Coregionalization (LMC). Here we show how LMC can refine sGWPR on ecological associations between socio-economic and environmental variables and low birth weight outcomes in the west-north-central region of Portugal. Copyright © 2016 Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
C. Makendran
2015-01-01
Full Text Available Prediction models for low volume village roads in India are developed to evaluate the progression of different types of distress such as roughness, cracking, and potholes. Even though the Government of India is investing huge quantum of money on road construction every year, poor control over the quality of road construction and its subsequent maintenance is leading to the faster road deterioration. In this regard, it is essential that scientific maintenance procedures are to be evolved on the basis of performance of low volume flexible pavements. Considering the above, an attempt has been made in this research endeavor to develop prediction models to understand the progression of roughness, cracking, and potholes in flexible pavements exposed to least or nil routine maintenance. Distress data were collected from the low volume rural roads covering about 173 stretches spread across Tamil Nadu state in India. Based on the above collected data, distress prediction models have been developed using multiple linear regression analysis. Further, the models have been validated using independent field data. It can be concluded that the models developed in this study can serve as useful tools for the practicing engineers maintaining flexible pavements on low volume roads.
Genetic evaluation of weekly body weight in Japanese quail using random regression models.
Karami, K; Zerehdaran, S; Tahmoorespur, M; Barzanooni, B; Lotfi, E
2017-02-01
1. A total of 11 826 records from 2489 quails, hatched between 2012 and 2013, were used to estimate genetic parameters for BW (body weight) of Japanese quail using random regression models. Weekly BW was measured from hatch until 49 d of age. WOMBAT software (University of New England, Australia) was used for estimating genetic and phenotypic parameters. 2. Nineteen models were evaluated to identify the best orders of Legendre polynomials. A model with Legendre polynomial of order 3 for additive genetic effect, order 3 for permanent environmental effects and order 1 for maternal permanent environmental effects was chosen as the best model. 3. According to the best model, phenotypic and genetic variances were higher at the end of the rearing period. Although direct heritability for BW reduced from 0.18 at hatch to 0.12 at 7 d of age, it gradually increased to 0.42 at 49 d of age. It indicates that BW at older ages is more controlled by genetic components in Japanese quail. 4. Phenotypic and genetic correlations between adjacent periods except hatching weight were more closely correlated than remote periods. The present results suggested that BW at earlier ages, especially at hatch, are different traits compared to BW at older ages. Therefore, BW at earlier ages could not be used as a selection criterion for improving BW at slaughter age.
A soft-sensing model for feedwater flow rate using fuzzy support vector regression
Energy Technology Data Exchange (ETDEWEB)
Na, Man Gyun; Yang, Heon Young; Lim, Dong Hyuk [Chosun University, Gwangju (Korea, Republic of)
2008-02-15
Most pressurized water reactors use Venturi flow meters to measure the feedwater flow rate. However, fouling phenomena, which allow corrosion products to accumulate and increase the differential pressure across the Venturi flow meter, can result in an overestimation of the flow rate. In this study, a soft-sensing model based on fuzzy support vector regression was developed to enable accurate on-line prediction of the feedwater flow rate. The available data was divided into two groups by fuzzy c-means clustering in order to reduce the training time. The data for training the soft-sensing model was selected from each data group with the aid of a subtractive clustering scheme because informative data increases the learning effect. The proposed soft-sensing model was confirmed with the real plant data of Yonggwang Nuclear Power Plant Unit 3. The root mean square error and relative maximum error of the model were quite small. Hence, this model can be used to validate and monitor existing hardware feedwater flow meters.
Modeling Source Water TOC Using Hydroclimate Variables and Local Polynomial Regression.
Samson, Carleigh C; Rajagopalan, Balaji; Summers, R Scott
2016-04-19
To control disinfection byproduct (DBP) formation in drinking water, an understanding of the source water total organic carbon (TOC) concentration variability can be critical. Previously, TOC concentrations in water treatment plant source waters have been modeled using streamflow data. However, the lack of streamflow data or unimpaired flow scenarios makes it difficult to model TOC. In addition, TOC variability under climate change further exacerbates the problem. Here we proposed a modeling approach based on local polynomial regression that uses climate, e.g. temperature, and land surface, e.g., soil moisture, variables as predictors of TOC concentration, obviating the need for streamflow. The local polynomial approach has the ability to capture non-Gaussian and nonlinear features that might be present in the relationships. The utility of the methodology is demonstrated using source water quality and climate data in three case study locations with surface source waters including river and reservoir sources. The models show good predictive skill in general at these locations, with lower skills at locations with the most anthropogenic influences in their streams. Source water TOC predictive models can provide water treatment utilities important information for making treatment decisions for DBP regulation compliance under future climate scenarios.
Extending Participatory Sensing to Personal Exposure Using Microscopic Land Use Regression Models
Directory of Open Access Journals (Sweden)
Luc Dekoninck
2017-05-01
Full Text Available Personal exposure is sensitive to the personal features and behavior of the individual, and including interpersonal variability will improve the health and quality of life evaluations. Participatory sensing assesses the spatial and temporal variability of environmental indicators and is used to quantify this interpersonal variability. Transferring the participatory sensing information to a specific study population is a basic requirement for epidemiological studies in the near future. We propose a methodology to reduce the void between participatory sensing and health research. Instantaneous microscopic land-use regression modeling (µLUR is an innovative approach. Data science techniques extract the activity-specific and route-sensitive spatiotemporal variability from the data. A data workflow to prepare and apply µLUR models to any mobile population is presented. The µLUR technique and data workflow are illustrated with models for exposure to traffic related Black Carbon. The example µLURs are available for three micro-environments; bicycle, in-vehicle, and indoor. Instantaneous noise assessments supply instantaneous traffic information to the µLURs. The activity specific models are combined into an instantaneous personal exposure model for Black Carbon. An independent external validation reached a correlation of 0.65. The µLURs can be applied to simulated behavioral patterns of individuals in epidemiological cohorts for advanced health and policy research.
Braak, ter C.J.F.
2009-01-01
This paper proposes a regression method, ROSCAS, which regularizes smart contrasts and sums of regression coefficients by an L1 penalty. The contrasts and sums are based on the sample correlation matrix of the predictors and are suggested by a latent variable regression model. The contrasts express
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen. Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Sumantari, Y. D.; Slamet, I.; Sugiyanto
2017-06-01
Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.
Modeling of Soil Aggregate Stability using Support Vector Machines and Multiple Linear Regression
Directory of Open Access Journals (Sweden)
Ali Asghar Besalatpour
2016-02-01
by 20-m digital elevation model (DEM. The data set was divided into two subsets of training and testing. The training subset was randomly chosen from 70% of the total set of the data and the remaining samples (30% of the data were used as the testing set. The correlation coefficient (r, mean square error (MSE, and error percentage (ERROR% between the measured and the predicted GMD values were used to evaluate the performance of the models. Results and Discussion: The description statistics showed that there was little variability in the sample distributions of the variables used in this study to develop the GMD prediction models, indicating that their values were all normally distributed. The constructed SVM model had better performance in predicting GMD compared to the traditional multiple linear regression model. The obtained MSE and r values for the developed SVM model for soil aggregate stability prediction were 0.005 and 0.86, respectively. The obtained ERROR% value for soil aggregate stability prediction using the SVM model was 10.7% while it was 15.7% for the regression model. The scatter plot figures also showed that the SVM model was more accurate in GMD estimation than the MLR model, since the predicted GMD values were closer in agreement with the measured values for most of the samples. The worse performance of the MLR model might be due to the larger amount of data that is required for developing a sustainable regression model compared to intelligent systems. Furthermore, only the linear effects of the predictors on the dependent variable can be extracted by linear models while in many cases the effects may not be linear in nature. Meanwhile, the SVM model is suitable for modelling nonlinear relationships and its major advantage is that the method can be developed without knowing the exact form of the analytical function on which the model should be built. All these indicate that the SVM approach would be a better choice for predicting soil aggregate
Canaza-Cayo, Ali William; Lopes, Paulo Sávio; da Silva, Marcos Vinicius Gualberto Barbosa; de Almeida Torres, Robledo; Martins, Marta Fonseca; Arbex, Wagner Antonio; Cobuci, Jaime Araujo
2015-01-01
A total of 32,817 test-day milk yield (TDMY) records of the first lactation of 4,056 Girolando cows daughters of 276 sires, collected from 118 herds between 2000 and 2011 were utilized to estimate the genetic parameters for TDMY via random regression models (RRM) using Legendre’s polynomial functions whose orders varied from 3 to 5. In addition, nine measures of persistency in milk yield (PSi) and the genetic trend of 305-day milk yield (305MY) were evaluated. The fit quality criteria used in...
A logistic regression model of Coronary Artery Disease among Male Patients in Punjab
Directory of Open Access Journals (Sweden)
Sohail Chand
2005-07-01
Full Text Available This is a cross-sectional retrospective study of 308 male patients, who were presented first time for coronary angiography at the Punjab Institute of Cardiology. The mean age was 50.97 + 9.9 among male patients. As the response variable coronary artery disease (CAD was a binary variable, logistic regression model was fitted to predict the Coronary Artery Disease with the help of significant risk factors. Age, Chest pain, Diabetes Mellitus, Smoking and Lipids are resulted as significant risk factors associated with CAD among male population.
A weakly informative default prior distribution for logistic and other regression models
Gelman, Andrew; Jakulin, Aleks; Pittau, Maria Grazia; Su, Yu-Sung
2008-01-01
We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-$t$ prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-ha...
Dikaios, Nikolaos; Alkalbani, Jokha; Abd-Alazeez, Mohamed; Sidhu, Harbir Singh; Kirkham, Alex; Ahmed, Hashim U; Emberton, Mark; Freeman, Alex; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit
2015-09-01
To assess the interchangeability of zone-specific (peripheral-zone (PZ) and transition-zone (TZ)) multiparametric-MRI (mp-MRI) logistic-regression (LR) models for classification of prostate cancer. Two hundred and thirty-one patients (70 TZ training-cohort; 76 PZ training-cohort; 85 TZ temporal validation-cohort) underwent mp-MRI and transperineal-template-prostate-mapping biopsy. PZ and TZ uni/multi-variate mp-MRI LR-models for classification of significant cancer (any cancer-core-length (CCL) with Gleason > 3 + 3 or any grade with CCL ≥ 4 mm) were derived from the respective cohorts and validated within the same zone by leave-one-out analysis. Inter-zonal performance was tested by applying TZ models to the PZ training-cohort and vice-versa. Classification performance of TZ models for TZ cancer was further assessed in the TZ validation-cohort. ROC area-under-curve (ROC-AUC) analysis was used to compare models. The univariate parameters with the best classification performance were the normalised T2 signal (T2nSI) within the TZ (ROC-AUC = 0.77) and normalized early contrast-enhanced T1 signal (DCE-nSI) within the PZ (ROC-AUC = 0.79). Performance was not significantly improved by bi-variate/tri-variate modelling. PZ models that contained DCE-nSI performed poorly in classification of TZ cancer. The TZ model based solely on maximum-enhancement poorly classified PZ cancer. LR-models dependent on DCE-MRI parameters alone are not interchangable between prostatic zones; however, models based exclusively on T2 and/or ADC are more robust for inter-zonal application. • The ADC and T2-nSI of benign/cancer PZ are higher than benign/cancer TZ. • DCE parameters are significantly different between benign PZ and TZ, but not between cancerous PZ and TZ. • Diagnostic models containing contrast enhancement parameters have reduced performance when applied across zones.
Optimization of end-members used in multiple linear regression geochemical mixing models
Dunlea, Ann G.; Murray, Richard W.
2015-11-01
Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).
Robust brain ROI segmentation by deformation regression and deformable shape model.
Wu, Zhengwang; Guo, Yanrong; Park, Sang Hyun; Gao, Yaozong; Dong, Pei; Lee, Seong-Whan; Shen, Dinggang
2018-01-01
We propose a robust and efficient learning-based deformable model for segmenting regions of interest (ROIs) from structural MR brain images. Different from the conventional deformable-model-based methods that deform a shape model locally around the initialization location, we learn an image-based regressor to guide the deformable model to fit for the target ROI. Specifically, given any voxel in a new image, the image-based regressor can predict the displacement vector from this voxel towards the boundary of target ROI, which can be used to guide the deformable segmentation. By predicting the displacement vector maps for the whole image, our deformable model is able to use multiple non-boundary predictions to jointly determine and iteratively converge the initial shape model to the target ROI boundary, which is more robust to the local prediction error and initialization. In addition, by introducing the prior shape model, our segmentation avoids the isolated segmentations as often occurred in the previous multi-atlas-based methods. In order to learn an image-based regressor for displacement vector prediction, we adopt the following novel strategies in the learning procedure: (1) a joint classification and regression random forest is proposed to learn an image-based regressor together with an ROI classifier in a multi-task manner; (2) high-level context features are extracted from intermediate (estimated) displacement vector and classification maps to enforce the relationship between predicted displacement vectors at neighboring voxels. To validate our method, we compare it with the state-of-the-art multi-atlas-based methods and other learning-based methods on three public brain MR datasets. The results consistently show that our method is better in terms of both segmentation accuracy and computational efficiency. Copyright © 2017 Elsevier B.V. All rights reserved.