WorldWideScience

Sample records for linear regression relationship

  1. Return-Volatility Relationship: Insights from Linear and Non-Linear Quantile Regression

    NARCIS (Netherlands)

    D.E. Allen (David); A.K. Singh (Abhay); R.J. Powell (Robert); M.J. McAleer (Michael); J. Taylor (James); L. Thomas (Lyn)

    2013-01-01

    textabstractThe purpose of this paper is to examine the asymmetric relationship between price and implied volatility and the associated extreme quantile dependence using linear and non linear quantile regression approach. Our goal in this paper is to demonstrate that the relationship between the

  2. Advanced statistics: linear regression, part I: simple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  3. Advanced statistics: linear regression, part II: multiple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  4. Correlation and simple linear regression.

    Science.gov (United States)

    Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

    2003-06-01

    In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.

  5. Linear regression

    CERN Document Server

    Olive, David J

    2017-01-01

    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  6. The Relationship between Economic Growth and Money Laundering – a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Daniel Rece

    2009-09-01

    Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.

  7. Transmission of linear regression patterns between time series: from relationship in time series to complex networks.

    Science.gov (United States)

    Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui

    2014-07-01

    The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.

  8. On the Relationship Between Confidence Sets and Exchangeable Weights in Multiple Linear Regression.

    Science.gov (United States)

    Pek, Jolynn; Chalmers, R Philip; Monette, Georges

    2016-01-01

    When statistical models are employed to provide a parsimonious description of empirical relationships, the extent to which strong conclusions can be drawn rests on quantifying the uncertainty in parameter estimates. In multiple linear regression (MLR), regression weights carry two kinds of uncertainty represented by confidence sets (CSs) and exchangeable weights (EWs). Confidence sets quantify uncertainty in estimation whereas the set of EWs quantify uncertainty in the substantive interpretation of regression weights. As CSs and EWs share certain commonalities, we clarify the relationship between these two kinds of uncertainty about regression weights. We introduce a general framework describing how CSs and the set of EWs for regression weights are estimated from the likelihood-based and Wald-type approach, and establish the analytical relationship between CSs and sets of EWs. With empirical examples on posttraumatic growth of caregivers (Cadell et al., 2014; Schneider, Steele, Cadell & Hemsworth, 2011) and on graduate grade point average (Kuncel, Hezlett & Ones, 2001), we illustrate the usefulness of CSs and EWs for drawing strong scientific conclusions. We discuss the importance of considering both CSs and EWs as part of the scientific process, and provide an Online Appendix with R code for estimating Wald-type CSs and EWs for k regression weights.

  9. Comparing Machine Learning Classifiers and Linear/Logistic Regression to Explore the Relationship between Hand Dimensions and Demographic Characteristics.

    Science.gov (United States)

    Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue

    2016-01-01

    Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.

  10. Biostatistics Series Module 6: Correlation and Linear Regression.

    Science.gov (United States)

    Hazra, Avijit; Gogtay, Nithya

    2016-01-01

    Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.

  11. [From clinical judgment to linear regression model.

    Science.gov (United States)

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.

  12. Applied linear regression

    CERN Document Server

    Weisberg, Sanford

    2013-01-01

    Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus

  13. Reliability of the Load-Velocity Relationship Obtained Through Linear and Polynomial Regression Models to Predict the One-Repetition Maximum Load.

    Science.gov (United States)

    Pestaña-Melero, Francisco Luis; Haff, G Gregory; Rojas, Francisco Javier; Pérez-Castilla, Alejandro; García-Ramos, Amador

    2017-12-18

    This study aimed to compare the between-session reliability of the load-velocity relationship between (1) linear vs. polynomial regression models, (2) concentric-only vs. eccentric-concentric bench press variants, as well as (3) the within-participants vs. the between-participants variability of the velocity attained at each percentage of the one-repetition maximum (%1RM). The load-velocity relationship of 30 men (age: 21.2±3.8 y; height: 1.78±0.07 m, body mass: 72.3±7.3 kg; bench press 1RM: 78.8±13.2 kg) were evaluated by means of linear and polynomial regression models in the concentric-only and eccentric-concentric bench press variants in a Smith Machine. Two sessions were performed with each bench press variant. The main findings were: (1) first-order-polynomials (CV: 4.39%-4.70%) provided the load-velocity relationship with higher reliability than second-order-polynomials (CV: 4.68%-5.04%); (2) the reliability of the load-velocity relationship did not differ between the concentric-only and eccentric-concentric bench press variants; (3) the within-participants variability of the velocity attained at each %1RM was markedly lower than the between-participants variability. Taken together, these results highlight that, regardless of the bench press variant considered, the individual determination of the load-velocity relationship by a linear regression model could be recommended to monitor and prescribe the relative load in the Smith machine bench press exercise.

  14. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    Science.gov (United States)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  15. Recursive Algorithm For Linear Regression

    Science.gov (United States)

    Varanasi, S. V.

    1988-01-01

    Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.

  16. Relationships between the structure of wheat gluten and ACE inhibitory activity of hydrolysate: stepwise multiple linear regression analysis.

    Science.gov (United States)

    Zhang, Yanyan; Ma, Haile; Wang, Bei; Qu, Wenjuan; Wali, Asif; Zhou, Cunshan

    2016-08-01

    Ultrasound pretreatment of wheat gluten (WG) before enzymolysis can improve the angiotensin converting enzyme (ACE) inhibitory activity of the hydrolysates by alerting the structure of substrate proteins. Establishment of a relationship between the structure of WG and ACE inhibitory activity of the hydrolysates to judge the end point of the ultrasonic pretreatment is vital. The results of stepwise multiple linear regression (MLR) showed that the contents of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil were significantly correlated to ACE Inhibitory activity of the hydrolysate, with the standard partial regression coefficients were 3.729, -0.676, -0.252, 0.022 and 0.156, respectively. The R(2) of this model was 0.970. External validation showed that the stepwise MLR model could well predict the ACE inhibitory activity of hydrolysate based on the content of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil of WG before hydrolysis. A stepwise multiple linear regression model describing the quantitative relationships between the structure of WG and the ACE Inhibitory activity of the hydrolysates was established. This model can be used to predict the endpoint of the ultrasonic pretreatment. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.

  17. Distributed Monitoring of the R2 Statistic for Linear Regression

    Data.gov (United States)

    National Aeronautics and Space Administration — The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and...

  18. Comparison of Classical Linear Regression and Orthogonal Regression According to the Sum of Squares Perpendicular Distances

    OpenAIRE

    KELEŞ, Taliha; ALTUN, Murat

    2016-01-01

    Regression analysis is a statistical technique for investigating and modeling the relationship between variables. The purpose of this study was the trivial presentation of the equation for orthogonal regression (OR) and the comparison of classical linear regression (CLR) and OR techniques with respect to the sum of squared perpendicular distances. For that purpose, the analyses were shown by an example. It was found that the sum of squared perpendicular distances of OR is smaller. Thus, it wa...

  19. Common pitfalls in statistical analysis: Linear regression analysis

    Directory of Open Access Journals (Sweden)

    Rakesh Aggarwal

    2017-01-01

    Full Text Available In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.

  20. Discriminative Elastic-Net Regularized Linear Regression.

    Science.gov (United States)

    Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen

    2017-03-01

    In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.

  1. Non-Linear Relationship between Economic Growth and CO₂ Emissions in China: An Empirical Study Based on Panel Smooth Transition Regression Models.

    Science.gov (United States)

    Wang, Zheng-Xin; Hao, Peng; Yao, Pei-Yi

    2017-12-13

    The non-linear relationship between provincial economic growth and carbon emissions is investigated by using panel smooth transition regression (PSTR) models. The research indicates that, on the condition of separately taking Gross Domestic Product per capita (GDPpc), energy structure (Es), and urbanisation level (Ul) as transition variables, three models all reject the null hypothesis of a linear relationship, i.e., a non-linear relationship exists. The results show that the three models all contain only one transition function but different numbers of location parameters. The model taking GDPpc as the transition variable has two location parameters, while the other two models separately considering Es and Ul as the transition variables both contain one location parameter. The three models applied in the study all favourably describe the non-linear relationship between economic growth and CO₂ emissions in China. It also can be seen that the conversion rate of the influence of Ul on per capita CO₂ emissions is significantly higher than those of GDPpc and Es on per capita CO₂ emissions.

  2. Linear regression in astronomy. II

    Science.gov (United States)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  3. On macroeconomic values investigation using fuzzy linear regression analysis

    Directory of Open Access Journals (Sweden)

    Richard Pospíšil

    2017-06-01

    Full Text Available The theoretical background for abstract formalization of the vague phenomenon of complex systems is the fuzzy set theory. In the paper, vague data is defined as specialized fuzzy sets - fuzzy numbers and there is described a fuzzy linear regression model as a fuzzy function with fuzzy numbers as vague parameters. To identify the fuzzy coefficients of the model, the genetic algorithm is used. The linear approximation of the vague function together with its possibility area is analytically and graphically expressed. A suitable application is performed in the tasks of the time series fuzzy regression analysis. The time-trend and seasonal cycles including their possibility areas are calculated and expressed. The examples are presented from the economy field, namely the time-development of unemployment, agricultural production and construction respectively between 2009 and 2011 in the Czech Republic. The results are shown in the form of the fuzzy regression models of variables of time series. For the period 2009-2011, the analysis assumptions about seasonal behaviour of variables and the relationship between them were confirmed; in 2010, the system behaved fuzzier and the relationships between the variables were vaguer, that has a lot of causes, from the different elasticity of demand, through state interventions to globalization and transnational impacts.

  4. Adaptive regression for modeling nonlinear relationships

    CERN Document Server

    Knafl, George J

    2016-01-01

    This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...

  5. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    Science.gov (United States)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  6. Piecewise linear regression splines with hyperbolic covariates

    International Nuclear Information System (INIS)

    Cologne, John B.; Sposto, Richard

    1992-09-01

    Consider the problem of fitting a curve to data that exhibit a multiphase linear response with smooth transitions between phases. We propose substituting hyperbolas as covariates in piecewise linear regression splines to obtain curves that are smoothly joined. The method provides an intuitive and easy way to extend the two-phase linear hyperbolic response model of Griffiths and Miller and Watts and Bacon to accommodate more than two linear segments. The resulting regression spline with hyperbolic covariates may be fit by nonlinear regression methods to estimate the degree of curvature between adjoining linear segments. The added complexity of fitting nonlinear, as opposed to linear, regression models is not great. The extra effort is particularly worthwhile when investigators are unwilling to assume that the slope of the response changes abruptly at the join points. We can also estimate the join points (the values of the abscissas where the linear segments would intersect if extrapolated) if their number and approximate locations may be presumed known. An example using data on changing age at menarche in a cohort of Japanese women illustrates the use of the method for exploratory data analysis. (author)

  7. Post-processing through linear regression

    Science.gov (United States)

    van Schaeybroeck, B.; Vannitsem, S.

    2011-03-01

    Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  8. Post-processing through linear regression

    Directory of Open Access Journals (Sweden)

    B. Van Schaeybroeck

    2011-03-01

    Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.

    These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  9. Learning a Nonnegative Sparse Graph for Linear Regression.

    Science.gov (United States)

    Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung

    2015-09-01

    Previous graph-based semisupervised learning (G-SSL) methods have the following drawbacks: 1) they usually predefine the graph structure and then use it to perform label prediction, which cannot guarantee an overall optimum and 2) they only focus on the label prediction or the graph structure construction but are not competent in handling new samples. To this end, a novel nonnegative sparse graph (NNSG) learning method was first proposed. Then, both the label prediction and projection learning were integrated into linear regression. Finally, the linear regression and graph structure learning were unified within the same framework to overcome these two drawbacks. Therefore, a novel method, named learning a NNSG for linear regression was presented, in which the linear regression and graph learning were simultaneously performed to guarantee an overall optimum. In the learning process, the label information can be accurately propagated via the graph structure so that the linear regression can learn a discriminative projection to better fit sample labels and accurately classify new samples. An effective algorithm was designed to solve the corresponding optimization problem with fast convergence. Furthermore, NNSG provides a unified perceptiveness for a number of graph-based learning methods and linear regression methods. The experimental results showed that NNSG can obtain very high classification accuracy and greatly outperforms conventional G-SSL methods, especially some conventional graph construction methods.

  10. Removing Malmquist bias from linear regressions

    Science.gov (United States)

    Verter, Frances

    1993-01-01

    Malmquist bias is present in all astronomical surveys where sources are observed above an apparent brightness threshold. Those sources which can be detected at progressively larger distances are progressively more limited to the intrinsically luminous portion of the true distribution. This bias does not distort any of the measurements, but distorts the sample composition. We have developed the first treatment to correct for Malmquist bias in linear regressions of astronomical data. A demonstration of the corrected linear regression that is computed in four steps is presented.

  11. Linear regression in astronomy. I

    Science.gov (United States)

    Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

    1990-01-01

    Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.

  12. Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis

    Science.gov (United States)

    Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried

    2018-03-01

    This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease ( P 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.

  13. A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models.

    Science.gov (United States)

    Chen, Qingxia; Ibrahim, Joseph G

    2014-07-01

    Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.

  14. Linear regression analysis: part 14 of a series on evaluation of scientific publications.

    Science.gov (United States)

    Schneider, Astrid; Hommel, Gerhard; Blettner, Maria

    2010-11-01

    Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.

  15. Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis.

    Science.gov (United States)

    Oguntunde, Philip G; Lischeid, Gunnar; Dietrich, Ottfried

    2018-03-01

    This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease (P  1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.

  16. Use of probabilistic weights to enhance linear regression myoelectric control.

    Science.gov (United States)

    Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J

    2015-12-01

    Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts' law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p linear regression control. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  17. Linear regression crash prediction models : issues and proposed solutions.

    Science.gov (United States)

    2010-05-01

    The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...

  18. Linear regression and the normality assumption.

    Science.gov (United States)

    Schmidt, Amand F; Finan, Chris

    2017-12-16

    Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates. Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient. Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings. Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Controlling attribute effect in linear regression

    KAUST Repository

    Calders, Toon; Karim, Asim A.; Kamiran, Faisal; Ali, Wasif Mohammad; Zhang, Xiangliang

    2013-01-01

    In data mining we often have to learn from biased data, because, for instance, data comes from different batches or there was a gender or racial bias in the collection of social data. In some applications it may be necessary to explicitly control this bias in the models we learn from the data. This paper is the first to study learning linear regression models under constraints that control the biasing effect of a given attribute such as gender or batch number. We show how propensity modeling can be used for factoring out the part of the bias that can be justified by externally provided explanatory attributes. Then we analytically derive linear models that minimize squared error while controlling the bias by imposing constraints on the mean outcome or residuals of the models. Experiments with discrimination-aware crime prediction and batch effect normalization tasks show that the proposed techniques are successful in controlling attribute effects in linear regression models. © 2013 IEEE.

  20. Controlling attribute effect in linear regression

    KAUST Repository

    Calders, Toon

    2013-12-01

    In data mining we often have to learn from biased data, because, for instance, data comes from different batches or there was a gender or racial bias in the collection of social data. In some applications it may be necessary to explicitly control this bias in the models we learn from the data. This paper is the first to study learning linear regression models under constraints that control the biasing effect of a given attribute such as gender or batch number. We show how propensity modeling can be used for factoring out the part of the bias that can be justified by externally provided explanatory attributes. Then we analytically derive linear models that minimize squared error while controlling the bias by imposing constraints on the mean outcome or residuals of the models. Experiments with discrimination-aware crime prediction and batch effect normalization tasks show that the proposed techniques are successful in controlling attribute effects in linear regression models. © 2013 IEEE.

  1. Determination of regression laws: Linear and nonlinear

    International Nuclear Information System (INIS)

    Onishchenko, A.M.

    1994-01-01

    A detailed mathematical determination of regression laws is presented in the article. Particular emphasis is place on determining the laws of X j on X l to account for source nuclei decay and detector errors in nuclear physics instrumentation. Both linear and nonlinear relations are presented. Linearization of 19 functions is tabulated, including graph, relation, variable substitution, obtained linear function, and remarks. 6 refs., 1 tab

  2. Evaluation of Linear Regression Simultaneous Myoelectric Control Using Intramuscular EMG.

    Science.gov (United States)

    Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J

    2016-04-01

    The objective of this study was to evaluate the ability of linear regression models to decode patterns of muscle coactivation from intramuscular electromyogram (EMG) and provide simultaneous myoelectric control of a virtual 3-DOF wrist/hand system. Performance was compared to the simultaneous control of conventional myoelectric prosthesis methods using intramuscular EMG (parallel dual-site control)-an approach that requires users to independently modulate individual muscles in the residual limb, which can be challenging for amputees. Linear regression control was evaluated in eight able-bodied subjects during a virtual Fitts' law task and was compared to performance of eight subjects using parallel dual-site control. An offline analysis also evaluated how different types of training data affected prediction accuracy of linear regression control. The two control systems demonstrated similar overall performance; however, the linear regression method demonstrated improved performance for targets requiring use of all three DOFs, whereas parallel dual-site control demonstrated improved performance for targets that required use of only one DOF. Subjects using linear regression control could more easily activate multiple DOFs simultaneously, but often experienced unintended movements when trying to isolate individual DOFs. Offline analyses also suggested that the method used to train linear regression systems may influence controllability. Linear regression myoelectric control using intramuscular EMG provided an alternative to parallel dual-site control for 3-DOF simultaneous control at the wrist and hand. The two methods demonstrated different strengths in controllability, highlighting the tradeoff between providing simultaneous control and the ability to isolate individual DOFs when desired.

  3. Augmenting Data with Published Results in Bayesian Linear Regression

    Science.gov (United States)

    de Leeuw, Christiaan; Klugkist, Irene

    2012-01-01

    In most research, linear regression analyses are performed without taking into account published results (i.e., reported summary statistics) of similar previous studies. Although the prior density in Bayesian linear regression could accommodate such prior knowledge, formal models for doing so are absent from the literature. The goal of this…

  4. Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

    CERN Document Server

    Faraway, Julian J

    2005-01-01

    Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...

  5. Suppression Situations in Multiple Linear Regression

    Science.gov (United States)

    Shieh, Gwowen

    2006-01-01

    This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…

  6. Linear regression metamodeling as a tool to summarize and present simulation model results.

    Science.gov (United States)

    Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M

    2013-10-01

    Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.

  7. Relationships between each part of the spinal curves and upright posture using Multiple stepwise linear regression analysis.

    Science.gov (United States)

    Boulet, Sebastien; Boudot, Elsa; Houel, Nicolas

    2016-05-03

    Back pain is a common reason for consultation in primary healthcare clinical practice, and has effects on daily activities and posture. Relationships between the whole spine and upright posture, however, remain unknown. The aim of this study was to identify the relationship between each spinal curve and centre of pressure position as well as velocity for healthy subjects. Twenty-one male subjects performed quiet stance in natural position. Each upright posture was then recorded using an optoelectronics system (Vicon Nexus) synchronized with two force plates. At each moment, polynomial interpolations of markers attached on the spine segment were used to compute cervical lordosis, thoracic kyphosis and lumbar lordosis angle curves. Mean of centre of pressure position and velocity was then computed. Multiple stepwise linear regression analysis showed that the position and velocity of centre of pressure associated with each part of the spinal curves were defined as best predictors of the lumbar lordosis angle (R(2)=0.45; p=1.65*10-10) and the thoracic kyphosis angle (R(2)=0.54; p=4.89*10-13) of healthy subjects in quiet stance. This study showed the relationships between each of cervical, thoracic, lumbar curvatures, and centre of pressure's fluctuation during free quiet standing using non-invasive full spinal curve exploration. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Use of probabilistic weights to enhance linear regression myoelectric control

    Science.gov (United States)

    Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.

    2015-12-01

    Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  9. Distributed Monitoring of the R(sup 2) Statistic for Linear Regression

    Science.gov (United States)

    Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.

    2011-01-01

    The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.

  10. Finite Algorithms for Robust Linear Regression

    DEFF Research Database (Denmark)

    Madsen, Kaj; Nielsen, Hans Bruun

    1990-01-01

    The Huber M-estimator for robust linear regression is analyzed. Newton type methods for solution of the problem are defined and analyzed, and finite convergence is proved. Numerical experiments with a large number of test problems demonstrate efficiency and indicate that this kind of approach may...

  11. Direction of Effects in Multiple Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  12. Simple and multiple linear regression: sample size considerations.

    Science.gov (United States)

    Hanley, James A

    2016-11-01

    The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Who Will Win?: Predicting the Presidential Election Using Linear Regression

    Science.gov (United States)

    Lamb, John H.

    2007-01-01

    This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…

  14. Quantum algorithm for linear regression

    Science.gov (United States)

    Wang, Guoming

    2017-07-01

    We present a quantum algorithm for fitting a linear regression model to a given data set using the least-squares approach. Differently from previous algorithms which yield a quantum state encoding the optimal parameters, our algorithm outputs these numbers in the classical form. So by running it once, one completely determines the fitted model and then can use it to make predictions on new data at little cost. Moreover, our algorithm works in the standard oracle model, and can handle data sets with nonsparse design matrices. It runs in time poly( log2(N ) ,d ,κ ,1 /ɛ ) , where N is the size of the data set, d is the number of adjustable parameters, κ is the condition number of the design matrix, and ɛ is the desired precision in the output. We also show that the polynomial dependence on d and κ is necessary. Thus, our algorithm cannot be significantly improved. Furthermore, we also give a quantum algorithm that estimates the quality of the least-squares fit (without computing its parameters explicitly). This algorithm runs faster than the one for finding this fit, and can be used to check whether the given data set qualifies for linear regression in the first place.

  15. Comparison of Linear and Non-linear Regression Analysis to Determine Pulmonary Pressure in Hyperthyroidism.

    Science.gov (United States)

    Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan

    2017-01-01

    This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second

  16. A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.

    Science.gov (United States)

    Meaney, Christopher; Moineddin, Rahim

    2014-01-24

    In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the

  17. A test for the parameters of multiple linear regression models ...

    African Journals Online (AJOL)

    A test for the parameters of multiple linear regression models is developed for conducting tests simultaneously on all the parameters of multiple linear regression models. The test is robust relative to the assumptions of homogeneity of variances and absence of serial correlation of the classical F-test. Under certain null and ...

  18. Testing hypotheses for differences between linear regression lines

    Science.gov (United States)

    Stanley J. Zarnoch

    2009-01-01

    Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...

  19. Multiple Linear Regression: A Realistic Reflector.

    Science.gov (United States)

    Nutt, A. T.; Batsell, R. R.

    Examples of the use of Multiple Linear Regression (MLR) techniques are presented. This is done to show how MLR aids data processing and decision-making by providing the decision-maker with freedom in phrasing questions and by accurately reflecting the data on hand. A brief overview of the rationale underlying MLR is given, some basic definitions…

  20. Identification of Influential Points in a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Jan Grosz

    2011-03-01

    Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.

  1. Linear regression and sensitivity analysis in nuclear reactor design

    International Nuclear Information System (INIS)

    Kumar, Akansha; Tsvetkov, Pavel V.; McClarren, Ryan G.

    2015-01-01

    Highlights: • Presented a benchmark for the applicability of linear regression to complex systems. • Applied linear regression to a nuclear reactor power system. • Performed neutronics, thermal–hydraulics, and energy conversion using Brayton’s cycle for the design of a GCFBR. • Performed detailed sensitivity analysis to a set of parameters in a nuclear reactor power system. • Modeled and developed reactor design using MCNP, regression using R, and thermal–hydraulics in Java. - Abstract: The paper presents a general strategy applicable for sensitivity analysis (SA), and uncertainity quantification analysis (UA) of parameters related to a nuclear reactor design. This work also validates the use of linear regression (LR) for predictive analysis in a nuclear reactor design. The analysis helps to determine the parameters on which a LR model can be fit for predictive analysis. For those parameters, a regression surface is created based on trial data and predictions are made using this surface. A general strategy of SA to determine and identify the influential parameters those affect the operation of the reactor is mentioned. Identification of design parameters and validation of linearity assumption for the application of LR of reactor design based on a set of tests is performed. The testing methods used to determine the behavior of the parameters can be used as a general strategy for UA, and SA of nuclear reactor models, and thermal hydraulics calculations. A design of a gas cooled fast breeder reactor (GCFBR), with thermal–hydraulics, and energy transfer has been used for the demonstration of this method. MCNP6 is used to simulate the GCFBR design, and perform the necessary criticality calculations. Java is used to build and run input samples, and to extract data from the output files of MCNP6, and R is used to perform regression analysis and other multivariate variance, and analysis of the collinearity of data

  2. SPLINE LINEAR REGRESSION USED FOR EVALUATING FINANCIAL ASSETS 1

    Directory of Open Access Journals (Sweden)

    Liviu GEAMBAŞU

    2010-12-01

    Full Text Available One of the most important preoccupations of financial markets participants was and still is the problem of determining more precise the trend of financial assets prices. For solving this problem there were written many scientific papers and were developed many mathematical and statistical models in order to better determine the financial assets price trend. If until recently the simple linear models were largely used due to their facile utilization, the financial crises that affected the world economy starting with 2008 highlight the necessity of adapting the mathematical models to variation of economy. A simple to use model but adapted to economic life realities is the spline linear regression. This type of regression keeps the continuity of regression function, but split the studied data in intervals with homogenous characteristics. The characteristics of each interval are highlighted and also the evolution of market over all the intervals, resulting reduced standard errors. The first objective of the article is the theoretical presentation of the spline linear regression, also referring to scientific national and international papers related to this subject. The second objective is applying the theoretical model to data from the Bucharest Stock Exchange

  3. Multiple regression technique for Pth degree polynominals with and without linear cross products

    Science.gov (United States)

    Davis, J. W.

    1973-01-01

    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  4. The microcomputer scientific software series 2: general linear model--regression.

    Science.gov (United States)

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  5. 2D Quantitative Structure-Property Relationship Study of Mycotoxins by Multiple Linear Regression and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Fereshteh Shiri

    2010-08-01

    Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.

  6. QSAR models for prediction study of HIV protease inhibitors using support vector machines, neural networks and multiple linear regression

    Directory of Open Access Journals (Sweden)

    Rachid Darnag

    2017-02-01

    Full Text Available Support vector machines (SVM represent one of the most promising Machine Learning (ML tools that can be applied to develop a predictive quantitative structure–activity relationship (QSAR models using molecular descriptors. Multiple linear regression (MLR and artificial neural networks (ANNs were also utilized to construct quantitative linear and non linear models to compare with the results obtained by SVM. The prediction results are in good agreement with the experimental value of HIV activity; also, the results reveal the superiority of the SVM over MLR and ANN model. The contribution of each descriptor to the structure–activity relationships was evaluated.

  7. Stochastic development regression on non-linear manifolds

    DEFF Research Database (Denmark)

    Kühnel, Line; Sommer, Stefan Horst

    2017-01-01

    We introduce a regression model for data on non-linear manifolds. The model describes the relation between a set of manifold valued observations, such as shapes of anatomical objects, and Euclidean explanatory variables. The approach is based on stochastic development of Euclidean diffusion...... processes to the manifold. Defining the data distribution as the transition distribution of the mapped stochastic process, parameters of the model, the non-linear analogue of design matrix and intercept, are found via maximum likelihood. The model is intrinsically related to the geometry encoded...

  8. Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment

    Science.gov (United States)

    Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos

    2013-01-01

    In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…

  9. Linear regression methods a ccording to objective functions

    OpenAIRE

    Yasemin Sisman; Sebahattin Bektas

    2012-01-01

    The aim of the study is to explain the parameter estimation methods and the regression analysis. The simple linear regressionmethods grouped according to the objective function are introduced. The numerical solution is achieved for the simple linear regressionmethods according to objective function of Least Squares and theLeast Absolute Value adjustment methods. The success of the appliedmethods is analyzed using their objective function values.

  10. Estimating monotonic rates from biological data using local linear regression.

    Science.gov (United States)

    Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R

    2017-03-01

    Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.

  11. Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: a case study.

    Science.gov (United States)

    Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y

    2008-02-18

    The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.

  12. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  13. A novel simple QSAR model for the prediction of anti-HIV activity using multiple linear regression analysis.

    Science.gov (United States)

    Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga

    2006-08-01

    A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.

  14. Privacy-Preserving Distributed Linear Regression on High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Gascón Adrià

    2017-10-01

    Full Text Available We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013, and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.

  15. A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

    Science.gov (United States)

    Smith, Paul F; Ganesh, Siva; Liu, Ping

    2013-10-30

    Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.

  16. [Multiple linear regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis].

    Science.gov (United States)

    Ma, Yu-Feng; Wang, Qing-Fu; Chen, Zhao-Jun; Du, Chun-Lin; Li, Jun-Hai; Huang, Hu; Shi, Zong-Ting; Yin, Yue-Shan; Zhang, Lei; A-Di, Li-Jiang; Dong, Shi-Yu; Wu, Ji

    2012-05-01

    To perform Multiple Linear Regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis, and to analyze their relationship with clinical and biomechanical concepts. From March 2011 to July 2011, 140 patients (250 knees) were reviewed, including 132 knees in the left and 118 knees in the right; ranging in age from 40 to 71 years, with an average of 54.68 years. The MB-RULER measurement software was applied to measure femoral angle, tibial angle, femorotibial angle, joint gap angle from antero-posterir and lateral position of X-rays. The WOMAC scores were also collected. Then multiple regression equations was applied for the linear regression analysis of correlation between the X-ray measurement and WOMAC scores. There was statistical significance in the regression equation of AP X-rays value and WOMAC scores (Pregression equation of lateral X-ray value and WOMAC scores (P>0.05). 1) X-ray measurement of knee joint can reflect the WOMAC scores to a certain extent. 2) It is necessary to measure the X-ray mechanical axis of knee, which is important for diagnosis and treatment of osteoarthritis. 3) The correlation between tibial angle,joint gap angle on antero-posterior X-ray and WOMAC scores is significant, which can be used to assess the functional recovery of patients before and after treatment.

  17. A simple linear regression method for quantitative trait loci linkage analysis with censored observations.

    Science.gov (United States)

    Anderson, Carl A; McRae, Allan F; Visscher, Peter M

    2006-07-01

    Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.

  18. Analysis of γ spectra in airborne radioactivity measurements using multiple linear regressions

    International Nuclear Information System (INIS)

    Bao Min; Shi Quanlin; Zhang Jiamei

    2004-01-01

    This paper describes the net peak counts calculating of nuclide 137 Cs at 662 keV of γ spectra in airborne radioactivity measurements using multiple linear regressions. Mathematic model is founded by analyzing every factor that has contribution to Cs peak counts in spectra, and multiple linear regression function is established. Calculating process adopts stepwise regression, and the indistinctive factors are eliminated by F check. The regression results and its uncertainty are calculated using Least Square Estimation, then the Cs peak net counts and its uncertainty can be gotten. The analysis results for experimental spectrum are displayed. The influence of energy shift and energy resolution on the analyzing result is discussed. In comparison with the stripping spectra method, multiple linear regression method needn't stripping radios, and the calculating result has relation with the counts in Cs peak only, and the calculating uncertainty is reduced. (authors)

  19. Application of genetic algorithm - multiple linear regressions to predict the activity of RSK inhibitors

    Directory of Open Access Journals (Sweden)

    Avval Zhila Mohajeri

    2015-01-01

    Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.

  20. Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

    Science.gov (United States)

    Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

    2017-12-01

    The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.

  1. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    Science.gov (United States)

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  2. Generalised Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions

    DEFF Research Database (Denmark)

    Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf

    We consider the semiparametric generalised linear regression model which has mainstream empirical models such as the (partially) linear mean regression, logistic and multinomial regression as special cases. As an extension to related literature we allow a misclassified covariate to be interacted...

  3. Neutrosophic Correlation and Simple Linear Regression

    Directory of Open Access Journals (Sweden)

    A. A. Salama

    2014-09-01

    Full Text Available Since the world is full of indeterminacy, the neutrosophics found their place into contemporary research. The fundamental concepts of neutrosophic set, introduced by Smarandache. Recently, Salama et al., introduced the concept of correlation coefficient of neutrosophic data. In this paper, we introduce and study the concepts of correlation and correlation coefficient of neutrosophic data in probability spaces and study some of their properties. Also, we introduce and study the neutrosophic simple linear regression model. Possible applications to data processing are touched upon.

  4. Single image super-resolution using locally adaptive multiple linear regression.

    Science.gov (United States)

    Yu, Soohwan; Kang, Wonseok; Ko, Seungyong; Paik, Joonki

    2015-12-01

    This paper presents a regularized superresolution (SR) reconstruction method using locally adaptive multiple linear regression to overcome the limitation of spatial resolution of digital images. In order to make the SR problem better-posed, the proposed method incorporates the locally adaptive multiple linear regression into the regularization process as a local prior. The local regularization prior assumes that the target high-resolution (HR) pixel is generated by a linear combination of similar pixels in differently scaled patches and optimum weight parameters. In addition, we adapt a modified version of the nonlocal means filter as a smoothness prior to utilize the patch redundancy. Experimental results show that the proposed algorithm better restores HR images than existing state-of-the-art methods in the sense of the most objective measures in the literature.

  5. Teaching the Concept of Breakdown Point in Simple Linear Regression.

    Science.gov (United States)

    Chan, Wai-Sum

    2001-01-01

    Most introductory textbooks on simple linear regression analysis mention the fact that extreme data points have a great influence on ordinary least-squares regression estimation; however, not many textbooks provide a rigorous mathematical explanation of this phenomenon. Suggests a way to fill this gap by teaching students the concept of breakdown…

  6. High-throughput quantitative biochemical characterization of algal biomass by NIR spectroscopy; multiple linear regression and multivariate linear regression analysis.

    Science.gov (United States)

    Laurens, L M L; Wolfrum, E J

    2013-12-18

    One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.

  7. Establishment of regression dependences. Linear and nonlinear dependences

    International Nuclear Information System (INIS)

    Onishchenko, A.M.

    1994-01-01

    The main problems of determination of linear and 19 types of nonlinear regression dependences are completely discussed. It is taken into consideration that total dispersions are the sum of measurement dispersions and parameter variation dispersions themselves. Approaches to all dispersions determination are described. It is shown that the least square fit gives inconsistent estimation for industrial objects and processes. The correction methods by taking into account comparable measurement errors for both variable give an opportunity to obtain consistent estimation for the regression equation parameters. The condition of the correction technique application expediency is given. The technique for determination of nonlinear regression dependences taking into account the dependence form and comparable errors of both variables is described. 6 refs., 1 tab

  8. Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

    KAUST Repository

    Abdul Jameel, Abdul Gani

    2016-09-14

    An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN) of 71 pure hydrocarbons and 54 hydrocarbon blends were utilized as a data set to study the relationship between ignition quality and molecular structure. CN and DCN are functional equivalents and collectively referred to as D/CN, herein. The effect of molecular weight and weight percent of structural parameters such as paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic CH–CH2 groups, naphthenic CH–CH2 groups, and aromatic C–CH groups on D/CN was studied. A particular emphasis on the effect of branching (i.e., methyl substitution) on the D/CN was studied, and a new parameter denoted as the branching index (BI) was introduced to quantify this effect. A new formula was developed to calculate the BI of hydrocarbon fuels using 1H NMR spectroscopy. Multiple linear regression (MLR) modeling was used to develop an empirical relationship between D/CN and the eight structural parameters. This was then used to predict the DCN of many hydrocarbon fuels. The developed model has a high correlation coefficient (R2 = 0.97) and was validated with experimentally measured DCN of twenty-two real fuel mixtures (e.g., gasolines and diesels) and fifty-nine blends of known composition, and the predicted values matched well with the experimental data.

  9. The number of subjects per variable required in linear regression analyses.

    Science.gov (United States)

    Austin, Peter C; Steyerberg, Ewout W

    2015-06-01

    To determine the number of independent variables that can be included in a linear regression model. We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R(2) of the fitted model. A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R(2), although adjusted R(2) estimates behaved well. The bias in estimating the model R(2) statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  10. Reconciling the Log-Linear and Non-Log-Linear Nature of the TSH-Free T4 Relationship: Intra-Individual Analysis of a Large Population.

    Science.gov (United States)

    Rothacker, Karen M; Brown, Suzanne J; Hadlow, Narelle C; Wardrop, Robert; Walsh, John P

    2016-03-01

    The TSH-T4 relationship was thought to be inverse log-linear, but recent cross-sectional studies report a complex, nonlinear relationship; large, intra-individual studies are lacking. Our objective was to analyze the TSH-free T4 relationship within individuals. We analyzed data from 13 379 patients, each with six or more TSH/free T4 measurements and at least a 5-fold difference between individual median TSH and minimum or maximum TSH. Linear and nonlinear regression models of log TSH on free T4 were fitted to data from individuals and goodness of fit compared by likelihood ratio testing. Comparing all models, the linear model achieved best fit in 31% of individuals, followed by quartic (27%), cubic (15%), null (12%), and quadratic (11%) models. After eliminating least favored models (with individuals reassigned to best fitting, available models), the linear model fit best in 42% of participants, quartic in 43%, and null model in 15%. As the number of observations per individual increased, so did the proportion of individuals in whom the linear model achieved best fit, to 66% in those with more than 20 observations. When linear models were applied to all individuals and averaged according to individual median free T4 values, variations in slope and intercept indicated a nonlinear log TSH-free T4 relationship across the population. The log TSH-free T4 relationship appears linear in some individuals and nonlinear in others, but is predominantly linear in those with the largest number of observations. A log-linear relationship within individuals can be reconciled with a non-log-linear relationship in a population.

  11. A STATISTICAL ANALYSIS OF GDP AND FINAL CONSUMPTION USING SIMPLE LINEAR REGRESSION. THE CASE OF ROMANIA 1990–2010

    OpenAIRE

    Aniela Balacescu; Marian Zaharia

    2011-01-01

    This paper aims to examine the causal relationship between GDP and final consumption. The authors used linear regression model in which GDP is considered variable results, and final consumption variable factor. In drafting article we used Excel software application that is a modern computing and statistical data analysis.

  12. How Robust Is Linear Regression with Dummy Variables?

    Science.gov (United States)

    Blankmeyer, Eric

    2006-01-01

    Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…

  13. Bayesian quantile regression-based partially linear mixed-effects joint models for longitudinal data with multiple features.

    Science.gov (United States)

    Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara

    2017-01-01

    In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.

  14. Tightness of M-estimators for multiple linear regression in time series

    DEFF Research Database (Denmark)

    Johansen, Søren; Nielsen, Bent

    We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...

  15. BRGLM, Interactive Linear Regression Analysis by Least Square Fit

    International Nuclear Information System (INIS)

    Ringland, J.T.; Bohrer, R.E.; Sherman, M.E.

    1985-01-01

    1 - Description of program or function: BRGLM is an interactive program written to fit general linear regression models by least squares and to provide a variety of statistical diagnostic information about the fit. Stepwise and all-subsets regression can be carried out also. There are facilities for interactive data management (e.g. setting missing value flags, data transformations) and tools for constructing design matrices for the more commonly-used models such as factorials, cubic Splines, and auto-regressions. 2 - Method of solution: The least squares computations are based on the orthogonal (QR) decomposition of the design matrix obtained using the modified Gram-Schmidt algorithm. 3 - Restrictions on the complexity of the problem: The current release of BRGLM allows maxima of 1000 observations, 99 variables, and 3000 words of main memory workspace. For a problem with N observations and P variables, the number of words of main memory storage required is MAX(N*(P+6), N*P+P*P+3*N, and 3*P*P+6*N). Any linear model may be fit although the in-memory workspace will have to be increased for larger problems

  16. Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.

    Science.gov (United States)

    Kawashima, Issaku; Kumano, Hiroaki

    2017-01-01

    Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.

  17. Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling

    Directory of Open Access Journals (Sweden)

    Issaku Kawashima

    2017-07-01

    Full Text Available Mind-wandering (MW, task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.

  18. Stochastic development regression on non-linear manifolds

    DEFF Research Database (Denmark)

    Kühnel, Line; Sommer, Stefan Horst

    2017-01-01

    We introduce a regression model for data on non-linear manifolds. The model describes the relation between a set of manifold valued observations, such as shapes of anatomical objects, and Euclidean explanatory variables. The approach is based on stochastic development of Euclidean diffusion...... processes to the manifold. Defining the data distribution as the transition distribution of the mapped stochastic process, parameters of the model, the non-linear analogue of design matrix and intercept, are found via maximum likelihood. The model is intrinsically related to the geometry encoded...... in the connection of the manifold. We propose an estimation procedure which applies the Laplace approximation of the likelihood function. A simulation study of the performance of the model is performed and the model is applied to a real dataset of Corpus Callosum shapes....

  19. EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

    Science.gov (United States)

    Lian, Yao; Ge, Meng; Pan, Xian-Ming

    2014-12-19

    B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .

  20. The number of subjects per variable required in linear regression analyses

    NARCIS (Netherlands)

    P.C. Austin (Peter); E.W. Steyerberg (Ewout)

    2015-01-01

    textabstractObjectives To determine the number of independent variables that can be included in a linear regression model. Study Design and Setting We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression

  1. Weibull and lognormal Taguchi analysis using multiple linear regression

    International Nuclear Information System (INIS)

    Piña-Monarrez, Manuel R.; Ortiz-Yañez, Jesús F.

    2015-01-01

    The paper provides to reliability practitioners with a method (1) to estimate the robust Weibull family when the Taguchi method (TM) is applied, (2) to estimate the normal operational Weibull family in an accelerated life testing (ALT) analysis to give confidence to the extrapolation and (3) to perform the ANOVA analysis to both the robust and the normal operational Weibull family. On the other hand, because the Weibull distribution neither has the normal additive property nor has a direct relationship with the normal parameters (µ, σ), in this paper, the issues of estimating a Weibull family by using a design of experiment (DOE) are first addressed by using an L_9 (3"4) orthogonal array (OA) in both the TM and in the Weibull proportional hazard model approach (WPHM). Then, by using the Weibull/Gumbel and the lognormal/normal relationships and multiple linear regression, the direct relationships between the Weibull and the lifetime parameters are derived and used to formulate the proposed method. Moreover, since the derived direct relationships always hold, the method is generalized to the lognormal and ALT analysis. Finally, the method’s efficiency is shown through its application to the used OA and to a set of ALT data. - Highlights: • It gives the statistical relations and steps to use the Taguchi Method (TM) to analyze Weibull data. • It gives the steps to determine the unknown Weibull family to both the robust TM setting and the normal ALT level. • It gives a method to determine the expected lifetimes and to perform its ANOVA analysis in TM and ALT analysis. • It gives a method to give confidence to the extrapolation in an ALT analysis by using the Weibull family of the normal level.

  2. Treating experimental data of inverse kinetic method by unitary linear regression analysis

    International Nuclear Information System (INIS)

    Zhao Yusen; Chen Xiaoliang

    2009-01-01

    The theory of treating experimental data of inverse kinetic method by unitary linear regression analysis was described. Not only the reactivity, but also the effective neutron source intensity could be calculated by this method. Computer code was compiled base on the inverse kinetic method and unitary linear regression analysis. The data of zero power facility BFS-1 in Russia were processed and the results were compared. The results show that the reactivity and the effective neutron source intensity can be obtained correctly by treating experimental data of inverse kinetic method using unitary linear regression analysis and the precision of reactivity measurement is improved. The central element efficiency can be calculated by using the reactivity. The result also shows that the effect to reactivity measurement caused by external neutron source should be considered when the reactor power is low and the intensity of external neutron source is strong. (authors)

  3. An improved multiple linear regression and data analysis computer program package

    Science.gov (United States)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  4. Fuzzy Linear Regression for the Time Series Data which is Fuzzified with SMRGT Method

    Directory of Open Access Journals (Sweden)

    Seçil YALAZ

    2016-10-01

    Full Text Available Our work on regression and classification provides a new contribution to the analysis of time series used in many areas for years. Owing to the fact that convergence could not obtained with the methods used in autocorrelation fixing process faced with time series regression application, success is not met or fall into obligation of changing the models’ degree. Changing the models’ degree may not be desirable in every situation. In our study, recommended for these situations, time series data was fuzzified by using the simple membership function and fuzzy rule generation technique (SMRGT and to estimate future an equation has created by applying fuzzy least square regression (FLSR method which is a simple linear regression method to this data. Although SMRGT has success in determining the flow discharge in open channels and can be used confidently for flow discharge modeling in open canals, as well as in pipe flow with some modifications, there is no clue about that this technique is successful in fuzzy linear regression modeling. Therefore, in order to address the luck of such a modeling, a new hybrid model has been described within this study. In conclusion, to demonstrate our methods’ efficiency, classical linear regression for time series data and linear regression for fuzzy time series data were applied to two different data sets, and these two approaches performances were compared by using different measures.

  5. pKa prediction for acidic phosphorus-containing compounds using multiple linear regression with computational descriptors.

    Science.gov (United States)

    Yu, Donghai; Du, Ruobing; Xiao, Ji-Chang

    2016-07-05

    Ninety-six acidic phosphorus-containing molecules with pKa 1.88 to 6.26 were collected and divided into training and test sets by random sampling. Structural parameters were obtained by density functional theory calculation of the molecules. The relationship between the experimental pKa values and structural parameters was obtained by multiple linear regression fitting for the training set, and tested with the test set; the R(2) values were 0.974 and 0.966 for the training and test sets, respectively. This regression equation, which quantitatively describes the influence of structural parameters on pKa , and can be used to predict pKa values of similar structures, is significant for the design of new acidic phosphorus-containing extractants. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  6. The consistency of ordinary least-squares and generalized least-squares polynomial regression on characterizing the mechanomyographic amplitude versus torque relationship

    International Nuclear Information System (INIS)

    Herda, Trent J; Ryan, Eric D; Costa, Pablo B; DeFreitas, Jason M; Walter, Ashley A; Stout, Jeffrey R; Beck, Travis W; Cramer, Joel T; Housh, Terry J; Weir, Joseph P

    2009-01-01

    The primary purpose of this study was to examine the consistency of ordinary least-squares (OLS) and generalized least-squares (GLS) polynomial regression analyses utilizing linear, quadratic and cubic models on either five or ten data points that characterize the mechanomyographic amplitude (MMG RMS ) versus isometric torque relationship. The secondary purpose was to examine the consistency of OLS and GLS polynomial regression utilizing only linear and quadratic models (excluding cubic responses) on either ten or five data points. Eighteen participants (mean ± SD age = 24 ± 4 yr) completed ten randomly ordered isometric step muscle actions from 5% to 95% of the maximal voluntary contraction (MVC) of the right leg extensors during three separate trials. MMG RMS was recorded from the vastus lateralis during the MVCs and each submaximal muscle action. MMG RMS versus torque relationships were analyzed on a subject-by-subject basis using OLS and GLS polynomial regression. When using ten data points, only 33% and 27% of the subjects were fitted with the same model (utilizing linear, quadratic and cubic models) across all three trials for OLS and GLS, respectively. After eliminating the cubic model, there was an increase to 55% of the subjects being fitted with the same model across all trials for both OLS and GLS regression. Using only five data points (instead of ten data points), 55% of the subjects were fitted with the same model across all trials for OLS and GLS regression. Overall, OLS and GLS polynomial regression models were only able to consistently describe the torque-related patterns of response for MMG RMS in 27–55% of the subjects across three trials. Future studies should examine alternative methods for improving the consistency and reliability of the patterns of response for the MMG RMS versus isometric torque relationship

  7. Alzheimer's Disease Detection by Pseudo Zernike Moment and Linear Regression Classification.

    Science.gov (United States)

    Wang, Shui-Hua; Du, Sidan; Zhang, Yin; Phillips, Preetha; Wu, Le-Nan; Chen, Xian-Qing; Zhang, Yu-Dong

    2017-01-01

    This study presents an improved method based on "Gorji et al. Neuroscience. 2015" by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  8. Influences of spatial and temporal variation on fish-habitat relationships defined by regression quantiles

    Science.gov (United States)

    Dunham, J.B.; Cade, B.S.; Terrell, J.W.

    2002-01-01

    We used regression quantiles to model potentially limiting relationships between the standing crop of cutthroat trout Oncorhynchus clarki and measures of stream channel morphology. Regression quantile models indicated that variation in fish density was inversely related to the width:depth ratio of streams but not to stream width or depth alone. The spatial and temporal stability of model predictions were examined across years and streams, respectively. Variation in fish density with width:depth ratio (10th-90th regression quantiles) modeled for streams sampled in 1993-1997 predicted the variation observed in 1998-1999, indicating similar habitat relationships across years. Both linear and nonlinear models described the limiting relationships well, the latter performing slightly better. Although estimated relationships were transferable in time, results were strongly dependent on the influence of spatial variation in fish density among streams. Density changes with width:depth ratio in a single stream were responsible for the significant (P 80th). This suggests that stream-scale factors other than width:depth ratio play a more direct role in determining population density. Much of the variation in densities of cutthroat trout among streams was attributed to the occurrence of nonnative brook trout Salvelinus fontinalis (a possible competitor) or connectivity to migratory habitats. Regression quantiles can be useful for estimating the effects of limiting factors when ecological responses are highly variable, but our results indicate that spatiotemporal variability in the data should be explicitly considered. In this study, data from individual streams and stream-specific characteristics (e.g., the occurrence of nonnative species and habitat connectivity) strongly affected our interpretation of the relationship between width:depth ratio and fish density.

  9. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    Science.gov (United States)

    Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

    2016-01-01

    Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19

  10. Linear Regression between CIE-Lab Color Parameters and Organic Matter in Soils of Tea Plantations

    Science.gov (United States)

    Chen, Yonggen; Zhang, Min; Fan, Dongmei; Fan, Kai; Wang, Xiaochang

    2018-02-01

    To quantify the relationship between the soil organic matter and color parameters using the CIE-Lab system, 62 soil samples (0-10 cm, Ferralic Acrisols) from tea plantations were collected from southern China. After air-drying and sieving, numerical color information and reflectance spectra of soil samples were measured under laboratory conditions using an UltraScan VIS (HunterLab) spectrophotometer equipped with CIE-Lab color models. We found that soil total organic carbon (TOC) and nitrogen (TN) contents were negatively correlated with the L* value (lightness) ( r = -0.84 and -0.80, respectively), a* value (correlation coefficient r = -0.51 and -0.46, respectively) and b* value ( r = -0.76 and -0.70, respectively). There were also linear regressions between TOC and TN contents with the L* value and b* value. Results showed that color parameters from a spectrophotometer equipped with CIE-Lab color models can predict TOC contents well for soils in tea plantations. The linear regression model between color values and soil organic carbon contents showed it can be used as a rapid, cost-effective method to evaluate content of soil organic matter in Chinese tea plantations.

  11. Analysis of dental caries using generalized linear and count regression models

    Directory of Open Access Journals (Sweden)

    Javali M. Phil

    2013-11-01

    Full Text Available Generalized linear models (GLM are generalization of linear regression models, which allow fitting regression models to response data in all the sciences especially medical and dental sciences that follow a general exponential family. These are flexible and widely used class of such models that can accommodate response variables. Count data are frequently characterized by overdispersion and excess zeros. Zero-inflated count models provide a parsimonious yet powerful way to model this type of situation. Such models assume that the data are a mixture of two separate data generation processes: one generates only zeros, and the other is either a Poisson or a negative binomial data-generating process. Zero inflated count regression models such as the zero-inflated Poisson (ZIP, zero-inflated negative binomial (ZINB regression models have been used to handle dental caries count data with many zeros. We present an evaluation framework to the suitability of applying the GLM, Poisson, NB, ZIP and ZINB to dental caries data set where the count data may exhibit evidence of many zeros and over-dispersion. Estimation of the model parameters using the method of maximum likelihood is provided. Based on the Vuong test statistic and the goodness of fit measure for dental caries data, the NB and ZINB regression models perform better than other count regression models.

  12. Regression of non-linear coupling of noise in LIGO detectors

    Science.gov (United States)

    Da Silva Costa, C. F.; Billman, C.; Effler, A.; Klimenko, S.; Cheng, H.-P.

    2018-03-01

    In 2015, after their upgrade, the advanced Laser Interferometer Gravitational-Wave Observatory (LIGO) detectors started acquiring data. The effort to improve their sensitivity has never stopped since then. The goal to achieve design sensitivity is challenging. Environmental and instrumental noise couple to the detector output with different, linear and non-linear, coupling mechanisms. The noise regression method we use is based on the Wiener–Kolmogorov filter, which uses witness channels to make noise predictions. We present here how this method helped to determine complex non-linear noise couplings in the output mode cleaner and in the mirror suspension system of the LIGO detector.

  13. MODELLING THE RELATIONSHIP BETWEEN LAND SURFACE TEMPERATURE AND LANDSCAPE PATTERNS OF LAND USE LAND COVER CLASSIFICATION USING MULTI LINEAR REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    A. M. Bernales

    2016-06-01

    Full Text Available The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC and land surface temperature (LST. Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric “Effective mesh size” was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas and looking for common predictors between LSTs of these two different farming periods.

  14. Genome-scale regression analysis reveals a linear relationship for promoters and enhancers after combinatorial drug treatment

    KAUST Repository

    Rapakoulia, Trisevgeni

    2017-08-09

    Motivation: Drug combination therapy for treatment of cancers and other multifactorial diseases has the potential of increasing the therapeutic effect, while reducing the likelihood of drug resistance. In order to reduce time and cost spent in comprehensive screens, methods are needed which can model additive effects of possible drug combinations. Results: We here show that the transcriptional response to combinatorial drug treatment at promoters, as measured by single molecule CAGE technology, is accurately described by a linear combination of the responses of the individual drugs at a genome wide scale. We also find that the same linear relationship holds for transcription at enhancer elements. We conclude that the described approach is promising for eliciting the transcriptional response to multidrug treatment at promoters and enhancers in an unbiased genome wide way, which may minimize the need for exhaustive combinatorial screens.

  15. Fitting program for linear regressions according to Mahon (1996)

    Energy Technology Data Exchange (ETDEWEB)

    2018-01-09

    This program takes the users' Input data and fits a linear regression to it using the prescription presented by Mahon (1996). Compared to the commonly used York fit, this method has the correct prescription for measurement error propagation. This software should facilitate the proper fitting of measurements with a simple Interface.

  16. Development of statistical linear regression model for metals from transportation land uses.

    Science.gov (United States)

    Maniquiz, Marla C; Lee, Soyoung; Lee, Eunju; Kim, Lee-Hyung

    2009-01-01

    The transportation landuses possessing impervious surfaces such as highways, parking lots, roads, and bridges were recognized as the highly polluted non-point sources (NPSs) in the urban areas. Lots of pollutants from urban transportation are accumulating on the paved surfaces during dry periods and are washed-off during a storm. In Korea, the identification and monitoring of NPSs still represent a great challenge. Since 2004, the Ministry of Environment (MOE) has been engaged in several researches and monitoring to develop stormwater management policies and treatment systems for future implementation. The data over 131 storm events during May 2004 to September 2008 at eleven sites were analyzed to identify correlation relationships between particulates and metals, and to develop simple linear regression (SLR) model to estimate event mean concentration (EMC). Results indicate that there was no significant relationship between metals and TSS EMC. However, the SLR estimation models although not providing useful results are valuable indicators of high uncertainties that NPS pollution possess. Therefore, long term monitoring employing proper methods and precise statistical analysis of the data should be undertaken to eliminate these uncertainties.

  17. Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.

    Science.gov (United States)

    Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo

    2015-08-01

    Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.

  18. A SOCIOLOGICAL ANALYSIS OF THE CHILDBEARING COEFFICIENT IN THE ALTAI REGION BASED ON METHOD OF FUZZY LINEAR REGRESSION

    Directory of Open Access Journals (Sweden)

    Sergei Vladimirovich Varaksin

    2017-06-01

    Full Text Available Purpose. Construction of a mathematical model of the dynamics of childbearing change in the Altai region in 2000–2016, analysis of the dynamics of changes in birth rates for multiple age categories of women of childbearing age. Methodology. A auxiliary analysis element is the construction of linear mathematical models of the dynamics of childbearing by using fuzzy linear regression method based on fuzzy numbers. Fuzzy linear regression is considered as an alternative to standard statistical linear regression for short time series and unknown distribution law. The parameters of fuzzy linear and standard statistical regressions for childbearing time series were defined with using the built in language MatLab algorithm. Method of fuzzy linear regression is not used in sociological researches yet. Results. There are made the conclusions about the socio-demographic changes in society, the high efficiency of the demographic policy of the leadership of the region and the country, and the applicability of the method of fuzzy linear regression for sociological analysis.

  19. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    Science.gov (United States)

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  20. A primer for biomedical scientists on how to execute model II linear regression analysis.

    Science.gov (United States)

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.

  1. Positivity and job burnout in emergency personnel: examining linear and curvilinear relationship

    Directory of Open Access Journals (Sweden)

    Basińska Beata Aleksandra

    2017-06-01

    Full Text Available The aim of this study was to examine whether the relationship between the ratio of job-related positive to negative emotions (positivity ratio and job burnout is best described as linear or curvilinear. Participants were 89 police officers (12% women and 86 firefighters. The positivity ratio was evaluated using the Job-related Affective Wellbeing Scale (Van Katwyk, Fox, Spector, & Kelloway, 2000. Exhaustion and disengagement, two components of job burnout, were measured using the Oldenburg Burnout Inventory (Demerouti, Mostert, & Bakker, 2010. The results of regression analysis revealed that curvilinear relationships between the positivity ratio and two components of job burnout appeared to better fit the data than linear relationships. The relationship between the positivity ratio and exhaustion was curvilinear with a curve point at around 2.1. A similar curvilinear relationship, but with a lower curve point, i.e., around 1.8, was observed for disengagement. It seems that beyond certain values there may be hidden costs of maintaining positive emotions at work. Also, the unequal curve points for subscales suggest that different dimensions of work-related functioning are variously prone to such costs.

  2. Effective Surfactants Blend Concentration Determination for O/W Emulsion Stabilization by Two Nonionic Surfactants by Simple Linear Regression.

    Science.gov (United States)

    Hassan, A K

    2015-01-01

    In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature-conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility.

  3. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  4. Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg

    2007-01-01

    This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying...... and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain...... the clock error) and to obtain estimates of the uncertainty with which the position is determined. Regression analysis is used in many other fields of application both in the natural, the technical and the social sciences. Examples may be curve fitting, calibration, establishing relationships between...

  5. Investigating the complex relationship between in situ Southern Ocean pCO2 and its ocean physics and biogeochemical drivers using a nonparametric regression approach

    CSIR Research Space (South Africa)

    Pretorius, W

    2014-01-01

    Full Text Available the relationship more accurately in terms of MSE, RMSE and MAE, than a standard parametric approach (multiple linear regression). These results provide a platform for using the developed nonparametric regression model based on in situ measurements to predict p...

  6. Linear Regression on Sparse Features for Single-Channel Speech Separation

    DEFF Research Database (Denmark)

    Schmidt, Mikkel N.; Olsson, Rasmus Kongsgaard

    2007-01-01

    In this work we address the problem of separating multiple speakers from a single microphone recording. We formulate a linear regression model for estimating each speaker based on features derived from the mixture. The employed feature representation is a sparse, non-negative encoding of the speech...... mixture in terms of pre-learned speaker-dependent dictionaries. Previous work has shown that this feature representation by itself provides some degree of separation. We show that the performance is significantly improved when regression analysis is performed on the sparse, non-negative features, both...

  7. Inverse estimation of multiple muscle activations based on linear logistic regression.

    Science.gov (United States)

    Sekiya, Masashi; Tsuji, Toshiaki

    2017-07-01

    This study deals with a technology to estimate the muscle activity from the movement data using a statistical model. A linear regression (LR) model and artificial neural networks (ANN) have been known as statistical models for such use. Although ANN has a high estimation capability, it is often in the clinical application that the lack of data amount leads to performance deterioration. On the other hand, the LR model has a limitation in generalization performance. We therefore propose a muscle activity estimation method to improve the generalization performance through the use of linear logistic regression model. The proposed method was compared with the LR model and ANN in the verification experiment with 7 participants. As a result, the proposed method showed better generalization performance than the conventional methods in various tasks.

  8. Data Transformations for Inference with Linear Regression: Clarifications and Recommendations

    Science.gov (United States)

    Pek, Jolynn; Wong, Octavia; Wong, C. M.

    2017-01-01

    Data transformations have been promoted as a popular and easy-to-implement remedy to address the assumption of normally distributed errors (in the population) in linear regression. However, the application of data transformations introduces non-ignorable complexities which should be fully appreciated before their implementation. This paper adds to…

  9. Alpins and thibos vectorial astigmatism analyses: proposal of a linear regression model between methods

    Directory of Open Access Journals (Sweden)

    Giuliano de Oliveira Freitas

    2013-10-01

    Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.

  10. On the null distribution of Bayes factors in linear regression

    Science.gov (United States)

    We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...

  11. Implementing fuzzy polynomial interpolation (FPI and fuzzy linear regression (LFR

    Directory of Open Access Journals (Sweden)

    Maria Cristina Floreno

    1996-05-01

    Full Text Available This paper presents some preliminary results arising within a general framework concerning the development of software tools for fuzzy arithmetic. The program is in a preliminary stage. What has been already implemented consists of a set of routines for elementary operations, optimized functions evaluation, interpolation and regression. Some of these have been applied to real problems.This paper describes a prototype of a library in C++ for polynomial interpolation of fuzzifying functions, a set of routines in FORTRAN for fuzzy linear regression and a program with graphical user interface allowing the use of such routines.

  12. Application of range-test in multiple linear regression analysis in ...

    African Journals Online (AJOL)

    Application of range-test in multiple linear regression analysis in the presence of outliers is studied in this paper. First, the plot of the explanatory variables (i.e. Administration, Social/Commercial, Economic services and Transfer) on the dependent variable (i.e. GDP) was done to identify the statistical trend over the years.

  13. Evaluation of linear regression techniques for atmospheric applications: the importance of appropriate weighting

    Directory of Open Access Journals (Sweden)

    C. Wu

    2018-03-01

    Full Text Available Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS, Deming regression (DR, orthogonal distance regression (ODR, weighted ODR (WODR, and York regression (YR. We first introduce a new data generation scheme that employs the Mersenne twister (MT pseudorandom number generator. The numerical simulations are also improved by (a refining the parameterization of nonlinear measurement uncertainties, (b inclusion of a linear measurement uncertainty, and (c inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot was developed to facilitate the implementation of error-in-variables regressions.

  14. Evaluation of linear regression techniques for atmospheric applications: the importance of appropriate weighting

    Science.gov (United States)

    Wu, Cheng; Zhen Yu, Jian

    2018-03-01

    Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.

  15. Research on the multiple linear regression in non-invasive blood glucose measurement.

    Science.gov (United States)

    Zhu, Jianming; Chen, Zhencheng

    2015-01-01

    A non-invasive blood glucose measurement sensor and the data process algorithm based on the metabolic energy conservation (MEC) method are presented in this paper. The physiological parameters of human fingertip can be measured by various sensing modalities, and blood glucose value can be evaluated with the physiological parameters by the multiple linear regression analysis. Five methods such as enter, remove, forward, backward and stepwise in multiple linear regression were compared, and the backward method had the best performance. The best correlation coefficient was 0.876 with the standard error of the estimate 0.534, and the significance was 0.012 (sig. regression equation was valid. The Clarke error grid analysis was performed to compare the MEC method with the hexokinase method, using 200 data points. The correlation coefficient R was 0.867 and all of the points were located in Zone A and Zone B, which shows the MEC method provides a feasible and valid way for non-invasive blood glucose measurement.

  16. Estimating Loess Plateau Average Annual Precipitation with Multiple Linear Regression Kriging and Geographically Weighted Regression Kriging

    Directory of Open Access Journals (Sweden)

    Qiutong Jin

    2016-06-01

    Full Text Available Estimating the spatial distribution of precipitation is an important and challenging task in hydrology, climatology, ecology, and environmental science. In order to generate a highly accurate distribution map of average annual precipitation for the Loess Plateau in China, multiple linear regression Kriging (MLRK and geographically weighted regression Kriging (GWRK methods were employed using precipitation data from the period 1980–2010 from 435 meteorological stations. The predictors in regression Kriging were selected by stepwise regression analysis from many auxiliary environmental factors, such as elevation (DEM, normalized difference vegetation index (NDVI, solar radiation, slope, and aspect. All predictor distribution maps had a 500 m spatial resolution. Validation precipitation data from 130 hydrometeorological stations were used to assess the prediction accuracies of the MLRK and GWRK approaches. Results showed that both prediction maps with a 500 m spatial resolution interpolated by MLRK and GWRK had a high accuracy and captured detailed spatial distribution data; however, MLRK produced a lower prediction error and a higher variance explanation than GWRK, although the differences were small, in contrast to conclusions from similar studies.

  17. Investigation of linear regression of EPR dosimetric signal of the man tooth enamel

    International Nuclear Information System (INIS)

    Pivovarov, S.P.; Rukhin, A.B.; Zhakparov, R.K.; Vasilevskaya, L.A.

    2001-01-01

    The experimental relations of the EPR radiation signal in samples of man tooth enamel of three donors of different age up to doses 1350 Gy are examined. To all of them the linear regression is applicable. The considerable errors leading to apparent non-linearity are eliminated most. (author)

  18. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

  19. Optimal choice of basis functions in the linear regression analysis

    International Nuclear Information System (INIS)

    Khotinskij, A.M.

    1988-01-01

    Problem of optimal choice of basis functions in the linear regression analysis is investigated. Step algorithm with estimation of its efficiency, which holds true at finite number of measurements, is suggested. Conditions, providing the probability of correct choice close to 1 are formulated. Application of the step algorithm to analysis of decay curves is substantiated. 8 refs

  20. LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA

    Directory of Open Access Journals (Sweden)

    Ersin Yılmaz

    2016-05-01

    Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then  we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for  the estimation of the model. With the weights regression model  will be consistent and unbiased with that.   And also there is a method for the censored data that is a semi parametric regression and this method also give  useful results  for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.

  1. Association of footprint measurements with plantar kinetics: a linear regression model.

    Science.gov (United States)

    Fascione, Jeanna M; Crews, Ryan T; Wrobel, James S

    2014-03-01

    The use of foot measurements to classify morphology and interpret foot function remains one of the focal concepts of lower-extremity biomechanics. However, only 27% to 55% of midfoot variance in foot pressures has been determined in the most comprehensive models. We investigated whether dynamic walking footprint measurements are associated with inter-individual foot loading variability. Thirty individuals (15 men and 15 women; mean ± SD age, 27.17 ± 2.21 years) walked at a self-selected speed over an electronic pedography platform using the midgait technique. Kinetic variables (contact time, peak pressure, pressure-time integral, and force-time integral) were collected for six masked regions. Footprints were digitized for area and linear boundaries using digital photo planimetry software. Six footprint measurements were determined: contact area, footprint index, arch index, truncated arch index, Chippaux-Smirak index, and Staheli index. Linear regression analysis with a Bonferroni adjustment was performed to determine the association between the footprint measurements and each of the kinetic variables. The findings demonstrate that a relationship exists between increased midfoot contact and increased kinetic values in respective locations. Many of these variables produced large effect sizes while describing 38% to 71% of the common variance of select plantar kinetic variables in the medial midfoot region. In addition, larger footprints were associated with larger kinetic values at the medial heel region and both masked forefoot regions. Dynamic footprint measurements are associated with dynamic plantar loading kinetics, with emphasis on the midfoot region.

  2. Describing Growth Pattern of Bali Cows Using Non-linear Regression Models

    Directory of Open Access Journals (Sweden)

    Mohd. Hafiz A.W

    2016-12-01

    Full Text Available The objective of this study was to evaluate the best fit non-linear regression model to describe the growth pattern of Bali cows. Estimates of asymptotic mature weight, rate of maturing and constant of integration were derived from Brody, von Bertalanffy, Gompertz and Logistic models which were fitted to cross-sectional data of body weight taken from 74 Bali cows raised in MARDI Research Station Muadzam Shah Pahang. Coefficient of determination (R2 and residual mean squares (MSE were used to determine the best fit model in describing the growth pattern of Bali cows. Von Bertalanffy model was the best model among the four growth functions evaluated to determine the mature weight of Bali cattle as shown by the highest R2 and lowest MSE values (0.973 and 601.9, respectively, followed by Gompertz (0.972 and 621.2, respectively, Logistic (0.971 and 648.4, respectively and Brody (0.932 and 660.5, respectively models. The correlation between rate of maturing and mature weight was found to be negative in the range of -0.170 to -0.929 for all models, indicating that animals of heavier mature weight had lower rate of maturing. The use of non-linear model could summarize the weight-age relationship into several biologically interpreted parameters compared to the entire lifespan weight-age data points that are difficult and time consuming to interpret.

  3. Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression

    Science.gov (United States)

    Beckstead, Jason W.

    2012-01-01

    The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…

  4. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Directory of Open Access Journals (Sweden)

    Drzewiecki Wojciech

    2016-12-01

    Full Text Available In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques.

  5. Weighted functional linear regression models for gene-based association analysis.

    Science.gov (United States)

    Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I

    2018-01-01

    Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.

  6. Genomic prediction based on data from three layer lines using non-linear regression models

    NARCIS (Netherlands)

    Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.

    2014-01-01

    Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate

  7. A method for fitting regression splines with varying polynomial order in the linear mixed model.

    Science.gov (United States)

    Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W

    2006-02-15

    The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.

  8. QSAR Study of Insecticides of Phthalamide Derivatives Using Multiple Linear Regression and Artificial Neural Network Methods

    Directory of Open Access Journals (Sweden)

    Adi Syahputra

    2014-03-01

    Full Text Available Quantitative structure activity relationship (QSAR for 21 insecticides of phthalamides containing hydrazone (PCH was studied using multiple linear regression (MLR, principle component regression (PCR and artificial neural network (ANN. Five descriptors were included in the model for MLR and ANN analysis, and five latent variables obtained from principle component analysis (PCA were used in PCR analysis. Calculation of descriptors was performed using semi-empirical PM6 method. ANN analysis was found to be superior statistical technique compared to the other methods and gave a good correlation between descriptors and activity (r2 = 0.84. Based on the obtained model, we have successfully designed some new insecticides with higher predicted activity than those of previously synthesized compounds, e.g.2-(decalinecarbamoyl-5-chloro-N’-((5-methylthiophen-2-ylmethylene benzohydrazide, 2-(decalinecarbamoyl-5-chloro-N’-((thiophen-2-yl-methylene benzohydrazide and 2-(decaline carbamoyl-N’-(4-fluorobenzylidene-5-chlorobenzohydrazide with predicted log LC50 of 1.640, 1.672, and 1.769 respectively.

  9. Computer software for linear and nonlinear regression in organic NMR

    International Nuclear Information System (INIS)

    Canto, Eduardo Leite do; Rittner, Roberto

    1991-01-01

    Calculation involving two variable linear regressions, require specific procedures generally not familiar to chemist. For attending the necessity of fast and efficient handling of NMR data, a self explained and Pc portable software has been developed, which allows user to produce and use diskette recorded tables, containing chemical shift or any other substituent physical-chemical measurements and constants (σ T , σ o R , E s , ...)

  10. Non-linear quantitative structure-activity relationship for adenine derivatives as competitive inhibitors of adenosine deaminase

    International Nuclear Information System (INIS)

    Sadat Hayatshahi, Sayyed Hamed; Abdolmaleki, Parviz; Safarian, Shahrokh; Khajeh, Khosro

    2005-01-01

    Logistic regression and artificial neural networks have been developed as two non-linear models to establish quantitative structure-activity relationships between structural descriptors and biochemical activity of adenosine based competitive inhibitors, toward adenosine deaminase. The training set included 24 compounds with known k i values. The models were trained to solve two-class problems. Unlike the previous work in which multiple linear regression was used, the highest of positive charge on the molecules was recognized to be in close relation with their inhibition activity, while the electric charge on atom N1 of adenosine was found to be a poor descriptor. Consequently, the previously developed equation was improved and the newly formed one could predict the class of 91.66% of compounds correctly. Also optimized 2-3-1 and 3-4-1 neural networks could increase this rate to 95.83%

  11. Do clinical and translational science graduate students understand linear regression? Development and early validation of the REGRESS quiz.

    Science.gov (United States)

    Enders, Felicity

    2013-12-01

    Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students' understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health. A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field. Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P REGRESS quiz was internally reliable (Cronbach's alpha 0.89). The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions. © 2013 Wiley Periodicals, Inc.

  12. Power properties of invariant tests for spatial autocorrelation in linear regression

    NARCIS (Netherlands)

    Martellosio, F.

    2006-01-01

    Many popular tests for residual spatial autocorrelation in the context of the linear regression model belong to the class of invariant tests. This paper derives a number of exact properties of the power function of such tests. In particular, we extend the work of Krämer (2005, Journal of Statistical

  13. Predicting recovery of cognitive function soon after stroke: differential modeling of logarithmic and linear regression.

    Science.gov (United States)

    Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi

    2013-01-01

    Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, PLogarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.

  14. Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.

    Science.gov (United States)

    Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard

    2017-04-01

    To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.

  15. Comparison of height-diameter models based on geographically weighted regressions and linear mixed modelling applied to large scale forest inventory data

    Energy Technology Data Exchange (ETDEWEB)

    Quirós Segovia, M.; Condés Ruiz, S.; Drápela, K.

    2016-07-01

    Aim of the study: The main objective of this study was to test Geographically Weighted Regression (GWR) for developing height-diameter curves for forests on a large scale and to compare it with Linear Mixed Models (LMM). Area of study: Monospecific stands of Pinus halepensis Mill. located in the region of Murcia (Southeast Spain). Materials and Methods: The dataset consisted of 230 sample plots (2582 trees) from the Third Spanish National Forest Inventory (SNFI) randomly split into training data (152 plots) and validation data (78 plots). Two different methodologies were used for modelling local (Petterson) and generalized height-diameter relationships (Cañadas I): GWR, with different bandwidths, and linear mixed models. Finally, the quality of the estimated models was compared throughout statistical analysis. Main results: In general, both LMM and GWR provide better prediction capability when applied to a generalized height-diameter function than when applied to a local one, with R2 values increasing from around 0.6 to 0.7 in the model validation. Bias and RMSE were also lower for the generalized function. However, error analysis showed that there were no large differences between these two methodologies, evidencing that GWR provides results which are as good as the more frequently used LMM methodology, at least when no additional measurements are available for calibrating. Research highlights: GWR is a type of spatial analysis for exploring spatially heterogeneous processes. GWR can model spatial variation in tree height-diameter relationship and its regression quality is comparable to LMM. The advantage of GWR over LMM is the possibility to determine the spatial location of every parameter without additional measurements. Abbreviations: GWR (Geographically Weighted Regression); LMM (Linear Mixed Model); SNFI (Spanish National Forest Inventory). (Author)

  16. truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models

    Directory of Open Access Journals (Sweden)

    Maria Karlsson

    2014-05-01

    Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.

  17. COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS

    Directory of Open Access Journals (Sweden)

    K. Seetharaman

    2015-08-01

    Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.

  18. Comparison of multiple linear regression, partial least squares and artificial neural networks for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids.

    Science.gov (United States)

    Fragkaki, A G; Farmaki, E; Thomaidis, N; Tsantili-Kakoulidou, A; Angelis, Y S; Koupparis, M; Georgakopoulos, C

    2012-09-21

    The comparison among different modelling techniques, such as multiple linear regression, partial least squares and artificial neural networks, has been performed in order to construct and evaluate models for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids. The performance of the quantitative structure-retention relationship study, using the multiple linear regression and partial least squares techniques, has been previously conducted. In the present study, artificial neural networks models were constructed and used for the prediction of relative retention times of anabolic androgenic steroids, while their efficiency is compared with that of the models derived from the multiple linear regression and partial least squares techniques. For overall ranking of the models, a novel procedure [Trends Anal. Chem. 29 (2010) 101-109] based on sum of ranking differences was applied, which permits the best model to be selected. The suggested models are considered useful for the estimation of relative retention times of designer steroids for which no analytical data are available. Copyright © 2012 Elsevier B.V. All rights reserved.

  19. Analysis of interactive fixed effects dynamic linear panel regression with measurement error

    OpenAIRE

    Nayoung Lee; Hyungsik Roger Moon; Martin Weidner

    2011-01-01

    This paper studies a simple dynamic panel linear regression model with interactive fixed effects in which the variable of interest is measured with error. To estimate the dynamic coefficient, we consider the least-squares minimum distance (LS-MD) estimation method.

  20. Multicollinearity in applied economics research and the Bayesian linear regression

    OpenAIRE

    EISENSTAT, Eric

    2016-01-01

    This article revises the popular issue of collinearity amongst explanatory variables in the context of a multiple linear regression analysis, particularly in empirical studies within social science related fields. Some important interpretations and explanations are highlighted from the econometrics literature with respect to the effects of multicollinearity on statistical inference, as well as the general shortcomings of the once fervent search for methods intended to detect and mitigate thes...

  1. Robust best linear estimation for regression analysis using surrogate and instrumental variables.

    Science.gov (United States)

    Wang, C Y

    2012-04-01

    We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case-control study.

  2. Relative Importance for Linear Regression in R: The Package relaimpo

    Directory of Open Access Journals (Sweden)

    Ulrike Gromping

    2006-09-01

    Full Text Available Relative importance is a topic that has seen a lot of interest in recent years, particularly in applied work. The R package relaimpo implements six different metrics for assessing relative importance of regressors in the linear model, two of which are recommended - averaging over orderings of regressors and a newly proposed metric (Feldman 2005 called pmvd. Apart from delivering the metrics themselves, relaimpo also provides (exploratory bootstrap confidence intervals. This paper offers a brief tutorial introduction to the package. The methods and relaimpo’s functionality are illustrated using the data set swiss that is generally available in R. The paper targets readers who have a basic understanding of multiple linear regression. For the background of more advanced aspects, references are provided.

  3. Multiple Linear Regression Analysis Indicates Association of P-Glycoprotein Substrate or Inhibitor Character with Bitterness Intensity, Measured with a Sensor.

    Science.gov (United States)

    Yano, Kentaro; Mita, Suzune; Morimoto, Kaori; Haraguchi, Tamami; Arakawa, Hiroshi; Yoshida, Miyako; Yamashita, Fumiyoshi; Uchida, Takahiro; Ogihara, Takuo

    2015-09-01

    P-glycoprotein (P-gp) regulates absorption of many drugs in the gastrointestinal tract and their accumulation in tumor tissues, but the basis of substrate recognition by P-gp remains unclear. Bitter-tasting phenylthiocarbamide, which stimulates taste receptor 2 member 38 (T2R38), increases P-gp activity and is a substrate of P-gp. This led us to hypothesize that bitterness intensity might be a predictor of P-gp-inhibitor/substrate status. Here, we measured the bitterness intensity of a panel of P-gp substrates and nonsubstrates with various taste sensors, and used multiple linear regression analysis to examine the relationship between P-gp-inhibitor/substrate status and various physical properties, including intensity of bitter taste measured with the taste sensor. We calculated the first principal component analysis score (PC1) as the representative value of bitterness, as all taste sensor's outputs shared significant correlation. The P-gp substrates showed remarkably greater mean bitterness intensity than non-P-gp substrates. We found that Km value of P-gp substrates were correlated with molecular weight, log P, and PC1 value, and the coefficient of determination (R(2) ) of the linear regression equation was 0.63. This relationship might be useful as an aid to predict P-gp substrate status at an early stage of drug discovery. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.

  4. Quantitative structure-property relationship study of n-octanol-water partition coefficients of some of diverse drugs using multiple linear regression

    International Nuclear Information System (INIS)

    Ghasemi, Jahanbakhsh; Saaidpour, Saadi

    2007-01-01

    A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structures of 150 drug organic compounds to their n-octanol-water partition coefficients (log P o/w ). Molecular descriptors derived solely from 3D structures of the molecular drugs. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 110 molecules as training set, and predictive ability tested using 40 compounds. Modeling of log P o/w of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR). Four descriptors for these compounds molecular volume (MV) (geometrical), hydrophilic-lipophilic balance (HLB) (constitutional), hydrogen bond forming ability (HB) (electronic) and polar surface area (PSA) (electrostatic) are taken as inputs for the model. The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of log P o/w for molecules not yet synthesized. Application of the developed model to a testing set of 40 drug organic compounds demonstrates that the model is reliable with good predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. The root mean square error of prediction (RMSEP) and square correlation coefficient (R 2 ) for MLR model were 0.22 and 0.99 for the prediction set log P o/w

  5. Recursive and non-linear logistic regression: moving on from the original EuroSCORE and EuroSCORE II methodologies.

    Science.gov (United States)

    Poullis, Michael

    2014-11-01

    EuroSCORE II, despite improving on the original EuroSCORE system, has not solved all the calibration and predictability issues. Recursive, non-linear and mixed recursive and non-linear regression analysis were assessed with regard to sensitivity, specificity and predictability of the original EuroSCORE and EuroSCORE II systems. The original logistic EuroSCORE, EuroSCORE II and recursive, non-linear and mixed recursive and non-linear regression analyses of these risk models were assessed via receiver operator characteristic curves (ROC) and Hosmer-Lemeshow statistic analysis with regard to the accuracy of predicting in-hospital mortality. Analysis was performed for isolated coronary artery bypass grafts (CABGs) (n = 2913), aortic valve replacement (AVR) (n = 814), mitral valve surgery (n = 340), combined AVR and CABG (n = 517), aortic (n = 350), miscellaneous cases (n = 642), and combinations of the above cases (n = 5576). The original EuroSCORE had an ROC below 0.7 for isolated AVR and combined AVR and CABG. None of the methods described increased the ROC above 0.7. The EuroSCORE II risk model had an ROC below 0.7 for isolated AVR only. Recursive regression, non-linear regression, and mixed recursive and non-linear regression all increased the ROC above 0.7 for isolated AVR. The original EuroSCORE had a Hosmer-Lemeshow statistic that was above 0.05 for all patients and the subgroups analysed. All of the techniques markedly increased the Hosmer-Lemeshow statistic. The EuroSCORE II risk model had a Hosmer-Lemeshow statistic that was significant for all patients (P linear regression failed to improve on the original Hosmer-Lemeshow statistic. The mixed recursive and non-linear regression using the EuroSCORE II risk model was the only model that produced an ROC of 0.7 or above for all patients and procedures and had a Hosmer-Lemeshow statistic that was highly non-significant. The original EuroSCORE and the EuroSCORE II risk models do not have adequate ROC and Hosmer

  6. [Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

    Science.gov (United States)

    Ling, Ru; Liu, Jiawang

    2011-12-01

    To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.

  7. Linear and support vector regressions based on geometrical correlation of data

    Directory of Open Access Journals (Sweden)

    Kaijun Wang

    2007-10-01

    Full Text Available Linear regression (LR and support vector regression (SVR are widely used in data analysis. Geometrical correlation learning (GcLearn was proposed recently to improve the predictive ability of LR and SVR through mining and using correlations between data of a variable (inner correlation. This paper theoretically analyzes prediction performance of the GcLearn method and proves that GcLearn LR and SVR will have better prediction performance than traditional LR and SVR for prediction tasks when good inner correlations are obtained and predictions by traditional LR and SVR are far away from their neighbor training data under inner correlation. This gives the applicable condition of GcLearn method.

  8. Regularized Label Relaxation Linear Regression.

    Science.gov (United States)

    Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung; Fang, Bingwu

    2018-04-01

    Linear regression (LR) and some of its variants have been widely used for classification problems. Most of these methods assume that during the learning phase, the training samples can be exactly transformed into a strict binary label matrix, which has too little freedom to fit the labels adequately. To address this problem, in this paper, we propose a novel regularized label relaxation LR method, which has the following notable characteristics. First, the proposed method relaxes the strict binary label matrix into a slack variable matrix by introducing a nonnegative label relaxation matrix into LR, which provides more freedom to fit the labels and simultaneously enlarges the margins between different classes as much as possible. Second, the proposed method constructs the class compactness graph based on manifold learning and uses it as the regularization item to avoid the problem of overfitting. The class compactness graph is used to ensure that the samples sharing the same labels can be kept close after they are transformed. Two different algorithms, which are, respectively, based on -norm and -norm loss functions are devised. These two algorithms have compact closed-form solutions in each iteration so that they are easily implemented. Extensive experiments show that these two algorithms outperform the state-of-the-art algorithms in terms of the classification accuracy and running time.

  9. Multiple linear stepwise regression of liver lipid levels: proton MR spectroscopy study in vivo at 3.0 T

    International Nuclear Information System (INIS)

    Xu Li; Liang Changhong; Xiao Yuanqiu; Zhang Zhonglin

    2010-01-01

    Objective: To analyze the correlations between liver lipid level determined by liver 3.0 T 1 H-MRS in vivo and influencing factors using multiple linear stepwise regression. Methods: The prospective study of liver 1 H-MRS was performed with 3.0 T system and eight-channel torso phased-array coils using PRESS sequence. Forty-four volunteers were enrolled in this study. Liver spectra were collected with a TR of 1500 ms, TE of 30 ms, volume of interest of 2 cm×2 cm×2 cm, NSA of 64 times. The acquired raw proton MRS data were processed by using a software program SAGE. For each MRS measurement, using water as the internal reference, the amplitude of the lipid signal was normalized to the sum of the signal from lipid and water to obtain percentage lipid within the liver. The statistical description of height, weight, age and BMI, Line width and water suppression were recorded, and Pearson analysis was applied to test their relationships. Multiple linear stepwise regression was used to set the statistical model for the prediction of Liver lipid content. Results: Age (39.1±12.6) years, body weight (64.4±10.4) kg, BMI (23.3±3.1) kg/m 2 , linewidth (18.9±4.4) and the water suppression (90.7±6.5)% had significant correlation with liver lipid content (0.00 to 0.96%, median 0.02%), r were 0.11, 0.44, 0.40, 0.52, -0.73 respectively (P<0.05). But only age, BMI, line width, and the water suppression entered into the multiple linear regression equation. Liver lipid content prediction equation was as follows: Y= 1.395 - (0.021×water suppression) + (0.022×BMI) + (0.014×line width) - (0.004×age), and the coefficient of determination was 0. 613, corrected coefficient of determination was 0.59. Conclusion: The regression model fitted well, since the variables of age, BMI, width, and water suppression can explain about 60% of liver lipid content changes. (authors)

  10. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    International Nuclear Information System (INIS)

    Jafri, Y.Z.; Kamal, L.

    2007-01-01

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  11. Fragility estimation for seismically isolated nuclear structures by high confidence low probability of failure values and bi-linear regression

    International Nuclear Information System (INIS)

    Carausu, A.

    1996-01-01

    A method for the fragility estimation of seismically isolated nuclear power plant structure is proposed. The relationship between the ground motion intensity parameter (e.g. peak ground velocity or peak ground acceleration) and the response of isolated structures is expressed in terms of a bi-linear regression line, whose coefficients are estimated by the least-square method in terms of available data on seismic input and structural response. The notion of high confidence low probability of failure (HCLPF) value is also used for deriving compound fragility curves for coupled subsystems. (orig.)

  12. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

    Science.gov (United States)

    Madarang, Krish J; Kang, Joo-Hyon

    2014-06-01

    Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.

  13. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Science.gov (United States)

    Drzewiecki, Wojciech

    2016-12-01

    In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.

  14. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

    Science.gov (United States)

    Almedeij, Jaber

    2012-01-01

    Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984

  15. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression.

    Science.gov (United States)

    Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong

    2018-01-01

    Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.

  16. Estimate the contribution of incubation parameters influence egg hatchability using multiple linear regression analysis.

    Science.gov (United States)

    Khalil, Mohamed H; Shebl, Mostafa K; Kosba, Mohamed A; El-Sabrout, Karim; Zaki, Nesma

    2016-08-01

    This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens' eggs. Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens.

  17. User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)

    Science.gov (United States)

    Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.

    2009-01-01

    Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.

  18. Linear Regression Analysis

    CERN Document Server

    Seber, George A F

    2012-01-01

    Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.

  19. Introduction to statistical modelling 2: categorical variables and interactions in linear regression.

    Science.gov (United States)

    Lunt, Mark

    2015-07-01

    In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. A gentle introduction to quantile regression for ecologists

    Science.gov (United States)

    Cade, B.S.; Noon, B.R.

    2003-01-01

    Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.

  1. A note on the use of multiple linear regression in molecular ecology.

    Science.gov (United States)

    Frasier, Timothy R

    2016-03-01

    Multiple linear regression analyses (also often referred to as generalized linear models--GLMs, or generalized linear mixed models--GLMMs) are widely used in the analysis of data in molecular ecology, often to assess the relative effects of genetic characteristics on individual fitness or traits, or how environmental characteristics influence patterns of genetic differentiation. However, the coefficients resulting from multiple regression analyses are sometimes misinterpreted, which can lead to incorrect interpretations and conclusions within individual studies, and can propagate to wider-spread errors in the general understanding of a topic. The primary issue revolves around the interpretation of coefficients for independent variables when interaction terms are also included in the analyses. In this scenario, the coefficients associated with each independent variable are often interpreted as the independent effect of each predictor variable on the predicted variable. However, this interpretation is incorrect. The correct interpretation is that these coefficients represent the effect of each predictor variable on the predicted variable when all other predictor variables are zero. This difference may sound subtle, but the ramifications cannot be overstated. Here, my goals are to raise awareness of this issue, to demonstrate and emphasize the problems that can result and to provide alternative approaches for obtaining the desired information. © 2015 John Wiley & Sons Ltd.

  2. An evaluation of bias in propensity score-adjusted non-linear regression models.

    Science.gov (United States)

    Wan, Fei; Mitra, Nandita

    2018-03-01

    Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.

  3. Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling

    Directory of Open Access Journals (Sweden)

    Eric R. Edelman

    2017-06-01

    Full Text Available For efficient utilization of operating rooms (ORs, accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT. We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT. TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related

  4. Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.

    Science.gov (United States)

    Edelman, Eric R; van Kuijk, Sander M J; Hamaekers, Ankie E W; de Korte, Marcel J M; van Merode, Godefridus G; Buhre, Wolfgang F F A

    2017-01-01

    For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.

  5. QUANTITATIVE ELECTRONIC STRUCTURE - ACTIVITY RELATIONSHIP OF ANTIMALARIAL COMPOUND OF ARTEMISININ DERIVATIVES USING PRINCIPAL COMPONENT REGRESSION APPROACH

    Directory of Open Access Journals (Sweden)

    Paul Robert Martin Werfette

    2010-06-01

    Full Text Available Analysis of quantitative structure - activity relationship (QSAR for a series of antimalarial compound artemisinin derivatives has been done using principal component regression. The descriptors for QSAR study were representation of electronic structure i.e. atomic net charges of the artemisinin skeleton calculated by AM1 semi-empirical method. The antimalarial activity of the compound was expressed in log 1/IC50 which is an experimental data. The main purpose of the principal component analysis approach is to transform a large data set of atomic net charges to simplify into a data set which known as latent variables. The best QSAR equation to analyze of log 1/IC50 can be obtained from the regression method as a linear function of several latent variables i.e. x1, x2, x3, x4 and x5. The best QSAR model is expressed in the following equation,  (;;   Keywords: QSAR, antimalarial, artemisinin, principal component regression

  6. Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

    KAUST Repository

    Abdul Jameel, Abdul Gani; Naser, Nimal; Emwas, Abdul-Hamid M.; Dooley, Stephen; Sarathy, Mani

    2016-01-01

    An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN

  7. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

    Science.gov (United States)

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-04-08

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.

  8. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    Science.gov (United States)

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    Science.gov (United States)

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  10. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression

    Directory of Open Access Journals (Sweden)

    Xu Yu

    2018-01-01

    Full Text Available Cross-domain collaborative filtering (CDCF solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR. We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.

  11. Carbon 13 nuclear magnetic resonance chemical shifts empiric calculations of polymers by multi linear regression and molecular modeling

    International Nuclear Information System (INIS)

    Da Silva Pinto, P.S.; Eustache, R.P.; Audenaert, M.; Bernassau, J.M.

    1996-01-01

    This work deals with carbon 13 nuclear magnetic resonance chemical shifts empiric calculations by multi linear regression and molecular modeling. The multi linear regression is indeed one way to obtain an equation able to describe the behaviour of the chemical shift for some molecules which are in the data base (rigid molecules with carbons). The methodology consists of structures describer parameters definition which can be bound to carbon 13 chemical shift known for these molecules. Then, the linear regression is used to determine the equation significant parameters. This one can be extrapolated to molecules which presents some resemblances with those of the data base. (O.L.). 20 refs., 4 figs., 1 tab

  12. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    Science.gov (United States)

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  13. Staff turnover in hotels : exploring the quadratic and linear relationships.

    OpenAIRE

    Mohsin, A.; Lengler, J.F.B.; Aguzzoli, R.L.

    2015-01-01

    The aim of this study is to assess whether the relationship between intention to leave the job and its antecedents is quadratic or linear. To explore those relationships a theoretical model (see Fig. 1) and eight hypotheses are proposed. Each linear hypothesis is followed by an alternative quadratic hypothesis. The alternative hypotheses propose that the relationship between the four antecedent constructs and intention to leave the job might not be linear, as the existing literature suggests....

  14. Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States

    Science.gov (United States)

    Yang, J.; Astitha, M.; Schwartz, C. S.

    2017-12-01

    Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.

  15. [Comparison of application of Cochran-Armitage trend test and linear regression analysis for rate trend analysis in epidemiology study].

    Science.gov (United States)

    Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H

    2017-05-10

    We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P valuelinear regression P value). The statistical power of CAT test decreased, while the result of linear regression analysis remained the same when population size was reduced by 100 times and AMI incidence rate remained unchanged. The two statistical methods have their advantages and disadvantages. It is necessary to choose statistical method according the fitting degree of data, or comprehensively analyze the results of two methods.

  16. USE OF THE SIMPLE LINEAR REGRESSION MODEL IN MACRO-ECONOMICAL ANALYSES

    Directory of Open Access Journals (Sweden)

    Constantin ANGHELACHE

    2011-10-01

    Full Text Available The article presents the fundamental aspects of the linear regression, as a toolbox which can be used in macroeconomic analyses. The article describes the estimation of the parameters, the statistical tests used, the homoscesasticity and heteroskedasticity. The use of econometrics instrument in macroeconomics is an important factor that guarantees the quality of the models, analyses, results and possible interpretation that can be drawn at this level.

  17. Single camera multi-view anthropometric measurement of human height and mid-upper arm circumference using linear regression.

    Science.gov (United States)

    Liu, Yingying; Sowmya, Arcot; Khamis, Heba

    2018-01-01

    Manually measured anthropometric quantities are used in many applications including human malnutrition assessment. Training is required to collect anthropometric measurements manually, which is not ideal in resource-constrained environments. Photogrammetric methods have been gaining attention in recent years, due to the availability and affordability of digital cameras. The primary goal is to demonstrate that height and mid-upper arm circumference (MUAC)-indicators of malnutrition-can be accurately estimated by applying linear regression to distance measurements from photographs of participants taken from five views, and determine the optimal view combinations. A secondary goal is to observe the effect on estimate error of two approaches which reduce complexity of the setup, computational requirements and the expertise required of the observer. Thirty-one participants (11 female, 20 male; 18-37 years) were photographed from five views. Distances were computed using both camera calibration and reference object techniques from manually annotated photos. To estimate height, linear regression was applied to the distances between the top of the participants head and the floor, as well as the height of a bounding box enclosing the participant's silhouette which eliminates the need to identify the floor. To estimate MUAC, linear regression was applied to the mid-upper arm width. Estimates were computed for all view combinations and performance was compared to other photogrammetric methods from the literature-linear distance method for height, and shape models for MUAC. The mean absolute difference (MAD) between the linear regression estimates and manual measurements were smaller compared to other methods. For the optimal view combinations (smallest MAD), the technical error of measurement and coefficient of reliability also indicate the linear regression methods are more reliable. The optimal view combination was the front and side views. When estimating height by linear

  18. Robust linear registration of CT images using random regression forests

    Science.gov (United States)

    Konukoglu, Ender; Criminisi, Antonio; Pathak, Sayan; Robertson, Duncan; White, Steve; Haynor, David; Siddiqui, Khan

    2011-03-01

    Global linear registration is a necessary first step for many different tasks in medical image analysis. Comparing longitudinal studies1, cross-modality fusion2, and many other applications depend heavily on the success of the automatic registration. The robustness and efficiency of this step is crucial as it affects all subsequent operations. Most common techniques cast the linear registration problem as the minimization of a global energy function based on the image intensities. Although these algorithms have proved useful, their robustness in fully automated scenarios is still an open question. In fact, the optimization step often gets caught in local minima yielding unsatisfactory results. Recent algorithms constrain the space of registration parameters by exploiting implicit or explicit organ segmentations, thus increasing robustness4,5. In this work we propose a novel robust algorithm for automatic global linear image registration. Our method uses random regression forests to estimate posterior probability distributions for the locations of anatomical structures - represented as axis aligned bounding boxes6. These posterior distributions are later integrated in a global linear registration algorithm. The biggest advantage of our algorithm is that it does not require pre-defined segmentations or regions. Yet it yields robust registration results. We compare the robustness of our algorithm with that of the state of the art Elastix toolbox7. Validation is performed via 1464 pair-wise registrations in a database of very diverse 3D CT images. We show that our method decreases the "failure" rate of the global linear registration from 12.5% (Elastix) to only 1.9%.

  19. Using the fuzzy linear regression method to benchmark the energy efficiency of commercial buildings

    International Nuclear Information System (INIS)

    Chung, William

    2012-01-01

    Highlights: ► Fuzzy linear regression method is used for developing benchmarking systems. ► The systems can be used to benchmark energy efficiency of commercial buildings. ► The resulting benchmarking model can be used by public users. ► The resulting benchmarking model can capture the fuzzy nature of input–output data. -- Abstract: Benchmarking systems from a sample of reference buildings need to be developed to conduct benchmarking processes for the energy efficiency of commercial buildings. However, not all benchmarking systems can be adopted by public users (i.e., other non-reference building owners) because of the different methods in developing such systems. An approach for benchmarking the energy efficiency of commercial buildings using statistical regression analysis to normalize other factors, such as management performance, was developed in a previous work. However, the field data given by experts can be regarded as a distribution of possibility. Thus, the previous work may not be adequate to handle such fuzzy input–output data. Consequently, a number of fuzzy structures cannot be fully captured by statistical regression analysis. This present paper proposes the use of fuzzy linear regression analysis to develop a benchmarking process, the resulting model of which can be used by public users. An illustrative example is given as well.

  20. Least median of squares and iteratively re-weighted least squares as robust linear regression methods for fluorimetric determination of α-lipoic acid in capsules in ideal and non-ideal cases of linearity.

    Science.gov (United States)

    Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F

    2018-03-26

    This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.

  1. Causal correlation of foliar biochemical concentrations with AVIRIS spectra using forced entry linear regression

    Science.gov (United States)

    Dawson, Terence P.; Curran, Paul J.; Kupiec, John A.

    1995-01-01

    A major goal of airborne imaging spectrometry is to estimate the biochemical composition of vegetation canopies from reflectance spectra. Remotely-sensed estimates of foliar biochemical concentrations of forests would provide valuable indicators of ecosystem function at regional and eventually global scales. Empirical research has shown a relationship exists between the amount of radiation reflected from absorption features and the concentration of given biochemicals in leaves and canopies (Matson et al., 1994, Johnson et al., 1994). A technique commonly used to determine which wavelengths have the strongest correlation with the biochemical of interest is unguided (stepwise) multiple regression. Wavelengths are entered into a multivariate regression equation, in their order of importance, each contributing to the reduction of the variance in the measured biochemical concentration. A significant problem with the use of stepwise regression for determining the correlation between biochemical concentration and spectra is that of 'overfitting' as there are significantly more wavebands than biochemical measurements. This could result in the selection of wavebands which may be more accurately attributable to noise or canopy effects. In addition, there is a real problem of collinearity in that the individual biochemical concentrations may covary. A strong correlation between the reflectance at a given wavelength and the concentration of a biochemical of interest, therefore, may be due to the effect of another biochemical which is closely related. Furthermore, it is not always possible to account for potentially suitable waveband omissions in the stepwise selection procedure. This concern about the suitability of stepwise regression has been identified and acknowledged in a number of recent studies (Wessman et al., 1988, Curran, 1989, Curran et al., 1992, Peterson and Hubbard, 1992, Martine and Aber, 1994, Kupiec, 1994). These studies have pointed to the lack of a physical

  2. Piecewise linear regression techniques to analyze the timing of head coach dismissals in Dutch soccer clubs

    NARCIS (Netherlands)

    Schryver, T. de; Eisinga, R.

    2010-01-01

    The key question in research on dismissals of head coaches in sports clubs is not whether they should happen but when they will happen. This paper applies piecewise linear regression to advance our understanding of the timing of head coach dismissals. Essentially, the regression sacrifices degrees

  3. An Introduction to Graphical and Mathematical Methods for Detecting Heteroscedasticity in Linear Regression.

    Science.gov (United States)

    Thompson, Russel L.

    Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are…

  4. Error analysis of dimensionless scaling experiments with multiple points using linear regression

    International Nuclear Information System (INIS)

    Guercan, Oe.D.; Vermare, L.; Hennequin, P.; Bourdelle, C.

    2010-01-01

    A general method of error estimation in the case of multiple point dimensionless scaling experiments, using linear regression and standard error propagation, is proposed. The method reduces to the previous result of Cordey (2009 Nucl. Fusion 49 052001) in the case of a two-point scan. On the other hand, if the points follow a linear trend, it explains how the estimated error decreases as more points are added to the scan. Based on the analytical expression that is derived, it is argued that for a low number of points, adding points to the ends of the scanned range, rather than the middle, results in a smaller error estimate. (letter)

  5. Comparison of two-concentration with multi-concentration linear regressions: Retrospective data analysis of multiple regulated LC-MS bioanalytical projects.

    Science.gov (United States)

    Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi

    2013-09-01

    Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore

  6. Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system

    International Nuclear Information System (INIS)

    Fang, Tingting; Lahdelma, Risto

    2016-01-01

    Highlights: • Social factor is considered for the linear regression models besides weather file. • Simultaneously optimize all the coefficients for linear regression models. • SARIMA combined with linear regression is used to forecast the heat demand. • The accuracy for both linear regression and time series models are evaluated. - Abstract: Forecasting heat demand is necessary for production and operation planning of district heating (DH) systems. In this study we first propose a simple regression model where the hourly outdoor temperature and wind speed forecast the heat demand. Weekly rhythm of heat consumption as a social component is added to the model to significantly improve the accuracy. The other type of model is the seasonal autoregressive integrated moving average (SARIMA) model with exogenous variables as a combination to take weather factors, and the historical heat consumption data as depending variables. One outstanding advantage of the model is that it peruses the high accuracy for both long-term and short-term forecast by considering both exogenous factors and time series. The forecasting performance of both linear regression models and time series model are evaluated based on real-life heat demand data for the city of Espoo in Finland by out-of-sample tests for the last 20 full weeks of the year. The results indicate that the proposed linear regression model (T168h) using 168-h demand pattern with midweek holidays classified as Saturdays or Sundays gives the highest accuracy and strong robustness among all the tested models based on the tested forecasting horizon and corresponding data. Considering the parsimony of the input, the ease of use and the high accuracy, the proposed T168h model is the best in practice. The heat demand forecasting model can also be developed for individual buildings if automated meter reading customer measurements are available. This would allow forecasting the heat demand based on more accurate heat consumption

  7. Convergence diagnostics for Eigenvalue problems with linear regression model

    International Nuclear Information System (INIS)

    Shi, Bo; Petrovic, Bojan

    2011-01-01

    Although the Monte Carlo method has been extensively used for criticality/Eigenvalue problems, a reliable, robust, and efficient convergence diagnostics method is still desired. Most methods are based on integral parameters (multiplication factor, entropy) and either condense the local distribution information into a single value (e.g., entropy) or even disregard it. We propose to employ the detailed cycle-by-cycle local flux evolution obtained by using mesh tally mechanism to assess the source and flux convergence. By applying a linear regression model to each individual mesh in a mesh tally for convergence diagnostics, a global convergence criterion can be obtained. We exemplify this method on two problems and obtain promising diagnostics results. (author)

  8. Area under the curve predictions of dalbavancin, a new lipoglycopeptide agent, using the end of intravenous infusion concentration data point by regression analyses such as linear, log-linear and power models.

    Science.gov (United States)

    Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally

    2018-02-01

    1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.

  9. Regressão linear geograficamente ponderada em ambiente SIG

    Directory of Open Access Journals (Sweden)

    Luís Eduardo Ximenes Carvalho

    2009-10-01

    Full Text Available

    Este artigo aborda considerações teóricas e resultados da implementação em ambiente SIG de um modelo confirmatório de estatística espacial — regressão linear geograficamente ponderada (RGP — não disponível em ambiente livre. Os aspectos teóricos deste modelo local de regressão espacial foram amplamente discutidos em virtude da escassa bibliografia existente. O modelo RGP foi implementado na linguagem de programação GISDK do SIG-T TransCAD, utilizando compreensivamente as ferramentas de manipulação, tratamento georreferenciado dos dados e rotinas de análise espacial disponibilizadas em plataformas SIG. Ao final, espera-se ter desenvolvido, ainda que de maneira parcial, uma importante ferramenta que contribuirá para a compreensão e refinamento da modelagem de fenômenos geográficos tão amplamente analisados em estudos de Planejamento de Transportes.

  10. Development of planning level transportation safety tools using Geographically Weighted Poisson Regression.

    Science.gov (United States)

    Hadayeghi, Alireza; Shalaby, Amer S; Persaud, Bhagwant N

    2010-03-01

    A common technique used for the calibration of collision prediction models is the Generalized Linear Modeling (GLM) procedure with the assumption of Negative Binomial or Poisson error distribution. In this technique, fixed coefficients that represent the average relationship between the dependent variable and each explanatory variable are estimated. However, the stationary relationship assumed may hide some important spatial factors of the number of collisions at a particular traffic analysis zone. Consequently, the accuracy of such models for explaining the relationship between the dependent variable and the explanatory variables may be suspected since collision frequency is likely influenced by many spatially defined factors such as land use, demographic characteristics, and traffic volume patterns. The primary objective of this study is to investigate the spatial variations in the relationship between the number of zonal collisions and potential transportation planning predictors, using the Geographically Weighted Poisson Regression modeling technique. The secondary objective is to build on knowledge comparing the accuracy of Geographically Weighted Poisson Regression models to that of Generalized Linear Models. The results show that the Geographically Weighted Poisson Regression models are useful for capturing spatially dependent relationships and generally perform better than the conventional Generalized Linear Models. Copyright 2009 Elsevier Ltd. All rights reserved.

  11. Enhancement of Visual Field Predictions with Pointwise Exponential Regression (PER) and Pointwise Linear Regression (PLR).

    Science.gov (United States)

    Morales, Esteban; de Leon, John Mark S; Abdollahi, Niloufar; Yu, Fei; Nouri-Mahdavi, Kouros; Caprioli, Joseph

    2016-03-01

    The study was conducted to evaluate threshold smoothing algorithms to enhance prediction of the rates of visual field (VF) worsening in glaucoma. We studied 798 patients with primary open-angle glaucoma and 6 or more years of follow-up who underwent 8 or more VF examinations. Thresholds at each VF location for the first 4 years or first half of the follow-up time (whichever was greater) were smoothed with clusters defined by the nearest neighbor (NN), Garway-Heath, Glaucoma Hemifield Test (GHT), and weighting by the correlation of rates at all other VF locations. Thresholds were regressed with a pointwise exponential regression (PER) model and a pointwise linear regression (PLR) model. Smaller root mean square error (RMSE) values of the differences between the observed and the predicted thresholds at last two follow-ups indicated better model predictions. The mean (SD) follow-up times for the smoothing and prediction phase were 5.3 (1.5) and 10.5 (3.9) years. The mean RMSE values for the PER and PLR models were unsmoothed data, 6.09 and 6.55; NN, 3.40 and 3.42; Garway-Heath, 3.47 and 3.48; GHT, 3.57 and 3.74; and correlation of rates, 3.59 and 3.64. Smoothed VF data predicted better than unsmoothed data. Nearest neighbor provided the best predictions; PER also predicted consistently more accurately than PLR. Smoothing algorithms should be used when forecasting VF results with PER or PLR. The application of smoothing algorithms on VF data can improve forecasting in VF points to assist in treatment decisions.

  12. Single Image Super-Resolution Using Global Regression Based on Multiple Local Linear Mappings.

    Science.gov (United States)

    Choi, Jae-Seok; Kim, Munchurl

    2017-03-01

    Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity. However, since SI only utilizes simple linear mappings, it may fail to precisely reconstruct HR patches with complex texture. In this paper, we present a novel SR method, which inherits the large-to-small patch conversion scheme from SI but uses global regression based on local linear mappings (GLM). Thus, our new SR method is called GLM-SI. In GLM-SI, each LR input patch is divided into 25 overlapped subpatches. Next, based on the local properties of these subpatches, 25 different local linear mappings are applied to the current LR input patch to generate 25 HR patch candidates, which are then regressed into one final HR patch using a global regressor. The local linear mappings are learned cluster-wise in our off-line training phase. The main contribution of this paper is as follows: Previously, linear-mapping-based conventional SR methods, including SI only used one simple yet coarse linear mapping to each patch to reconstruct its HR version. On the contrary, for each LR input patch, our GLM-SI is the first to apply a combination of multiple local linear mappings, where each local linear mapping is found according to local properties of the current LR patch. Therefore, it can better approximate nonlinear LR-to-HR mappings for HR patches with complex texture. Experiment results show that the proposed GLM-SI method outperforms most of the state-of-the-art methods, and shows comparable PSNR performance with much lower

  13. Using the classical linear regression model in analysis of the dependences of conveyor belt life

    Directory of Open Access Journals (Sweden)

    Miriam Andrejiová

    2013-12-01

    Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.

  14. Multiple linear regressions

    Indian Academy of Sciences (India)

    Abstract. The predictive analysis based on quantitative structure activity relationships (QSAR) on benzim- ... could lead to treatment of obesity, diabetes and related conditions. ..... After discussing the physical and chemical mean- ing of the ...

  15. Building a new predictor for multiple linear regression technique-based corrective maintenance turnaround time.

    Science.gov (United States)

    Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa

    2008-01-01

    This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.

  16. Modeling the frequency of opposing left-turn conflicts at signalized intersections using generalized linear regression models.

    Science.gov (United States)

    Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei

    2014-01-01

    The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.

  17. Game Theory and its Relationship with Linear Programming Models ...

    African Journals Online (AJOL)

    Game Theory and its Relationship with Linear Programming Models. ... This paper shows that game theory and linear programming problem are closely related subjects since any computing method devised for ... AJOL African Journals Online.

  18. A Feature-Free 30-Disease Pathological Brain Detection System by Linear Regression Classifier.

    Science.gov (United States)

    Chen, Yi; Shao, Ying; Yan, Jie; Yuan, Ti-Fei; Qu, Yanwen; Lee, Elizabeth; Wang, Shuihua

    2017-01-01

    Alzheimer's disease patients are increasing rapidly every year. Scholars tend to use computer vision methods to develop automatic diagnosis system. (Background) In 2015, Gorji et al. proposed a novel method using pseudo Zernike moment. They tested four classifiers: learning vector quantization neural network, pattern recognition neural network trained by Levenberg-Marquardt, by resilient backpropagation, and by scaled conjugate gradient. This study presents an improved method by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Therefore, it can be used to detect Alzheimer's disease. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  19. MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY

    OpenAIRE

    Chayalakshmi C.L

    2018-01-01

    MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY ABSTRACT Calculation of boiler efficiency is essential if its parameters need to be controlled for either maintaining or enhancing its efficiency. But determination of boiler efficiency using conventional method is time consuming and very expensive. Hence, it is not recommended to find boiler efficiency frequently. The work presented in this paper deals with establishing the statistical mo...

  20. Using Recursive Regression to Explore Nonlinear Relationships and Interactions: A Tutorial Applied to a Multicultural Education Study

    Directory of Open Access Journals (Sweden)

    Kenneth David Strang

    2009-03-01

    Full Text Available This paper discusses how a seldom-used statistical procedure, recursive regression (RR, can numerically and graphically illustrate data-driven nonlinear relationships and interaction of variables. This routine falls into the family of exploratory techniques, yet a few interesting features make it a valuable compliment to factor analysis and multiple linear regression for method triangulation. By comparison, nonlinear cluster analysis also generates graphical dendrograms to visually depict relationships, but RR (as implemented here uses multiple combinations of nominal and interval predictors regressed on a categorical or ratio dependent variable. In similar fashion, multidimensional scaling, multiple discriminant analysis and conjoint analysis are constrained at best to predicting an ordinal dependent variable (as currently implemented in popular software. A flexible capability of RR (again as implemented here is the transformation of factor data (for substituting codes. One powerful RR feature is the ability to treat missing data as a theoretically important predictor value (useful for survey questions that respondents do not wish to answer. For practitioners, the paper summarizes how this technique fits within the generally-accepted statistical methods. Popular software such as SPSS, SAS or LISREL can be used, while sample data can be imported in common formats including ASCII text, comma delimited, Excel XLS, and SPSS SAV. A tutorial approach is applied here using RR in LISREL. The tutorial leverages a partial sample from a study that used recursive regression to predict grades from international student learning styles. Some tutorial portions are technical, to improve the ambiguous RR literature.

  1. Bivariate least squares linear regression: Towards a unified analytic formalism. I. Functional models

    Science.gov (United States)

    Caimmi, R.

    2011-08-01

    Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both

  2. Improving sensitivity of linear regression-based cell type-specific differential expression deconvolution with per-gene vs. global significance threshold.

    Science.gov (United States)

    Glass, Edmund R; Dozmorov, Mikhail G

    2016-10-06

    The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from blood samples suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. Combined with cell counts, heterogeneous gene expression may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression, and a global cutoff to judge significance, such as False Discovery Rate (FDR). Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect linear regression. In this paper we quantify the parameter space affecting the performance of linear regression (sensitivity of cell type-specific differential expression detection) on a per-gene basis. We evaluated the effect of sample sizes, cell type-specific proportion variability, and mean squared error on sensitivity of cell type-specific differential expression detection using linear regression. Each parameter affected variability of cell type-specific expression estimates and, subsequently, the sensitivity of differential expression detection. We provide the R package, LRCDE, which performs linear regression-based cell type-specific differential expression (deconvolution) detection on a gene-by-gene basis. Accounting for variability around cell type-specific gene expression estimates, it computes per-gene t-statistics of differential detection, p-values, t-statistic-based sensitivity, group-specific mean squared error, and several gene-specific diagnostic metrics. The sensitivity of linear regression-based cell type-specific differential expression detection differed for each gene as a function of mean squared error, per group sample sizes, and variability of the proportions

  3. Prediction of retention indices for frequently reported compounds of plant essential oils using multiple linear regression, partial least squares, and support vector machine.

    Science.gov (United States)

    Yan, Jun; Huang, Jian-Hua; He, Min; Lu, Hong-Bing; Yang, Rui; Kong, Bo; Xu, Qing-Song; Liang, Yi-Zeng

    2013-08-01

    Retention indices for frequently reported compounds of plant essential oils on three different stationary phases were investigated. Multivariate linear regression, partial least squares, and support vector machine combined with a new variable selection approach called random-frog recently proposed by our group, were employed to model quantitative structure-retention relationships. Internal and external validations were performed to ensure the stability and predictive ability. All the three methods could obtain an acceptable model, and the optimal results by support vector machine based on a small number of informative descriptors with the square of correlation coefficient for cross validation, values of 0.9726, 0.9759, and 0.9331 on the dimethylsilicone stationary phase, the dimethylsilicone phase with 5% phenyl groups, and the PEG stationary phase, respectively. The performances of two variable selection approaches, random-frog and genetic algorithm, are compared. The importance of the variables was found to be consistent when estimated from correlation coefficients in multivariate linear regression equations and selection probability in model spaces. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. QSAR study of HCV NS5B polymerase inhibitors using the genetic algorithm-multiple linear regression (GA-MLR).

    Science.gov (United States)

    Rafiei, Hamid; Khanzadeh, Marziyeh; Mozaffari, Shahla; Bostanifar, Mohammad Hassan; Avval, Zhila Mohajeri; Aalizadeh, Reza; Pourbasheer, Eslam

    2016-01-01

    Quantitative structure-activity relationship (QSAR) study has been employed for predicting the inhibitory activities of the Hepatitis C virus (HCV) NS5B polymerase inhibitors . A data set consisted of 72 compounds was selected, and then different types of molecular descriptors were calculated. The whole data set was split into a training set (80 % of the dataset) and a test set (20 % of the dataset) using principle component analysis. The stepwise (SW) and the genetic algorithm (GA) techniques were used as variable selection tools. Multiple linear regression method was then used to linearly correlate the selected descriptors with inhibitory activities. Several validation technique including leave-one-out and leave-group-out cross-validation, Y-randomization method were used to evaluate the internal capability of the derived models. The external prediction ability of the derived models was further analyzed using modified r(2), concordance correlation coefficient values and Golbraikh and Tropsha acceptable model criteria's. Based on the derived results (GA-MLR), some new insights toward molecular structural requirements for obtaining better inhibitory activity were obtained.

  5. Significance tests to determine the direction of effects in linear regression models.

    Science.gov (United States)

    Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander

    2015-02-01

    Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice. © 2014 The British Psychological Society.

  6. Multiple linear regression analysis

    Science.gov (United States)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  7. EMD-regression for modelling multi-scale relationships, and application to weather-related cardiovascular mortality

    Science.gov (United States)

    Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.

    2018-01-01

    In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.

  8. Dynamic Optimization for IPS2 Resource Allocation Based on Improved Fuzzy Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Maokuan Zheng

    2017-01-01

    Full Text Available The study mainly focuses on resource allocation optimization for industrial product-service systems (IPS2. The development of IPS2 leads to sustainable economy by introducing cooperative mechanisms apart from commodity transaction. The randomness and fluctuation of service requests from customers lead to the volatility of IPS2 resource utilization ratio. Three basic rules for resource allocation optimization are put forward to improve system operation efficiency and cut unnecessary costs. An approach based on fuzzy multiple linear regression (FMLR is developed, which integrates the strength and concision of multiple linear regression in data fitting and factor analysis and the merit of fuzzy theory in dealing with uncertain or vague problems, which helps reduce those costs caused by unnecessary resource transfer. The iteration mechanism is introduced in the FMLR algorithm to improve forecasting accuracy. A case study of human resource allocation optimization in construction machinery industry is implemented to test and verify the proposed model.

  9. A Linear Regression Model for Global Solar Radiation on Horizontal Surfaces at Warri, Nigeria

    Directory of Open Access Journals (Sweden)

    Michael S. Okundamiya

    2013-10-01

    Full Text Available The growing anxiety on the negative effects of fossil fuels on the environment and the global emission reduction targets call for a more extensive use of renewable energy alternatives. Efficient solar energy utilization is an essential solution to the high atmospheric pollution caused by fossil fuel combustion. Global solar radiation (GSR data, which are useful for the design and evaluation of solar energy conversion system, are not measured at the forty-five meteorological stations in Nigeria. The dearth of the measured solar radiation data calls for accurate estimation. This study proposed a temperature-based linear regression, for predicting the monthly average daily GSR on horizontal surfaces, at Warri (latitude 5.020N and longitude 7.880E an oil city located in the south-south geopolitical zone, in Nigeria. The proposed model is analyzed based on five statistical indicators (coefficient of correlation, coefficient of determination, mean bias error, root mean square error, and t-statistic, and compared with the existing sunshine-based model for the same study. The results indicate that the proposed temperature-based linear regression model could replace the existing sunshine-based model for generating global solar radiation data. Keywords: air temperature; empirical model; global solar radiation; regression analysis; renewable energy; Warri

  10. Partitioning of late gestation energy expenditure in ewes using indirect calorimetry and a linear regression approach

    DEFF Research Database (Denmark)

    Kiani, Alishir; Chwalibog, André; Nielsen, Mette O

    2007-01-01

    Late gestation energy expenditure (EE(gest)) originates from energy expenditure (EE) of development of conceptus (EE(conceptus)) and EE of homeorhetic adaptation of metabolism (EE(homeorhetic)). Even though EE(gest) is relatively easy to quantify, its partitioning is problematic. In the present...... study metabolizable energy (ME) intake ranges for twin-bearing ewes were 220-440, 350- 700, 350-900 kJ per metabolic body weight (W0.75) at week seven, five, two pre-partum respectively. Indirect calorimetry and a linear regression approach were used to quantify EE(gest) and then partition to EE......(conceptus) and EE(homeorhetic). Energy expenditure of basal metabolism of the non-gravid tissues (EE(bmng)), derived from the intercept of the linear regression equation of retained energy [kJ/W0.75] and ME intake [kJ/W(0.75)], was 298 [kJ/ W0.75]. Values of the intercepts of the regression equations at week seven...

  11. Lattice Designs in Standard and Simple Implicit Multi-linear Regression

    OpenAIRE

    Wooten, Rebecca D.

    2016-01-01

    Statisticians generally use ordinary least squares to minimize the random error in a subject response with respect to independent explanatory variable. However, Wooten shows illustrates how ordinary least squares can be used to minimize the random error in the system without defining a subject response. Using lattice design Wooten shows that non-response analysis is a superior alternative rotation of the pyramidal relationship between random variables and parameter estimates in multi-linear r...

  12. Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

    Science.gov (United States)

    Ng, Kar Yong; Awang, Norhashidah

    2018-01-06

    Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.

  13. The relationships among iron supplement use, Hb concentration and linear growth in young children: Ethiopian Demographic and Health Survey.

    Science.gov (United States)

    Mohammed, Shimels Hussien; Esmaillzadeh, Ahmad

    2017-11-01

    Growth faltering and anaemia remain unacceptably high among infants and young children in Ethiopia. In this study, we investigated the relationships among Fe supplement use (ISU), Hb concentration and linear growth, hypothesising positive relationships between ISU and Hb, ISU and linear growth and Hb and linear growth. We used a nationally representative data of 2400 children aged 6-24 months from the Ethiopian Demographic and Health Survey (EDHS) 2011, conducted following a stratified, two-stage cluster sampling. We examined the links by Pearson's correlation, bivariate and multivariate linear regression analyses and reported adjusted estimates. We found that ISU was not significantly associated with either Hb (β=1·09; 95 % CI -2·73, 5·01, P=0·567) or linear growth (β=0·07; 95 % CI -0·06, 0·21, P=0·217). We found a positive, however, weak, correlation between Hb and linear growth (r 0·09; 95 % CI 0·06, 0·11, PHb predicted linear growth independent of a variety dietary and non-dietary factors (β=0·08; 95 % CI 0·04, 0·11, PHb; age, birth type, size at birth, sex, breast-feeding duration, dietary diversity and deworming were independently associated with linear growth, indicating that Hb and linear growth are multifactorial with both nutritional and non-nutritional factors implicated. Further studies, with better design and exposure assessment, are warranted on the relation of ISU with Hb or linear growth.

  14. Bayesian linear regression : different conjugate models and their (in)sensitivity to prior-data conflict

    NARCIS (Netherlands)

    Walter, G.M.; Augustin, Th.; Kneib, Thomas; Tutz, Gerhard

    2010-01-01

    The paper is concerned with Bayesian analysis under prior-data conflict, i.e. the situation when observed data are rather unexpected under the prior (and the sample size is not large enough to eliminate the influence of the prior). Two approaches for Bayesian linear regression modeling based on

  15. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    Science.gov (United States)

    Cooper, Paul D.

    2010-01-01

    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  16. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    Science.gov (United States)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  17. Assessment of triglyceride and cholesterol in overweight people based on multiple linear regression and artificial intelligence model.

    Science.gov (United States)

    Ma, Jing; Yu, Jiong; Hao, Guangshu; Wang, Dan; Sun, Yanni; Lu, Jianxin; Cao, Hongcui; Lin, Feiyan

    2017-02-20

    The prevalence of high hyperlipemia is increasing around the world. Our aims are to analyze the relationship of triglyceride (TG) and cholesterol (TC) with indexes of liver function and kidney function, and to develop a prediction model of TG, TC in overweight people. A total of 302 adult healthy subjects and 273 overweight subjects were enrolled in this study. The levels of fasting indexes of TG (fs-TG), TC (fs-TC), blood glucose, liver function, and kidney function were measured and analyzed by correlation analysis and multiple linear regression (MRL). The back propagation artificial neural network (BP-ANN) was applied to develop prediction models of fs-TG and fs-TC. The results showed there was significant difference in biochemical indexes between healthy people and overweight people. The correlation analysis showed fs-TG was related to weight, height, blood glucose, and indexes of liver and kidney function; while fs-TC was correlated with age, indexes of liver function (P < 0.01). The MRL analysis indicated regression equations of fs-TG and fs-TC both had statistic significant (P < 0.01) when included independent indexes. The BP-ANN model of fs-TG reached training goal at 59 epoch, while fs-TC model achieved high prediction accuracy after training 1000 epoch. In conclusions, there was high relationship of fs-TG and fs-TC with weight, height, age, blood glucose, indexes of liver function and kidney function. Based on related variables, the indexes of fs-TG and fs-TC can be predicted by BP-ANN models in overweight people.

  18. Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

    Science.gov (United States)

    Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

    2013-01-01

    Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…

  19. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    Science.gov (United States)

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  20. Healthcare Expenditures Associated with Depression Among Individuals with Osteoarthritis: Post-Regression Linear Decomposition Approach.

    Science.gov (United States)

    Agarwal, Parul; Sambamoorthi, Usha

    2015-12-01

    Depression is common among individuals with osteoarthritis and leads to increased healthcare burden. The objective of this study was to examine excess total healthcare expenditures associated with depression among individuals with osteoarthritis in the US. Adults with self-reported osteoarthritis (n = 1881) were identified using data from the 2010 Medical Expenditure Panel Survey (MEPS). Among those with osteoarthritis, chi-square tests and ordinary least square regressions (OLS) were used to examine differences in healthcare expenditures between those with and without depression. Post-regression linear decomposition technique was used to estimate the relative contribution of different constructs of the Anderson's behavioral model, i.e., predisposing, enabling, need, personal healthcare practices, and external environment factors, to the excess expenditures associated with depression among individuals with osteoarthritis. All analysis accounted for the complex survey design of MEPS. Depression coexisted among 20.6 % of adults with osteoarthritis. The average total healthcare expenditures were $13,684 among adults with depression compared to $9284 among those without depression. Multivariable OLS regression revealed that adults with depression had 38.8 % higher healthcare expenditures (p regression linear decomposition analysis indicated that 50 % of differences in expenditures among adults with and without depression can be explained by differences in need factors. Among individuals with coexisting osteoarthritis and depression, excess healthcare expenditures associated with depression were mainly due to comorbid anxiety, chronic conditions and poor health status. These expenditures may potentially be reduced by providing timely intervention for need factors or by providing care under a collaborative care model.

  1. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    Science.gov (United States)

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  2. Analysis of Relationship Between Personality and Favorite Places with Poisson Regression Analysis

    Directory of Open Access Journals (Sweden)

    Yoon Song Ha

    2018-01-01

    Full Text Available A relationship between human personality and preferred locations have been a long conjecture for human mobility research. In this paper, we analyzed the relationship between personality and visiting place with Poisson Regression. Poisson Regression can analyze correlation between countable dependent variable and independent variable. For this analysis, 33 volunteers provided their personality data and 49 location categories data are used. Raw location data is preprocessed to be normalized into rates of visit and outlier data is prunned. For the regression analysis, independent variables are personality data and dependent variables are preprocessed location data. Several meaningful results are found. For example, persons with high tendency of frequent visiting to university laboratory has personality with high conscientiousness and low openness. As well, other meaningful location categories are presented in this paper.

  3. An introduction to using Bayesian linear regression with clinical data.

    Science.gov (United States)

    Baldwin, Scott A; Larson, Michael J

    2017-11-01

    Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Daily Suspended Sediment Discharge Prediction Using Multiple Linear Regression and Artificial Neural Network

    Science.gov (United States)

    Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari

    2018-01-01

    Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg

  5. Computer software for linear and nonlinear regression in organic NMR; Programa de computador para regressao linear e nao linear em R.M.N. organica

    Energy Technology Data Exchange (ETDEWEB)

    Canto, Eduardo Leite do; Rittner, Roberto [Universidade Estadual de Campinas, SP (Brazil). Inst. de Quimica

    1992-12-31

    Calculation involving two variable linear regressions, require specific procedures generally not familiar to chemist. For attending the necessity of fast and efficient handling of NMR data, a self explained and Pc portable software has been developed, which allows user to produce and use diskette recorded tables, containing chemical shift or any other substituent physical-chemical measurements and constants ({sigma}{sub T}, {sigma}{sup o}{sub R}, E{sub s}, ...) 9 refs., 1 fig.

  6. Straight line fitting and predictions: On a marginal likelihood approach to linear regression and errors-in-variables models

    Science.gov (United States)

    Christiansen, Bo

    2015-04-01

    Linear regression methods are without doubt the most used approaches to describe and predict data in the physical sciences. They are often good first order approximations and they are in general easier to apply and interpret than more advanced methods. However, even the properties of univariate regression can lead to debate over the appropriateness of various models as witnessed by the recent discussion about climate reconstruction methods. Before linear regression is applied important choices have to be made regarding the origins of the noise terms and regarding which of the two variables under consideration that should be treated as the independent variable. These decisions are often not easy to make but they may have a considerable impact on the results. We seek to give a unified probabilistic - Bayesian with flat priors - treatment of univariate linear regression and prediction by taking, as starting point, the general errors-in-variables model (Christiansen, J. Clim., 27, 2014-2031, 2014). Other versions of linear regression can be obtained as limits of this model. We derive the likelihood of the model parameters and predictands of the general errors-in-variables model by marginalizing over the nuisance parameters. The resulting likelihood is relatively simple and easy to analyze and calculate. The well known unidentifiability of the errors-in-variables model is manifested as the absence of a well-defined maximum in the likelihood. However, this does not mean that probabilistic inference can not be made; the marginal likelihoods of model parameters and the predictands have, in general, well-defined maxima. We also include a probabilistic version of classical calibration and show how it is related to the errors-in-variables model. The results are illustrated by an example from the coupling between the lower stratosphere and the troposphere in the Northern Hemisphere winter.

  7. The detection of influential subsets in linear regression using an influence matrix

    OpenAIRE

    Peña, Daniel; Yohai, Víctor J.

    1991-01-01

    This paper presents a new method to identify influential subsets in linear regression problems. The procedure uses the eigenstructure of an influence matrix which is defined as the matrix of uncentered covariance of the effect on the whole data set of deleting each observation, normalized to include the univariate Cook's statistics in the diagonal. It is shown that points in an influential subset will appear with large weight in at least one of the eigenvector linked to the largest eigenvalue...

  8. Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis

    Science.gov (United States)

    Camilleri, Liberato; Cefai, Carmel

    2013-01-01

    Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…

  9. Multivariate linear regression of high-dimensional fMRI data with multiple target variables.

    Science.gov (United States)

    Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia

    2014-05-01

    Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets. Copyright © 2013 Wiley Periodicals, Inc.

  10. Chemical composition of the essential oils of Citrus sinensis cv. valencia and a quantitative structure-retention relationship study for the prediction of retention indices by multiple linear regression

    Directory of Open Access Journals (Sweden)

    Larijani Kambiz

    2011-01-01

    Full Text Available The chemical composition of the volatile fraction obtained by head-space solid phase microextraction (HS-SPME, single drop microextraction (SDME and the essential oil obtained by cold-press from the peels of C. sinensis cv. valencia were analyzed employing gas chromatography-flame ionization detector (GC-FID and gas chromatography-mass spectrometry (GC-MS. The main components were limonene (61.34 %, 68.27 %, 90.50 %, myrcene (17.55 %, 12.35 %, 2.50 %, sabinene (6.50 %, 7.62 %, 0.5 % and α-pinene (0 %, 6.65 %, 1.4 % respectively obtained by HS-SPME, SDME and cold-press. Then a quantitative structure-retention relationship (QSRR study for the prediction of retention indices (RI of the compounds was developed by application of structural descriptors and the multiple linear regression (MLR method. Principal components analysis was used to select the training set. A simple model with low standard errors and high correlation coefficients was obtained. The results illustrated that linear techniques such as MLR combined with a successful variable selection procedure are capable of generating an efficient QSRR model for prediction of the retention indices of different compounds. This model, with high statistical significance (R2 train = 0.983, R2 test = 0.970, Q2 LOO = 0.962, Q2 LGO = 0.936, REP(% = 3.00, could be used adequately for the prediction and description of the retention indices of the volatile compounds.

  11. An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

    Science.gov (United States)

    Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

    2011-01-01

    This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…

  12. Regression: The Apple Does Not Fall Far From the Tree.

    Science.gov (United States)

    Vetter, Thomas R; Schober, Patrick

    2018-05-15

    Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.

  13. BOX-COX REGRESSION METHOD IN TIME SCALING

    Directory of Open Access Journals (Sweden)

    ATİLLA GÖKTAŞ

    2013-06-01

    Full Text Available Box-Cox regression method with λj, for j = 1, 2, ..., k, power transformation can be used when dependent variable and error term of the linear regression model do not satisfy the continuity and normality assumptions. The situation obtaining the smallest mean square error  when optimum power λj, transformation for j = 1, 2, ..., k, of Y has been discussed. Box-Cox regression method is especially appropriate to adjust existence skewness or heteroscedasticity of error terms for a nonlinear functional relationship between dependent and explanatory variables. In this study, the advantage and disadvantage use of Box-Cox regression method have been discussed in differentiation and differantial analysis of time scale concept.

  14. Non-linear calibration models for near infrared spectroscopy

    DEFF Research Database (Denmark)

    Ni, Wangdong; Nørgaard, Lars; Mørup, Morten

    2014-01-01

    by ridge regression (RR). The performance of the different methods is demonstrated by their practical applications using three real-life near infrared (NIR) data sets. Different aspects of the various approaches including computational time, model interpretability, potential over-fitting using the non-linear...... models on linear problems, robustness to small or medium sample sets, and robustness to pre-processing, are discussed. The results suggest that GPR and BANN are powerful and promising methods for handling linear as well as nonlinear systems, even when the data sets are moderately small. The LS......-SVM), relevance vector machines (RVM), Gaussian process regression (GPR), artificial neural network (ANN), and Bayesian ANN (BANN). In this comparison, partial least squares (PLS) regression is used as a linear benchmark, while the relationship of the methods is considered in terms of traditional calibration...

  15. Modeling maximum daily temperature using a varying coefficient regression model

    Science.gov (United States)

    Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith

    2014-01-01

    Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...

  16. Uncertainty of pesticide residue concentration determined from ordinary and weighted linear regression curve.

    Science.gov (United States)

    Yolci Omeroglu, Perihan; Ambrus, Árpad; Boyacioglu, Dilek

    2018-03-28

    Determination of pesticide residues is based on calibration curves constructed for each batch of analysis. Calibration standard solutions are prepared from a known amount of reference material at different concentration levels covering the concentration range of the analyte in the analysed samples. In the scope of this study, the applicability of both ordinary linear and weighted linear regression (OLR and WLR) for pesticide residue analysis was investigated. We used 782 multipoint calibration curves obtained for 72 different analytical batches with high-pressure liquid chromatography equipped with an ultraviolet detector, and gas chromatography with electron capture, nitrogen phosphorus or mass spectrophotometer detectors. Quality criteria of the linear curves including regression coefficient, standard deviation of relative residuals and deviation of back calculated concentrations were calculated both for WLR and OLR methods. Moreover, the relative uncertainty of the predicted analyte concentration was estimated for both methods. It was concluded that calibration curve based on WLR complies with all the quality criteria set by international guidelines compared to those calculated with OLR. It means that all the data fit well with WLR for pesticide residue analysis. It was estimated that, regardless of the actual concentration range of the calibration, relative uncertainty at the lowest calibrated level ranged between 0.3% and 113.7% for OLR and between 0.2% and 22.1% for WLR. At or above 1/3 of the calibrated range, uncertainty of calibration curve ranged between 0.1% and 16.3% for OLR and 0% and 12.2% for WLR, and therefore, the two methods gave comparable results.

  17. The regression-calibration method for fitting generalized linear models with additive measurement error

    OpenAIRE

    James W. Hardin; Henrik Schmeidiche; Raymond J. Carroll

    2003-01-01

    This paper discusses and illustrates the method of regression calibration. This is a straightforward technique for fitting models with additive measurement error. We present this discussion in terms of generalized linear models (GLMs) following the notation defined in Hardin and Carroll (2003). Discussion will include specified measurement error, measurement error estimated by replicate error-prone proxies, and measurement error estimated by instrumental variables. The discussion focuses on s...

  18. Time-Frequency Analysis of Non-Stationary Biological Signals with Sparse Linear Regression Based Fourier Linear Combiner

    Directory of Open Access Journals (Sweden)

    Yubo Wang

    2017-06-01

    Full Text Available It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC. In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976 ratio and outperforms existing methods such as short-time Fourier transfrom (STFT, continuous Wavelet transform (CWT and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.

  19. Time-Frequency Analysis of Non-Stationary Biological Signals with Sparse Linear Regression Based Fourier Linear Combiner.

    Science.gov (United States)

    Wang, Yubo; Veluvolu, Kalyana C

    2017-06-14

    It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.

  20. Quantitative Structure-Activity Relationships and Docking Studies of Calcitonin Gene-Related Peptide Antagonists

    DEFF Research Database (Denmark)

    Jenssen, Håvard; Mehrabian, Mohadeseh; Kyani, Anahita

    2012-01-01

    Defining the role of calcitonin gene-related peptide in migraine pathogenesis could lead to the application of calcitonin gene-related peptide antagonists as novel migraine therapeutics. In this work, quantitative structure-activity relationship modeling of biological activities of a large range...... of calcitonin gene-related peptide antagonists was performed using a panel of physicochemical descriptors. The computational studies evaluated different variable selection techniques and demonstrated shuffling stepwise multiple linear regression to be superior over genetic algorithm-multiple linear regression....... The linear quantitative structure-activity relationship model revealed better statistical parameters of cross-validation in comparison with the non-linear support vector regression technique. Implementing only five peptide descriptors into this linear quantitative structure-activity relationship model...

  1. Linear and non-linear dose-response functions reveal a hormetic relationship between stress and learning.

    Science.gov (United States)

    Zoladz, Phillip R; Diamond, David M

    2008-10-16

    Over a century of behavioral research has shown that stress can enhance or impair learning and memory. In the present review, we have explored the complex effects of stress on cognition and propose that they are characterized by linear and non-linear dose-response functions, which together reveal a hormetic relationship between stress and learning. We suggest that stress initially enhances hippocampal function, resulting from amygdala-induced excitation of hippocampal synaptic plasticity, as well as the excitatory effects of several neuromodulators, including corticosteroids, norepinephrine, corticotropin-releasing hormone, acetylcholine and dopamine. We propose that this rapid activation of the amygdala-hippocampus brain memory system results in a linear dose-response relation between emotional strength and memory formation. More prolonged stress, however, leads to an inhibition of hippocampal function, which can be attributed to compensatory cellular responses that protect hippocampal neurons from excitotoxicity. This inhibition of hippocampal functioning in response to prolonged stress is potentially relevant to the well-described curvilinear dose-response relationship between arousal and memory. Our emphasis on the temporal features of stress-brain interactions addresses how stress can activate, as well as impair, hippocampal functioning to produce a hormetic relationship between stress and learning.

  2. Discussion on Regression Methods Based on Ensemble Learning and Applicability Domains of Linear Submodels.

    Science.gov (United States)

    Kaneko, Hiromasa

    2018-02-26

    To develop a new ensemble learning method and construct highly predictive regression models in chemoinformatics and chemometrics, applicability domains (ADs) are introduced into the ensemble learning process of prediction. When estimating values of an objective variable using subregression models, only the submodels with ADs that cover a query sample, i.e., the sample is inside the model's AD, are used. By constructing submodels and changing a list of selected explanatory variables, the union of the submodels' ADs, which defines the overall AD, becomes large, and the prediction performance is enhanced for diverse compounds. By analyzing a quantitative structure-activity relationship data set and a quantitative structure-property relationship data set, it is confirmed that the ADs can be enlarged and the estimation performance of regression models is improved compared with traditional methods.

  3. The review of the achieved degree of sustainable development in South Eastern Europe - The use of linear regression method

    Energy Technology Data Exchange (ETDEWEB)

    Golusin, Mirjana [Educons University, Vojvode Putnika st. bb, 21013 Sremska Kamnica (RS); Ivanovic, Olja Munitlak [Faculty of Business in Services, Vojvode Putnik st. bb, 21013 Sremska Kamenica (RS); Teodorovic, Natasa [Faculty of Entrepreneurial Management, Modene st. 5, 21000 Novi Sad (RS)

    2011-01-15

    The need for preservation and adequate management of the quality of environment requires the development of new methods and techniques by which the achieved degree of sustainable development can be defined as well as the laws regarding the relationship among its subsystems. Main objective of research is to point to a strong contradiction between the development of ecological and economic subsystems. In order to improve previous research, this study suggests the use of linear evaluation, by which it is possible to determine the exact degree of contradiction between these two subsystems and to define the regularities as well as the deviations. Authors present the essential steps that were used. Conducted by the method of linear regression this research shows a significant negative correlation between ecological and economic subsystem indicators, whereas its value R{sup 2} 0.58 proves the expected contradiction that exists between the two previously mentioned subsystems. By observing the sustainable development as a two-dimensional system that includes ecological and economic indicators, the authors suggest the methodology to modelling the relationship between economic and ecological development as an orthogonal distance between the degree of the current state measured by the relation between economic and ecological indicators of sustainable development and the degree which was obtained in a traditional way. The method used in this research proved to be extremely suitable for modelling the relationship between ecological and economic subsystems of sustainable development. This research was conducted on a repeated sample of countries of South East Europe by including the data for France and Germany, being two countries on the highest level of development in the European Union. (author)

  4. Heteroscedasticity as a Basis of Direction Dependence in Reversible Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; Artner, Richard; von Eye, Alexander

    2017-01-01

    Heteroscedasticity is a well-known issue in linear regression modeling. When heteroscedasticity is observed, researchers are advised to remedy possible model misspecification of the explanatory part of the model (e.g., considering alternative functional forms and/or omitted variables). The present contribution discusses another source of heteroscedasticity in observational data: Directional model misspecifications in the case of nonnormal variables. Directional misspecification refers to situations where alternative models are equally likely to explain the data-generating process (e.g., x → y versus y → x). It is shown that the homoscedasticity assumption is likely to be violated in models that erroneously treat true nonnormal predictors as response variables. Recently, Direction Dependence Analysis (DDA) has been proposed as a framework to empirically evaluate the direction of effects in linear models. The present study links the phenomenon of heteroscedasticity with DDA and describes visual diagnostics and nine homoscedasticity tests that can be used to make decisions concerning the direction of effects in linear models. Results of a Monte Carlo simulation that demonstrate the adequacy of the approach are presented. An empirical example is provided, and applicability of the methodology in cases of violated assumptions is discussed.

  5. Two Paradoxes in Linear Regression Analysis

    Science.gov (United States)

    FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

    2016-01-01

    Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214

  6. Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape

    Directory of Open Access Journals (Sweden)

    Christophe Coupé

    2018-04-01

    Full Text Available As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM, which address grouping of observations, and generalized linear mixed-effects models (GLMM, which offer a family of distributions for the dependent variable. Generalized additive models (GAM are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS. We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships

  7. Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape.

    Science.gov (United States)

    Coupé, Christophe

    2018-01-01

    As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we

  8. How to deal with continuous and dichotomic outcomes in epidemiological research: linear and logistic regression analyses

    NARCIS (Netherlands)

    Tripepi, Giovanni; Jager, Kitty J.; Stel, Vianda S.; Dekker, Friedo W.; Zoccali, Carmine

    2011-01-01

    Because of some limitations of stratification methods, epidemiologists frequently use multiple linear and logistic regression analyses to address specific epidemiological questions. If the dependent variable is a continuous one (for example, systolic pressure and serum creatinine), the researcher

  9. BFLCRM: A BAYESIAN FUNCTIONAL LINEAR COX REGRESSION MODEL FOR PREDICTING TIME TO CONVERSION TO ALZHEIMER'S DISEASE.

    Science.gov (United States)

    Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G

    2015-12-01

    The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer's disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM.

  10. Toward Customer-Centric Organizational Science: A Common Language Effect Size Indicator for Multiple Linear Regressions and Regressions With Higher-Order Terms.

    Science.gov (United States)

    Krasikova, Dina V; Le, Huy; Bachura, Eric

    2018-01-22

    To address a long-standing concern regarding a gap between organizational science and practice, scholars called for more intuitive and meaningful ways of communicating research results to users of academic research. In this article, we develop a common language effect size index (CLβ) that can help translate research results to practice. We demonstrate how CLβ can be computed and used to interpret the effects of continuous and categorical predictors in multiple linear regression models. We also elaborate on how the proposed CLβ index is computed and used to interpret interactions and nonlinear effects in regression models. In addition, we test the robustness of the proposed index to violations of normality and provide means for computing standard errors and constructing confidence intervals around its estimates. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  11. A simple bias correction in linear regression for quantitative trait association under two-tail extreme selection.

    Science.gov (United States)

    Kwan, Johnny S H; Kung, Annie W C; Sham, Pak C

    2011-09-01

    Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.

  12. Multi-stratified multiple regression tests of the linear/no-threshold theory of radon-induced lung cancer

    International Nuclear Information System (INIS)

    Cohen, B.L.

    1992-01-01

    A plot of lung-cancer rates versus radon exposures in 965 US counties, or in all US states, has a strong negative slope, b, in sharp contrast to the strong positive slope predicted by linear/no-threshold theory. The discrepancy between these slopes exceeds 20 standard deviations (SD). Including smoking frequency in the analysis substantially improves fits to a linear relationship but has little effect on the discrepancy in b, because correlations between smoking frequency and radon levels are quite weak. Including 17 socioeconomic variables (SEV) in multiple regression analysis reduces the discrepancy to 15 SD. Data were divided into segments by stratifying on each SEV in turn, and on geography, and on both simultaneously, giving over 300 data sets to be analyzed individually, but negative slopes predominated. The slope is negative whether one considers only the most urban counties or only the most rural; only the richest or only the poorest; only the richest in the South Atlantic region or only the poorest in that region, etc., etc.,; and for all the strata in between. Since this is an ecological study, the well-known problems with ecological studies were investigated and found not to be applicable here. The open-quotes ecological fallacyclose quotes was shown not to apply in testing a linear/no-threshold theory, and the vulnerability to confounding is greatly reduced when confounding factors are only weakly correlated with radon levels, as is generally the case here. All confounding factors known to correlate with radon and with lung cancer were investigated quantitatively and found to have little effect on the discrepancy

  13. Introduction to regression graphics

    CERN Document Server

    Cook, R Dennis

    2009-01-01

    Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava

  14. Linear relationships between cherry tomato traits

    Directory of Open Access Journals (Sweden)

    Bruno Giacomini Sari

    Full Text Available ABSTRACT: The objective of this study was to identify the linear relationship between cherry tomato yield components. Two uniformity trials, without treatments, were conducted on Lilli cherry tomato plants in a plastic greenhouse during the 2014 spring/summer season, with the plants in two stems. Variables observed for each plant were mean fruit length, mean fruit width, mean fruit weight, number of bunches, number of fruits per bunch, total number of fruits, and total fruit weight; a Pearson's correlation matrix was used to estimate the relationship between the variables. Path analysis was then performed considering total fruit weight as the main variable and the remaining variables as explanatory. Due to the severe multicollinearity, the variable 'number of fruits per bunch' was eliminated. Pearson's correlation coefficients were significant between explanatory and main variables. Mean fruit weight has a low cause-and-effect relationship with the total weight of fruits produced. A low cause-and-effect relationship was also observed between number of fruits and number of bunches. Cherry tomato productivity is directly related to the number of fruits per plant.

  15. Hippocampal atrophy and developmental regression as first sign of linear scleroderma "en coup de sabre".

    Science.gov (United States)

    Verhelst, Helene E; Beele, Hilde; Joos, Rik; Vanneuville, Benedicte; Van Coster, Rudy N

    2008-11-01

    An 8-year-old girl with linear scleroderma "en coup de sabre" is reported who, at preschool age, presented with intractable simple partial seizures more than 1 year before skin lesions were first noticed. MRI revealed hippocampal atrophy, controlaterally to the seizures and ipsilaterally to the skin lesions. In the following months, a mental and motor regression was noticed. Cerebral CT scan showed multiple foci of calcifications in the affected hemisphere. In previously reported patients the skin lesions preceded the neurological signs. To the best of our knowledge, hippocampal atrophy was not earlier reported as presenting symptom of linear scleroderma. Linear scleroderma should be included in the differential diagnosis in patients with unilateral hippocampal atrophy even when the typical skin lesions are not present.

  16. QSAR study on the histamine (H3 receptor antagonists using the genetic algorithm: Multi parameter linear regression

    Directory of Open Access Journals (Sweden)

    Adimi Maryam

    2012-01-01

    Full Text Available A quantitative structure activity relationship (QSAR model has been produced for predicting antagonist potency of biphenyl derivatives as human histamine (H3 receptors. The molecular structures of the compounds are numerically represented by various kinds of molecular descriptors. The whole data set was divided into training and test sets. Genetic algorithm based multiple linear regression is used to select most statistically effective descriptors. The final QSAR model (N =24, R2=0.916, F = 51.771, Q2 LOO = 0.872, Q2 LGO = 0.847, Q2 BOOT = 0.857 was fully validated employing leaveone- out (LOO cross-validation approach, Fischer statistics (F, Yrandomisation test, and predictions based on the test data set. The test set presented an external prediction power of R2 test=0.855. In conclusion, the QSAR model generated can be used as a valuable tool for designing similar groups of new antagonists of histamine (H3 receptors.

  17. Re-examining the risk–return relationship in Europe: Linear or non-linear trade-off?

    OpenAIRE

    Salvador, Enrique; Floros, Christos; Arago, Vicent

    2014-01-01

    This paper analyzes the risk–return trade-off in Europe using recent data from 11 European stock markets. After relaxing the linear assumptions in the risk–return relationship by introducing a new approach that considers the current state of the market, we obtain significant evidence for a positive risk–return trade-off for low volatility states. However, this finding is reduced or even non-significant during periods of high volatility. Maintaining the linear assumption over the risk–return t...

  18. A computer tool for a minimax criterion in binary response and heteroscedastic simple linear regression models.

    Science.gov (United States)

    Casero-Alonso, V; López-Fidalgo, J; Torsney, B

    2017-01-01

    Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  19. Two-Sample Tests for High-Dimensional Linear Regression with an Application to Detecting Interactions.

    Science.gov (United States)

    Xia, Yin; Cai, Tianxi; Cai, T Tony

    2018-01-01

    Motivated by applications in genomics, we consider in this paper global and multiple testing for the comparisons of two high-dimensional linear regression models. A procedure for testing the equality of the two regression vectors globally is proposed and shown to be particularly powerful against sparse alternatives. We then introduce a multiple testing procedure for identifying unequal coordinates while controlling the false discovery rate and false discovery proportion. Theoretical justifications are provided to guarantee the validity of the proposed tests and optimality results are established under sparsity assumptions on the regression coefficients. The proposed testing procedures are easy to implement. Numerical properties of the procedures are investigated through simulation and data analysis. The results show that the proposed tests maintain the desired error rates under the null and have good power under the alternative at moderate sample sizes. The procedures are applied to the Framingham Offspring study to investigate the interactions between smoking and cardiovascular related genetic mutations important for an inflammation marker.

  20. Soil organic carbon distribution in Mediterranean areas under a climate change scenario via multiple linear regression analysis.

    Science.gov (United States)

    Olaya-Abril, Alfonso; Parras-Alcántara, Luis; Lozano-García, Beatriz; Obregón-Romero, Rafael

    2017-08-15

    Over time, the interest on soil studies has increased due to its role in carbon sequestration in terrestrial ecosystems, which could contribute to decreasing atmospheric CO 2 rates. In many studies, independent variables were related to soil organic carbon (SOC) alone, however, the contribution degree of each variable with the experimentally determined SOC content were not considered. In this study, samples from 612 soil profiles were obtained in a natural protected (Red Natura 2000) of Sierra Morena (Mediterranean area, South Spain), considering only the topsoil 0-25cm, for better comparison between results. 24 independent variables were used to define it relationship with SOC content. Subsequently, using a multiple linear regression analysis, the effects of these variables on the SOC correlation was considered. Finally, the best parameters determined with the regression analysis were used in a climatic change scenario. The model indicated that SOC in a future scenario of climate change depends on average temperature of coldest quarter (41.9%), average temperature of warmest quarter (34.5%), annual precipitation (22.2%) and annual average temperature (1.3%). When the current and future situations were compared, the SOC content in the study area was reduced a 35.4%, and a trend towards migration to higher latitude and altitude was observed. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. The Relationship between Residential Electricity Consumption and Income: A Piecewise Linear Model with Panel Data

    Directory of Open Access Journals (Sweden)

    Yanan Liu

    2016-10-01

    Full Text Available There are many uncertainties and risks in residential electricity consumption associated with economic development. Knowledge of the relationship between residential electricity consumption and its key determinant—income—is important to the sustainable development of the electric power industry. Using panel data from 30 provinces for the 1995–2012 period, this study investigates how residential electricity consumption changes as incomes increase in China. Previous studies typically used linear or quadratic double-logarithmic models imposing ex ante restrictions on the indistinct relationship between residential electricity consumption and income. Contrary to those models, we employed a reduced piecewise linear model that is self-adaptive and highly flexible and circumvents the problem of “prior restrictions”. Robust tests of different segment specifications and regression methods are performed to ensure the validity of the research. The results provide strong evidence that the income elasticity was approximately one, and it remained stable throughout the estimation period. The income threshold at which residential electricity consumption automatically remains stable or slows has not been reached. To ensure the sustainable development of the electric power industry, introducing higher energy efficiency standards for electrical appliances and improving income levels are vital. Government should also emphasize electricity conservation in the industrial sector rather than in residential sector.

  2. A simple bias correction in linear regression for quantitative trait association under two-tail extreme selection

    OpenAIRE

    Kwan, Johnny S. H.; Kung, Annie W. C.; Sham, Pak C.

    2011-01-01

    Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias. © The Author(s) 2011.

  3. Method validation using weighted linear regression models for quantification of UV filters in water samples.

    Science.gov (United States)

    da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues

    2015-01-01

    This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Linear and evolutionary polynomial regression models to forecast coastal dynamics: Comparison and reliability assessment

    Science.gov (United States)

    Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe

    2018-01-01

    In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.

  5. Forecasting on the total volumes of Malaysia's imports and exports by multiple linear regression

    Science.gov (United States)

    Beh, W. L.; Yong, M. K. Au

    2017-04-01

    This study is to give an insight on the doubt of the important of macroeconomic variables that affecting the total volumes of Malaysia's imports and exports by using multiple linear regression (MLR) analysis. The time frame for this study will be determined by using quarterly data of the total volumes of Malaysia's imports and exports covering the period between 2000-2015. The macroeconomic variables will be limited to eleven variables which are the exchange rate of US Dollar with Malaysia Ringgit (USD-MYR), exchange rate of China Yuan with Malaysia Ringgit (RMB-MYR), exchange rate of European Euro with Malaysia Ringgit (EUR-MYR), exchange rate of Singapore Dollar with Malaysia Ringgit (SGD-MYR), crude oil prices, gold prices, producer price index (PPI), interest rate, consumer price index (CPI), industrial production index (IPI) and gross domestic product (GDP). This study has applied the Johansen Co-integration test to investigate the relationship among the total volumes to Malaysia's imports and exports. The result shows that crude oil prices, RMB-MYR, EUR-MYR and IPI play important roles in the total volumes of Malaysia's imports. Meanwhile crude oil price, USD-MYR and GDP play important roles in the total volumes of Malaysia's exports.

  6. Modeling of Soil Aggregate Stability using Support Vector Machines and Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Ali Asghar Besalatpour

    2016-02-01

    by 20-m digital elevation model (DEM. The data set was divided into two subsets of training and testing. The training subset was randomly chosen from 70% of the total set of the data and the remaining samples (30% of the data were used as the testing set. The correlation coefficient (r, mean square error (MSE, and error percentage (ERROR% between the measured and the predicted GMD values were used to evaluate the performance of the models. Results and Discussion: The description statistics showed that there was little variability in the sample distributions of the variables used in this study to develop the GMD prediction models, indicating that their values were all normally distributed. The constructed SVM model had better performance in predicting GMD compared to the traditional multiple linear regression model. The obtained MSE and r values for the developed SVM model for soil aggregate stability prediction were 0.005 and 0.86, respectively. The obtained ERROR% value for soil aggregate stability prediction using the SVM model was 10.7% while it was 15.7% for the regression model. The scatter plot figures also showed that the SVM model was more accurate in GMD estimation than the MLR model, since the predicted GMD values were closer in agreement with the measured values for most of the samples. The worse performance of the MLR model might be due to the larger amount of data that is required for developing a sustainable regression model compared to intelligent systems. Furthermore, only the linear effects of the predictors on the dependent variable can be extracted by linear models while in many cases the effects may not be linear in nature. Meanwhile, the SVM model is suitable for modelling nonlinear relationships and its major advantage is that the method can be developed without knowing the exact form of the analytical function on which the model should be built. All these indicate that the SVM approach would be a better choice for predicting soil aggregate

  7. Improving ASTER GDEM Accuracy Using Land Use-Based Linear Regression Methods: A Case Study of Lianyungang, East China

    Directory of Open Access Journals (Sweden)

    Xiaoyan Yang

    2018-04-01

    Full Text Available The Advanced Spaceborne Thermal-Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM is important to a wide range of geographical and environmental studies. Its accuracy, to some extent associated with land-use types reflecting topography, vegetation coverage, and human activities, impacts the results and conclusions of these studies. In order to improve the accuracy of ASTER GDEM prior to its application, we investigated ASTER GDEM errors based on individual land-use types and proposed two linear regression calibration methods, one considering only land use-specific errors and the other considering the impact of both land-use and topography. Our calibration methods were tested on the coastal prefectural city of Lianyungang in eastern China. Results indicate that (1 ASTER GDEM is highly accurate for rice, wheat, grass and mining lands but less accurate for scenic, garden, wood and bare lands; (2 despite improvements in ASTER GDEM2 accuracy, multiple linear regression calibration requires more data (topography and a relatively complex calibration process; (3 simple linear regression calibration proves a practicable and simplified means to systematically investigate and improve the impact of land-use on ASTER GDEM accuracy. Our method is applicable to areas with detailed land-use data based on highly accurate field-based point-elevation measurements.

  8. Assessing risk factors for periodontitis using regression

    Science.gov (United States)

    Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa

    2013-10-01

    Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.

  9. Estimating integrated variance in the presence of microstructure noise using linear regression

    Science.gov (United States)

    Holý, Vladimír

    2017-07-01

    Using financial high-frequency data for estimation of integrated variance of asset prices is beneficial but with increasing number of observations so-called microstructure noise occurs. This noise can significantly bias the realized variance estimator. We propose a method for estimation of the integrated variance robust to microstructure noise as well as for testing the presence of the noise. Our method utilizes linear regression in which realized variances estimated from different data subsamples act as dependent variable while the number of observations act as explanatory variable. We compare proposed estimator with other methods on simulated data for several microstructure noise structures.

  10. Face Hallucination with Linear Regression Model in Semi-Orthogonal Multilinear PCA Method

    Science.gov (United States)

    Asavaskulkiet, Krissada

    2018-04-01

    In this paper, we propose a new face hallucination technique, face images reconstruction in HSV color space with a semi-orthogonal multilinear principal component analysis method. This novel hallucination technique can perform directly from tensors via tensor-to-vector projection by imposing the orthogonality constraint in only one mode. In our experiments, we use facial images from FERET database to test our hallucination approach which is demonstrated by extensive experiments with high-quality hallucinated color faces. The experimental results assure clearly demonstrated that we can generate photorealistic color face images by using the SO-MPCA subspace with a linear regression model.

  11. Use of empirical likelihood to calibrate auxiliary information in partly linear monotone regression models.

    Science.gov (United States)

    Chen, Baojiang; Qin, Jing

    2014-05-10

    In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.

  12. Genomic prediction based on data from three layer lines using non-linear regression models.

    Science.gov (United States)

    Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L

    2014-11-06

    Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional

  13. The Non-Linear Relationship Between Fiscal Deficits And Inflation: Evidence From Africa

    Directory of Open Access Journals (Sweden)

    Abu Nurudeen

    2015-12-01

    Full Text Available Although, there is abundant research on the fiscal deficit-inflation relationship, little has been done to investigate the non-linear association between them, particularly in Africa. This study employs fixed-effects and GMM estimators to examine the non-linear relationship between deficits and inflation from 1999 to 2011 in 51 African economies, which are further grouped into high-inflation/low-income countries and moderate-inflation/middle-income countries. The results indicate that the deficit-inflation relationship is non-linear for the whole sample and sub-groups. For the whole sample, a percentage point increase in deficit results in a 0.25 percentage point increase in inflation rate, while the relationship becomes quantitatively greater once deficits reach 23% of GDP. The subsamples report different relationships. Although our results cannot be used as the base for generalization, we identify importance of grouping African countries according to their levels of inflation and/or income, rather than treating them as a homogeneous entity.

  14. Linear regression based on Minimum Covariance Determinant (MCD) and TELBS methods on the productivity of phytoplankton

    Science.gov (United States)

    Gusriani, N.; Firdaniza

    2018-03-01

    The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.

  15. U.S. Army Armament Research, Development and Engineering Center Grain Evaluation Software to Numerically Predict Linear Burn Regression for Solid Propellant Grain Geometries

    Science.gov (United States)

    2017-10-01

    ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID PROPELLANT GRAIN GEOMETRIES Brian...distribution is unlimited. AD U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER Munitions Engineering Technology Center Picatinny...U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID

  16. Isotherms and thermodynamics by linear and non-linear regression analysis for the sorption of methylene blue onto activated carbon: Comparison of various error functions

    International Nuclear Information System (INIS)

    Kumar, K. Vasanth; Porkodi, K.; Rocha, F.

    2008-01-01

    A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of methylene blue sorption by activated carbon. The r 2 was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions, namely coefficient of determination (r 2 ), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r 2 was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K 2 was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm

  17. A Matlab program for stepwise regression

    Directory of Open Access Journals (Sweden)

    Yanhong Qi

    2016-03-01

    Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.

  18. Aspects of robust linear regression

    NARCIS (Netherlands)

    Davies, P.L.

    1993-01-01

    Section 1 of the paper contains a general discussion of robustness. In Section 2 the influence function of the Hampel-Rousseeuw least median of squares estimator is derived. Linearly invariant weak metrics are constructed in Section 3. It is shown in Section 4 that $S$-estimators satisfy an exact

  19. Uncovering state-dependent relationships in shallow lakes using Bayesian latent variable regression.

    Science.gov (United States)

    Vitense, Kelsey; Hanson, Mark A; Herwig, Brian R; Zimmer, Kyle D; Fieberg, John

    2018-03-01

    Ecosystems sometimes undergo dramatic shifts between contrasting regimes. Shallow lakes, for instance, can transition between two alternative stable states: a clear state dominated by submerged aquatic vegetation and a turbid state dominated by phytoplankton. Theoretical models suggest that critical nutrient thresholds differentiate three lake types: highly resilient clear lakes, lakes that may switch between clear and turbid states following perturbations, and highly resilient turbid lakes. For effective and efficient management of shallow lakes and other systems, managers need tools to identify critical thresholds and state-dependent relationships between driving variables and key system features. Using shallow lakes as a model system for which alternative stable states have been demonstrated, we developed an integrated framework using Bayesian latent variable regression (BLR) to classify lake states, identify critical total phosphorus (TP) thresholds, and estimate steady state relationships between TP and chlorophyll a (chl a) using cross-sectional data. We evaluated the method using data simulated from a stochastic differential equation model and compared its performance to k-means clustering with regression (KMR). We also applied the framework to data comprising 130 shallow lakes. For simulated data sets, BLR had high state classification rates (median/mean accuracy >97%) and accurately estimated TP thresholds and state-dependent TP-chl a relationships. Classification and estimation improved with increasing sample size and decreasing noise levels. Compared to KMR, BLR had higher classification rates and better approximated the TP-chl a steady state relationships and TP thresholds. We fit the BLR model to three different years of empirical shallow lake data, and managers can use the estimated bifurcation diagrams to prioritize lakes for management according to their proximity to thresholds and chance of successful rehabilitation. Our model improves upon

  20. Internal correction of spectral interferences and mass bias for selenium metabolism studies using enriched stable isotopes in combination with multiple linear regression.

    Science.gov (United States)

    Lunøe, Kristoffer; Martínez-Sierra, Justo Giner; Gammelgaard, Bente; Alonso, J Ignacio García

    2012-03-01

    The analytical methodology for the in vivo study of selenium metabolism using two enriched selenium isotopes has been modified, allowing for the internal correction of spectral interferences and mass bias both for total selenium and speciation analysis. The method is based on the combination of an already described dual-isotope procedure with a new data treatment strategy based on multiple linear regression. A metabolic enriched isotope ((77)Se) is given orally to the test subject and a second isotope ((74)Se) is employed for quantification. In our approach, all possible polyatomic interferences occurring in the measurement of the isotope composition of selenium by collision cell quadrupole ICP-MS are taken into account and their relative contribution calculated by multiple linear regression after minimisation of the residuals. As a result, all spectral interferences and mass bias are corrected internally allowing the fast and independent quantification of natural abundance selenium ((nat)Se) and enriched (77)Se. In this sense, the calculation of the tracer/tracee ratio in each sample is straightforward. The method has been applied to study the time-related tissue incorporation of (77)Se in male Wistar rats while maintaining the (nat)Se steady-state conditions. Additionally, metabolically relevant information such as selenoprotein synthesis and selenium elimination in urine could be studied using the proposed methodology. In this case, serum proteins were separated by affinity chromatography while reverse phase was employed for urine metabolites. In both cases, (74)Se was used as a post-column isotope dilution spike. The application of multiple linear regression to the whole chromatogram allowed us to calculate the contribution of bromine hydride, selenium hydride, argon polyatomics and mass bias on the observed selenium isotope patterns. By minimising the square sum of residuals for the whole chromatogram, internal correction of spectral interferences and mass

  1. Characteristics and Properties of a Simple Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Kowal Robert

    2016-12-01

    Full Text Available A simple linear regression model is one of the pillars of classic econometrics. Despite the passage of time, it continues to raise interest both from the theoretical side as well as from the application side. One of the many fundamental questions in the model concerns determining derivative characteristics and studying the properties existing in their scope, referring to the first of these aspects. The literature of the subject provides several classic solutions in that regard. In the paper, a completely new design is proposed, based on the direct application of variance and its properties, resulting from the non-correlation of certain estimators with the mean, within the scope of which some fundamental dependencies of the model characteristics are obtained in a much more compact manner. The apparatus allows for a simple and uniform demonstration of multiple dependencies and fundamental properties in the model, and it does it in an intuitive manner. The results were obtained in a classic, traditional area, where everything, as it might seem, has already been thoroughly studied and discovered.

  2. Exhaustive Search for Sparse Variable Selection in Linear Regression

    Science.gov (United States)

    Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato

    2018-04-01

    We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.

  3. Soil moisture estimation using multi linear regression with terraSAR-X data

    Directory of Open Access Journals (Sweden)

    G. García

    2016-06-01

    Full Text Available The first five centimeters of soil form an interface where the main heat fluxes exchanges between the land surface and the atmosphere occur. Besides ground measurements, remote sensing has proven to be an excellent tool for the monitoring of spatial and temporal distributed data of the most relevant Earth surface parameters including soil’s parameters. Indeed, active microwave sensors (Synthetic Aperture Radar - SAR offer the opportunity to monitor soil moisture (HS at global, regional and local scales by monitoring involved processes. Several inversion algorithms, that derive geophysical information as HS from SAR data, were developed. Many of them use electromagnetic models for simulating the backscattering coefficient and are based on statistical techniques, such as neural networks, inversion methods and regression models. Recent studies have shown that simple multiple regression techniques yield satisfactory results. The involved geophysical variables in these methodologies are descriptive of the soil structure, microwave characteristics and land use. Therefore, in this paper we aim at developing a multiple linear regression model to estimate HS on flat agricultural regions using TerraSAR-X satellite data and data from a ground weather station. The results show that the backscatter, the precipitation and the relative humidity are the explanatory variables of HS. The results obtained presented a RMSE of 5.4 and a R2  of about 0.6

  4. Linear and Non-Linear Dose-Response Functions Reveal a Hormetic Relationship Between Stress and Learning

    OpenAIRE

    Zoladz, Phillip R.; Diamond, David M.

    2008-01-01

    Over a century of behavioral research has shown that stress can enhance or impair learning and memory. In the present review, we have explored the complex effects of stress on cognition and propose that they are characterized by linear and non-linear dose-response functions, which together reveal a hormetic relationship between stress and learning. We suggest that stress initially enhances hippocampal function, resulting from amygdala-induced excitation of hippocampal synaptic plasticity, as ...

  5. Linear Regression Based Real-Time Filtering

    Directory of Open Access Journals (Sweden)

    Misel Batmend

    2013-01-01

    Full Text Available This paper introduces real time filtering method based on linear least squares fitted line. Method can be used in case that a filtered signal is linear. This constraint narrows a band of potential applications. Advantage over Kalman filter is that it is computationally less expensive. The paper further deals with application of introduced method on filtering data used to evaluate a position of engraved material with respect to engraving machine. The filter was implemented to the CNC engraving machine control system. Experiments showing its performance are included.

  6. A unified framework for testing in the linear regression model under unknown order of fractional integration

    DEFF Research Database (Denmark)

    Christensen, Bent Jesper; Kruse, Robinson; Sibbertsen, Philipp

    We consider hypothesis testing in a general linear time series regression framework when the possibly fractional order of integration of the error term is unknown. We show that the approach suggested by Vogelsang (1998a) for the case of integer integration does not apply to the case of fractional...

  7. Use of multiple linear regression and logistic regression models to investigate changes in birthweight for term singleton infants in Scotland.

    Science.gov (United States)

    Bonellie, Sandra R

    2012-10-01

    To illustrate the use of regression and logistic regression models to investigate changes over time in size of babies particularly in relation to social deprivation, age of the mother and smoking. Mean birthweight has been found to be increasing in many countries in recent years, but there are still a group of babies who are born with low birthweights. Population-based retrospective cohort study. Multiple linear regression and logistic regression models are used to analyse data on term 'singleton births' from Scottish hospitals between 1994-2003. Mothers who smoke are shown to give birth to lighter babies on average, a difference of approximately 0.57 Standard deviations lower (95% confidence interval. 0.55-0.58) when adjusted for sex and parity. These mothers are also more likely to have babies that are low birthweight (odds ratio 3.46, 95% confidence interval 3.30-3.63) compared with non-smokers. Low birthweight is 30% more likely where the mother lives in the most deprived areas compared with the least deprived, (odds ratio 1.30, 95% confidence interval 1.21-1.40). Smoking during pregnancy is shown to have a detrimental effect on the size of infants at birth. This effect explains some, though not all, of the observed socioeconomic birthweight. It also explains much of the observed birthweight differences by the age of the mother.   Identifying mothers at greater risk of having a low birthweight baby as important implications for the care and advice this group receives. © 2012 Blackwell Publishing Ltd.

  8. Credit Scoring Problem Based on Regression Analysis

    OpenAIRE

    Khassawneh, Bashar Suhil Jad Allah

    2014-01-01

    ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....

  9. A Simple Linear Regression Method for Quantitative Trait Loci Linkage Analysis With Censored Observations

    OpenAIRE

    Anderson, Carl A.; McRae, Allan F.; Visscher, Peter M.

    2006-01-01

    Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using...

  10. Bisphenol-A exposures and behavioural aberrations: median and linear spline and meta-regression analyses of 12 toxicity studies in rodents.

    Science.gov (United States)

    Peluso, Marco E M; Munnia, Armelle; Ceppi, Marcello

    2014-11-05

    Exposures to bisphenol-A, a weak estrogenic chemical, largely used for the production of plastic containers, can affect the rodent behaviour. Thus, we examined the relationships between bisphenol-A and the anxiety-like behaviour, spatial skills, and aggressiveness, in 12 toxicity studies of rodent offspring from females orally exposed to bisphenol-A, while pregnant and/or lactating, by median and linear splines analyses. Subsequently, the meta-regression analysis was applied to quantify the behavioural changes. U-shaped, inverted U-shaped and J-shaped dose-response curves were found to describe the relationships between bisphenol-A with the behavioural outcomes. The occurrence of anxiogenic-like effects and spatial skill changes displayed U-shaped and inverted U-shaped curves, respectively, providing examples of effects that are observed at low-doses. Conversely, a J-dose-response relationship was observed for aggressiveness. When the proportion of rodents expressing certain traits or the time that they employed to manifest an attitude was analysed, the meta-regression indicated that a borderline significant increment of anxiogenic-like effects was present at low-doses regardless of sexes (β)=-0.8%, 95% C.I. -1.7/0.1, P=0.076, at ≤120 μg bisphenol-A. Whereas, only bisphenol-A-males exhibited a significant inhibition of spatial skills (β)=0.7%, 95% C.I. 0.2/1.2, P=0.004, at ≤100 μg/day. A significant increment of aggressiveness was observed in both the sexes (β)=67.9,C.I. 3.4, 172.5, P=0.038, at >4.0 μg. Then, bisphenol-A treatments significantly abrogated spatial learning and ability in males (Pbisphenol-A, e.g. ≤120 μg/day, were associated to behavioural aberrations in offspring. Copyright © 2014. Published by Elsevier Ireland Ltd.

  11. (Non) linear regression modelling

    NARCIS (Netherlands)

    Cizek, P.; Gentle, J.E.; Hardle, W.K.; Mori, Y.

    2012-01-01

    We will study causal relationships of a known form between random variables. Given a model, we distinguish one or more dependent (endogenous) variables Y = (Y1,…,Yl), l ∈ N, which are explained by a model, and independent (exogenous, explanatory) variables X = (X1,…,Xp),p ∈ N, which explain or

  12. Estimation of error components in a multi-error linear regression model, with an application to track fitting

    International Nuclear Information System (INIS)

    Fruehwirth, R.

    1993-01-01

    We present an estimation procedure of the error components in a linear regression model with multiple independent stochastic error contributions. After solving the general problem we apply the results to the estimation of the actual trajectory in track fitting with multiple scattering. (orig.)

  13. Linear and non-linear quantitative structure-activity relationship models on indole substitution patterns as inhibitors of HIV-1 attachment.

    Science.gov (United States)

    Nirouei, Mahyar; Ghasemi, Ghasem; Abdolmaleki, Parviz; Tavakoli, Abdolreza; Shariati, Shahab

    2012-06-01

    The antiviral drugs that inhibit human immunodeficiency virus (HIV) entry to the target cells are already in different phases of clinical trials. They prevent viral entry and have a highly specific mechanism of action with a low toxicity profile. Few QSAR studies have been performed on this group of inhibitors. This study was performed to develop a quantitative structure-activity relationship (QSAR) model of the biological activity of indole glyoxamide derivatives as inhibitors of the interaction between HIV glycoprotein gp120 and host cell CD4 receptors. Forty different indole glyoxamide derivatives were selected as a sample set and geometrically optimized using Gaussian 98W. Different combinations of multiple linear regression (MLR), genetic algorithms (GA) and artificial neural networks (ANN) were then utilized to construct the QSAR models. These models were also utilized to select the most efficient subsets of descriptors in a cross-validation procedure for non-linear log (1/EC50) prediction. The results that were obtained using GA-ANN were compared with MLR-MLR and MLR-ANN models. A high predictive ability was observed for the MLR, MLR-ANN and GA-ANN models, with root mean sum square errors (RMSE) of 0.99, 0.91 and 0.67, respectively (N = 40). In summary, machine learning methods were highly effective in designing QSAR models when compared to statistical method.

  14. Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method

    International Nuclear Information System (INIS)

    Lin Chao; Chen Yingqiang; Zhang Qingwen; Tan Fuwen; Peng Guanghui

    1991-01-01

    A multiple linear regression method was used to compute γ spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients

  15. QSAR Modeling of COX -2 Inhibitory Activity of Some Dihydropyridine and Hydroquinoline Derivatives Using Multiple Linear Regression (MLR) Method.

    Science.gov (United States)

    Akbari, Somaye; Zebardast, Tannaz; Zarghi, Afshin; Hajimahdi, Zahra

    2017-01-01

    COX-2 inhibitory activities of some 1,4-dihydropyridine and 5-oxo-1,4,5,6,7,8-hexahydroquinoline derivatives were modeled by quantitative structure-activity relationship (QSAR) using stepwise-multiple linear regression (SW-MLR) method. The built model was robust and predictive with correlation coefficient (R 2 ) of 0.972 and 0.531 for training and test groups, respectively. The quality of the model was evaluated by leave-one-out (LOO) cross validation (LOO correlation coefficient (Q 2 ) of 0.943) and Y-randomization. We also employed a leverage approach for the defining of applicability domain of model. Based on QSAR models results, COX-2 inhibitory activity of selected data set had correlation with BEHm6 (highest eigenvalue n. 6 of Burden matrix/weighted by atomic masses), Mor03u (signal 03/unweighted) and IVDE (Mean information content on the vertex degree equality) descriptors which derived from their structures.

  16. Logarithmic Transformations in Regression: Do You Transform Back Correctly?

    Science.gov (United States)

    Dambolena, Ismael G.; Eriksen, Steven E.; Kopcso, David P.

    2009-01-01

    The logarithmic transformation is often used in regression analysis for a variety of purposes such as the linearization of a nonlinear relationship between two or more variables. We have noticed that when this transformation is applied to the response variable, the computation of the point estimate of the conditional mean of the original response…

  17. Fuzzy multiple linear regression: A computational approach

    Science.gov (United States)

    Juang, C. H.; Huang, X. H.; Fleming, J. W.

    1992-01-01

    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  18. An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt

    Science.gov (United States)

    Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir

    2018-04-01

    Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.

  19. A hybrid genetic algorithm and linear regression for prediction of NOx emission in power generation plant

    International Nuclear Information System (INIS)

    Bunyamin, Muhammad Afif; Yap, Keem Siah; Aziz, Nur Liyana Afiqah Abdul; Tiong, Sheih Kiong; Wong, Shen Yuong; Kamal, Md Fauzan

    2013-01-01

    This paper presents a new approach of gas emission estimation in power generation plant using a hybrid Genetic Algorithm (GA) and Linear Regression (LR) (denoted as GA-LR). The LR is one of the approaches that model the relationship between an output dependant variable, y, with one or more explanatory variables or inputs which denoted as x. It is able to estimate unknown model parameters from inputs data. On the other hand, GA is used to search for the optimal solution until specific criteria is met causing termination. These results include providing good solutions as compared to one optimal solution for complex problems. Thus, GA is widely used as feature selection. By combining the LR and GA (GA-LR), this new technique is able to select the most important input features as well as giving more accurate prediction by minimizing the prediction errors. This new technique is able to produce more consistent of gas emission estimation, which may help in reducing population to the environment. In this paper, the study's interest is focused on nitrous oxides (NOx) prediction. The results of the experiment are encouraging.

  20. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing.

    Science.gov (United States)

    Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

    2018-02-01

    A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  1. Threat Appeals: The Fear-Persuasion Relationship is Linear and Curvilinear.

    Science.gov (United States)

    Dillard, James Price; Li, Ruobing; Huang, Yan

    2017-11-01

    Drive theory may be seen as the first scientific theory of health and risk communication. However, its prediction of a curvilinear association between fear and persuasion is generally held to be incorrect. A close rereading of Hovland et al. reveals that within- and between-persons processes were conflated. Using a message that advocated obtaining a screening for colonoscopy, this study (N = 259) tested both forms of the inverted-U hypothesis. In the between-persons data, analyses revealed a linear effect that was consistent with earlier investigations. However, the data showed an inverted-U relationship in within-persons data. Hence, the relationship between fear and persuasion is linear or curvilinear depending on the level of analysis.

  2. Linear Equating for the NEAT Design: Parameter Substitution Models and Chained Linear Relationship Models

    Science.gov (United States)

    Kane, Michael T.; Mroch, Andrew A.; Suh, Youngsuk; Ripkey, Douglas R.

    2009-01-01

    This paper analyzes five linear equating models for the "nonequivalent groups with anchor test" (NEAT) design with internal anchors (i.e., the anchor test is part of the full test). The analysis employs a two-dimensional framework. The first dimension contrasts two general approaches to developing the equating relationship. Under a "parameter…

  3. On linear relationship between shock velocity and particle velocity

    International Nuclear Information System (INIS)

    Dandache, H.

    1986-11-01

    We attempt to derive the linear relationship between shock velocity U s and particle velocity U p from thermodynamic considerations, taking into account an ideal gas equation of state and a Mie-Grueneisen equation of state for solids. 23 refs

  4. Construction of multiple linear regression models using blood biomarkers for selecting against abdominal fat traits in broilers.

    Science.gov (United States)

    Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H

    2018-01-01

    Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.

  5. Generating linear regression model to predict motor functions by use of laser range finder during TUG.

    Science.gov (United States)

    Adachi, Daiki; Nishiguchi, Shu; Fukutani, Naoto; Hotta, Takayuki; Tashiro, Yuto; Morino, Saori; Shirooka, Hidehiko; Nozaki, Yuma; Hirata, Hinako; Yamaguchi, Moe; Yorozu, Ayanori; Takahashi, Masaki; Aoyama, Tomoki

    2017-05-01

    The purpose of this study was to investigate which spatial and temporal parameters of the Timed Up and Go (TUG) test are associated with motor function in elderly individuals. This study included 99 community-dwelling women aged 72.9 ± 6.3 years. Step length, step width, single support time, variability of the aforementioned parameters, gait velocity, cadence, reaction time from starting signal to first step, and minimum distance between the foot and a marker placed to 3 in front of the chair were measured using our analysis system. The 10-m walk test, five times sit-to-stand (FTSTS) test, and one-leg standing (OLS) test were used to assess motor function. Stepwise multivariate linear regression analysis was used to determine which TUG test parameters were associated with each motor function test. Finally, we calculated a predictive model for each motor function test using each regression coefficient. In stepwise linear regression analysis, step length and cadence were significantly associated with the 10-m walk test, FTSTS and OLS test. Reaction time was associated with the FTSTS test, and step width was associated with the OLS test. Each predictive model showed a strong correlation with the 10-m walk test and OLS test (P motor function test. Moreover, the TUG test time regarded as the lower extremity function and mobility has strong predictive ability in each motor function test. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.

  6. A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.

    Science.gov (United States)

    Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S

    2017-06-01

    The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were

  7. Entrepreneurship Education: Non-Linearity in the Satisfaction – Continuation Relationship = Podjetniško izobraževanje: nelineranost v razmerju med zadovoljstvom in nadaljevanjem izobraževanja

    Directory of Open Access Journals (Sweden)

    Boštjan Antoncic

    2007-06-01

    Full Text Available In this paper we propose one possible explanation of the interrelationships between education continuation or avoidance, satisfaction level, and experience (entrepreneurial maturity of potential and practicing entrepreneurs. By using the cusp catastrophe model we propose that relationship between education satisfaction and continuation tends to be linear for less experienced entrepreneurs (pre-entrepreneurs, whereas for more experienced entrepreneurs the relationship is proposed to be positive but non-linear (s-shaped. Data were collected with a structured questionnaire from 122 participants in management and entrepreneurship education and training programs. The proposed model was tested with linear and non-linear regression equations. The relationship between satisfaction and continuation (loyalty was found to be positive for all entrepreneurial and nonentrepreneurial groups. The appropriate functional form for the satisfaction-continuation relationship discovered for non-entrepreneurs and people that are only thinking about entrepreneurship (maybe-entrepreneurs is close to linear and less steep than for more entrepreneurial groups. By contrast, prospective entrepreneurs (people in the process of pre-start up and practicing entrepreneurs tend to be more sensitive to their education satisfaction in their future education continuation decisions. The appropriate functional form for these entrepreneurial groups tends to be cubical, which is close to the s-shaped function proposed in the cusp model. The study provided evidence that the relationships between entrepreneurial maturity, education satisfaction and education continuation may be modeled as a cusp catastrophe model. The proposed model can be helpful for education and for training providers (and marketers in explaining and predicting of education loyalty or the switching behavior of entrepreneurs.

  8. Effects of measurement errors on psychometric measurements in ergonomics studies: Implications for correlations, ANOVA, linear regression, factor analysis, and linear discriminant analysis.

    Science.gov (United States)

    Liu, Yan; Salvendy, Gavriel

    2009-05-01

    This paper aims to demonstrate the effects of measurement errors on psychometric measurements in ergonomics studies. A variety of sources can cause random measurement errors in ergonomics studies and these errors can distort virtually every statistic computed and lead investigators to erroneous conclusions. The effects of measurement errors on five most widely used statistical analysis tools have been discussed and illustrated: correlation; ANOVA; linear regression; factor analysis; linear discriminant analysis. It has been shown that measurement errors can greatly attenuate correlations between variables, reduce statistical power of ANOVA, distort (overestimate, underestimate or even change the sign of) regression coefficients, underrate the explanation contributions of the most important factors in factor analysis and depreciate the significance of discriminant function and discrimination abilities of individual variables in discrimination analysis. The discussions will be restricted to subjective scales and survey methods and their reliability estimates. Other methods applied in ergonomics research, such as physical and electrophysiological measurements and chemical and biomedical analysis methods, also have issues of measurement errors, but they are beyond the scope of this paper. As there has been increasing interest in the development and testing of theories in ergonomics research, it has become very important for ergonomics researchers to understand the effects of measurement errors on their experiment results, which the authors believe is very critical to research progress in theory development and cumulative knowledge in the ergonomics field.

  9. Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.

    Science.gov (United States)

    Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H

    2006-01-01

    Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.

  10. Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis

    KAUST Repository

    Rubio, Francisco J.

    2016-02-09

    We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.

  11. Reduction of interferences in graphite furnace atomic absorption spectrometry by multiple linear regression modelling

    Science.gov (United States)

    Grotti, Marco; Abelmoschi, Maria Luisa; Soggia, Francesco; Tiberiade, Christian; Frache, Roberto

    2000-12-01

    The multivariate effects of Na, K, Mg and Ca as nitrates on the electrothermal atomisation of manganese, cadmium and iron were studied by multiple linear regression modelling. Since the models proved to efficiently predict the effects of the considered matrix elements in a wide range of concentrations, they were applied to correct the interferences occurring in the determination of trace elements in seawater after pre-concentration of the analytes. In order to obtain a statistically significant number of samples, a large volume of the certified seawater reference materials CASS-3 and NASS-3 was treated with Chelex-100 resin; then, the chelating resin was separated from the solution, divided into several sub-samples, each of them was eluted with nitric acid and analysed by electrothermal atomic absorption spectrometry (for trace element determinations) and inductively coupled plasma optical emission spectrometry (for matrix element determinations). To minimise any other systematic error besides that due to matrix effects, accuracy of the pre-concentration step and contamination levels of the procedure were checked by inductively coupled plasma mass spectrometric measurements. Analytical results obtained by applying the multiple linear regression models were compared with those obtained with other calibration methods, such as external calibration using acid-based standards, external calibration using matrix-matched standards and the analyte addition technique. Empirical models proved to efficiently reduce interferences occurring in the analysis of real samples, allowing an improvement of accuracy better than for other calibration methods.

  12. Synthesis of linear regression coefficients by recovering the within-study covariance matrix from summary statistics.

    Science.gov (United States)

    Yoneoka, Daisuke; Henmi, Masayuki

    2017-06-01

    Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Comparison of multiple linear regression and artificial neural network in developing the objective functions of the orthopaedic screws.

    Science.gov (United States)

    Hsu, Ching-Chi; Lin, Jinn; Chao, Ching-Kong

    2011-12-01

    Optimizing the orthopaedic screws can greatly improve their biomechanical performances. However, a methodical design optimization approach requires a long time to search the best design. Thus, the surrogate objective functions of the orthopaedic screws should be accurately developed. To our knowledge, there is no study to evaluate the strengths and limitations of the surrogate methods in developing the objective functions of the orthopaedic screws. Three-dimensional finite element models for both the tibial locking screws and the spinal pedicle screws were constructed and analyzed. Then, the learning data were prepared according to the arrangement of the Taguchi orthogonal array, and the verification data were selected with use of a randomized selection. Finally, the surrogate objective functions were developed by using either the multiple linear regression or the artificial neural network. The applicability and accuracy of those surrogate methods were evaluated and discussed. The multiple linear regression method could successfully construct the objective function of the tibial locking screws, but it failed to develop the objective function of the spinal pedicle screws. The artificial neural network method showed a greater capacity of prediction in developing the objective functions for the tibial locking screws and the spinal pedicle screws than the multiple linear regression method. The artificial neural network method may be a useful option for developing the objective functions of the orthopaedic screws with a greater structural complexity. The surrogate objective functions of the orthopaedic screws could effectively decrease the time and effort required for the design optimization process. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  14. Testing for marginal linear effects in quantile regression

    KAUST Repository

    Wang, Huixia Judy

    2017-10-23

    The paper develops a new marginal testing procedure to detect significant predictors that are associated with the conditional quantiles of a scalar response. The idea is to fit the marginal quantile regression on each predictor one at a time, and then to base the test on the t-statistics that are associated with the most predictive predictors. A resampling method is devised to calibrate this test statistic, which has non-regular limiting behaviour due to the selection of the most predictive variables. Asymptotic validity of the procedure is established in a general quantile regression setting in which the marginal quantile regression models can be misspecified. Even though a fixed dimension is assumed to derive the asymptotic results, the test proposed is applicable and computationally feasible for large dimensional predictors. The method is more flexible than existing marginal screening test methods based on mean regression and has the added advantage of being robust against outliers in the response. The approach is illustrated by using an application to a human immunodeficiency virus drug resistance data set.

  15. Testing for marginal linear effects in quantile regression

    KAUST Repository

    Wang, Huixia Judy; McKeague, Ian W.; Qian, Min

    2017-01-01

    The paper develops a new marginal testing procedure to detect significant predictors that are associated with the conditional quantiles of a scalar response. The idea is to fit the marginal quantile regression on each predictor one at a time, and then to base the test on the t-statistics that are associated with the most predictive predictors. A resampling method is devised to calibrate this test statistic, which has non-regular limiting behaviour due to the selection of the most predictive variables. Asymptotic validity of the procedure is established in a general quantile regression setting in which the marginal quantile regression models can be misspecified. Even though a fixed dimension is assumed to derive the asymptotic results, the test proposed is applicable and computationally feasible for large dimensional predictors. The method is more flexible than existing marginal screening test methods based on mean regression and has the added advantage of being robust against outliers in the response. The approach is illustrated by using an application to a human immunodeficiency virus drug resistance data set.

  16. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    Science.gov (United States)

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  17. Model-based Quantile Regression for Discrete Data

    KAUST Repository

    Padellini, Tullia

    2018-04-10

    Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution\\'s parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.

  18. Influence of plant root morphology and tissue composition on phenanthrene uptake: Stepwise multiple linear regression analysis

    International Nuclear Information System (INIS)

    Zhan, Xinhua; Liang, Xiao; Xu, Guohua; Zhou, Lixiang

    2013-01-01

    Polycyclic aromatic hydrocarbons (PAHs) are contaminants that reside mainly in surface soils. Dietary intake of plant-based foods can make a major contribution to total PAH exposure. Little information is available on the relationship between root morphology and plant uptake of PAHs. An understanding of plant root morphologic and compositional factors that affect root uptake of contaminants is important and can inform both agricultural (chemical contamination of crops) and engineering (phytoremediation) applications. Five crop plant species are grown hydroponically in solutions containing the PAH phenanthrene. Measurements are taken for 1) phenanthrene uptake, 2) root morphology – specific surface area, volume, surface area, tip number and total root length and 3) root tissue composition – water, lipid, protein and carbohydrate content. These factors are compared through Pearson's correlation and multiple linear regression analysis. The major factors which promote phenanthrene uptake are specific surface area and lipid content. -- Highlights: •There is no correlation between phenanthrene uptake and total root length, and water. •Specific surface area and lipid are the most crucial factors for phenanthrene uptake. •The contribution of specific surface area is greater than that of lipid. -- The contribution of specific surface area is greater than that of lipid in the two most important root morphological and compositional factors affecting phenanthrene uptake

  19. An Introduction to the Hybrid Approach of Neural Networks and the Linear Regression Model : An Illustration in the Hedonic Pricing Model of Building Costs

    OpenAIRE

    浅野, 美代子; マーコ, ユー K.W.

    2007-01-01

    This paper introduces the hybrid approach of neural networks and linear regression model proposed by Asano and Tsubaki (2003). Neural networks are often credited with its superiority in data consistency whereas the linear regression model provides simple interpretation of the data enabling researchers to verify their hypotheses. The hybrid approach aims at combing the strengths of these two well-established statistical methods. A step-by-step procedure for performing the hybrid approach is pr...

  20. A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover

    Science.gov (United States)

    Huang, C.; Townshend, J.R.G.

    2003-01-01

    A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.

  1. Radioligand assays - methods and applications. IV. Uniform regression of hyperbolic and linear radioimmunoassay calibration curves

    Energy Technology Data Exchange (ETDEWEB)

    Keilacker, H; Becker, G; Ziegler, M; Gottschling, H D [Zentralinstitut fuer Diabetes, Karlsburg (German Democratic Republic)

    1980-10-01

    In order to handle all types of radioimmunoassay (RIA) calibration curves obtained in the authors' laboratory in the same way, they tried to find a non-linear expression for their regression which allows calibration curves with different degrees of curvature to be fitted. Considering the two boundary cases of the incubation protocol they derived a hyperbolic inverse regression function: x = a/sub 1/y + a/sub 0/ + asub(-1)y/sup -1/, where x is the total concentration of antigen, asub(i) are constants, and y is the specifically bound radioactivity. An RIA evaluation procedure based on this function is described providing a fitted inverse RIA calibration curve and some statistical quality parameters. The latter are of an order which is normal for RIA systems. There is an excellent agreement between fitted and experimentally obtained calibration curves having a different degree of curvature.

  2. Trend analysis by a piecewise linear regression model applied to surface air temperatures in Southeastern Spain (1973–2014)

    OpenAIRE

    Campra, Pablo; Morales, Maria

    2016-01-01

    The magnitude of the trends of environmental and climatic changes is mostly derived from the slopes of the linear trends using ordinary least-square fitting. An alternative flexible fitting model, piecewise regression, has been applied here to surface air temperature records in southeastern Spain for the recent warming period (1973–2014) to gain accuracy in the description of the inner structure of change, dividing the time series into linear segments with different slopes. Breakpoint y...

  3. A study on direct determination of uranium in ore by analyzing γ-ray spectrum with dual linear regression

    International Nuclear Information System (INIS)

    Liu Chunkui

    1996-01-01

    The method introduced is based on different energy of γ-ray emitted from radionuclide in the uranium-radium decay series in ore. The pulse counting rates of two spectra bands, i.e. N 1 (55∼193 keV) and N 2 (260∼1500 keV), are measured by portable type HYX-3 400-channel γ-ray spectrometer. On the other side, the uranium content (Q U ) is obtained by chemical analysis of channel sampling. Then the regression coefficients (b 0 , b 1 ,b 2 ) can be determined through dual linear regression by using Q U and N 1 , N 2 . The direct determination of uranium can be made with the regression equation Q U = b 0 + b 1 N 1 + b 2 N 2

  4. Standardizing effect size from linear regression models with log-transformed variables for meta-analysis.

    Science.gov (United States)

    Rodríguez-Barranco, Miguel; Tobías, Aurelio; Redondo, Daniel; Molina-Portillo, Elena; Sánchez, María José

    2017-03-17

    Meta-analysis is very useful to summarize the effect of a treatment or a risk factor for a given disease. Often studies report results based on log-transformed variables in order to achieve the principal assumptions of a linear regression model. If this is the case for some, but not all studies, the effects need to be homogenized. We derived a set of formulae to transform absolute changes into relative ones, and vice versa, to allow including all results in a meta-analysis. We applied our procedure to all possible combinations of log-transformed independent or dependent variables. We also evaluated it in a simulation based on two variables either normally or asymmetrically distributed. In all the scenarios, and based on different change criteria, the effect size estimated by the derived set of formulae was equivalent to the real effect size. To avoid biased estimates of the effect, this procedure should be used with caution in the case of independent variables with asymmetric distributions that significantly differ from the normal distribution. We illustrate an application of this procedure by an application to a meta-analysis on the potential effects on neurodevelopment in children exposed to arsenic and manganese. The procedure proposed has been shown to be valid and capable of expressing the effect size of a linear regression model based on different change criteria in the variables. Homogenizing the results from different studies beforehand allows them to be combined in a meta-analysis, independently of whether the transformations had been performed on the dependent and/or independent variables.

  5. Multiresponse semiparametric regression for modelling the effect of regional socio-economic variables on the use of information technology

    Science.gov (United States)

    Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania

    2017-03-01

    Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.

  6. Polynomial regression analysis and significance test of the regression function

    International Nuclear Information System (INIS)

    Gao Zhengming; Zhao Juan; He Shengping

    2012-01-01

    In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)

  7. Estimating traffic volume on Wyoming low volume roads using linear and logistic regression methods

    Directory of Open Access Journals (Sweden)

    Dick Apronti

    2016-12-01

    Full Text Available Traffic volume is an important parameter in most transportation planning applications. Low volume roads make up about 69% of road miles in the United States. Estimating traffic on the low volume roads is a cost-effective alternative to taking traffic counts. This is because traditional traffic counts are expensive and impractical for low priority roads. The purpose of this paper is to present the development of two alternative means of cost-effectively estimating traffic volumes for low volume roads in Wyoming and to make recommendations for their implementation. The study methodology involves reviewing existing studies, identifying data sources, and carrying out the model development. The utility of the models developed were then verified by comparing actual traffic volumes to those predicted by the model. The study resulted in two regression models that are inexpensive and easy to implement. The first regression model was a linear regression model that utilized pavement type, access to highways, predominant land use types, and population to estimate traffic volume. In verifying the model, an R2 value of 0.64 and a root mean square error of 73.4% were obtained. The second model was a logistic regression model that identified the level of traffic on roads using five thresholds or levels. The logistic regression model was verified by estimating traffic volume thresholds and determining the percentage of roads that were accurately classified as belonging to the given thresholds. For the five thresholds, the percentage of roads classified correctly ranged from 79% to 88%. In conclusion, the verification of the models indicated both model types to be useful for accurate and cost-effective estimation of traffic volumes for low volume Wyoming roads. The models developed were recommended for use in traffic volume estimations for low volume roads in pavement management and environmental impact assessment studies.

  8. Linear and logistic regression analysis

    NARCIS (Netherlands)

    Tripepi, G.; Jager, K. J.; Dekker, F. W.; Zoccali, C.

    2008-01-01

    In previous articles of this series, we focused on relative risks and odds ratios as measures of effect to assess the relationship between exposure to risk factors and clinical outcomes and on control for confounding. In randomized clinical trials, the random allocation of patients is hoped to

  9. Interpretation of commonly used statistical regression models.

    Science.gov (United States)

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  10. Detection of epistatic effects with logic regression and a classical linear regression model.

    Science.gov (United States)

    Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata

    2014-02-01

    To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.

  11. Adaptive Linear and Normalized Combination of Radial Basis Function Networks for Function Approximation and Regression

    Directory of Open Access Journals (Sweden)

    Yunfeng Wu

    2014-01-01

    Full Text Available This paper presents a novel adaptive linear and normalized combination (ALNC method that can be used to combine the component radial basis function networks (RBFNs to implement better function approximation and regression tasks. The optimization of the fusion weights is obtained by solving a constrained quadratic programming problem. According to the instantaneous errors generated by the component RBFNs, the ALNC is able to perform the selective ensemble of multiple leaners by adaptively adjusting the fusion weights from one instance to another. The results of the experiments on eight synthetic function approximation and six benchmark regression data sets show that the ALNC method can effectively help the ensemble system achieve a higher accuracy (measured in terms of mean-squared error and the better fidelity (characterized by normalized correlation coefficient of approximation, in relation to the popular simple average, weighted average, and the Bagging methods.

  12. Functions Represented as Linear Sequential Data: Relationships between Presentation and Student Responses

    Science.gov (United States)

    Ayalon, Michal; Watson, Anne; Lerman, Steve

    2015-01-01

    This study investigates students' ways of attending to linear sequential data in two tasks, and conjectures possible relationships between those ways and elements of the task design. Drawing on the substantial literature about such situations, we focus for this paper on linear rate of change, and on covariation and correspondence approaches to…

  13. SU-G-BRA-08: Diaphragm Motion Tracking Based On KV CBCT Projections with a Constrained Linear Regression Optimization

    Energy Technology Data Exchange (ETDEWEB)

    Wei, J [City College of New York, New York, NY (United States); Chao, M [The Mount Sinai Medical Center, New York, NY (United States)

    2016-06-15

    Purpose: To develop a novel strategy to extract the respiratory motion of the thoracic diaphragm from kilovoltage cone beam computed tomography (CBCT) projections by a constrained linear regression optimization technique. Methods: A parabolic function was identified as the geometric model and was employed to fit the shape of the diaphragm on the CBCT projections. The search was initialized by five manually placed seeds on a pre-selected projection image. Temporal redundancies, the enabling phenomenology in video compression and encoding techniques, inherent in the dynamic properties of the diaphragm motion together with the geometrical shape of the diaphragm boundary and the associated algebraic constraint that significantly reduced the searching space of viable parabolic parameters was integrated, which can be effectively optimized by a constrained linear regression approach on the subsequent projections. The innovative algebraic constraints stipulating the kinetic range of the motion and the spatial constraint preventing any unphysical deviations was able to obtain the optimal contour of the diaphragm with minimal initialization. The algorithm was assessed by a fluoroscopic movie acquired at anteriorposterior fixed direction and kilovoltage CBCT projection image sets from four lung and two liver patients. The automatic tracing by the proposed algorithm and manual tracking by a human operator were compared in both space and frequency domains. Results: The error between the estimated and manual detections for the fluoroscopic movie was 0.54mm with standard deviation (SD) of 0.45mm, while the average error for the CBCT projections was 0.79mm with SD of 0.64mm for all enrolled patients. The submillimeter accuracy outcome exhibits the promise of the proposed constrained linear regression approach to track the diaphragm motion on rotational projection images. Conclusion: The new algorithm will provide a potential solution to rendering diaphragm motion and ultimately

  14. SU-G-BRA-08: Diaphragm Motion Tracking Based On KV CBCT Projections with a Constrained Linear Regression Optimization

    International Nuclear Information System (INIS)

    Wei, J; Chao, M

    2016-01-01

    Purpose: To develop a novel strategy to extract the respiratory motion of the thoracic diaphragm from kilovoltage cone beam computed tomography (CBCT) projections by a constrained linear regression optimization technique. Methods: A parabolic function was identified as the geometric model and was employed to fit the shape of the diaphragm on the CBCT projections. The search was initialized by five manually placed seeds on a pre-selected projection image. Temporal redundancies, the enabling phenomenology in video compression and encoding techniques, inherent in the dynamic properties of the diaphragm motion together with the geometrical shape of the diaphragm boundary and the associated algebraic constraint that significantly reduced the searching space of viable parabolic parameters was integrated, which can be effectively optimized by a constrained linear regression approach on the subsequent projections. The innovative algebraic constraints stipulating the kinetic range of the motion and the spatial constraint preventing any unphysical deviations was able to obtain the optimal contour of the diaphragm with minimal initialization. The algorithm was assessed by a fluoroscopic movie acquired at anteriorposterior fixed direction and kilovoltage CBCT projection image sets from four lung and two liver patients. The automatic tracing by the proposed algorithm and manual tracking by a human operator were compared in both space and frequency domains. Results: The error between the estimated and manual detections for the fluoroscopic movie was 0.54mm with standard deviation (SD) of 0.45mm, while the average error for the CBCT projections was 0.79mm with SD of 0.64mm for all enrolled patients. The submillimeter accuracy outcome exhibits the promise of the proposed constrained linear regression approach to track the diaphragm motion on rotational projection images. Conclusion: The new algorithm will provide a potential solution to rendering diaphragm motion and ultimately

  15. Multiple linear combination (MLC) regression tests for common variants adapted to linkage disequilibrium structure.

    Science.gov (United States)

    Yoo, Yun Joo; Sun, Lei; Poirier, Julia G; Paterson, Andrew D; Bull, Shelley B

    2017-02-01

    By jointly analyzing multiple variants within a gene, instead of one at a time, gene-based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster-specific effects in a quadratic sum of squares and cross-products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well-powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P-value, variance-component, and principal-component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene-specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome-wide analysis. The cluster construction of the MLC test statistics helps reveal within-gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations. © 2016 The Authors Genetic Epidemiology Published by Wiley Periodicals, Inc.

  16. Plateletpheresis efficiency and mathematical correction of software-derived platelet yield prediction: A linear regression and ROC modeling approach.

    Science.gov (United States)

    Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David

    2017-10-01

    Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.

  17. Modeling the kinetics of essential oil hydrodistillation from juniper berries (Juniperus communis L. using non-linear regression

    Directory of Open Access Journals (Sweden)

    Radosavljević Dragana B.

    2017-01-01

    Full Text Available This paper presents kinetics modeling of essential oil hydrodistillation from juniper berries (Juniperus communis L. by using a non-linear regression methodology. The proposed model has the polynomial-logarithmic form. The initial equation of the proposed non-linear model is q = q∞•(a•(logt2 + b•logt + c and by substituting a1=q∞•a, b1 = q∞•b and c1 = q∞•c, the final equation is obtained as q = a1•(logt2 + b1•logt + c1. In this equation q is the quantity of the obtained oil at time t, while a1, b1 and c1 are parameters to be determined for each sample. From the final equation it can be seen that the key parameter q∞, which presents the maximal oil quantity obtained after infinite time, is already included in parameters a1, b1 and c1. In this way, experimental determination of this parameter is avoided. Using the proposed model with parameters obtained by regression, the values of oil hydrodistillation in time are calculated for each sample and compared to the experimental values. In addition, two kinetic models previously proposed in literature were applied to the same experimental results. The developed model provided better agreements with the experimental values than the two, generally accepted kinetic models of this process. The average values of error measures (RSS, RSE, AIC and MRPD obtained for our model (0.005; 0.017; –84.33; 1.65 were generally lower than the corresponding values of the other two models (0.025; 0.041; –53.20; 3.89 and (0.0035; 0.015; –86.83; 1.59. Also, parameter estimation for the proposed model was significantly simpler (maximum 2 iterations per sample using the non-linear regression than that for the existing models (maximum 9 iterations per sample. [Project of the Serbian Ministry of Education, Science and Technological Development, Grant no. TR-35026

  18. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure.

    Science.gov (United States)

    Li, Yanming; Nan, Bin; Zhu, Ji

    2015-06-01

    We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. © 2015, The International Biometric Society.

  19. Modeling daily soil temperature over diverse climate conditions in Iran—a comparison of multiple linear regression and support vector regression techniques

    Science.gov (United States)

    Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan

    2018-02-01

    The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.

  20. Searching for the main anti-bacterial components in artificial Calculus bovis using UPLC and microcalorimetry coupled with multi-linear regression analysis.

    Science.gov (United States)

    Zang, Qing-Ce; Wang, Jia-Bo; Kong, Wei-Jun; Jin, Cheng; Ma, Zhi-Jie; Chen, Jing; Gong, Qian-Feng; Xiao, Xiao-He

    2011-12-01

    The fingerprints of artificial Calculus bovis extracts from different solvents were established by ultra-performance liquid chromatography (UPLC) and the anti-bacterial activities of artificial C. bovis extracts on Staphylococcus aureus (S. aureus) growth were studied by microcalorimetry. The UPLC fingerprints were evaluated using hierarchical clustering analysis. Some quantitative parameters obtained from the thermogenic curves of S. aureus growth affected by artificial C. bovis extracts were analyzed using principal component analysis. The spectrum-effect relationships between UPLC fingerprints and anti-bacterial activities were investigated using multi-linear regression analysis. The results showed that peak 1 (taurocholate sodium), peak 3 (unknown compound), peak 4 (cholic acid), and peak 6 (chenodeoxycholic acid) are more significant than the other peaks with the standard parameter estimate 0.453, -0.166, 0.749, 0.025, respectively. So, compounds cholic acid, taurocholate sodium, and chenodeoxycholic acid might be the major anti-bacterial components in artificial C. bovis. Altogether, this work provides a general model of the combination of UPLC chromatography and anti-bacterial effect to study the spectrum-effect relationships of artificial C. bovis extracts, which can be used to discover the main anti-bacterial components in artificial C. bovis or other Chinese herbal medicines with anti-bacterial effects. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Development of a Multiple Linear Regression Model to Forecast Facility Electrical Consumption at an Air Force Base.

    Science.gov (United States)

    1981-09-01

    corresponds to the same square footage that consumed the electrical energy. 3. The basic assumptions of multiple linear regres- sion, as enumerated in...7. Data related to the sample of bases is assumed to be representative of bases in the population. Limitations Basic limitations on this research were... Ratemaking --Overview. Rand Report R-5894, Santa Monica CA, May 1977. Chatterjee, Samprit, and Bertram Price. Regression Analysis by Example. New York: John

  2. Multiple linear regression to estimate time-frequency electrophysiological responses in single trials.

    Science.gov (United States)

    Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D

    2015-05-01

    Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical

  3. Genetic algorithm as a variable selection procedure for the simulation of 13C nuclear magnetic resonance spectra of flavonoid derivatives using multiple linear regression.

    Science.gov (United States)

    Ghavami, Raoof; Najafi, Amir; Sajadi, Mohammad; Djannaty, Farhad

    2008-09-01

    In order to accurately simulate (13)C NMR spectra of hydroxy, polyhydroxy and methoxy substituted flavonoid a quantitative structure-property relationship (QSPR) model, relating atom-based calculated descriptors to (13)C NMR chemical shifts (ppm, TMS=0), is developed. A dataset consisting of 50 flavonoid derivatives was employed for the present analysis. A set of 417 topological, geometrical, and electronic descriptors representing various structural characteristics was calculated and separate multilinear QSPR models were developed between each carbon atom of flavonoid and the calculated descriptors. Genetic algorithm (GA) and multiple linear regression analysis (MLRA) were used to select the descriptors and to generate the correlation models. Analysis of the results revealed a correlation coefficient and root mean square error (RMSE) of 0.994 and 2.53ppm, respectively, for the prediction set.

  4. Implicit collinearity effect in linear regression: Application to basal ...

    African Journals Online (AJOL)

    Collinearity of predictor variables is a severe problem in the least square regression analysis. It contributes to the instability of regression coefficients and leads to a wrong prediction accuracy. Despite these problems, studies are conducted with a large number of observed and derived variables linked with a response ...

  5. Using multiple linear regression and physicochemical changes of amino acid mutations to predict antigenic variants of influenza A/H3N2 viruses.

    Science.gov (United States)

    Cui, Haibo; Wei, Xiaomei; Huang, Yu; Hu, Bin; Fang, Yaping; Wang, Jia

    2014-01-01

    Among human influenza viruses, strain A/H3N2 accounts for over a quarter of a million deaths annually. Antigenic variants of these viruses often render current vaccinations ineffective and lead to repeated infections. In this study, a computational model was developed to predict antigenic variants of the A/H3N2 strain. First, 18 critical antigenic amino acids in the hemagglutinin (HA) protein were recognized using a scoring method combining phi (ϕ) coefficient and information entropy. Next, a prediction model was developed by integrating multiple linear regression method with eight types of physicochemical changes in critical amino acid positions. When compared to other three known models, our prediction model achieved the best performance not only on the training dataset but also on the commonly-used testing dataset composed of 31878 antigenic relationships of the H3N2 influenza virus.

  6. Performance of an Axisymmetric Rocket Based Combined Cycle Engine During Rocket Only Operation Using Linear Regression Analysis

    Science.gov (United States)

    Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.

    1998-01-01

    The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.

  7. Investigation of the UK37' vs. SST relationship for Atlantic Ocean suspended particulate alkenones: An alternative regression model and discussion of possible sampling bias

    Science.gov (United States)

    Gould, Jessica; Kienast, Markus; Dowd, Michael

    2017-05-01

    Alkenone unsaturation, expressed as the UK37' index, is closely related to growth temperature of prymnesiophytes, thus providing a reliable proxy to infer past sea surface temperatures (SSTs). Here we address two lingering uncertainties related to this SST proxy. First, calibration models developed for core-top sediments and those developed for surface suspended particulates organic material (SPOM) show systematic offsets, raising concerns regarding the transfer of the primary signal into the sedimentary record. Second, questions remain regarding changes in slope of the UK37' vs. growth temperature relationship at the temperature extremes. Based on (re)analysis of 31 new and 394 previously published SPOM UK37' data from the Atlantic Ocean, a new regression model to relate UK37' to SST is introduced; the Richards curve (Richards, 1959). This non-linear regression model provides a robust calibration of the UK37' vs. SST relationship for Atlantic SPOM samples and uniquely accounts for both the fact that the UK37' index is a proportion, and so must lie between 0 and 1, as well as for the observed reduction in slope at the warm and cold ends of the temperature range. As with prior fits of SPOM UK37' vs. SST, the Richards model is offset from traditional regression models of sedimentary UK37' vs. SST. We posit that (some of) this offset can be attributed to the seasonally and depth biased sampling of SPOM material.

  8. Prediction of octanol-water partition coefficients of organic compounds by multiple linear regression, partial least squares, and artificial neural network.

    Science.gov (United States)

    Golmohammadi, Hassan

    2009-11-30

    A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models. Copyright 2009 Wiley Periodicals, Inc.

  9. Linear regression models and k-means clustering for statistical analysis of fNIRS data.

    Science.gov (United States)

    Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

    2015-02-01

    We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.

  10. Multiple linear regression approach for the analysis of the relationships between joints mobility and regional pressure-based parameters in the normal-arched foot.

    Science.gov (United States)

    Caravaggi, Paolo; Leardini, Alberto; Giacomozzi, Claudia

    2016-10-03

    Plantar load can be considered as a measure of the foot ability to transmit forces at the foot/ground, or foot/footwear interface during ambulatory activities via the lower limb kinematic chain. While morphological and functional measures have been shown to be correlated with plantar load, no exhaustive data are currently available on the possible relationships between range of motion of foot joints and plantar load regional parameters. Joints' kinematics from a validated multi-segmental foot model were recorded together with plantar pressure parameters in 21 normal-arched healthy subjects during three barefoot walking trials. Plantar pressure maps were divided into six anatomically-based regions of interest associated to corresponding foot segments. A stepwise multiple regression analysis was performed to determine the relationships between pressure-based parameters, joints range of motion and normalized walking speed (speed/subject height). Sagittal- and frontal-plane joint motion were those most correlated to plantar load. Foot joints' range of motion and normalized walking speed explained between 6% and 43% of the model variance (adjusted R 2 ) for pressure-based parameters. In general, those joints' presenting lower mobility during stance were associated to lower vertical force at forefoot and to larger mean and peak pressure at hindfoot and forefoot. Normalized walking speed was always positively correlated to mean and peak pressure at hindfoot and forefoot. While a large variance in plantar pressure data is still not accounted for by the present models, this study provides statistical corroboration of the close relationship between joint mobility and plantar pressure during stance in the normal healthy foot. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. A biological basis for the linear non-threshold dose-response relationship for low-level carcinogen exposure

    International Nuclear Information System (INIS)

    Albert, R.E.

    1981-01-01

    This chapter examines low-level dose-response relationships in terms of the two-stage mouse tumorigenesis model. Analyzes the feasibility of the linear non-threshold dose-response model which was first adopted for use in the assessment of cancer risks from ionizing radiation and more recently from chemical carcinogens. Finds that both the interaction of B(a)P with epidermal DNA of the mouse skin and the dose-response relationship for the initiation stage of mouse skin tumorigenesis showed a linear non-threshold dose-response relationship. Concludes that low level exposure to environmental carcinogens has a linear non-threshold dose-response relationship with the carcinogen acting as an initiator and the promoting action being supplied by the factors that are responsible for the background cancer rate in the target tissue

  12. The estimation and prediction of the inventories for the liquid and gaseous radwaste systems using the linear regression analysis

    International Nuclear Information System (INIS)

    Kim, J. Y.; Shin, C. H.; Kim, J. K.; Lee, J. K.; Park, Y. J.

    2003-01-01

    The variation transitions of the inventories for the liquid radwaste system and the radioactive gas have being released in containment, and their predictive values according to the operation histories of Yonggwang(YGN) 3 and 4 were analyzed by linear regression analysis methodology. The results show that the variation transitions of the inventories for those systems are linearly increasing according to the operation histories but the inventories released to the environment are considerably lower than the recommended values based on the FSAR suggestions. It is considered that some conservation were presented in the estimation methodology in preparing stage of FSAR

  13. Investigating Years 7 to 12 students' knowledge of linear relationships through different contexts and representations

    Science.gov (United States)

    Wilkie, Karina J.; Ayalon, Michal

    2018-02-01

    A foundational component of developing algebraic thinking for meaningful calculus learning is the idea of "function" that focuses on the relationship between varying quantities. Students have demonstrated widespread difficulties in learning calculus, particularly interpreting and modeling dynamic events, when they have a poor understanding of relationships between variables. Yet, there are differing views on how to develop students' functional thinking over time. In the Australian curriculum context, linear relationships are introduced to lower secondary students with content that reflects a hybrid of traditional and reform algebra pedagogy. This article discusses an investigation into Australian secondary students' understanding of linear functional relationships from Years 7 to 12 (approximately 12 to 18 years old; n = 215) in their approaches to three tasks (finding rate of change, pattern generalisation and interpretation of gradient) involving four different representations (table, geometric growing pattern, equation and graph). From the findings, it appears that these students' knowledge of linear functions remains context-specific rather than becoming connected over time.

  14. Multiple linear regression analysis of bacterial deposition to polyurethane coatings after conditioning film formation in the marine environment

    NARCIS (Netherlands)

    Bakker, D.P.; Busscher, H.J.; Zanten, J. van; Vries, J. de; Klijnstra, J.W.; Mei, H.C. van der

    2004-01-01

    Many studies have shown relationships of substratum hydrophobicity, charge or roughness with bacterial adhesion, although bacterial adhesion is governed by interplay of different physico-chemical properties and multiple regression analysis would be more suitable to reveal mechanisms of bacterial

  15. Multiple linear regression analysis of bacterial deposition to polyurethane coating after conditioning film formation in the marine environment

    NARCIS (Netherlands)

    Bakker, Dewi P; Busscher, Henk J; van Zanten, Joyce; de Vries, Jacob; Klijnstra, Job W; van der Mei, Henny C

    Many studies have shown relationships of substratum hydrophobicity, charge or roughness with bacterial adhesion, although bacterial adhesion is governed by interplay of different physico-chemical properties and multiple regression analysis would be more suitable to reveal mechanisms of bacterial

  16. Structure/property relationships in non-linear optical materials

    Energy Technology Data Exchange (ETDEWEB)

    Cole, J M [Institut Max von Laue - Paul Langevin (ILL), 38 - Grenoble (France); [Durham Univ. (United Kingdom); Howard, J A.K. [Durham Univ. (United Kingdom); McIntyre, G J [Institut Max von Laue - Paul Langevin (ILL), 38 - Grenoble (France)

    1997-04-01

    The application of neutrons to the study of structure/property relationships in organic non-linear optical materials (NLOs) is described. In particular, charge-transfer effects and intermolecular interactions are investigated. Charge-transfer effects are studied by charge-density analysis and an example of one such investigation is given. The study of intermolecular interactions concentrates on the effects of hydrogen-bonding and an example is given of two structurally similar molecules with very disparate NLO properties, as a result of different types of hydrogen-bonding. (author). 3 refs.

  17. Genomic-Enabled Prediction Based on Molecular Markers and Pedigree Using the Bayesian Linear Regression Package in R

    Directory of Open Access Journals (Sweden)

    Paulino Pérez

    2010-09-01

    Full Text Available The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO in a unified framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyper-parameters, are also addressed.

  18. On the linearity of the dose-effect relationship of DNA double strand breaks

    International Nuclear Information System (INIS)

    Chadwick, K.H.; Leenhouts, H.P.

    1994-01-01

    Most radiation biologists believe that DNA double-strand breaks are induced linearly with radiation dose for all types of radiation. Since 1985, with the advent of elution and gel electrophoresis techniques which permit the measurement of DNA double-strand breaks induced in mammalian cells at doses having radiobiological relevance, the true nature of the dose-effect relationship has been brought into some doubt. Many investigators measured curvilinear dose-effect relationships and a few found good correlations between the induction of the DNA double-strand breaks and cell survival. We approach the problem pragmatically by assuming that the induction of DNA double-strand breaks by 125 I Auger electron emitters incorporated into the DNA of the cells is a linear function of the number of 125 I decays, and by comparing the dose-effect relationship for sparsely ionizing radiation against this standard. The conclusion drawn that the curvilinear dose-effect relationships and the correlations with survival are real. (Author)

  19. pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms.

    Science.gov (United States)

    Molnos, Sophie; Baumbach, Clemens; Wahl, Simone; Müller-Nurasyid, Martina; Strauch, Konstantin; Wang-Sattler, Rui; Waldenberger, Melanie; Meitinger, Thomas; Adamski, Jerzy; Kastenmüller, Gabi; Suhre, Karsten; Peters, Annette; Grallert, Harald; Theis, Fabian J; Gieger, Christian

    2017-09-29

    Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different "omics" layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different "omics" layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/ .

  20. Monopole and dipole estimation for multi-frequency sky maps by linear regression

    Science.gov (United States)

    Wehus, I. K.; Fuskeland, U.; Eriksen, H. K.; Banday, A. J.; Dickinson, C.; Ghosh, T.; Górski, K. M.; Lawrence, C. R.; Leahy, J. P.; Maino, D.; Reich, P.; Reich, W.

    2017-01-01

    We describe a simple but efficient method for deriving a consistent set of monopole and dipole corrections for multi-frequency sky map data sets, allowing robust parametric component separation with the same data set. The computational core of this method is linear regression between pairs of frequency maps, often called T-T plots. Individual contributions from monopole and dipole terms are determined by performing the regression locally in patches on the sky, while the degeneracy between different frequencies is lifted whenever the dominant foreground component exhibits a significant spatial spectral index variation. Based on this method, we present two different, but each internally consistent, sets of monopole and dipole coefficients for the nine-year WMAP, Planck 2013, SFD 100 μm, Haslam 408 MHz and Reich & Reich 1420 MHz maps. The two sets have been derived with different analysis assumptions and data selection, and provide an estimate of residual systematic uncertainties. In general, our values are in good agreement with previously published results. Among the most notable results are a relative dipole between the WMAP and Planck experiments of 10-15μK (depending on frequency), an estimate of the 408 MHz map monopole of 8.9 ± 1.3 K, and a non-zero dipole in the 1420 MHz map of 0.15 ± 0.03 K pointing towards Galactic coordinates (l,b) = (308°,-36°) ± 14°. These values represent the sum of any instrumental and data processing offsets, as well as any Galactic or extra-Galactic component that is spectrally uniform over the full sky.

  1. A multiple linear regression analysis of hot corrosion attack on a series of nickel base turbine alloys

    Science.gov (United States)

    Barrett, C. A.

    1985-01-01

    Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.

  2. Non-linear relationship between oxygen uptake and power output in the Astrand nomogram-old data revisited.

    Science.gov (United States)

    Zoladz, J A; Szkutnik, Z; Majerczak, J; Duda, K; Pedersen, P K

    2007-06-01

    For the last decade there have been considerable discussion concerning the linearity / non-linearity of the oxygen uptake (V(O2)) - power output (W) relationship with strong experimental evidence of non-linearity provided mainly by breath-by-breath measurements. In this study, we attempted to answer the question whether the V(O2) - W relationship in the Astrand nomogram, as presented in the Textbook of Work Physiology, P.-O. Astrand et al. (2003), page 281, based on the Douglas bag method, is indeed linear, as stated by the authors before, or if a change point in V(O2), described by Zoladz et al. (1998) Eur J Appl Physiol 78: 369-377, can possibly be detected in those data. The V(O2) - W data were taken from the Astrand nomogram referenced above and from the Table 9.5 on page 282 in the same reference and tested for the presence of the change point in V(O2), using our two-phase model (see the reference above). In the first phase, a linear V(O2) - W relationship was assumed, whereas in the second one (above the so-called change point) an additional increase in V(O2) above the values expected from the linear model was allowed. It was found that in the data taken from the Astrand nomogram (data for men), as well as in the data taken from the Table 9.5, statistically significant change points in V(O2) were present at the power output of 150 W. The documentation of the presence of a change point in the V(O2) - W relationship in the Astrand data provides further evidence for the existence of a non-linearity in the V(O2) - W relationship in incremental exercise tests of humans, also in V(O2) data based upon the Douglas bag method.

  3. Modeling of chemical exergy of agricultural biomass using improved general regression neural network

    International Nuclear Information System (INIS)

    Huang, Y.W.; Chen, M.Q.; Li, Y.; Guo, J.

    2016-01-01

    A comprehensive evaluation for energy potential contained in agricultural biomass was a vital step for energy utilization of agricultural biomass. The chemical exergy of typical agricultural biomass was evaluated based on the second law of thermodynamics. The chemical exergy was significantly influenced by C and O elements rather than H element. The standard entropy of the samples also was examined based on their element compositions. Two predicted models of the chemical exergy were developed, which referred to a general regression neural network model based upon the element composition, and a linear model based upon the high heat value. An auto-refinement algorithm was firstly developed to improve the performance of regression neural network model. The developed general regression neural network model with K-fold cross-validation had a better ability for predicting the chemical exergy than the linear model, which had lower predicted errors (±1.5%). - Highlights: • Chemical exergies of agricultural biomass were evaluated based upon fifty samples. • Values for the standard entropy of agricultural biomass samples were calculated. • A linear relationship between chemical exergy and HHV of samples was detected. • An improved GRNN prediction model for the chemical exergy of biomass was developed.

  4. Dose-Response Relationship between Dietary Magnesium Intake and Risk of Type 2 Diabetes Mellitus: A Systematic Review and Meta-Regression Analysis of Prospective Cohort Studies

    Directory of Open Access Journals (Sweden)

    Xin Fang

    2016-11-01

    Full Text Available The epidemiological evidence for a dose-response relationship between magnesium intake and risk of type 2 diabetes mellitus (T2D is sparse. The aim of the study was to summarize the evidence for the association of dietary magnesium intake with risk of T2D and evaluate the dose-response relationship. We conducted a systematic review and meta-analysis of prospective cohort studies that reported dietary magnesium intake and risk of incident T2D. We identified relevant studies by searching major scientific literature databases and grey literature resources from their inception to February 2016. We included cohort studies that provided risk ratios, i.e., relative risks (RRs, odds ratios (ORs or hazard ratios (HRs, for T2D. Linear dose-response relationships were assessed using random-effects meta-regression. Potential nonlinear associations were evaluated using restricted cubic splines. A total of 25 studies met the eligibility criteria. These studies comprised 637,922 individuals including 26,828 with a T2D diagnosis. Compared with the lowest magnesium consumption group in the population, the risk of T2D was reduced by 17% across all the studies; 19% in women and 16% in men. A statistically significant linear dose-response relationship was found between incremental magnesium intake and T2D risk. After adjusting for age and body mass index, the risk of T2D incidence was reduced by 8%–13% for per 100 mg/day increment in dietary magnesium intake. There was no evidence to support a nonlinear dose-response relationship between dietary magnesium intake and T2D risk. The combined data supports a role for magnesium in reducing risk of T2D, with a statistically significant linear dose-response pattern within the reference dose range of dietary intake among Asian and US populations. The evidence from Europe and black people is limited and more prospective studies are needed for the two subgroups.

  5. Predicting hyperketonemia by logistic and linear regression using test-day milk and performance variables in early-lactation Holstein and Jersey cows.

    Science.gov (United States)

    Chandler, T L; Pralle, R S; Dórea, J R R; Poock, S E; Oetzel, G R; Fourdraine, R H; White, H M

    2018-03-01

    Although cowside testing strategies for diagnosing hyperketonemia (HYK) are available, many are labor intensive and costly, and some lack sufficient accuracy. Predicting milk ketone bodies by Fourier transform infrared spectrometry during routine milk sampling may offer a more practical monitoring strategy. The objectives of this study were to (1) develop linear and logistic regression models using all available test-day milk and performance variables for predicting HYK and (2) compare prediction methods (Fourier transform infrared milk ketone bodies, linear regression models, and logistic regression models) to determine which is the most predictive of HYK. Given the data available, a secondary objective was to evaluate differences in test-day milk and performance variables (continuous measurements) between Holsteins and Jerseys and between cows with or without HYK within breed. Blood samples were collected on the same day as milk sampling from 658 Holstein and 468 Jersey cows between 5 and 20 d in milk (DIM). Diagnosis of HYK was at a serum β-hydroxybutyrate (BHB) concentration ≥1.2 mmol/L. Concentrations of milk BHB and acetone were predicted by Fourier transform infrared spectrometry (Foss Analytical, Hillerød, Denmark). Thresholds of milk BHB and acetone were tested for diagnostic accuracy, and logistic models were built from continuous variables to predict HYK in primiparous and multiparous cows within breed. Linear models were constructed from continuous variables for primiparous and multiparous cows within breed that were 5 to 11 DIM or 12 to 20 DIM. Milk ketone body thresholds diagnosed HYK with 64.0 to 92.9% accuracy in Holsteins and 59.1 to 86.6% accuracy in Jerseys. Logistic models predicted HYK with 82.6 to 97.3% accuracy. Internally cross-validated multiple linear regression models diagnosed HYK of Holstein cows with 97.8% accuracy for primiparous and 83.3% accuracy for multiparous cows. Accuracy of Jersey models was 81.3% in primiparous and 83

  6. A step-by-step guide to non-linear regression analysis of experimental data using a Microsoft Excel spreadsheet.

    Science.gov (United States)

    Brown, A M

    2001-06-01

    The objective of this present study was to introduce a simple, easily understood method for carrying out non-linear regression analysis based on user input functions. While it is relatively straightforward to fit data with simple functions such as linear or logarithmic functions, fitting data with more complicated non-linear functions is more difficult. Commercial specialist programmes are available that will carry out this analysis, but these programmes are expensive and are not intuitive to learn. An alternative method described here is to use the SOLVER function of the ubiquitous spreadsheet programme Microsoft Excel, which employs an iterative least squares fitting routine to produce the optimal goodness of fit between data and function. The intent of this paper is to lead the reader through an easily understood step-by-step guide to implementing this method, which can be applied to any function in the form y=f(x), and is well suited to fast, reliable analysis of data in all fields of biology.

  7. A non-linear regression analysis program for describing electrophysiological data with multiple functions using Microsoft Excel.

    Science.gov (United States)

    Brown, Angus M

    2006-04-01

    The objective of this present study was to demonstrate a method for fitting complex electrophysiological data with multiple functions using the SOLVER add-in of the ubiquitous spreadsheet Microsoft Excel. SOLVER minimizes the difference between the sum of the squares of the data to be fit and the function(s) describing the data using an iterative generalized reduced gradient method. While it is a straightforward procedure to fit data with linear functions, and we have previously demonstrated a method of non-linear regression analysis of experimental data based upon a single function, it is more complex to fit data with multiple functions, usually requiring specialized expensive computer software. In this paper we describe an easily understood program for fitting experimentally acquired data, in this case the stimulus-evoked compound action potential from the mouse optic nerve, with multiple Gaussian functions. The program is flexible and can be applied to describe data with a wide variety of user-input functions.

  8. The Multiple Correspondence Analysis Method and Brain Functional Connectivity: Its Application to the Study of the Non-linear Relationships of Motor Cortex and Basal Ganglia.

    Science.gov (United States)

    Rodriguez-Sabate, Clara; Morales, Ingrid; Sanchez, Alberto; Rodriguez, Manuel

    2017-01-01

    The complexity of basal ganglia (BG) interactions is often condensed into simple models mainly based on animal data and that present BG in closed-loop cortico-subcortical circuits of excitatory/inhibitory pathways which analyze the incoming cortical data and return the processed information to the cortex. This study was aimed at identifying functional relationships in the BG motor-loop of 24 healthy-subjects who provided written, informed consent and whose BOLD-activity was recorded by MRI methods. The analysis of the functional interaction between these centers by correlation techniques and multiple linear regression showed non-linear relationships which cannot be suitably addressed with these methods. The multiple correspondence analysis (MCA), an unsupervised multivariable procedure which can identify non-linear interactions, was used to study the functional connectivity of BG when subjects were at rest. Linear methods showed different functional interactions expected according to current BG models. MCA showed additional functional interactions which were not evident when using lineal methods. Seven functional configurations of BG were identified with MCA, two involving the primary motor and somatosensory cortex, one involving the deepest BG (external-internal globus pallidum, subthalamic nucleus and substantia nigral), one with the input-output BG centers (putamen and motor thalamus), two linking the input-output centers with other BG (external pallidum and subthalamic nucleus), and one linking the external pallidum and the substantia nigral. The results provide evidence that the non-linear MCA and linear methods are complementary and should be best used in conjunction to more fully understand the nature of functional connectivity of brain centers.

  9. Predicting musically induced emotions from physiological inputs: linear and neural network models.

    Science.gov (United States)

    Russo, Frank A; Vempala, Naresh N; Sandstrom, Gillian M

    2013-01-01

    Listening to music often leads to physiological responses. Do these physiological responses contain sufficient information to infer emotion induced in the listener? The current study explores this question by attempting to predict judgments of "felt" emotion from physiological responses alone using linear and neural network models. We measured five channels of peripheral physiology from 20 participants-heart rate (HR), respiration, galvanic skin response, and activity in corrugator supercilii and zygomaticus major facial muscles. Using valence and arousal (VA) dimensions, participants rated their felt emotion after listening to each of 12 classical music excerpts. After extracting features from the five channels, we examined their correlation with VA ratings, and then performed multiple linear regression to see if a linear relationship between the physiological responses could account for the ratings. Although linear models predicted a significant amount of variance in arousal ratings, they were unable to do so with valence ratings. We then used a neural network to provide a non-linear account of the ratings. The network was trained on the mean ratings of eight of the 12 excerpts and tested on the remainder. Performance of the neural network confirms that physiological responses alone can be used to predict musically induced emotion. The non-linear model derived from the neural network was more accurate than linear models derived from multiple linear regression, particularly along the valence dimension. A secondary analysis allowed us to quantify the relative contributions of inputs to the non-linear model. The study represents a novel approach to understanding the complex relationship between physiological responses and musically induced emotion.

  10. Evaluation of Multiple Linear Regression-Based Limited Sampling Strategies for Enteric-Coated Mycophenolate Sodium in Adult Kidney Transplant Recipients.

    Science.gov (United States)

    Brooks, Emily K; Tett, Susan E; Isbel, Nicole M; McWhinney, Brett; Staatz, Christine E

    2018-04-01

    Although multiple linear regression-based limited sampling strategies (LSSs) have been published for enteric-coated mycophenolate sodium, none have been evaluated for the prediction of subsequent mycophenolic acid (MPA) exposure. This study aimed to examine the predictive performance of the published LSS for the estimation of future MPA area under the concentration-time curve from 0 to 12 hours (AUC0-12) in renal transplant recipients. Total MPA plasma concentrations were measured in 20 adult renal transplant patients on 2 occasions a week apart. All subjects received concomitant tacrolimus and were approximately 1 month after transplant. Samples were taken at 0, 0.33, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 6, and 8 hours and 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 2, 3, 4, 6, 9, and 12 hours after dose on the first and second sampling occasion, respectively. Predicted MPA AUC0-12 was calculated using 19 published LSSs and data from the first or second sampling occasion for each patient and compared with the second occasion full MPA AUC0-12 calculated using the linear trapezoidal rule. Bias (median percentage prediction error) and imprecision (median absolute prediction error) were determined. Median percentage prediction error and median absolute prediction error for the prediction of full MPA AUC0-12 were multiple linear regression-based LSS was not possible without concentrations up to at least 8 hours after the dose.

  11. Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix.

    Directory of Open Access Journals (Sweden)

    Zhe Zhang

    2010-09-01

    Full Text Available With the availability of high density whole-genome single nucleotide polymorphism chips, genomic selection has become a promising method to estimate genetic merit with potentially high accuracy for animal, plant and aquaculture species of economic importance. With markers covering the entire genome, genetic merit of genotyped individuals can be predicted directly within the framework of mixed model equations, by using a matrix of relationships among individuals that is derived from the markers. Here we extend that approach by deriving a marker-based relationship matrix specifically for the trait of interest.In the framework of mixed model equations, a new best linear unbiased prediction (BLUP method including a trait-specific relationship matrix (TA was presented and termed TABLUP. The TA matrix was constructed on the basis of marker genotypes and their weights in relation to the trait of interest. A simulation study with 1,000 individuals as the training population and five successive generations as candidate population was carried out to validate the proposed method. The proposed TABLUP method outperformed the ridge regression BLUP (RRBLUP and BLUP with realized relationship matrix (GBLUP. It performed slightly worse than BayesB with an accuracy of 0.79 in the standard scenario.The proposed TABLUP method is an improvement of the RRBLUP and GBLUP method. It might be equivalent to the BayesB method but it has additional benefits like the calculation of accuracies for individual breeding values. The results also showed that the TA-matrix performs better in predicting ability than the classical numerator relationship matrix and the realized relationship matrix which are derived solely from pedigree or markers without regard to the trait. This is because the TA-matrix not only accounts for the Mendelian sampling term, but also puts the greater emphasis on those markers that explain more of the genetic variance in the trait.

  12. Combined genetic algorithm and multiple linear regression (GA-MLR) optimizer: Application to multi-exponential fluorescence decay surface.

    Science.gov (United States)

    Fisz, Jacek J

    2006-12-07

    The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi-linear

  13. Meta-regression analysis of the effect of trans fatty acids on low-density lipoprotein cholesterol.

    Science.gov (United States)

    Allen, Bruce C; Vincent, Melissa J; Liska, DeAnn; Haber, Lynne T

    2016-12-01

    We conducted a meta-regression of controlled clinical trial data to investigate quantitatively the relationship between dietary intake of industrial trans fatty acids (iTFA) and increased low-density lipoprotein cholesterol (LDL-C). Previous regression analyses included insufficient data to determine the nature of the dose response in the low-dose region and have nonetheless assumed a linear relationship between iTFA intake and LDL-C levels. This work contributes to the previous work by 1) including additional studies examining low-dose intake (identified using an evidence mapping procedure); 2) investigating a range of curve shapes, including both linear and nonlinear models; and 3) using Bayesian meta-regression to combine results across trials. We found that, contrary to previous assumptions, the linear model does not acceptably fit the data, while the nonlinear, S-shaped Hill model fits the data well. Based on a conservative estimate of the degree of intra-individual variability in LDL-C (0.1 mmoL/L), as an estimate of a change in LDL-C that is not adverse, a change in iTFA intake of 2.2% of energy intake (%en) (corresponding to a total iTFA intake of 2.2-2.9%en) does not cause adverse effects on LDL-C. The iTFA intake associated with this change in LDL-C is substantially higher than the average iTFA intake (0.5%en). Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  14. Time series linear regression of half-hourly radon levels in a residence

    International Nuclear Information System (INIS)

    Hull, D.A.

    1990-01-01

    This paper uses time series linear regression modelling to assess the impact of temperature and pressure differences on the radon measured in the basement and in the basement drain of a research house in the Princeton area of New Jersey. The models examine half-hour averages of several climate and house parameters for several periods of up to 11 days. The drain radon concentrations follow a strong diurnal pattern that shifts 12 hours in phase between the summer and the fall seasons. This shift can be linked both to the change in temperature differences between seasons and to an experiment which involved sealing the connection between the drain and the basement. We have found that both the basement and the drain radon concentrations are correlated to basement-outdoor and soil-outdoor temperature differences (the coefficient of determination varies between 0.6 and 0.8). The statistical models for the summer periods clearly describe a physical system where the basement drain pumps radon in during the night and sucks radon out during the day

  15. An Ionospheric Index Model based on Linear Regression and Neural Network Approaches

    Science.gov (United States)

    Tshisaphungo, Mpho; McKinnell, Lee-Anne; Bosco Habarulema, John

    2017-04-01

    The ionosphere is well known to reflect radio wave signals in the high frequency (HF) band due to the present of electron and ions within the region. To optimise the use of long distance HF communications, it is important to understand the drivers of ionospheric storms and accurately predict the propagation conditions especially during disturbed days. This paper presents the development of an ionospheric storm-time index over the South African region for the application of HF communication users. The model will result into a valuable tool to measure the complex ionospheric behaviour in an operational space weather monitoring and forecasting environment. The development of an ionospheric storm-time index is based on a single ionosonde station data over Grahamstown (33.3°S,26.5°E), South Africa. Critical frequency of the F2 layer (foF2) measurements for a period 1996-2014 were considered for this study. The model was developed based on linear regression and neural network approaches. In this talk validation results for low, medium and high solar activity periods will be discussed to demonstrate model's performance.

  16. Modelos de regressão não linear aplicados a grupos de acessos de alho

    OpenAIRE

    Reis, Renata M; Cecon, Paulo R; Puiatti, Mário; Finger, Fernando L; Nascimento, Moysés; Silva, Fabyano F; Carneiro, Antônio PS; Silva, Anderson R

    2014-01-01

    O principal objetivo deste estudo foi comparar modelos de regressão não linear aptos a descreverem o acúmulo de massa seca de diferentes partes da planta do alho ao longo do tempo (60, 90, 120 e 150 dias após plantio). Objetivou-se também identificar acessos semelhantes em relação às características avaliadas por meio de análises de agrupamento. Foram utilizados 20 acessos de alho pertencentes ao Banco de Germoplasma de Hortaliças da Universidade Federal de Viçosa (BGH/UFV). O teor de massa s...

  17. Relationships between otolith size and fish length in some mesopelagic teleosts (Myctophidae, Paralepididae, Phosichthyidae and Stomiidae).

    Science.gov (United States)

    Battaglia, P; Malara, D; Ammendolia, G; Romeo, T; Andaloro, F

    2015-09-01

    Length-mass relationships and linear regressions are given for otolith size (length and height) and standard length (LS ) of certain mesopelagic fishes (Myctophidae, Paralepididae, Phosichthyidae and Stomiidae) living in the central Mediterranean Sea. The length-mass relationship showed isometric growth in six species, whereas linear regressions of LS and otolith size fit the data well for all species. These equations represent a useful tool for dietary studies on Mediterranean marine predators. © 2015 The Fisheries Society of the British Isles.

  18. Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions

    DEFF Research Database (Denmark)

    Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf

    2017-01-01

    Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification...... or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m...... observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate...

  19. hMuLab: A Biomedical Hybrid MUlti-LABel Classifier Based on Multiple Linear Regression.

    Science.gov (United States)

    Wang, Pu; Ge, Ruiquan; Xiao, Xuan; Zhou, Manli; Zhou, Fengfeng

    2017-01-01

    Many biomedical classification problems are multi-label by nature, e.g., a gene involved in a variety of functions and a patient with multiple diseases. The majority of existing classification algorithms assumes each sample with only one class label, and the multi-label classification problem remains to be a challenge for biomedical researchers. This study proposes a novel multi-label learning algorithm, hMuLab, by integrating both feature-based and neighbor-based similarity scores. The multiple linear regression modeling techniques make hMuLab capable of producing multiple label assignments for a query sample. The comparison results over six commonly-used multi-label performance measurements suggest that hMuLab performs accurately and stably for the biomedical datasets, and may serve as a complement to the existing literature.

  20. A Seemingly Unrelated Poisson Regression Model

    OpenAIRE

    King, Gary

    1989-01-01

    This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.

  1. Competition Experiments as a Means of Evaluating Linear Free Energy Relationships

    Science.gov (United States)

    Mullins, Richard J.; Vedernikov, Andrei; Viswanathan, Rajesh

    2004-01-01

    The use of competition experiments as a means of evaluating linear free energy relationship in the undergraduate teaching laboratory is reported. The use of competition experiments proved to be a reliable method for the construction of Hammett plots with good correlation providing great flexibility with regard to the compounds and reactions that…

  2. Railway Crossing Risk Area Detection Using Linear Regression and Terrain Drop Compensation Techniques

    Science.gov (United States)

    Chen, Wen-Yuan; Wang, Mei; Fu, Zhou-Xing

    2014-01-01

    Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1) we use a terrain drop compensation (TDC) technique to solve the problem of the concavity of railway crossings; (2) we use a linear regression technique to predict the position and length of an object from image processing; (3) we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP) to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas. PMID:24936948

  3. Railway Crossing Risk Area Detection Using Linear Regression and Terrain Drop Compensation Techniques

    Directory of Open Access Journals (Sweden)

    Wen-Yuan Chen

    2014-06-01

    Full Text Available Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1 we use a terrain drop compensation (TDC technique to solve the problem of the concavity of railway crossings; (2 we use a linear regression technique to predict the position and length of an object from image processing; (3 we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas.

  4. Non-Linear Dose Response Relationships in Biology, Toxicology, and Medicine (June 8-10, 2004). Final Report

    International Nuclear Information System (INIS)

    Calabrese, Edward J.

    2004-01-01

    The conference attracts approximately 500 scientists researching in the area of non-linear low dose effects. These scientists represent a wide range of biological/medical fields and technical disciplines. Observations that biphasic dose responses are frequently reported in each of these areas but that the recognition of similar dose response relationships across disciplines is very rarely appreciated and exploited. By bringing scientist of such diverse backgrounds together who are working on the common area of non-linear dose response relationships this will enhance our understanding of the occurrence, origin, mechanism, significance and practical applications of such dose response relationships

  5. Introduction to generalized linear models

    CERN Document Server

    Dobson, Annette J

    2008-01-01

    Introduction Background Scope Notation Distributions Related to the Normal Distribution Quadratic Forms Estimation Model Fitting Introduction Examples Some Principles of Statistical Modeling Notation and Coding for Explanatory Variables Exponential Family and Generalized Linear Models Introduction Exponential Family of Distributions Properties of Distributions in the Exponential Family Generalized Linear Models Examples Estimation Introduction Example: Failure Times for Pressure Vessels Maximum Likelihood Estimation Poisson Regression Example Inference Introduction Sampling Distribution for Score Statistics Taylor Series Approximations Sampling Distribution for MLEs Log-Likelihood Ratio Statistic Sampling Distribution for the Deviance Hypothesis Testing Normal Linear Models Introduction Basic Results Multiple Linear Regression Analysis of Variance Analysis of Covariance General Linear Models Binary Variables and Logistic Regression Probability Distributions ...

  6. Linear regression models for quantitative assessment of left ...

    African Journals Online (AJOL)

    Changes in left ventricular structures and function have been reported in cardiomyopathies. No prediction models have been established in this environment. This study established regression models for prediction of left ventricular structures in normal subjects. A sample of normal subjects was drawn from a large urban ...

  7. Scale of association: hierarchical linear models and the measurement of ecological systems

    Science.gov (United States)

    Sean M. McMahon; Jeffrey M. Diez

    2007-01-01

    A fundamental challenge to understanding patterns in ecological systems lies in employing methods that can analyse, test and draw inference from measured associations between variables across scales. Hierarchical linear models (HLM) use advanced estimation algorithms to measure regression relationships and variance-covariance parameters in hierarchically structured...

  8. Bayesian Nonparametric Regression Analysis of Data with Random Effects Covariates from Longitudinal Measurements

    KAUST Repository

    Ryu, Duchwan

    2010-09-28

    We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.

  9. Comparing parametric and nonparametric regression methods for panel data

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...

  10. Using multiple linear regression techniques to quantify carbon ...

    African Journals Online (AJOL)

    Fallow ecosystems provide a significant carbon stock that can be quantified for inclusion in the accounts of global carbon budgets. Process and statistical models of productivity, though useful, are often technically rigid as the conditions for their application are not easy to satisfy. Multiple regression techniques have been ...

  11. Early Parallel Activation of Semantics and Phonology in Picture Naming: Evidence from a Multiple Linear Regression MEG Study.

    Science.gov (United States)

    Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf

    2015-10-01

    The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200-400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. © The Author 2014. Published by Oxford University Press.

  12. Direct integral linear least square regression method for kinetic evaluation of hepatobiliary scintigraphy

    International Nuclear Information System (INIS)

    Shuke, Noriyuki

    1991-01-01

    In hepatobiliary scintigraphy, kinetic model analysis, which provides kinetic parameters like hepatic extraction or excretion rate, have been done for quantitative evaluation of liver function. In this analysis, unknown model parameters are usually determined using nonlinear least square regression method (NLS method) where iterative calculation and initial estimate for unknown parameters are required. As a simple alternative to NLS method, direct integral linear least square regression method (DILS method), which can determine model parameters by a simple calculation without initial estimate, is proposed, and tested the applicability to analysis of hepatobiliary scintigraphy. In order to see whether DILS method could determine model parameters as good as NLS method, or to determine appropriate weight for DILS method, simulated theoretical data based on prefixed parameters were fitted to 1 compartment model using both DILS method with various weightings and NLS method. The parameter values obtained were then compared with prefixed values which were used for data generation. The effect of various weights on the error of parameter estimate was examined, and inverse of time was found to be the best weight to make the error minimum. When using this weight, DILS method could give parameter values close to those obtained by NLS method and both parameter values were very close to prefixed values. With appropriate weighting, the DILS method could provide reliable parameter estimate which is relatively insensitive to the data noise. In conclusion, the DILS method could be used as a simple alternative to NLS method, providing reliable parameter estimate. (author)

  13. Estimating leaf photosynthetic pigments information by stepwise multiple linear regression analysis and a leaf optical model

    Science.gov (United States)

    Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei

    2014-10-01

    Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.

  14. Assessment of Brown Bear\\'s (Ursus arctos syriacus Winter Habitat Using Geographically Weighted Regression and Generalized Linear Model in South of Iran

    Directory of Open Access Journals (Sweden)

    A. A. Zarei

    2016-03-01

    Full Text Available Winter dens are one of the important components of brown bear's (Ursus arctos syriacus habitat, affecting their reproduction and survival. Therefore identification of factors affecting the habitat selection and suitable denning areas in the conservation of our largest carnivore is necessary. We used Geographically Weighted Logistic Regression (GWLR and Generalized Linear Model (GLM for modeling suitability of denning habitat in Kouhkhom region in Fars province. In the present research, 20 dens (presence locations and 20 caves where signs of bear were not found (absence locations were used as dependent variables and six environmental factors were used for each location as independent variables. The results of GLM showed that variables of distance to settlements, altitude, and distance to water were the most important parameters affecting suitability of the brown bear's denning habitat. The results of GWLR showed the significant local variations in the relationship between occurrence of brown bear dens and the variable of distance to settlements. Based on the results of both models, suitable habitats for denning of the species are impassable areas in the mountains and inaccessible for humans.

  15. Standards for Standardized Logistic Regression Coefficients

    Science.gov (United States)

    Menard, Scott

    2011-01-01

    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  16. Predicting the aquatic toxicity mode of action using logistic regression and linear discriminant analysis.

    Science.gov (United States)

    Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X

    2016-09-01

    The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.

  17. Understanding logistic regression analysis

    OpenAIRE

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...

  18. Time-adaptive quantile regression

    DEFF Research Database (Denmark)

    Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik

    2008-01-01

    and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....

  19. Relationships between street characteristics and perceived attractiveness for walking reported by elderly people

    NARCIS (Netherlands)

    Borst, H.C.; Miedema, H.M.E.; Vries, S.I. de; Graham, J.M.A.; Dongen, J.E.F. van

    2008-01-01

    Walking is important for the health of elderly people. Previous studies have found a relationship between neighbourhood characteristics, physical activity and related health aspects. The multivariate linear regression model presented here describes the relationships between the perceived

  20. Application of single-step genomic best linear unbiased prediction with a multiple-lactation random regression test-day model for Japanese Holsteins.

    Science.gov (United States)

    Baba, Toshimi; Gotoh, Yusaku; Yamaguchi, Satoshi; Nakagawa, Satoshi; Abe, Hayato; Masuda, Yutaka; Kawahara, Takayoshi

    2017-08-01

    This study aimed to evaluate a validation reliability of single-step genomic best linear unbiased prediction (ssGBLUP) with a multiple-lactation random regression test-day model and investigate an effect of adding genotyped cows on the reliability. Two data sets for test-day records from the first three lactations were used: full data from February 1975 to December 2015 (60 850 534 records from 2 853 810 cows) and reduced data cut off in 2011 (53 091 066 records from 2 502 307 cows). We used marker genotypes of 4480 bulls and 608 cows. Genomic enhanced breeding values (GEBV) of 305-day milk yield in all the lactations were estimated for at least 535 young bulls using two marker data sets: bull genotypes only and both bulls and cows genotypes. The realized reliability (R 2 ) from linear regression analysis was used as an indicator of validation reliability. Using only genotyped bulls, R 2 was ranged from 0.41 to 0.46 and it was always higher than parent averages. The very similar R 2 were observed when genotyped cows were added. An application of ssGBLUP to a multiple-lactation random regression model is feasible and adding a limited number of genotyped cows has no significant effect on reliability of GEBV for genotyped bulls. © 2016 Japanese Society of Animal Science.

  1. The non-linear link between electricity consumption and temperature in Europe: A threshold panel approach

    Energy Technology Data Exchange (ETDEWEB)

    Bessec, Marie [CGEMP, Universite Paris-Dauphine, Place du Marechal de Lattre de Tassigny Paris (France); Fouquau, Julien [LEO, Universite d' Orleans, Faculte de Droit, d' Economie et de Gestion, Rue de Blois, BP 6739, 45067 Orleans Cedex 2 (France)

    2008-09-15

    This paper investigates the relationship between electricity demand and temperature in the European Union. We address this issue by means of a panel threshold regression model on 15 European countries over the last two decades. Our results confirm the non-linearity of the link between electricity consumption and temperature found in more limited geographical areas in previous studies. By distinguishing between North and South countries, we also find that this non-linear pattern is more pronounced in the warm countries. Finally, rolling regressions show that the sensitivity of electricity consumption to temperature in summer has increased in the recent period. (author)

  2. Assessing the Liquidity of Firms: Robust Neural Network Regression as an Alternative to the Current Ratio

    Science.gov (United States)

    de Andrés, Javier; Landajo, Manuel; Lorca, Pedro; Labra, Jose; Ordóñez, Patricia

    Artificial neural networks have proven to be useful tools for solving financial analysis problems such as financial distress prediction and audit risk assessment. In this paper we focus on the performance of robust (least absolute deviation-based) neural networks on measuring liquidity of firms. The problem of learning the bivariate relationship between the components (namely, current liabilities and current assets) of the so-called current ratio is analyzed, and the predictive performance of several modelling paradigms (namely, linear and log-linear regressions, classical ratios and neural networks) is compared. An empirical analysis is conducted on a representative data base from the Spanish economy. Results indicate that classical ratio models are largely inadequate as a realistic description of the studied relationship, especially when used for predictive purposes. In a number of cases, especially when the analyzed firms are microenterprises, the linear specification is improved by considering the flexible non-linear structures provided by neural networks.

  3. STREAMFLOW AND WATER QUALITY REGRESSION MODELING ...

    African Journals Online (AJOL)

    ... downstream Obigbo station show: consistent time-trends in degree of contamination; linear and non-linear relationships for water quality models against total dissolved solids (TDS), total suspended sediment (TSS), chloride, pH and sulphate; and non-linear relationship for streamflow and water quality transport models.

  4. A primer on linear models

    CERN Document Server

    Monahan, John F

    2008-01-01

    Preface Examples of the General Linear Model Introduction One-Sample Problem Simple Linear Regression Multiple Regression One-Way ANOVA First Discussion The Two-Way Nested Model Two-Way Crossed Model Analysis of Covariance Autoregression Discussion The Linear Least Squares Problem The Normal Equations The Geometry of Least Squares Reparameterization Gram-Schmidt Orthonormalization Estimability and Least Squares Estimators Assumptions for the Linear Mean Model Confounding, Identifiability, and Estimability Estimability and Least Squares Estimators F

  5. Vectors, a tool in statistical regression theory

    NARCIS (Netherlands)

    Corsten, L.C.A.

    1958-01-01

    Using linear algebra this thesis developed linear regression analysis including analysis of variance, covariance analysis, special experimental designs, linear and fertility adjustments, analysis of experiments at different places and times. The determination of the orthogonal projection, yielding

  6. Estimating linear effects in ANOVA designs: the easy way.

    Science.gov (United States)

    Pinhas, Michal; Tzelgov, Joseph; Ganor-Stern, Dana

    2012-09-01

    Research in cognitive science has documented numerous phenomena that are approximated by linear relationships. In the domain of numerical cognition, the use of linear regression for estimating linear effects (e.g., distance and SNARC effects) became common following Fias, Brysbaert, Geypens, and d'Ydewalle's (1996) study on the SNARC effect. While their work has become the model for analyzing linear effects in the field, it requires statistical analysis of individual participants and does not provide measures of the proportions of variability accounted for (cf. Lorch & Myers, 1990). In the present methodological note, using both the distance and SNARC effects as examples, we demonstrate how linear effects can be estimated in a simple way within the framework of repeated measures analysis of variance. This method allows for estimating effect sizes in terms of both slope and proportions of variability accounted for. Finally, we show that our method can easily be extended to estimate linear interaction effects, not just linear effects calculated as main effects.

  7. Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.

    Science.gov (United States)

    Li, Hongjian; Leung, Kwong-Sak; Wong, Man-Hon; Ballester, Pedro J

    2014-08-27

    State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.

  8. A simplified calculation procedure for mass isotopomer distribution analysis (MIDA) based on multiple linear regression.

    Science.gov (United States)

    Fernández-Fernández, Mario; Rodríguez-González, Pablo; García Alonso, J Ignacio

    2016-10-01

    We have developed a novel, rapid and easy calculation procedure for Mass Isotopomer Distribution Analysis based on multiple linear regression which allows the simultaneous calculation of the precursor pool enrichment and the fraction of newly synthesized labelled proteins (fractional synthesis) using linear algebra. To test this approach, we used the peptide RGGGLK as a model tryptic peptide containing three subunits of glycine. We selected glycine labelled in two 13 C atoms ( 13 C 2 -glycine) as labelled amino acid to demonstrate that spectral overlap is not a problem in the proposed methodology. The developed methodology was tested first in vitro by changing the precursor pool enrichment from 10 to 40% of 13 C 2 -glycine. Secondly, a simulated in vivo synthesis of proteins was designed by combining the natural abundance RGGGLK peptide and 10 or 20% 13 C 2 -glycine at 1 : 1, 1 : 3 and 3 : 1 ratios. Precursor pool enrichments and fractional synthesis values were calculated with satisfactory precision and accuracy using a simple spreadsheet. This novel approach can provide a relatively rapid and easy means to measure protein turnover based on stable isotope tracers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  9. Aortic and Hepatic Contrast Enhancement During Hepatic-Arterial and Portal Venous Phase Computed Tomography Scanning: Multivariate Linear Regression Analysis Using Age, Sex, Total Body Weight, Height, and Cardiac Output.

    Science.gov (United States)

    Masuda, Takanori; Nakaura, Takeshi; Funama, Yoshinori; Higaki, Toru; Kiguchi, Masao; Imada, Naoyuki; Sato, Tomoyasu; Awai, Kazuo

    We evaluated the effect of the age, sex, total body weight (TBW), height (HT) and cardiac output (CO) of patients on aortic and hepatic contrast enhancement during hepatic-arterial phase (HAP) and portal venous phase (PVP) computed tomography (CT) scanning. This prospective study received institutional review board approval; prior informed consent to participate was obtained from all 168 patients. All were examined using our routine protocol; the contrast material was 600 mg/kg iodine. Cardiac output was measured with a portable electrical velocimeter within 5 minutes of starting the CT scan. We calculated contrast enhancement (per gram of iodine: [INCREMENT]HU/gI) of the abdominal aorta during the HAP and of the liver parenchyma during the PVP. We performed univariate and multivariate linear regression analysis between all patient characteristics and the [INCREMENT]HU/gI of aortic- and liver parenchymal enhancement. Univariate linear regression analysis demonstrated statistically significant correlations between the [INCREMENT]HU/gI and the age, sex, TBW, HT, and CO (all P linear regression analysis showed that only the TBW and CO were of independent predictive value (P linear regression analysis only the TBW and CO were significantly correlated with aortic and liver parenchymal enhancement; the age, sex, and HT were not. The CO was the only independent factor affecting aortic and liver parenchymal enhancement at hepatic CT when the protocol was adjusted for the TBW.

  10. Water pollution and income relationships: A seemingly unrelated partially linear analysis

    Science.gov (United States)

    Pandit, Mahesh; Paudel, Krishna P.

    2016-10-01

    We used a seemingly unrelated partially linear model (SUPLM) to address a potential correlation between pollutants (nitrogen, phosphorous, dissolved oxygen and mercury) in an environmental Kuznets curve study. Simulation studies show that the SUPLM performs well to address potential correlation among pollutants. We find that the relationship between income and pollution follows an inverted U-shaped curve for nitrogen and dissolved oxygen and a cubic shaped curve for mercury. Model specification tests suggest that a SUPLM is better specified compared to a parametric model to study the income-pollution relationship. Results suggest a need to continually assess policy effectiveness of pollution reduction as income increases.

  11. Predicting musically induced emotions from physiological inputs: Linear and neural network models

    Directory of Open Access Journals (Sweden)

    Frank A. Russo

    2013-08-01

    Full Text Available Listening to music often leads to physiological responses. Do these physiological responses contain sufficient information to infer emotion induced in the listener? The current study explores this question by attempting to predict judgments of 'felt' emotion from physiological responses alone using linear and neural network models. We measured five channels of peripheral physiology from 20 participants – heart rate, respiration, galvanic skin response, and activity in corrugator supercilii and zygomaticus major facial muscles. Using valence and arousal (VA dimensions, participants rated their felt emotion after listening to each of 12 classical music excerpts. After extracting features from the five channels, we examined their correlation with VA ratings, and then performed multiple linear regression to see if a linear relationship between the physiological responses could account for the ratings. Although linear models predicted a significant amount of variance in arousal ratings, they were unable to do so with valence ratings. We then used a neural network to provide a nonlinear account of the ratings. The network was trained on the mean ratings of eight of the 12 excerpts and tested on the remainder. Performance of the neural network confirms that physiological responses alone can be used to predict musically induced emotion. The nonlinear model derived from the neural network was more accurate than linear models derived from multiple linear regression, particularly along the valence dimension. A secondary analysis allowed us to quantify the relative contributions of inputs to the nonlinear model. The study represents a novel approach to understanding the complex relationship between physiological responses and musically induced emotion.

  12. True phosphorus digestibility and the endogenous phosphorus outputs associated with brown rice for weanling pigs measured by the simple linear regression analysis technique.

    Science.gov (United States)

    Yang, H; Li, A K; Yin, Y L; Li, T J; Wang, Z R; Wu, G; Huang, R L; Kong, X F; Yang, C B; Kang, P; Deng, J; Wang, S X; Tan, B E; Hu, Q; Xing, F F; Wu, X; He, Q H; Yao, K; Liu, Z J; Tang, Z R; Yin, F G; Deng, Z Y; Xie, M Y; Fan, M Z

    2007-03-01

    The objectives of this study were to determine true phosphorus (P) digestibility, degradability of phytate-P complex and the endogenous P outputs associated with brown rice feeding in weanling pigs by using the simple linear regression analysis technique. Six barrows with an average initial body weight of 12.5 kg were fitted with a T-cannula and fed six diets according to a 6 × 6 Latin-square design. Six maize starch-based diets, containing six levels of P at 0.80, 1.36, 1.93, 2.49, 3.04, and 3.61 g/kg per kg dry-matter (DM) intake (DMI), were formulated with brown rice. Each experimental period lasted 10 days. After a 7-day adaptation, all faecal samples were collected on days 8 and 9. Ileal digesta samples were collected for a total of 24 h on day 10. The apparent ileal and faecal P digestibility values of brown rice were affected ( P Linear relationships ( P simple regression analysis technique. There were no differences ( P>0.05) in true P digestibility values (57.7 ± 5.4 v. 58.2 ± 5.9%), phytate P degradability (76.4 ± 6.7 v. 79.0 ± 4.4%) and the endogenous P outputs (0.812 ± 0..096 v. 0.725 ± 0.083 g/kg DMI) between the ileal and the faecal levels. The endogenous faecal P output represented 14 and 25% of the National Research Council (1998) recommended daily total and available P requirements in the weanling pig, respectively. About 58% of the total P in brown rice could be digested and absorbed by the weanling pig. Our results suggest that the large intestine of the weanling pigs does not play a significant role in the digestion of P in brown rice. Diet formulation on the basis of total or apparent P digestibility with brown rice may lead to P overfeeding and excessive P excretion in pigs.

  13. Least-Squares Linear Regression and Schrodinger's Cat: Perspectives on the Analysis of Regression Residuals.

    Science.gov (United States)

    Hecht, Jeffrey B.

    The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…

  14. Electricity consumption forecasting in Italy using linear regression models

    Energy Technology Data Exchange (ETDEWEB)

    Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio [DIAM, Seconda Universita degli Studi di Napoli, Via Roma 29, 81031 Aversa (CE) (Italy)

    2009-09-15

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of {+-}1% for the best case and {+-}11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  15. Electricity consumption forecasting in Italy using linear regression models

    International Nuclear Information System (INIS)

    Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio

    2009-01-01

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of ±1% for the best case and ±11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  16. Possible factors determining the non-linearity in the VO2-power output relationship in humans: theoretical studies.

    Science.gov (United States)

    Korzeniewski, Bernard; Zoladz, Jerzy A

    2003-08-01

    At low power output exercise (below lactate threshold), the oxygen uptake increases linearly with power output, but at high power output exercise (above lactate threshold) some additional oxygen consumption causes a non-linearity in the overall VO(2) (oxygen uptake rate)-power output relationship. The functional significance of this phenomenon for human exercise tolerance is very important, but the mechanisms underlying it remain unknown. In the present work, a computer model of oxidative phosphorylation in intact skeletal muscle developed previously is used to examine the background of this relationship in different modes of exercise. Our simulations demonstrate that the non-linearity in the VO(2)-power output relationship and the difference in the magnitude of this non-linearity between incremental exercise mode and square-wave exercise mode (constant power output exercise) can be generated by introducing into the model some hypothetical factor F (group of associated factors) that accumulate(s) in time during exercise. The performed computer simulations, based on this assumption, give proper time courses of changes in VO(2) and [PCr] after an onset of work of different intensities, including the slow component in VO(2), well matching the experimental results. Moreover, if it is assumed that the exercise terminates because of fatigue when the amount/intensity of F exceed some threshold value, the model allows the generation of a proper shape of the well-known power-duration curve. This fact suggests that the phenomenon of the non-linearity of the VO(2)-power output relationship and the magnitude of this non-linearity in different modes of exercise is determined by some factor(s) responsible for muscle fatigue.

  17. Evaluation of force-velocity and power-velocity relationship of arm muscles.

    Science.gov (United States)

    Sreckovic, Sreten; Cuk, Ivan; Djuric, Sasa; Nedeljkovic, Aleksandar; Mirkov, Dragan; Jaric, Slobodan

    2015-08-01

    A number of recent studies have revealed an approximately linear force-velocity (F-V) and, consequently, a parabolic power-velocity (P-V) relationship of multi-joint tasks. However, the measurement characteristics of their parameters have been neglected, particularly those regarding arm muscles, which could be a problem for using the linear F-V model in both research and routine testing. Therefore, the aims of the present study were to evaluate the strength, shape, reliability, and concurrent validity of the F-V relationship of arm muscles. Twelve healthy participants performed maximum bench press throws against loads ranging from 20 to 70 % of their maximum strength, and linear regression model was applied on the obtained range of F and V data. One-repetition maximum bench press and medicine ball throw tests were also conducted. The observed individual F-V relationships were exceptionally strong (r = 0.96-0.99; all P stronger relationships. The reliability of parameters obtained from the linear F-V regressions proved to be mainly high (ICC > 0.80), while their concurrent validity regarding directly measured F, P, and V ranged from high (for maximum F) to medium-to-low (for maximum P and V). The findings add to the evidence that the linear F-V and, consequently, parabolic P-V models could be used to study the mechanical properties of muscular systems, as well as to design a relatively simple, reliable, and ecologically valid routine test of the muscle ability of force, power, and velocity production.

  18. Regression analysis by example

    CERN Document Server

    Chatterjee, Samprit

    2012-01-01

    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  19. Under which climate and soil conditions the plant productivity-precipitation relationship is linear or nonlinear?

    Science.gov (United States)

    Ye, Jian-Sheng; Pei, Jiu-Ying; Fang, Chao

    2018-03-01

    Understanding under which climate and soil conditions the plant productivity-precipitation relationship is linear or nonlinear is useful for accurately predicting the response of ecosystem function to global environmental change. Using long-term (2000-2016) net primary productivity (NPP)-precipitation datasets derived from satellite observations, we identify >5600pixels in the North Hemisphere landmass that fit either linear or nonlinear temporal NPP-precipitation relationships. Differences in climate (precipitation, radiation, ratio of actual to potential evapotranspiration, temperature) and soil factors (nitrogen, phosphorous, organic carbon, field capacity) between the linear and nonlinear types are evaluated. Our analysis shows that both linear and nonlinear types exhibit similar interannual precipitation variabilities and occurrences of extreme precipitation. Permutational multivariate analysis of variance suggests that linear and nonlinear types differ significantly regarding to radiation, ratio of actual to potential evapotranspiration, and soil factors. The nonlinear type possesses lower radiation and/or less soil nutrients than the linear type, thereby suggesting that nonlinear type features higher degree of limitation from resources other than precipitation. This study suggests several factors limiting the responses of plant productivity to changes in precipitation, thus causing nonlinear NPP-precipitation pattern. Precipitation manipulation and modeling experiments should combine with changes in other climate and soil factors to better predict the response of plant productivity under future climate. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Association between resting-state brain network topological organization and creative ability: Evidence from a multiple linear regression model.

    Science.gov (United States)

    Jiao, Bingqing; Zhang, Delong; Liang, Aiying; Liang, Bishan; Wang, Zengjian; Li, Junchao; Cai, Yuxuan; Gao, Mengxia; Gao, Zhenni; Chang, Song; Huang, Ruiwang; Liu, Ming

    2017-10-01

    Previous studies have indicated a tight linkage between resting-state functional connectivity of the human brain and creative ability. This study aimed to further investigate the association between the topological organization of resting-state brain networks and creativity. Therefore, we acquired resting-state fMRI data from 22 high-creativity participants and 22 low-creativity participants (as determined by their Torrance Tests of Creative Thinking scores). We then constructed functional brain networks for each participant and assessed group differences in network topological properties before exploring the relationships between respective network topological properties and creative ability. We identified an optimized organization of intrinsic brain networks in both groups. However, compared with low-creativity participants, high-creativity participants exhibited increased global efficiency and substantially decreased path length, suggesting increased efficiency of information transmission across brain networks in creative individuals. Using a multiple linear regression model, we further demonstrated that regional functional integration properties (i.e., the betweenness centrality and global efficiency) of brain networks, particularly the default mode network (DMN) and sensorimotor network (SMN), significantly predicted the individual differences in creative ability. Furthermore, the associations between network regional properties and creative performance were creativity-level dependent, where the difference in the resource control component may be important in explaining individual difference in creative performance. These findings provide novel insights into the neural substrate of creativity and may facilitate objective identification of creative ability. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Quantitative structure-retention relationship studies using immobilized artificial membrane chromatography I: amended linear solvation energy relationships with the introduction of a molecular electronic factor.

    Science.gov (United States)

    Li, Jie; Sun, Jin; Cui, Shengmiao; He, Zhonggui

    2006-11-03

    Linear solvation energy relationships (LSERs) amended by the introduction of a molecular electronic factor were employed to establish quantitative structure-retention relationships using immobilized artificial membrane (IAM) chromatography, in particular ionizable solutes. The chromatographic indices, log k(IAM), were determined by HPLC on an IAM.PC.DD2 column for 53 structurally diverse compounds, including neutral, acidic and basic compounds. Unlike neutral compounds, the IAM chromatographic retention of ionizable compounds was affected by their molecular charge state. When the mean net charge per molecule (delta) was introduced into the amended LSER as the sixth variable, the LSER regression coefficient was significantly improved for the test set including ionizable solutes. The delta coefficients of acidic and basic compounds were quite different indicating that the molecular electronic factor had a markedly different impact on the retention of acidic and basic compounds on IAM column. Ionization of acidic compounds containing a carboxylic group tended to impair their retention on IAM, while the ionization of basic compounds did not have such a marked effect. In addition, the extra-interaction with the polar head of phospholipids might cause a certain change in the retention of basic compounds. A comparison of calculated and experimental retention indices suggested that the semi-empirical LSER amended by the addition of a molecular electronic factor was able to reproduce adequately the experimental retention factors of the structurally diverse solutes investigated.

  2. Application of semi-empirical modeling and non-linear regression to unfolding fast neutron spectra from integral reaction rate data

    International Nuclear Information System (INIS)

    Harker, Y.D.

    1976-01-01

    A semi-empirical analytical expression representing a fast reactor neutron spectrum has been developed. This expression was used in a non-linear regression computer routine to obtain from measured multiple foil integral reaction data the neutron spectrum inside the Coupled Fast Reactivity Measurement Facility. In this application six parameters in the analytical expression for neutron spectrum were adjusted in the non-linear fitting process to maximize consistency between calculated and measured integral reaction rates for a set of 15 dosimetry detector foils. In two-thirds of the observations the calculated integral agreed with its respective measured value to within the experimental standard deviation, and in all but one case agreement within two standard deviations was obtained. Based on this quality of fit the estimated 70 to 75 percent confidence intervals for the derived spectrum are 10 to 20 percent for the energy range 100 eV to 1 MeV, 10 to 50 percent for 1 MeV to 10 MeV and 50 to 90 percent for 10 MeV to 18 MeV. The analytical model has demonstrated a flexibility to describe salient features of neutron spectra of the fast reactor type. The use of regression analysis with this model has produced a stable method to derive neutron spectra from a limited amount of integral data

  3. Alternative regression models to assess increase in childhood BMI

    OpenAIRE

    Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M

    2008-01-01

    Abstract Background Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 childre...

  4. Estimating severity of sideways fall using a generic multi linear regression model based on kinematic input variables.

    Science.gov (United States)

    van der Zijden, A M; Groen, B E; Tanck, E; Nienhuis, B; Verdonschot, N; Weerdesteyn, V

    2017-03-21

    Many research groups have studied fall impact mechanics to understand how fall severity can be reduced to prevent hip fractures. Yet, direct impact force measurements with force plates are restricted to a very limited repertoire of experimental falls. The purpose of this study was to develop a generic model for estimating hip impact forces (i.e. fall severity) in in vivo sideways falls without the use of force plates. Twelve experienced judokas performed sideways Martial Arts (MA) and Block ('natural') falls on a force plate, both with and without a mat on top. Data were analyzed to determine the hip impact force and to derive 11 selected (subject-specific and kinematic) variables. Falls from kneeling height were used to perform a stepwise regression procedure to assess the effects of these input variables and build the model. The final model includes four input variables, involving one subject-specific measure and three kinematic variables: maximum upper body deceleration, body mass, shoulder angle at the instant of 'maximum impact' and maximum hip deceleration. The results showed that estimated and measured hip impact forces were linearly related (explained variances ranging from 46 to 63%). Hip impact forces of MA falls onto the mat from a standing position (3650±916N) estimated by the final model were comparable with measured values (3698±689N), even though these data were not used for training the model. In conclusion, a generic linear regression model was developed that enables the assessment of fall severity through kinematic measures of sideways falls, without using force plates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Two-Stage Method Based on Local Polynomial Fitting for a Linear Heteroscedastic Regression Model and Its Application in Economics

    Directory of Open Access Journals (Sweden)

    Liyun Su

    2012-01-01

    Full Text Available We introduce the extension of local polynomial fitting to the linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to nonparametric technique of local polynomial estimation, we do not need to know the heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we focus on comparison of parameters and reach an optimal fitting. Besides, we verify the asymptotic normality of parameters based on numerical simulations. Finally, this approach is applied to a case of economics, and it indicates that our method is surely effective in finite-sample situations.

  6. A Quantile Regression Approach to Estimating the Distribution of Anesthetic Procedure Time during Induction.

    Directory of Open Access Journals (Sweden)

    Hsin-Lun Wu

    Full Text Available Although procedure time analyses are important for operating room management, it is not easy to extract useful information from clinical procedure time data. A novel approach was proposed to analyze procedure time during anesthetic induction. A two-step regression analysis was performed to explore influential factors of anesthetic induction time (AIT. Linear regression with stepwise model selection was used to select significant correlates of AIT and then quantile regression was employed to illustrate the dynamic relationships between AIT and selected variables at distinct quantiles. A total of 1,060 patients were analyzed. The first and second-year residents (R1-R2 required longer AIT than the third and fourth-year residents and attending anesthesiologists (p = 0.006. Factors prolonging AIT included American Society of Anesthesiologist physical status ≧ III, arterial, central venous and epidural catheterization, and use of bronchoscopy. Presence of surgeon before induction would decrease AIT (p < 0.001. Types of surgery also had significant influence on AIT. Quantile regression satisfactorily estimated extra time needed to complete induction for each influential factor at distinct quantiles. Our analysis on AIT demonstrated the benefit of quantile regression analysis to provide more comprehensive view of the relationships between procedure time and related factors. This novel two-step regression approach has potential applications to procedure time analysis in operating room management.

  7. Regression filter for signal resolution

    International Nuclear Information System (INIS)

    Matthes, W.

    1975-01-01

    The problem considered is that of resolving a measured pulse height spectrum of a material mixture, e.g. gamma ray spectrum, Raman spectrum, into a weighed sum of the spectra of the individual constituents. The model on which the analytical formulation is based is described. The problem reduces to that of a multiple linear regression. A stepwise linear regression procedure was constructed. The efficiency of this method was then tested by transforming the procedure in a computer programme which was used to unfold test spectra obtained by mixing some spectra, from a library of arbitrary chosen spectra, and adding a noise component. (U.K.)

  8. Forecasting the daily power output of a grid-connected photovoltaic system based on multivariate adaptive regression splines

    International Nuclear Information System (INIS)

    Li, Yanting; He, Yong; Su, Yan; Shu, Lianjie

    2016-01-01

    Highlights: • Suggests a nonparametric model based on MARS for output power prediction. • Compare the MARS model with a wide variety of prediction models. • Show that the MARS model is able to provide an overall good performance in both the training and testing stages. - Abstract: Both linear and nonlinear models have been proposed for forecasting the power output of photovoltaic systems. Linear models are simple to implement but less flexible. Due to the stochastic nature of the power output of PV systems, nonlinear models tend to provide better forecast than linear models. Motivated by this, this paper suggests a fairly simple nonlinear regression model known as multivariate adaptive regression splines (MARS), as an alternative to forecasting of solar power output. The MARS model is a data-driven modeling approach without any assumption about the relationship between the power output and predictors. It maintains simplicity of the classical multiple linear regression (MLR) model while possessing the capability of handling nonlinearity. It is simpler in format than other nonlinear models such as ANN, k-nearest neighbors (KNN), classification and regression tree (CART), and support vector machine (SVM). The MARS model was applied on the daily output of a grid-connected 2.1 kW PV system to provide the 1-day-ahead mean daily forecast of the power output. The comparisons with a wide variety of forecast models show that the MARS model is able to provide reliable forecast performance.

  9. The use of artificial neural networks and multiple linear regression to predict rate of medical waste generation

    International Nuclear Information System (INIS)

    Jahandideh, Sepideh; Jahandideh, Samad; Asadabadi, Ebrahim Barzegari; Askarian, Mehrdad; Movahedi, Mohammad Mehdi; Hosseini, Somayyeh; Jahandideh, Mina

    2009-01-01

    Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R 2 were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R 2 confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.

  10. Dimension Reduction and Discretization in Stochastic Problems by Regression Method

    DEFF Research Database (Denmark)

    Ditlevsen, Ove Dalager

    1996-01-01

    The chapter mainly deals with dimension reduction and field discretizations based directly on the concept of linear regression. Several examples of interesting applications in stochastic mechanics are also given.Keywords: Random fields discretization, Linear regression, Stochastic interpolation, ...

  11. Relationship between Personality Traits and Happiness in Patients with Thalassemia

    Directory of Open Access Journals (Sweden)

    Babollah Bakhshipour

    2014-12-01

    Full Text Available Background: The aim of this study was determining the relationship between personality traits and happiness in patients with major thalassemia. Materials and Methods: The design of this study was descriptive (correlational study. The target population of this study was all under-treated patients with major thalassemia in Amirkola thalassemia center in 2011. Among these patients, 150 patients were sampled using simple random sampling method and Morgan's table. The data were analyzed by means of calculating Pearson correlation coefficients and multiple linear regression analysis. The patients were asked to complete NEO-five factor Inventory (short form and Oxford happiness inventory. Results: Based on the results, the coefficient of regression analysis of NEO personality factors (big five and happiness was 0.45, which shows a linear relationship between personality factors of NEO and happiness in patients with thalassemia. Thus, there is a statistically significant relationship among personality traits (neuroticism, extroversion, openness, agreeableness, conscientiousness and happiness. Conclusion: Among personality traits, extroversion, flexibility, agreeableness and conscientiousness had positive statistically meaningful relationship with happiness i.e. patients with lower scores in neuroticism, were happier.

  12. Quantification of endocrine disruptors and pesticides in water by gas chromatography-tandem mass spectrometry. Method validation using weighted linear regression schemes.

    Science.gov (United States)

    Mansilha, C; Melo, A; Rebelo, H; Ferreira, I M P L V O; Pinho, O; Domingues, V; Pinho, C; Gameiro, P

    2010-10-22

    A multi-residue methodology based on a solid phase extraction followed by gas chromatography-tandem mass spectrometry was developed for trace analysis of 32 compounds in water matrices, including estrogens and several pesticides from different chemical families, some of them with endocrine disrupting properties. Matrix standard calibration solutions were prepared by adding known amounts of the analytes to a residue-free sample to compensate matrix-induced chromatographic response enhancement observed for certain pesticides. Validation was done mainly according to the International Conference on Harmonisation recommendations, as well as some European and American validation guidelines with specifications for pesticides analysis and/or GC-MS methodology. As the assumption of homoscedasticity was not met for analytical data, weighted least squares linear regression procedure was applied as a simple and effective way to counteract the greater influence of the greater concentrations on the fitted regression line, improving accuracy at the lower end of the calibration curve. The method was considered validated for 31 compounds after consistent evaluation of the key analytical parameters: specificity, linearity, limit of detection and quantification, range, precision, accuracy, extraction efficiency, stability and robustness. Copyright © 2010 Elsevier B.V. All rights reserved.

  13. LENGTH-WEIGHT RELATIONSHIP AND CONDITION FACTOR OF ...

    African Journals Online (AJOL)

    Data Collection and Analysis. The measurements of length (cm), weight (g) and the condition factor of individual fish sampled were recorded. The relationship between length and weight of the fish was examined by simple linear regression using WINKS software. The variations in the length-weight represented by 'b' were.

  14. Multiple Linear Regression Analysis of Factors Affecting Real Property Price Index From Case Study Research In Istanbul/Turkey

    Science.gov (United States)

    Denli, H. H.; Koc, Z.

    2015-12-01

    Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.

  15. Quality of life in breast cancer patients--a quantile regression analysis.

    Science.gov (United States)

    Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma

    2008-01-01

    Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.

  16. Principal component regression analysis with SPSS.

    Science.gov (United States)

    Liu, R X; Kuang, J; Gong, Q; Hou, X L

    2003-06-01

    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  17. Minimax Regression Quantiles

    DEFF Research Database (Denmark)

    Bache, Stefan Holst

    A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....

  18. Linear relationship between in distribution of thallium-201 and blood flow in ischemic and nonischemic myocardium during exercise

    International Nuclear Information System (INIS)

    Nielsen, A.P.; Morris, K.G.; Murdock, R.; Bruno, F.P.; Cobb, F.R.

    1980-01-01

    The purpose of this study was to compare the myocardial distribution of thallium-201 and regional myocrdial blood flow during ischemia and the physiologic stress of exercise. Studies were carried out in six dogs with chronically implanted catheters in the atrium and aorta and a snare on the circumflex coronary artery distal to the first marginal branch. Regional myocardial blood flow was measured during quiet, resting conditions using 7 to 10 ] of radioisotope-labeled microspheres. Each dog was then exercised on a treadmill at speeds of 5 to 9 mph at a 5/sup o/ incline. (After 1 minute of exercise the cirumflex coronary artery was occluded and thallium-201 and a second label of microspheres were injected. Exercise was continued for 5 minutes. The dogs were then sacrificed and the left ventricle was sectioned into approximately 80 1-2-g samples to compare thallium-201 activity and regional myocardial blood flow. The maximum increase in blood flow ranged from 3.3 to 7.2 times resting control values. Each dog had myocardial samples in which blood flow was markedly reduced, to less than 0.10 ml/min/g. In each dog there was a close linear relationship between thallium-201 distribution and direct measurements of regional myocardial blood flow. Linear regression analyses demonstrated a correlation coefficient of 0.98 or greater in each dog. These data indicate that during the physiologic stress of exercise, the myocardial distribution of thallium activity is linearly related to regional myocardial blood flow in both the ischemic and nonischemic regions

  19. Regression with Sparse Approximations of Data

    DEFF Research Database (Denmark)

    Noorzad, Pardis; Sturm, Bob L.

    2012-01-01

    We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected...... by a sparse approximation of the point in terms of the regressors. We show SPARROW can be considered a variant of \\(k\\)-nearest neighbors regression (\\(k\\)-NNR), and more generally, local polynomial kernel regression. Unlike \\(k\\)-NNR, however, SPARROW can adapt the number of regressors to use based...

  20. Applied logistic regression

    CERN Document Server

    Hosmer, David W; Sturdivant, Rodney X

    2013-01-01

     A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-

  1. Thermal Efficiency Degradation Diagnosis Method Using Regression Model

    International Nuclear Information System (INIS)

    Jee, Chang Hyun; Heo, Gyun Young; Jang, Seok Won; Lee, In Cheol

    2011-01-01

    This paper proposes an idea for thermal efficiency degradation diagnosis in turbine cycles, which is based on turbine cycle simulation under abnormal conditions and a linear regression model. The correlation between the inputs for representing degradation conditions (normally unmeasured but intrinsic states) and the simulation outputs (normally measured but superficial states) was analyzed with the linear regression model. The regression models can inversely response an associated intrinsic state for a superficial state observed from a power plant. The diagnosis method proposed herein is classified into three processes, 1) simulations for degradation conditions to get measured states (referred as what-if method), 2) development of the linear model correlating intrinsic and superficial states, and 3) determination of an intrinsic state using the superficial states of current plant and the linear regression model (referred as inverse what-if method). The what-if method is to generate the outputs for the inputs including various root causes and/or boundary conditions whereas the inverse what-if method is the process of calculating the inverse matrix with the given superficial states, that is, component degradation modes. The method suggested in this paper was validated using the turbine cycle model for an operating power plant

  2. Precision Interval Estimation of the Response Surface by Means of an Integrated Algorithm of Neural Network and Linear Regression

    Science.gov (United States)

    Lo, Ching F.

    1999-01-01

    The integration of Radial Basis Function Networks and Back Propagation Neural Networks with the Multiple Linear Regression has been accomplished to map nonlinear response surfaces over a wide range of independent variables in the process of the Modem Design of Experiments. The integrated method is capable to estimate the precision intervals including confidence and predicted intervals. The power of the innovative method has been demonstrated by applying to a set of wind tunnel test data in construction of response surface and estimation of precision interval.

  3. Gibrat’s law and quantile regressions

    DEFF Research Database (Denmark)

    Distante, Roberta; Petrella, Ivan; Santoro, Emiliano

    2017-01-01

    The nexus between firm growth, size and age in U.S. manufacturing is examined through the lens of quantile regression models. This methodology allows us to overcome serious shortcomings entailed by linear regression models employed by much of the existing literature, unveiling a number of important...

  4. On the analysis of clonogenic survival data: Statistical alternatives to the linear-quadratic model

    International Nuclear Information System (INIS)

    Unkel, Steffen; Belka, Claus; Lauber, Kirsten

    2016-01-01

    The most frequently used method to quantitatively describe the response to ionizing irradiation in terms of clonogenic survival is the linear-quadratic (LQ) model. In the LQ model, the logarithm of the surviving fraction is regressed linearly on the radiation dose by means of a second-degree polynomial. The ratio of the estimated parameters for the linear and quadratic term, respectively, represents the dose at which both terms have the same weight in the abrogation of clonogenic survival. This ratio is known as the α/β ratio. However, there are plausible scenarios in which the α/β ratio fails to sufficiently reflect differences between dose-response curves, for example when curves with similar α/β ratio but different overall steepness are being compared. In such situations, the interpretation of the LQ model is severely limited. Colony formation assays were performed in order to measure the clonogenic survival of nine human pancreatic cancer cell lines and immortalized human pancreatic ductal epithelial cells upon irradiation at 0-10 Gy. The resulting dataset was subjected to LQ regression and non-linear log-logistic regression. Dimensionality reduction of the data was performed by cluster analysis and principal component analysis. Both the LQ model and the non-linear log-logistic regression model resulted in accurate approximations of the observed dose-response relationships in the dataset of clonogenic survival. However, in contrast to the LQ model the non-linear regression model allowed the discrimination of curves with different overall steepness but similar α/β ratio and revealed an improved goodness-of-fit. Additionally, the estimated parameters in the non-linear model exhibit a more direct interpretation than the α/β ratio. Dimensionality reduction of clonogenic survival data by means of cluster analysis was shown to be a useful tool for classifying radioresistant and sensitive cell lines. More quantitatively, principal component analysis allowed

  5. A Simple Piece of Apparatus to Aid the Understanding of the Relationship between Angular Velocity and Linear Velocity

    Science.gov (United States)

    Unsal, Yasin

    2011-01-01

    One of the subjects that is confusing and difficult for students to fully comprehend is the concept of angular velocity and linear velocity. It is the relationship between linear and angular velocity that students find difficult; most students understand linear motion in isolation. In this article, we detail the design, construction and…

  6. Prolificacy and Its Relationship with Age, Body Weight, Parity, Previous Litter Size and Body Linear Type Traits in Meat-type Goats

    Directory of Open Access Journals (Sweden)

    Avijit Haldar

    2014-05-01

    Full Text Available Data on age and body weight at breeding, parity, previous litter size, days open and some descriptive body linear traits from 389 meat-type, prolific Black Bengal goats in Tripura State of India, were collected for 3 and 1/2 years (2007 to 2010 and analyzed using logistic regression model. The objectives of the study were i to evaluate the effect of age and body weight at breeding, parity, previous litter size and days open on litter size of does; and ii to investigate if body linear type traits influenced litter size in meat-type, prolific goats. The incidence of 68.39% multiple births with a prolificacy rate of 175.07% was recorded. Higher age (>2.69 year, higher parity order (>2.31, more body weight at breeding (>20.5 kg and larger previous litter size (>1.65 showed an increase likelihood of multiple litter size when compared to single litter size. There was a strong, positive relationship between litter size and various body linear type traits like neck length (>22.78 cm, body length (>54.86 cm, withers height (>48.85 cm, croup height (>50.67 cm, distance between tuber coxae bones (>11.38 cm and distance between tuber ischii bones (>4.56 cm for discriminating the goats bearing multiple fetuses from those bearing a single fetus.

  7. Analysis of the Covered Electrode Welding Process Stability on the Basis of Linear Regression Equation

    Directory of Open Access Journals (Sweden)

    Słania J.

    2014-10-01

    Full Text Available The article presents the process of production of coated electrodes and their welding properties. The factors concerning the welding properties and the currently applied method of assessing are given. The methodology of the testing based on the measuring and recording of instantaneous values of welding current and welding arc voltage is discussed. Algorithm for creation of reference data base of the expert system is shown, aiding the assessment of covered electrodes welding properties. The stability of voltage–current characteristics was discussed. Statistical factors of instantaneous values of welding current and welding arc voltage waveforms used for determining of welding process stability are presented. The results of coated electrodes welding properties are compared. The article presents the results of linear regression as well as the impact of the independent variables on the welding process performance. Finally the conclusions drawn from the research are given.

  8. Relationship between particular dendrobiometrical indicators of natural European beech (Fagus sylvatica L. dendrocenoses in Central Balkan Range

    Directory of Open Access Journals (Sweden)

    Ferezliev Angel

    2017-12-01

    Full Text Available In parallel studies, different regression models were tested to identify relationships between particular dendrobiometrical indicators on two sample plots representing forests dominated by the European beech in the Central Balkan Range (Bulgaria. The presence of incomplete multicollinearity was studied through correlation matrix for factor variables. To avoid multicollinear negative impact, step multiple regression was applied and adequate regression equations of the relationships under consideration were formulated. The results of statistical analysis confirmed that the link between the investigated indicators is strong and that the ’cloud‘ data show some ’sphericity‘ and distribution close to normal. In one of the sample plots, one major volume-forming factor – height does not participate in the obtained regression equation, so it is not possible to estimate its influence. By testing linear and several nonlinear regression dependencies and by mediating widely used statistical criterions for model selection, the optimal linear model of the considered link was chosen.

  9. The calculated reference value of the tubular extraction rate in infants and children. An attempt to use a new regression equation

    International Nuclear Information System (INIS)

    Watanabe, Nami; Sugai Yukio; Komatani, Akio; Yamaguchi, Koichi; Takahashi, Kazuei

    1999-01-01

    This study was designed to investigate the empirical tubular extraction rate (TER) of the normal renal function in childhood and then propose a new equation to obtain TER theoretically. The empirical TER was calculated using Russell's method for determination of single-sample plasma clearance and 99m Tc-MAG 3 in 40 patients with renal disease younger than 10 years of age who were classified as having normal renal function using diagnostic criteria defined by the Paediatric Task Group of EANM. First, we investigated the relationships of the empirical value of absolute TER to age, body weight, body surface area (BSA) and distribution volume. Next we investigated the relationships of the empirical value of BSA corrected TER to age, body weight, BSA and distribution volume. Linear relationship was indicated between the absolute TER and each body dimensional factors, especially regarding to BSA, its correlation coefficient was 0.90 (p value). The BSA-corrected TER showed a logarithmic relationship with BSA, but linear regression did not show any significant correlation. Therefore, it was thought that the normal value of TER could be calculated theoretically using the body surface area, and here we proposed the following linear regression equation; Theoretical TER (ml/min/1.73 m 2 )=(-39.8+257.2 x BSA)/BSA/1.73. The theoretical TER could be one of the reference values of the renal function in the period of the renal maturation. (author)

  10. THE RELATIONSHIP BETWEEN SATISFACTION WITH LIFE AND EMPLOYEE ENGAGEMENT

    Directory of Open Access Journals (Sweden)

    Anton Vorina

    2013-04-01

    Full Text Available Modern organizations need dedicated employees who are engaged with their work. The theme of employee engagement has generated a great deal of attention among many human resource practitioners and academic researchers across the world. In this paper we present an analysis the relationship between satisfaction with life and employee engagement in a casual sample of 1006 respondents in Slovenia. Based on multipla linear regression analysis, we found that relation between satisfaction with life and employee engagement is statistically significant (F: 381.80, Sig.: 0.000. Among two evaluated multiple regression models, as the most appropriate, the multiple linear regression model with one regressors (satisfaction with life and sample size of 1006. We found out that the engagement of employee would increase if the satisfaction with life increase.

  11. A Poisson Regression Examination of the Relationship between Website Traffic and Search Engine Queries

    OpenAIRE

    Tierney, Heather L.R.; Pan, Bing

    2010-01-01

    A new area of research involves the use of Google data, which has been normalized and scaled to predict economic activity. This new source of data holds both many advantages as well as disadvantages, which are discussed through the use of daily and weekly data. Daily and weekly data are employed to show the effect of aggregation as it pertains to Google data, which can lead to contradictory findings. In this paper, Poisson regressions are used to explore the relationship between the online...

  12. Interpreting Multiple Linear Regression: A Guidebook of Variable Importance

    Science.gov (United States)

    Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim

    2012-01-01

    Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…

  13. Understanding logistic regression analysis.

    Science.gov (United States)

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.

  14. QRank: a novel quantile regression tool for eQTL discovery.

    Science.gov (United States)

    Song, Xiaoyu; Li, Gen; Zhou, Zhenwei; Wang, Xianling; Ionita-Laza, Iuliana; Wei, Ying

    2017-07-15

    Over the past decade, there has been a remarkable improvement in our understanding of the role of genetic variation in complex human diseases, especially via genome-wide association studies. However, the underlying molecular mechanisms are still poorly characterized, impending the development of therapeutic interventions. Identifying genetic variants that influence the expression level of a gene, i.e. expression quantitative trait loci (eQTLs), can help us understand how genetic variants influence traits at the molecular level. While most eQTL studies focus on identifying mean effects on gene expression using linear regression, evidence suggests that genetic variation can impact the entire distribution of the expression level. Motivated by the potential higher order associations, several studies investigated variance eQTLs. In this paper, we develop a Quantile Rank-score based test (QRank), which provides an easy way to identify eQTLs that are associated with the conditional quantile functions of gene expression. We have applied the proposed QRank to the Genotype-Tissue Expression project, an international tissue bank for studying the relationship between genetic variation and gene expression in human tissues, and found that the proposed QRank complements the existing methods, and identifies new eQTLs with heterogeneous effects across different quantile levels. Notably, we show that the eQTLs identified by QRank but missed by linear regression are associated with greater enrichment in genome-wide significant SNPs from the GWAS catalog, and are also more likely to be tissue specific than eQTLs identified by linear regression. An R package is available on R CRAN at https://cran.r-project.org/web/packages/QRank . xs2148@cumc.columbia.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  15. Regression analysis using dependent Polya trees.

    Science.gov (United States)

    Schörgendorfer, Angela; Branscum, Adam J

    2013-11-30

    Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.

  16. Nonparametric Regression Estimation for Multivariate Null Recurrent Processes

    Directory of Open Access Journals (Sweden)

    Biqing Cai

    2015-04-01

    Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.

  17. Comparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds.

    Science.gov (United States)

    Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena

    2013-01-01

    The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  18. Air - water temperature relationships in the trout streams of southeastern Minnesota’s carbonate - sandstone landscape

    Science.gov (United States)

    Krider, Lori A.; Magner, Joseph A.; Perry, Jim; Vondracek, Bruce C.; Ferrington, Leonard C.

    2013-01-01

    Carbonate-sandstone geology in southeastern Minnesota creates a heterogeneous landscape of springs, seeps, and sinkholes that supply groundwater into streams. Air temperatures are effective predictors of water temperature in surface-water dominated streams. However, no published work investigates the relationship between air and water temperatures in groundwater-fed streams (GWFS) across watersheds. We used simple linear regressions to examine weekly air-water temperature relationships for 40 GWFS in southeastern Minnesota. A 40-stream, composite linear regression model has a slope of 0.38, an intercept of 6.63, and R2 of 0.83. The regression models for GWFS have lower slopes and higher intercepts in comparison to surface-water dominated streams. Regression models for streams with high R2 values offer promise for use as predictive tools for future climate conditions. Climate change is expected to alter the thermal regime of groundwater-fed systems, but will do so at a slower rate than surface-water dominated systems. A regression model of intercept vs. slope can be used to identify streams for which water temperatures are more meteorologically than groundwater controlled, and thus more vulnerable to climate change. Such relationships can be used to guide restoration vs. management strategies to protect trout streams.

  19. Fungible weights in logistic regression.

    Science.gov (United States)

    Jones, Jeff A; Waller, Niels G

    2016-06-01

    In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  20. Finding-equal regression method and its application in predication of U resources

    International Nuclear Information System (INIS)

    Cao Huimo

    1995-03-01

    The commonly adopted deposit model method in mineral resources predication has two main part: one is model data that show up geological mineralization law for deposit, the other is statistics predication method that accords with characters of the data namely pretty regression method. This kind of regression method may be called finding-equal regression, which is made of the linear regression and distribution finding-equal method. Because distribution finding-equal method is a data pretreatment which accords with advanced mathematical precondition for the linear regression namely equal distribution theory, and this kind of data pretreatment is possible of realization. Therefore finding-equal regression not only can overcome nonlinear limitations, that are commonly occurred in traditional linear regression or other regression and always have no solution, but also can distinguish outliers and eliminate its weak influence, which would usually appeared when Robust regression possesses outlier in independent variables. Thus this newly finding-equal regression stands the best status in all kind of regression methods. Finally, two good examples of U resource quantitative predication are provided

  1. Convert a low-cost sensor to a colorimeter using an improved regression method

    Science.gov (United States)

    Wu, Yifeng

    2008-01-01

    Closed loop color calibration is a process to maintain consistent color reproduction for color printers. To perform closed loop color calibration, a pre-designed color target should be printed, and automatically measured by a color measuring instrument. A low cost sensor has been embedded to the printer to perform the color measurement. A series of sensor calibration and color conversion methods have been developed. The purpose is to get accurate colorimetric measurement from the data measured by the low cost sensor. In order to get high accuracy colorimetric measurement, we need carefully calibrate the sensor, and minimize all possible errors during the color conversion. After comparing several classical color conversion methods, a regression based color conversion method has been selected. The regression is a powerful method to estimate the color conversion functions. But the main difficulty to use this method is to find an appropriate function to describe the relationship between the input and the output data. In this paper, we propose to use 1D pre-linearization tables to improve the linearity between the input sensor measuring data and the output colorimetric data. Using this method, we can increase the accuracy of the regression method, so as to improve the accuracy of the color conversion.

  2. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models.

    Science.gov (United States)

    Forkuor, Gerald; Hounkpatin, Ozias K L; Welp, Gerhard; Thiel, Michael

    2017-01-01

    Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness

  3. Tests of the linearity assumption in the dose-effect relationship for radiation-induced cancer

    International Nuclear Information System (INIS)

    Cohen, A.F.; Cohen, B.L.

    1978-01-01

    The validity of the BEIR linear extrapolation to low doses of the dose-effect relationship for radiation induced cancer is tested by use of natural radiation making use of selectivity on type of cancer, sex, age group, geographic area, and time period. For lung cancer, a linear interpolation between zero dose-zero effect and the data from radon-induced cancers in miners over-estimates the total number of observed lung cancers in many countries in the early years of this century; the discrepancy is substantially increased if the 30-44 year age range and/or if only females are considered, and by the fact that many other causes of lung cancer are shown to have been important at that time. The degree to which changes of diagnostic efficiency with time can influence the analysis is considered at some length. It is concluded that the linear relationship substantially over-estimates effects of low radiation doses. A similar analysis is applied to leukemia induced by natural radiation, applying selectivity by age, sex, natural background level, and date, and considering other causes. It is concluded that effects substantially larger than those obtained from linear extrapolation are excluded. The use of the selectivities mentioned above is justified by the fact that the incidence of cancer or leukemia is an upper limit on the rate at which it is caused by radiation effects; in determining upper limits it is justifiable to select situations which minimize it. (author)

  4. Multiple Linear Regression Model Based on Neural Network and Its Application in the MBR Simulation

    Directory of Open Access Journals (Sweden)

    Chunqing Li

    2012-01-01

    Full Text Available The computer simulation of the membrane bioreactor MBR has become the research focus of the MBR simulation. In order to compensate for the defects, for example, long test period, high cost, invisible equipment seal, and so forth, on the basis of conducting in-depth study of the mathematical model of the MBR, combining with neural network theory, this paper proposed a three-dimensional simulation system for MBR wastewater treatment, with fast speed, high efficiency, and good visualization. The system is researched and developed with the hybrid programming of VC++ programming language and OpenGL, with a multifactor linear regression model of affecting MBR membrane fluxes based on neural network, applying modeling method of integer instead of float and quad tree recursion. The experiments show that the three-dimensional simulation system, using the above models and methods, has the inspiration and reference for the future research and application of the MBR simulation technology.

  5. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    Science.gov (United States)

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  6. Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.

    Science.gov (United States)

    Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu

    2016-08-01

    This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Statistical learning techniques applied to epidemiology: a simulated case-control comparison study with logistic regression

    Directory of Open Access Journals (Sweden)

    Land Walker H

    2011-01-01

    Full Text Available Abstract Background When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. Results The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. Conclusions The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.

  8. Linear proportional relationship between N(OH) and N(CH) in the diffuse interstellar medium

    Science.gov (United States)

    Hong, Seung Yeong; Kwak, Kyujin

    2018-04-01

    It has been known that there is a linearly proportional relationship between the column densities of CH and OH measured toward bright UV-emitting stars, although there are four outliers in this relationship among the total 24 measured targets. By using the Simbad database, we investigate reasonable configurations of diffuse interstellar medium (ISM) which could explain the observed relationship. We first identify the locations of 24 targets on the celestial sphere getting the distances to them and then count the number of molecular clouds, nebulae, and peculiar stars toward the targets which could contribute to the production of OH and CH. We present the results of our search by testing three hypothetical configurations of diffuse ISM which may explain the observed relationship.

  9. Linear solvation energy relationships: "rule of thumb" for estimation of variable values

    Science.gov (United States)

    Hickey, James P.; Passino-Reader, Dora R.

    1991-01-01

    For the linear solvation energy relationship (LSER), values are listed for each of the variables (Vi/100, π*, &betam, αm) for fundamental organic structures and functional groups. We give the guidelines to estimate LSER variable values quickly for a vast array of possible organic compounds such as those found in the environment. The difficulty in generating these variables has greatly discouraged the application of this quantitative structure-activity relationship (QSAR) method. This paper present the first compilation of molecular functional group values together with a utilitarian set of the LSER variable estimation rules. The availability of these variable values and rules should facilitate widespread application of LSER for hazard evaluation of environmental contaminants.

  10. Linear-regression convolutional neural network for fully automated coronary lumen segmentation in intravascular optical coherence tomography

    Science.gov (United States)

    Yong, Yan Ling; Tan, Li Kuo; McLaughlin, Robert A.; Chee, Kok Han; Liew, Yih Miin

    2017-12-01

    Intravascular optical coherence tomography (OCT) is an optical imaging modality commonly used in the assessment of coronary artery diseases during percutaneous coronary intervention. Manual segmentation to assess luminal stenosis from OCT pullback scans is challenging and time consuming. We propose a linear-regression convolutional neural network to automatically perform vessel lumen segmentation, parameterized in terms of radial distances from the catheter centroid in polar space. Benchmarked against gold-standard manual segmentation, our proposed algorithm achieves average locational accuracy of the vessel wall of 22 microns, and 0.985 and 0.970 in Dice coefficient and Jaccard similarity index, respectively. The average absolute error of luminal area estimation is 1.38%. The processing rate is 40.6 ms per image, suggesting the potential to be incorporated into a clinical workflow and to provide quantitative assessment of vessel lumen in an intraoperative time frame.

  11. Pengaruh Customer Relationship Management Terhadap Loyalitas Pelanggan Indosat Bengkulu

    OpenAIRE

    Yulinda, Ade Tiara

    2017-01-01

    Ade Tiara Yulinda: The objectives of this research is to analyze the influence of customer relationship management (technology, people, process and knowledge) on customer's loyalty at PT. Indosat Bengkulu. Using survey research, analysis techniques using a likert scale and using multiple linear regression analysis. Samples used were 100 respondents. The result of this research, can be concluded that the customer relationship management in the variable of technology, people, process, knowledge...

  12. Variable selection in multiple linear regression: The influence of ...

    African Journals Online (AJOL)

    provide an indication of whether the fit of the selected model improves or ... and calculate M(−i); quantify the influence of case i in terms of a function, f(•), of M and ..... [21] Venter JH & Snyman JLJ, 1997, Linear model selection based on risk ...

  13. Antimicrobial efficacy of Curcuma zedoaria extract as assessed by linear regression compared with commercial mouthrinses Eficácia antimicrobiana do extrato de Curcuma zedoaria avaliada por regressão linear comparada com anti-sépticos bucais comerciais

    Directory of Open Access Journals (Sweden)

    Adriana Bugno

    2007-09-01

    Full Text Available The antimicrobial activity of Curcuma zedoaria (Christm Roscoe extract against some oral microorganisms was compared with the antimicrobial activity of five commercial mouthrinses in order to evaluate the potential of the plant extract to be incorporated into formulas for improving or creating antiseptic activity. The in vitro antimicrobial efficacy of plant extracts and commercial products were evaluated against Streptococcus mutans, Enterococcus faecalis, Staphylococcus aureus and Candida albicans using a linear regression method to evaluate the microbial reduction obtained in function of the exposure time, considering as effectiveness a 99.999% reduction in count of standardized microbial populations within 60 seconds. The results showed that the antimicrobial efficacy of Curcuma zedoaria (Christm Roscoe extract was similar to that of commercial products, and its incorporation into a mouthrinse could be an alternative for improving the antimicrobial efficacy of the oral product.A atividade antimicrobiana do extrato de Curcuma zedoaria (Christm Roscoe contra algumas bactérias da microbiota bucal foi comparada com a atividade antimicrobiana de cinco anti-sépticos comerciais, a fim de avaliar o potencial do extrato vegetal de ser incorporado em formulações com a finalidade de melhorar ou conferir atividade anti-séptica. A eficácia antimicrobiana in vitro do extrato vegetal e produtos comerciais foi avaliada frente a Streptococcus mutans,Enterococcus faecalis,Staphylococcus aureus e Candida albicans, utilizando o método de regressão linear para avaliar a redução microbiana obtida em função do tempo de exposição, considerando como eficácia a redução de 99,999% na contagem de população microbiana padronizada em 60 segundos. Os resultados demonstraram que a eficácia antimicrobiana do extrato de Curcuma zedoaria (Christm Roscoe foi similar a de produtos comerciais e que sua incorporação em anti-sépticos bucais pode ser uma

  14. Assessing the human cardiovascular response to moderate exercise: feature extraction by support vector regression

    International Nuclear Information System (INIS)

    Wang, Lu; Su, Steven W; Celler, Branko G; Chan, Gregory S H; Cheng, Teddy M; Savkin, Andrey V

    2009-01-01

    This study aims to quantitatively describe the steady-state relationships among percentage changes in key central cardiovascular variables (i.e. stroke volume, heart rate (HR), total peripheral resistance and cardiac output), measured using non-invasive means, in response to moderate exercise, and the oxygen uptake rate, using a new nonlinear regression approach—support vector regression. Ten untrained normal males exercised in an upright position on an electronically braked cycle ergometer with constant workloads ranging from 25 W to 125 W. Throughout the experiment, .VO 2 was determined breath by breath and the HR was monitored beat by beat. During the last minute of each exercise session, the cardiac output was measured beat by beat using a novel non-invasive ultrasound-based device and blood pressure was measured using a tonometric measurement device. Based on the analysis of experimental data, nonlinear steady-state relationships between key central cardiovascular variables and .VO 2 were qualitatively observed except for the HR which increased linearly as a function of increasing .VO 2 . Quantitative descriptions of these complex nonlinear behaviour were provided by nonparametric models which were obtained by using support vector regression

  15. Non-linear auto-regressive models for cross-frequency coupling in neural time series

    Science.gov (United States)

    Tallot, Lucille; Grabot, Laetitia; Doyère, Valérie; Grenier, Yves; Gramfort, Alexandre

    2017-01-01

    We address the issue of reliably detecting and quantifying cross-frequency coupling (CFC) in neural time series. Based on non-linear auto-regressive models, the proposed method provides a generative and parametric model of the time-varying spectral content of the signals. As this method models the entire spectrum simultaneously, it avoids the pitfalls related to incorrect filtering or the use of the Hilbert transform on wide-band signals. As the model is probabilistic, it also provides a score of the model “goodness of fit” via the likelihood, enabling easy and legitimate model selection and parameter comparison; this data-driven feature is unique to our model-based approach. Using three datasets obtained with invasive neurophysiological recordings in humans and rodents, we demonstrate that these models are able to replicate previous results obtained with other metrics, but also reveal new insights such as the influence of the amplitude of the slow oscillation. Using simulations, we demonstrate that our parametric method can reveal neural couplings with shorter signals than non-parametric methods. We also show how the likelihood can be used to find optimal filtering parameters, suggesting new properties on the spectrum of the driving signal, but also to estimate the optimal delay between the coupled signals, enabling a directionality estimation in the coupling. PMID:29227989

  16. Magnitude conversion to unified moment magnitude using orthogonal regression relation

    Science.gov (United States)

    Das, Ranjit; Wason, H. R.; Sharma, M. L.

    2012-05-01

    Homogenization of earthquake catalog being a pre-requisite for seismic hazard assessment requires region based magnitude conversion relationships. Linear Standard Regression (SR) relations fail when both the magnitudes have measurement errors. To accomplish homogenization, techniques like Orthogonal Standard Regression (OSR) are thus used. In this paper a technique is proposed for using such OSR for preparation of homogenized earthquake catalog in moment magnitude Mw. For derivation of orthogonal regression relation between mb and Mw, a data set consisting of 171 events with observed body wave magnitudes (mb,obs) and moment magnitude (Mw,obs) values has been taken from ISC and GCMT databases for Northeast India and adjoining region for the period 1978-2006. Firstly, an OSR relation given below has been developed using mb,obs and Mw,obs values corresponding to 150 events from this data set. M=1.3(±0.004)m-1.4(±0.130), where mb,proxy are body wave magnitude values of the points on the OSR line given by the orthogonality criterion, for observed (mb,obs, Mw,obs) points. A linear relation is then developed between these 150 mb,obs values and corresponding mb,proxy values given by the OSR line using orthogonality criterion. The relation obtained is m=0.878(±0.03)m+0.653(±0.15). The accuracy of the above procedure has been checked with the rest of the data i.e., 21 events values. The improvement in the correlation coefficient value between mb,obs and Mw estimated using the proposed procedure compared to the correlation coefficient value between mb,obs and Mw,obs shows the advantage of OSR relationship for homogenization. The OSR procedure developed in this study can be used to homogenize any catalog containing various magnitudes (e.g., ML, mb, MS) with measurement errors, by their conversion to unified moment magnitude Mw. The proposed procedure also remains valid in case the magnitudes have measurement errors of different orders, i.e. the error variance ratio is

  17. Wavelet regression model in forecasting crude oil price

    Science.gov (United States)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  18. Modelling of binary logistic regression for obesity among secondary students in a rural area of Kedah

    Science.gov (United States)

    Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

    2014-07-01

    Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.

  19. Linear relationship between peak and season-long abundances in insects.

    Directory of Open Access Journals (Sweden)

    Ksenia S Onufrieva

    Full Text Available An accurate quantitative relationship between key characteristics of an insect population, such as season-long and peak abundances, can be very useful in pest management programs. To the best of our knowledge, no such relationship has yet been established. Here we establish a predictive linear relationship between insect catch Mpw during the week of peak abundance, the length of seasonal flight period, F (number of weeks and season-long cumulative catch (abundance A = 0.41MpwF. The derivation of the equation is based on several general assumptions and does not involve fitting to experimental data, which implies generality of the result. A quantitative criterion for the validity of the model is presented. The equation was tested using extensive data collected on captures of male gypsy moths Lymantria dispar (L. (Lepidoptera: Erebidae in pheromone-baited traps during 15 years. The model was also tested using trap catch data for two species of mosquitoes, Culex pipiens (L. (Diptera: Culicidae and Aedes albopictus (Skuse (Diptera: Culicidae, in Gravid and BG-sentinel mosquito traps, respectively. The simple, parameter-free equation approximates experimental data points with relative error of 13% and R2 = 0.997, across all of the species tested. For gypsy moth, we also related season-long and weekly trap catches to the daily trap catches during peak flight. We describe several usage scenarios, in which the derived relationships are employed to help link results of small-scale field studies to the operational pest management programs.

  20. The Relationship between Counselors' Multicultural Counseling Competence and Poverty Beliefs

    Science.gov (United States)

    Clark, Madeline; Moe, Jeff; Hays, Danica G.

    2017-01-01

    The authors explored the relationship between counselors' multicultural counseling competence (MCC), poverty beliefs, and select demographic factors. Results of hierarchical linear regressions indicate that MCC is predictive of counselor individualistic and structural poverty beliefs. Implications for counselor multicultural training and immersion…