Olive, David J
2017-01-01
This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Weisberg, Sanford
2005-01-01
Master linear regression techniques with a new edition of a classic text Reviews of the Second Edition: ""I found it enjoyable reading and so full of interesting material that even the well-informed reader will probably find something new . . . a necessity for all of those who do linear regression."" -Technometrics, February 1987 ""Overall, I feel that the book is a valuable addition to the now considerable list of texts on applied linear regression. It should be a strong contender as the leading text for a first serious course in regression analysis."" -American Scientist, May-June 1987
Weisberg, Sanford
2013-01-01
Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus
Scaled Sparse Linear Regression
Sun, Tingni
2011-01-01
Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual squares and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs nearly nothing beyond the computation of a path of the sparse regression estimator for penalty levels above a threshold. For the scaled Lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the method yields simultaneously an estimator for the noise level and an estimated coefficient vector in the Lasso path satisfying certain oracle inequalities for the estimation of the noise level, prediction, and the estimation of regression coefficients. These oracle inequalities provide sufficient conditions for the consistency and asymptotic...
Recursive Algorithm For Linear Regression
Varanasi, S. V.
1988-01-01
Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.
Multiple linear regression analysis
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Seber, George A F
2012-01-01
Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.
Linear regression in astronomy. II
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Linear regression in astronomy. I
Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh
1990-01-01
Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
Inferential Models for Linear Regression
Zuoyi Zhang
2011-09-01
Full Text Available Linear regression is arguably one of the most widely used statistical methods in applications. However, important problems, especially variable selection, remain a challenge for classical modes of inference. This paper develops a recently proposed framework of inferential models (IMs in the linear regression context. In general, an IM is able to produce meaningful probabilistic summaries of the statistical evidence for and against assertions about the unknown parameter of interest and, moreover, these summaries are shown to be properly calibrated in a frequentist sense. Here we demonstrate, using simple examples, that the IM framework is promising for linear regression analysis --- including model checking, variable selection, and prediction --- and for uncertain inference in general.
Practical Session: Multiple Linear Regression
Clausel, M.; Grégoire, G.
2014-12-01
Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr33.pdf) and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_5.pdf).
Knowledge and Awareness: Linear Regression
Monika Raghuvanshi
2016-12-01
Full Text Available Knowledge and awareness are factors guiding development of an individual. These may seem simple and practicable, but in reality a proper combination of these is a complex task. Economically driven state of development in younger generations is an impediment to the correct manner of development. As youths are at the learning phase, they can be molded to follow a correct lifestyle. Awareness and knowledge are important components of any formal or informal environmental education. The purpose of this study is to evaluate the relationship of these components among students of secondary/ senior secondary schools who have undergone a formal study of environment in their curricula. A suitable instrument is developed in order to measure the elements of Awareness and Knowledge among the participants of the study. Data was collected from various secondary and senior secondary school students in the age group 14 to 20 years using cluster sampling technique from the city of Bikaner, India. Linear regression analysis was performed using IBM SPSS 23 statistical tool. There exists a weak relation between knowledge and awareness about environmental issues, caused due to routine practices mishandling; hence one component can be complemented by other for improvement in both. Knowledge and awareness are crucial factors and can provide huge opportunities in any field. Resource utilization for economic solutions may pave the way for eco-friendly products and practices. If green practices are inculcated at the learning phase, they may become normal routine. This will also help in repletion of the environment.
Varying-coefficient functional linear regression
Wu, Yichao; Müller, Hans-Georg; 10.3150/09-BEJ231
2011-01-01
Functional linear regression analysis aims to model regression relations which include a functional predictor. The analog of the regression parameter vector or matrix in conventional multivariate or multiple-response linear regression models is a regression parameter function in one or two arguments. If, in addition, one has scalar predictors, as is often the case in applications to longitudinal studies, the question arises how to incorporate these into a functional regression model. We study a varying-coefficient approach where the scalar covariates are modeled as additional arguments of the regression parameter function. This extension of the functional linear regression model is analogous to the extension of conventional linear regression models to varying-coefficient models and shares its advantages, such as increased flexibility; however, the details of this extension are more challenging in the functional case. Our methodology combines smoothing methods with regularization by truncation at a finite numb...
Functional linear regression via canonical analysis
He, Guozhong; Wang, Jane-Ling; Yang, Wenjing; 10.3150/09-BEJ228
2011-01-01
We study regression models for the situation where both dependent and independent variables are square-integrable stochastic processes. Questions concerning the definition and existence of the corresponding functional linear regression models and some basic properties are explored for this situation. We derive a representation of the regression parameter function in terms of the canonical components of the processes involved. This representation establishes a connection between functional regression and functional canonical analysis and suggests alternative approaches for the implementation of functional linear regression analysis. A specific procedure for the estimation of the regression parameter function using canonical expansions is proposed and compared with an established functional principal component regression approach. As an example of an application, we present an analysis of mortality data for cohorts of medflies, obtained in experimental studies of aging and longevity.
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.
Discriminative Elastic-Net Regularized Linear Regression.
Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen
2017-03-01
In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.
LINEAR REGRESSION WITH R AND HADOOP
Bogdan OANCEA
2015-07-01
Full Text Available In this paper we present a way to solve the linear regression model with R and Hadoop using the Rhadoop library. We show how the linear regression model can be solved even for very large models that require special technologies. For storing the data we used Hadoop and for computation we used R. The interface between R and Hadoop is the open source library RHadoop. We present the main features of the Hadoop and R software systems and the way of interconnecting them. We then show how the least squares solution for the linear regression problem could be expressed in terms of map-reduce programming paradigm and how could be implemented using the Rhadoop library.
Removing Malmquist bias from linear regressions
Verter, Frances
1993-01-01
Malmquist bias is present in all astronomical surveys where sources are observed above an apparent brightness threshold. Those sources which can be detected at progressively larger distances are progressively more limited to the intrinsically luminous portion of the true distribution. This bias does not distort any of the measurements, but distorts the sample composition. We have developed the first treatment to correct for Malmquist bias in linear regressions of astronomical data. A demonstration of the corrected linear regression that is computed in four steps is presented.
Finite Algorithms for Robust Linear Regression
Madsen, Kaj; Nielsen, Hans Bruun
1990-01-01
The Huber M-estimator for robust linear regression is analyzed. Newton type methods for solution of the problem are defined and analyzed, and finite convergence is proved. Numerical experiments with a large number of test problems demonstrate efficiency and indicate that this kind of approach may...
Finite Algorithms for Robust Linear Regression
Madsen, Kaj; Nielsen, Hans Bruun
1990-01-01
The Huber M-estimator for robust linear regression is analyzed. Newton type methods for solution of the problem are defined and analyzed, and finite convergence is proved. Numerical experiments with a large number of test problems demonstrate efficiency and indicate that this kind of approach may...
Post-processing through linear regression
B. Van Schaeybroeck
2011-03-01
Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.
These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Post-processing through linear regression
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Controlling attribute effect in linear regression
Calders, Toon
2013-12-01
In data mining we often have to learn from biased data, because, for instance, data comes from different batches or there was a gender or racial bias in the collection of social data. In some applications it may be necessary to explicitly control this bias in the models we learn from the data. This paper is the first to study learning linear regression models under constraints that control the biasing effect of a given attribute such as gender or batch number. We show how propensity modeling can be used for factoring out the part of the bias that can be justified by externally provided explanatory attributes. Then we analytically derive linear models that minimize squared error while controlling the bias by imposing constraints on the mean outcome or residuals of the models. Experiments with discrimination-aware crime prediction and batch effect normalization tasks show that the proposed techniques are successful in controlling attribute effects in linear regression models. © 2013 IEEE.
Genetic Programming Transforms in Linear Regression Situations
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
A tutorial on Bayesian Normal linear regression
Klauenberg, Katy; Wübbeler, Gerd; Mickan, Bodo; Harris, Peter; Elster, Clemens
2015-12-01
Regression is a common task in metrology and often applied to calibrate instruments, evaluate inter-laboratory comparisons or determine fundamental constants, for example. Yet, a regression model cannot be uniquely formulated as a measurement function, and consequently the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements are not applicable directly. Bayesian inference, however, is well suited to regression tasks, and has the advantage of accounting for additional a priori information, which typically robustifies analyses. Furthermore, it is anticipated that future revisions of the GUM shall also embrace the Bayesian view. Guidance on Bayesian inference for regression tasks is largely lacking in metrology. For linear regression models with Gaussian measurement errors this tutorial gives explicit guidance. Divided into three steps, the tutorial first illustrates how a priori knowledge, which is available from previous experiments, can be translated into prior distributions from a specific class. These prior distributions have the advantage of yielding analytical, closed form results, thus avoiding the need to apply numerical methods such as Markov Chain Monte Carlo. Secondly, formulas for the posterior results are given, explained and illustrated, and software implementations are provided. In the third step, Bayesian tools are used to assess the assumptions behind the suggested approach. These three steps (prior elicitation, posterior calculation, and robustness to prior uncertainty and model adequacy) are critical to Bayesian inference. The general guidance given here for Normal linear regression tasks is accompanied by a simple, but real-world, metrological example. The calibration of a flow device serves as a running example and illustrates the three steps. It is shown that prior knowledge from previous calibrations of the same sonic nozzle enables robust predictions even for extrapolations.
A Gibbs Sampler for Multivariate Linear Regression
Mantz, Adam B
2015-01-01
Kelly (2007, hereafter K07) described an efficient algorithm, using Gibbs sampling, for performing linear regression in the fairly general case where non-zero measurement errors exist for both the covariates and response variables, where these measurements may be correlated (for the same data point), where the response variable is affected by intrinsic scatter in addition to measurement error, and where the prior distribution of covariates is modeled by a flexible mixture of Gaussians rather than assumed to be uniform. Here I extend the K07 algorithm in two ways. First, the procedure is generalized to the case of multiple response variables. Second, I describe how to model the prior distribution of covariates using a Dirichlet process, which can be thought of as a Gaussian mixture where the number of mixture components is learned from the data. I present an example of multivariate regression using the extended algorithm, namely fitting scaling relations of the gas mass, temperature, and luminosity of dynamica...
Privacy Preserving Linear Regression on Distributed Databases
Fida K. Dankar
2015-04-01
Full Text Available Studies that combine data from multiple sources can tremendously improve the outcome of the statistical analysis. However, combining data from these various sources for analysis poses privacy risks. A number of protocols have been proposed in the literature to address the privacy concerns; however they do not fully deliver on either privacy or complexity. In this paper, we present a (theoretical privacy preserving linear regression model for the analysis of data owned by several sources. The protocol uses a semi-trusted third party and delivers on privacy and complexity.
Neutrosophic Correlation and Simple Linear Regression
A. A. Salama
2014-09-01
Full Text Available Since the world is full of indeterminacy, the neutrosophics found their place into contemporary research. The fundamental concepts of neutrosophic set, introduced by Smarandache. Recently, Salama et al., introduced the concept of correlation coefficient of neutrosophic data. In this paper, we introduce and study the concepts of correlation and correlation coefficient of neutrosophic data in probability spaces and study some of their properties. Also, we introduce and study the neutrosophic simple linear regression model. Possible applications to data processing are touched upon.
Revisit of Sheppard corrections in linear regression
无
2010-01-01
Dempster and Rubin(D&R) in their JRSSB paper considered the statistical error caused by data rounding in a linear regression model and compared the Sheppard correction,BRB correction and the ordinary LSE by simulations.Some asymptotic results when the rounding scale tends to 0 were also presented.In a previous research,we found that the ordinary sample variance of rounded data from normal populations is always inconsistent while the sample mean of rounded data is consistent if and only if the true mean is a multiple of the half rounding scale.In the light of these results,in this paper we further investigate the rounding errors in linear regressions.We notice that these results form the basic reasons that the Sheppard corrections perform better than other methods in D&R examples and their conclusion in general cases is incorrect.Examples in which the Sheppard correction works worse than the BRB correction are also given.Furthermore,we propose a new approach to estimate the parameters,called "two-stage estimator",and establish the consistency and asymptotic normality of the new estimators.
Hierarchical linear regression models for conditional quantiles
TIAN Maozai; CHEN Gemai
2006-01-01
The quantile regression has several useful features and therefore is gradually developing into a comprehensive approach to the statistical analysis of linear and nonlinear response models,but it cannot deal effectively with the data with a hierarchical structure.In practice,the existence of such data hierarchies is neither accidental nor ignorable,it is a common phenomenon.To ignore this hierarchical data structure risks overlooking the importance of group effects,and may also render many of the traditional statistical analysis techniques used for studying data relationships invalid.On the other hand,the hierarchical models take a hierarchical data structure into account and have also many applications in statistics,ranging from overdispersion to constructing min-max estimators.However,the hierarchical models are virtually the mean regression,therefore,they cannot be used to characterize the entire conditional distribution of a dependent variable given high-dimensional covariates.Furthermore,the estimated coefficient vector (marginal effects)is sensitive to an outlier observation on the dependent variable.In this article,a new approach,which is based on the Gauss-Seidel iteration and taking a full advantage of the quantile regression and hierarchical models,is developed.On the theoretical front,we also consider the asymptotic properties of the new method,obtaining the simple conditions for an n1/2-convergence and an asymptotic normality.We also illustrate the use of the technique with the real educational data which is hierarchical and how the results can be explained.
Mahani, Mohamad Khayatzadeh; Chaloosi, Marzieh; Maragheh, Mohamad Ghanadi; Khanchi, Ali Reza; Afzali, Daryoush
2007-09-01
The oral acute in vivo toxicity of 32 amine and amide drugs was related to their structural-dependent properties. Genetic algorithm-partial least-squares and stepwise variable selection was applied to select of meaningful descriptors. Multiple linear regression (MLR), artificial neural network (ANN) and partial least square (PLS) models were created with selected descriptors. The predictive ability of all three models was evaluated and compared on a set of five drugs, which were not used in modeling steps. Average errors of 0.168, 0.169 and 0.259 were obtained for MLR, ANN and PLS, respectively.
Fuzzy multiple linear regression: A computational approach
Juang, C. H.; Huang, X. H.; Fleming, J. W.
1992-01-01
This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.
Linear Regression Based Real-Time Filtering
Misel Batmend
2013-01-01
Full Text Available This paper introduces real time filtering method based on linear least squares fitted line. Method can be used in case that a filtered signal is linear. This constraint narrows a band of potential applications. Advantage over Kalman filter is that it is computationally less expensive. The paper further deals with application of introduced method on filtering data used to evaluate a position of engraved material with respect to engraving machine. The filter was implemented to the CNC engraving machine control system. Experiments showing its performance are included.
Augmenting Data with Published Results in Bayesian Linear Regression
de Leeuw, Christiaan; Klugkist, Irene
2012-01-01
In most research, linear regression analyses are performed without taking into account published results (i.e., reported summary statistics) of similar previous studies. Although the prior density in Bayesian linear regression could accommodate such prior knowledge, formal models for doing so are absent from the literature. The goal of this…
Who Will Win?: Predicting the Presidential Election Using Linear Regression
Lamb, John H.
2007-01-01
This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
Augmenting Data with Published Results in Bayesian Linear Regression
de Leeuw, Christiaan; Klugkist, Irene
2012-01-01
In most research, linear regression analyses are performed without taking into account published results (i.e., reported summary statistics) of similar previous studies. Although the prior density in Bayesian linear regression could accommodate such prior knowledge, formal models for doing so are absent from the literature. The goal of this…
Who Will Win?: Predicting the Presidential Election Using Linear Regression
Lamb, John H.
2007-01-01
This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
Local Linear Regression for Data with AR Errors
Runze Li; Yan Li
2009-01-01
In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques.We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one.From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.
Compound Identification Using Penalized Linear Regression on Metabolomics.
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2016-05-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson's correlation along with the penalized linear regression are proposed in this study.
Use of probabilistic weights to enhance linear regression myoelectric control
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
Use of probabilistic weights to enhance linear regression myoelectric control.
Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J
2015-12-01
Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts' law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p linear regression control. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
Distributed Monitoring of the R2 Statistic for Linear Regression
National Aeronautics and Space Administration — The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and...
Spectral Experts for Estimating Mixtures of Linear Regressions
Chaganty, Arun Tejasvi; Liang, Percy
2013-01-01
Discriminative latent-variable models are typically learned using EM or gradient-based optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative latent-variable model. Our approach relies on a low-rank linear regression to recover a symmetric tensor, which can be factorized into the parameters using a tensor power method. We prove rates of ...
Identification of Influential Points in a Linear Regression Model
Jan Grosz
2011-03-01
Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.
Learning a Nonnegative Sparse Graph for Linear Regression.
Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung
2015-09-01
Previous graph-based semisupervised learning (G-SSL) methods have the following drawbacks: 1) they usually predefine the graph structure and then use it to perform label prediction, which cannot guarantee an overall optimum and 2) they only focus on the label prediction or the graph structure construction but are not competent in handling new samples. To this end, a novel nonnegative sparse graph (NNSG) learning method was first proposed. Then, both the label prediction and projection learning were integrated into linear regression. Finally, the linear regression and graph structure learning were unified within the same framework to overcome these two drawbacks. Therefore, a novel method, named learning a NNSG for linear regression was presented, in which the linear regression and graph learning were simultaneously performed to guarantee an overall optimum. In the learning process, the label information can be accurately propagated via the graph structure so that the linear regression can learn a discriminative projection to better fit sample labels and accurately classify new samples. An effective algorithm was designed to solve the corresponding optimization problem with fast convergence. Furthermore, NNSG provides a unified perceptiveness for a number of graph-based learning methods and linear regression methods. The experimental results showed that NNSG can obtain very high classification accuracy and greatly outperforms conventional G-SSL methods, especially some conventional graph construction methods.
Testing hypotheses for differences between linear regression lines
Stanley J. Zarnoch
2009-01-01
Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
Evaluation of Linear Regression Simultaneous Myoelectric Control Using Intramuscular EMG.
Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J
2016-04-01
The objective of this study was to evaluate the ability of linear regression models to decode patterns of muscle coactivation from intramuscular electromyogram (EMG) and provide simultaneous myoelectric control of a virtual 3-DOF wrist/hand system. Performance was compared to the simultaneous control of conventional myoelectric prosthesis methods using intramuscular EMG (parallel dual-site control)-an approach that requires users to independently modulate individual muscles in the residual limb, which can be challenging for amputees. Linear regression control was evaluated in eight able-bodied subjects during a virtual Fitts' law task and was compared to performance of eight subjects using parallel dual-site control. An offline analysis also evaluated how different types of training data affected prediction accuracy of linear regression control. The two control systems demonstrated similar overall performance; however, the linear regression method demonstrated improved performance for targets requiring use of all three DOFs, whereas parallel dual-site control demonstrated improved performance for targets that required use of only one DOF. Subjects using linear regression control could more easily activate multiple DOFs simultaneously, but often experienced unintended movements when trying to isolate individual DOFs. Offline analyses also suggested that the method used to train linear regression systems may influence controllability. Linear regression myoelectric control using intramuscular EMG provided an alternative to parallel dual-site control for 3-DOF simultaneous control at the wrist and hand. The two methods demonstrated different strengths in controllability, highlighting the tradeoff between providing simultaneous control and the ability to isolate individual DOFs when desired.
Estimating monotonic rates from biological data using local linear regression.
Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R
2017-03-01
Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.
A Bayesian approach to linear regression in astronomy
Sereno, Mauro
2015-01-01
Linear regression is common in astronomical analyses. I discuss a Bayesian hierarchical modeling of data with heteroscedastic and possibly correlated measurement errors and intrinsic scatter. The method fully accounts for time evolution. The slope, the normalization, and the intrinsic scatter of the relation can evolve with the redshift. The intrinsic distribution of the independent variable is approximated using a mixture of Gaussian distributions whose means and standard deviations depend on time. The method can address scatter in the measured independent variable (a kind of Eddington bias), selection effects in the response variable (Malmquist bias), and departure from linearity in form of a knee. I tested the method with toy models and simulations and quantified the effect of biases and inefficient modeling. The R-package LIRA (LInear Regression in Astronomy) is made available to perform the regression.
Exploring compact reinforcement-learning representations with linear regression
Walsh, Thomas J; Diuk, Carlos; Littman, Michael L
2012-01-01
This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together.
Biostatistics Series Module 6: Correlation and Linear Regression.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
On the null distribution of Bayes factors in linear regression
We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...
Common pitfalls in statistical analysis: Linear regression analysis.
Aggarwal, Rakesh; Ranganathan, Priya
2017-01-01
In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.
Common pitfalls in statistical analysis: Linear regression analysis
Rakesh Aggarwal
2017-01-01
Full Text Available In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Fractals with point impact in functional linear regression
McKeague, Ian W; 10.1214/10-AOS791
2010-01-01
This paper develops a point impact linear regression model in which the trajectory of a continuous stochastic process, when evaluated at a sensitive time point, is associated with a scalar response. The proposed model complements and is more interpretable than the functional linear regression approach that has become popular in recent years. The trajectories are assumed to have fractal (self-similar) properties in common with a fractional Brownian motion with an unknown Hurst exponent. Bootstrap confidence intervals based on the least-squares estimator of the sensitive time point are developed. Misspecification of the point impact model by a functional linear model is also investigated. Non-Gaussian limit distributions and rates of convergence determined by the Hurst exponent play an important role.
Simple and multiple linear regression: sample size considerations.
Hanley, James A
2016-11-01
The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright Â© 2016 Elsevier Inc. All rights reserved.
Implementing fuzzy polynomial interpolation (FPI and fuzzy linear regression (LFR
Maria Cristina Floreno
1996-05-01
Full Text Available This paper presents some preliminary results arising within a general framework concerning the development of software tools for fuzzy arithmetic. The program is in a preliminary stage. What has been already implemented consists of a set of routines for elementary operations, optimized functions evaluation, interpolation and regression. Some of these have been applied to real problems.This paper describes a prototype of a library in C++ for polynomial interpolation of fuzzifying functions, a set of routines in FORTRAN for fuzzy linear regression and a program with graphical user interface allowing the use of such routines.
A linear regression solution to the spatial autocorrelation problem
Griffith, Daniel A.
The Moran Coefficient spatial autocorrelation index can be decomposed into orthogonal map pattern components. This decomposition relates it directly to standard linear regression, in which corresponding eigenvectors can be used as predictors. This paper reports comparative results between these linear regressions and their auto-Gaussian counterparts for the following georeferenced data sets: Columbus (Ohio) crime, Ottawa-Hull median family income, Toronto population density, southwest Ohio unemployment, Syracuse pediatric lead poisoning, and Glasgow standard mortality rates, and a small remotely sensed image of the High Peak district. This methodology is extended to auto-logistic and auto-Poisson situations, with selected data analyses including percentage of urban population across Puerto Rico, and the frequency of SIDs cases across North Carolina. These data analytic results suggest that this approach to georeferenced data analysis offers considerable promise.
Deduction of Oral Cancer Using Fuzzy Linear Regression
S. Arulchinnappan
2011-01-01
Full Text Available Problem statement: To examine the risk factors of oral cancer at the earlier stage. Smoking, chewing, and drinking are the major risk factors which cause oral cancer considered as input variables. Approach: A case control study was conducted at JKK Nataraj Dental College and Hospital, during the period from September 2007 to November 2009, in Namakkal District, Tamilnadu, India. Data collected were analyzed using Fuzzy Linear Regression. For this JAVA program was developed. Results: Using this fuzzy linear regression model Smoking, Drinking, and Chewing were identified as potent risk factors of oral cancer. Conclusion: Smoking, drinking, and chewing, are the most dangerous risk factors that will cause oral cancer. This study will help to improve the clinical practice, guidance for analyzing the risk factors of oral cancer.
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
Return-Volatility Relationship: Insights from Linear and Non-Linear Quantile Regression
D.E. Allen (David); A.K. Singh (Abhay); R.J. Powell (Robert); M.J. McAleer (Michael); J. Taylor (James); L. Thomas (Lyn)
2013-01-01
textabstractThe purpose of this paper is to examine the asymmetric relationship between price and implied volatility and the associated extreme quantile dependence using linear and non linear quantile regression approach. Our goal in this paper is to demonstrate that the relationship between the
Smith, Paul F; Ganesh, Siva; Liu, Ping
2013-10-30
Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.
Local Linear Regression on Manifolds and its Geometric Interpretation
Cheng, Ming-Yen
2012-01-01
We study nonparametric regression with high-dimensional data, when the predictors lie on an unknown, lower-dimensional manifold. In this context, recently \\cite{aswani_bickel:2011} suggested performing the conventional local linear regression (LLR) in the ambient space and regularizing the estimation problem using information obtained from learning the manifold locally. By contrast, our approach is to reduce the dimensionality first and then construct the LLR directly on a tangent plane approximation to the manifold. Under mild conditions, asymptotic expressions for the conditional mean squared error of the proposed estimator are derived for both the interior and the boundary cases. One implication of these results is that the optimal convergence rate depends only on the intrinsic dimension $d$ of the manifold, but not on the ambient space dimension $p$. Another implication is that the estimator is design adaptive and automatically adapts to the boundary of the unknown manifold. The bias and variance expressi...
Relative Importance for Linear Regression in R: The Package relaimpo
Ulrike Gromping
2006-09-01
Full Text Available Relative importance is a topic that has seen a lot of interest in recent years, particularly in applied work. The R package relaimpo implements six different metrics for assessing relative importance of regressors in the linear model, two of which are recommended - averaging over orderings of regressors and a newly proposed metric (Feldman 2005 called pmvd. Apart from delivering the metrics themselves, relaimpo also provides (exploratory bootstrap confidence intervals. This paper offers a brief tutorial introduction to the package. The methods and relaimpo’s functionality are illustrated using the data set swiss that is generally available in R. The paper targets readers who have a basic understanding of multiple linear regression. For the background of more advanced aspects, references are provided.
Qiutong Jin
2016-06-01
Full Text Available Estimating the spatial distribution of precipitation is an important and challenging task in hydrology, climatology, ecology, and environmental science. In order to generate a highly accurate distribution map of average annual precipitation for the Loess Plateau in China, multiple linear regression Kriging (MLRK and geographically weighted regression Kriging (GWRK methods were employed using precipitation data from the period 1980–2010 from 435 meteorological stations. The predictors in regression Kriging were selected by stepwise regression analysis from many auxiliary environmental factors, such as elevation (DEM, normalized difference vegetation index (NDVI, solar radiation, slope, and aspect. All predictor distribution maps had a 500 m spatial resolution. Validation precipitation data from 130 hydrometeorological stations were used to assess the prediction accuracies of the MLRK and GWRK approaches. Results showed that both prediction maps with a 500 m spatial resolution interpolated by MLRK and GWRK had a high accuracy and captured detailed spatial distribution data; however, MLRK produced a lower prediction error and a higher variance explanation than GWRK, although the differences were small, in contrast to conclusions from similar studies.
The Role of Data Range in Linear Regression
da Silva, M. A. Salgueiro; Seixas, T. M.
2017-09-01
Measuring one physical quantity as a function of another often requires making some choices prior to the measurement process. Two of these choices are: the data range where measurements should focus and the number (n) of data points to acquire in the chosen data range. Here, we consider data range as the interval of variation of the independent variable (x) that is associated with a given interval of variation of the dependent variable (y). We analyzed the role of the width and lower endpoint of measurement data range on parameter estimation by linear regression. We show that, when feasible, increasing data range width is more effective than increasing the number of data points on the same data range in reducing the uncertainty in the slope of a regression line. Moreover, the uncertainty in the intercept of a regression line depends not only on the number of data points but also on the ratio between the lower endpoint and the width of the measurement data range, reaching its minimum when the dataset is centered at the ordinate axis. Since successful measurement methodologies require a good understanding of factors ruling data analysis, it is pedagogically justified and highly recommended to teach these two subjects alongside each other.
Robust linear registration of CT images using random regression forests
Konukoglu, Ender; Criminisi, Antonio; Pathak, Sayan; Robertson, Duncan; White, Steve; Haynor, David; Siddiqui, Khan
2011-03-01
Global linear registration is a necessary first step for many different tasks in medical image analysis. Comparing longitudinal studies1, cross-modality fusion2, and many other applications depend heavily on the success of the automatic registration. The robustness and efficiency of this step is crucial as it affects all subsequent operations. Most common techniques cast the linear registration problem as the minimization of a global energy function based on the image intensities. Although these algorithms have proved useful, their robustness in fully automated scenarios is still an open question. In fact, the optimization step often gets caught in local minima yielding unsatisfactory results. Recent algorithms constrain the space of registration parameters by exploiting implicit or explicit organ segmentations, thus increasing robustness4,5. In this work we propose a novel robust algorithm for automatic global linear image registration. Our method uses random regression forests to estimate posterior probability distributions for the locations of anatomical structures - represented as axis aligned bounding boxes6. These posterior distributions are later integrated in a global linear registration algorithm. The biggest advantage of our algorithm is that it does not require pre-defined segmentations or regions. Yet it yields robust registration results. We compare the robustness of our algorithm with that of the state of the art Elastix toolbox7. Validation is performed via 1464 pair-wise registrations in a database of very diverse 3D CT images. We show that our method decreases the "failure" rate of the global linear registration from 12.5% (Elastix) to only 1.9%.
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second
Statistical Inference for Partially Linear Regression Models with Measurement Errors
Jinhong YOU; Qinfeng XU; Bin ZHOU
2008-01-01
In this paper, the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors. Firstly,a bandwidth selection procedure is proposed, which is a combination of the difference-based technique and GCV method. Secondly, a goodness-of-fit test procedure is proposed,which is an extension of the generalized likelihood technique. Thirdly, a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares. Same as "Variable selection via nonconcave penalized like-lihood and its oracle properties" (J. Amer. Statist. Assoc., 96, 2001, 1348-1360), it is shown that the resulting estimator has an oracle property with a proper choice of regu-larization parameters and penalty function. Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.
Prediction by linear regression on a quantum computer
Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco
2016-08-01
We give an algorithm for prediction on a quantum computer which is based on a linear regression model with least-squares optimization. In contrast to related previous contributions suffering from the problem of reading out the optimal parameters of the fit, our scheme focuses on the machine-learning task of guessing the output corresponding to a new input given examples of data points. Furthermore, we adapt the algorithm to process nonsparse data matrices that can be represented by low-rank approximations, and significantly improve the dependency on its condition number. The prediction result can be accessed through a single-qubit measurement or used for further quantum information processing routines. The algorithm's runtime is logarithmic in the dimension of the input space provided the data is given as quantum information as an input to the routine.
Contiguous Uniform Deviation for Multiple Linear Regression in Pattern Recognition
Andriana, A. S.; Prihatmanto, D.; Hidaya, E. M. I.; Supriana, I.; Machbub, C.
2017-01-01
Understanding images by recognizing its objects is still a challenging task. Face elements detection has been developed by researchers but not yet shows enough information (low resolution in information) needed for recognizing objects. Available face recognition methods still have error in classification and need a huge amount of examples which may still be incomplete. Another approach which is still rare in understanding images uses pattern structures or syntactic grammars describing shape detail features. Image pixel values are also processed as signal patterns which are approximated by mathematical function curve fitting. This paper attempts to add contiguous uniform deviation method to curve fitting algorithm to increase applicability in image recognition system related to object movement. The combination of multiple linear regression and contiguous uniform deviation method are applied to the function of image pixel values, and show results in higher resolution (more information) of visual object detail description in object movement.
Robust linear regression with broad distributions of errors
Postnikov, Eugene B
2015-01-01
We consider the problem of linear fitting of noisy data in the case of broad (say $\\alpha$-stable) distributions of random impacts ("noise"), which can lack even the first moment. This situation, common in statistical physics of small systems, in Earth sciences, in network science or in econophysics, does not allow for application of conventional Gaussian maximum-likelihood estimators resulting in usual least-squares fits. Such fits lead to large deviations of fitted parameters from their true values due to the presence of outliers. The approaches discussed here aim onto the minimization of the width of the distribution of residua. The corresponding width of the distribution can either be defined via the interquantile distance of the corresponding distributions or via the scale parameter in its characteristic function. The methods provide the robust regression even in the case of short samples with large outliers, and are equivalent to the normal least squares fit for the Gaussian noises. Our discussion is il...
Wavelet-based LASSO in functional linear regression.
Zhao, Yihong; Ogden, R Todd; Reiss, Philip T
2012-07-01
In linear regression with functional predictors and scalar responses, it may be advantageous, particularly if the function is thought to contain features at many scales, to restrict the coefficient function to the span of a wavelet basis, thereby converting the problem into one of variable selection. If the coefficient function is sparsely represented in the wavelet domain, we may employ the well-known LASSO to select a relatively small number of nonzero wavelet coefficients. This is a natural approach to take but to date, the properties of such an estimator have not been studied. In this paper we describe the wavelet-based LASSO approach to regressing scalars on functions and investigate both its asymptotic convergence and its finite-sample performance through both simulation and real-data application. We compare the performance of this approach with existing methods and find that the wavelet-based LASSO performs relatively well, particularly when the true coefficient function is spiky. Source code to implement the method and data sets used in the study are provided as supplemental materials available online.
K factor estimation in distribution transformers using linear regression models
Juan Miguel Astorga Gómez
2016-06-01
Full Text Available Background: Due to massive incorporation of electronic equipment to distribution systems, distribution transformers are subject to operation conditions other than the design ones, because of the circulation of harmonic currents. It is necessary to quantify the effect produced by these harmonic currents to determine the capacity of the transformer to withstand these new operating conditions. The K-factor is an indicator that estimates the ability of a transformer to withstand the thermal effects caused by harmonic currents. This article presents a linear regression model to estimate the value of the K-factor, from total current harmonic content obtained with low-cost equipment.Method: Two distribution transformers that feed different loads are studied variables, current total harmonic distortion factor K are recorded, and the regression model that best fits the data field is determined. To select the regression model the coefficient of determination R2 and the Akaike Information Criterion (AIC are used. With the selected model, the K-factor is estimated to actual operating conditions.Results: Once determined the model it was found that for both agricultural cargo and industrial mining, present harmonic content (THDi exceeds the values that these transformers can drive (average of 12.54% and minimum 8,90% in the case of agriculture and average value of 18.53% and a minimum of 6.80%, for industrial mining case.Conclusions: When estimating the K factor using polynomial models it was determined that studied transformers can not withstand the current total harmonic distortion of their current loads. The appropriate K factor for studied transformer should be 4; this allows transformers support the current total harmonic distortion of their respective loads.
Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions.
Kumar, Gaurav; Bajaj, Rakesh Kumar
2014-01-01
In fuzzy set theory, it is well known that a triangular fuzzy number can be uniquely determined through its position and entropies. In the present communication, we extend this concept on triangular intuitionistic fuzzy number for its one-to-one correspondence with its position and entropies. Using the concept of fuzzy entropy the estimators of the intuitionistic fuzzy regression coefficients have been estimated in the unrestricted regression model. An intuitionistic fuzzy weighted linear regression (IFWLR) model with some restrictions in the form of prior information has been considered. Further, the estimators of regression coefficients have been obtained with the help of fuzzy entropy for the restricted/unrestricted IFWLR model by assigning some weights in the distance function.
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Jaber Almedeij
2012-01-01
Full Text Available Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.
Rafidah Ismail
2014-12-01
Full Text Available In this study, linear relationships between response and concentration were used to estimate the detection limit (DL and quantification limit (QL for five avermectins: emamectin, abamectin, doramectin, moxidectin, and ivermectin. Estimation of DL and QL was based on the standard deviation of residual and y-intercept of the regression line at low concentrations of avermectins, using the dispersive solid-phase extraction procedure. Avermectin extracts were analyzed using liquid chromatography tandem mass spectrometry. Based on the regression slope, DL and QL were higher at concentrations of 0.3–0.4 μg/kg and 1 μg/kg, respectively, for all avermectin compounds. Linearity assessment was performed by linear regression, which incorporated a regression model, outlier rejection, and evaluation of the assumption with a significant test. For all avermectins, there is a significant correlation between response and concentration in the range 1–15 μg/kg, and the y-intercept passes through origin (zero.
The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?
Jiangshan Lai
Full Text Available Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.
Morales, Esteban; de Leon, John Mark S; Abdollahi, Niloufar; Yu, Fei; Nouri-Mahdavi, Kouros; Caprioli, Joseph
2016-03-01
The study was conducted to evaluate threshold smoothing algorithms to enhance prediction of the rates of visual field (VF) worsening in glaucoma. We studied 798 patients with primary open-angle glaucoma and 6 or more years of follow-up who underwent 8 or more VF examinations. Thresholds at each VF location for the first 4 years or first half of the follow-up time (whichever was greater) were smoothed with clusters defined by the nearest neighbor (NN), Garway-Heath, Glaucoma Hemifield Test (GHT), and weighting by the correlation of rates at all other VF locations. Thresholds were regressed with a pointwise exponential regression (PER) model and a pointwise linear regression (PLR) model. Smaller root mean square error (RMSE) values of the differences between the observed and the predicted thresholds at last two follow-ups indicated better model predictions. The mean (SD) follow-up times for the smoothing and prediction phase were 5.3 (1.5) and 10.5 (3.9) years. The mean RMSE values for the PER and PLR models were unsmoothed data, 6.09 and 6.55; NN, 3.40 and 3.42; Garway-Heath, 3.47 and 3.48; GHT, 3.57 and 3.74; and correlation of rates, 3.59 and 3.64. Smoothed VF data predicted better than unsmoothed data. Nearest neighbor provided the best predictions; PER also predicted consistently more accurately than PLR. Smoothing algorithms should be used when forecasting VF results with PER or PLR. The application of smoothing algorithms on VF data can improve forecasting in VF points to assist in treatment decisions.
Widely Linear Complex-Valued Kernel Methods for Regression
Boloix-Tortosa, Rafael; Murillo-Fuentes, Juan Jose; Santos, Irene; Perez-Cruz, Fernando
2017-10-01
Usually, complex-valued RKHS are presented as an straightforward application of the real-valued case. In this paper we prove that this procedure yields a limited solution for regression. We show that another kernel, here denoted as pseudo kernel, is needed to learn any function in complex-valued fields. Accordingly, we derive a novel RKHS to include it, the widely RKHS (WRKHS). When the pseudo-kernel cancels, WRKHS reduces to complex-valued RKHS of previous approaches. We address the kernel and pseudo-kernel design, paying attention to the kernel and the pseudo-kernel being complex-valued. In the experiments included we report remarkable improvements in simple scenarios where real a imaginary parts have different similitude relations for given inputs or cases where real and imaginary parts are correlated. In the context of these novel results we revisit the problem of non-linear channel equalization, to show that the WRKHS helps to design more efficient solutions.
Printed Arabic Text Recognition using Linear and Nonlinear Regression
Ashraf A. Shahin
2017-01-01
Full Text Available Arabic language is one of the most popular languages in the world. Hundreds of millions of people in many countries around the world speak Arabic as their native speaking. However, due to complexity of Arabic language, recognition of printed and handwritten Arabic text remained untouched for a very long time compared with English and Chinese. Although, in the last few years, significant number of researches has been done in recognizing printed and handwritten Arabic text, it stills an open research field due to cursive nature of Arabic script. This paper proposes automatic printed Arabic text recognition technique based on linear and ellipse regression techniques. After collecting all possible forms of each character, unique code is generated to represent each character form. Each code contains a sequence of lines and ellipses. To recognize fonts, a unique list of codes is identified to be used as a fingerprint of font. The proposed technique has been evaluated using over 14000 different Arabic words with different fonts and experimental results show that average recognition rate of the proposed technique is 86%.
Forecasting Gold Prices Using Multiple Linear Regression Method
Z. Ismail
2009-01-01
Full Text Available Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR model. MLR is a study on the relationship between a single dependent variable and one or more independent variables, as this case with gold price as the single dependent variable. The fitted model of MLR will be used to predict the future gold prices. A naive model known as forecast-1 was considered to be a benchmark model in order to evaluate the performance of the model. Results: Many factors determine the price of gold and based on a hunch of experts, several economic factors had been identified to have influence on the gold prices. Variables such as Commodity Research Bureau future index (CRB; USD/Euro Foreign Exchange Rate (EUROUSD; Inflation rate (INF; Money Supply (M1; New York Stock Exchange (NYSE; Standard and Poor 500 (SPX; Treasury Bill (T-BILL and US Dollar index (USDX were considered to
Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis
Nielsen, Allan Aasbjerg
2007-01-01
This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying...... and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain...
Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf
We consider the semiparametric generalised linear regression model which has mainstream empirical models such as the (partially) linear mean regression, logistic and multinomial regression as special cases. As an extension to related literature we allow a misclassified covariate to be interacted...
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Interpreting Multiple Linear Regression: A Guidebook of Variable Importance
Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim
2012-01-01
Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…
Linearity and Misspecification Tests for Vector Smooth Transition Regression Models
Teräsvirta, Timo; Yang, Yukai
The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...
PARAMETER ESTIMATION IN LINEAR REGRESSION MODELS FOR LONGITUDINAL CONTAMINATED DATA
QianWeimin; LiYumei
2005-01-01
The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence. Under the suitable conditions it is proved that the estimators which are established in the paper are strongly consistent estimators.
The Static Stiffness Linear Regression of Parallel Mechanism Based on the Orthogonal Experiment
Wang-Nan; Zhao-Cheng Kang; Gao-Peng; Pang-Bo; Zhou-Shasha
2013-01-01
Using the orthogonal experimental method, we can get the linear regression model of about parallel mechanism stiffness. Selecting four factors three levels of orthogonal experiment method, in ANSYS-workbench to space in third rotation 3-SPS/S parallel mechanism for static stiffness analysis, we have won nine of the data of the experiments, the application of the MATLAB software to experimental data is linear regression, which can get the static stiffness linear regression of parallel mechanis...
An introduction to using Bayesian linear regression with clinical data.
Baldwin, Scott A; Larson, Michael J
2017-11-01
Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.
Central limit theorem of linear regression model under right censorship
HE; Shuyuan(何书元); HUANG; Xiang(Heung; Wong)(黄香)
2003-01-01
In this paper, the estimation of joint distribution F(y,z) of (Y, Z) and the estimation in thelinear regression model Y = b′Z + ε for complete data are extended to that of the right censored data. Theregression parameter estimates of b and the variance of ε are weighted least square estimates with randomweights. The central limit theorems of the estimators are obtained under very weak conditions and the derivedasymptotic variance has a very simple form.
CONSISTENCY OF LS ESTIMATOR IN SIMPLE LINEAR EV REGRESSION MODELS
Liu Jixue; Chen Xiru
2005-01-01
Consistency of LS estimate of simple linear EV model is studied. It is shown that under some common assumptions of the model, both weak and strong consistency of the estimate are equivalent but it is not so for quadratic-mean consistency.
Hecht, Jeffrey B.
The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…
Consistent group selection in high-dimensional linear regression
Wei, Fengrong; 10.3150/10-BEJ252
2010-01-01
In regression problems where covariates can be naturally grouped, the group Lasso is an attractive method for variable selection since it respects the grouping structure in the data. We study the selection and estimation properties of the group Lasso in high-dimensional settings when the number of groups exceeds the sample size. We provide sufficient conditions under which the group Lasso selects a model whose dimension is comparable with the underlying model with high probability and is estimation consistent. However, the group Lasso is, in general, not selection consistent and also tends to select groups that are not important in the model. To improve the selection results, we propose an adaptive group Lasso method which is a generalization of the adaptive Lasso and requires an initial estimator. We show that the adaptive group Lasso is consistent in group selection under certain conditions if the group Lasso is used as the initial estimator.
Two biased estimation techniques in linear regression: Application to aircraft
Klein, Vladislav
1988-01-01
Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.
Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa
2015-11-01
A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.
Remodeling and Estimation for Sparse Partially Linear Regression Models
Yunhui Zeng
2013-01-01
Full Text Available When the dimension of covariates in the regression model is high, one usually uses a submodel as a working model that contains significant variables. But it may be highly biased and the resulting estimator of the parameter of interest may be very poor when the coefficients of removed variables are not exactly zero. In this paper, based on the selected submodel, we introduce a two-stage remodeling method to get the consistent estimator for the parameter of interest. More precisely, in the first stage, by a multistep adjustment, we reconstruct an unbiased model based on the correlation information between the covariates; in the second stage, we further reduce the adjusted model by a semiparametric variable selection method and get a new estimator of the parameter of interest simultaneously. Its convergence rate and asymptotic normality are also obtained. The simulation results further illustrate that the new estimator outperforms those obtained by the submodel and the full model in the sense of mean square errors of point estimation and mean square prediction errors of model prediction.
A simplified procedure of linear regression in a preliminary analysis
Silvia Facchinetti
2013-05-01
Full Text Available The analysis of a statistical large data-set can be led by the study of a particularly interesting variable Y – regressed – and an explicative variable X, chosen among the remained variables, conjointly observed. The study gives a simplified procedure to obtain the functional link of the variables y=y(x by a partition of the data-set into m subsets, in which the observations are synthesized by location indices (mean or median of X and Y. Polynomial models for y(x of order r are considered to verify the characteristics of the given procedure, in particular we assume r= 1 and 2. The distributions of the parameter estimators are obtained by simulation, when the fitting is done for m= r + 1. Comparisons of the results, in terms of distribution and efficiency, are made with the results obtained by the ordinary least square methods. The study also gives some considerations on the consistency of the estimated parameters obtained by the given procedure.
Identifying predictors of physics item difficulty: A linear regression approach
Vanes Mesic
2011-06-01
Full Text Available Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal
Identifying predictors of physics item difficulty: A linear regression approach
Mesic, Vanes; Muratovic, Hasnija
2011-06-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
van Gaans, P. F. M.; Vriend, S. P.
Application of ridge regression in geoscience usually is a more appropriate technique than ordinary least-squares regression, especially in the situation of highly intercorrelated predictor variables. A FORTRAN 77 program RIDGE for ridged multiple linear regression is presented. The theory of linear regression and ridge regression is treated, to allow for a careful interpretation of the results and to understand the structure of the program. The program gives various parameters to evaluate the extent of multicollinearity within a given regression problem, such as the correlation matrix, multiple correlations among the predictors, variance inflation factors, eigenvalues, condition number, and the determinant of the predictors correlation matrix. The best method for the optimum choice of the ridge parameter with ridge regression has not been established yet. Estimates of the ridge bias, ridged variance inflation factors, estimates, and norms for the ridge parameter therefore are given as output by RIDGE and should complement inspection of the ridge traces. Application within the earth sciences is discussed.
Parappagoudar, Mahesh B.; Pratihar, Dilip K.; Datta, Gouranga L.
2008-08-01
A cement-bonded moulding sand system takes a fairly long time to attain the required strength. Hence, the moulds prepared with cement as a bonding material will have to wait a long time for the metal to be poured. In this work, an accelerator was used to accelerate the process of developing the bonding strength. Regression analysis was carried out on the experimental data collected as per statistical design of experiments (DOE) to establish input-output relationships of the process. The experiments were conducted to measure compression strength and hardness (output parameters) by varying the input variables, namely amount of cement, amount of accelerator, water in the form of cement-to-water ratio, and testing time. A two-level full-factorial design was used for linear regression model, whereas a three-level central composite design (CCD) had been utilized to develop non-linear regression model. Surface plots and main effects plots were used to study the effects of amount of cement, amount of accelerator, water and testing time on compression strength, and mould hardness. It was observed from both the linear as well as non-linear models that amount of cement, accelerator, and testing time have some positive contributions, whereas cement-to-water ratio has negative contribution to both the above responses. Compression strength was found to have linear relationship with the amount of cement and accelerator, and non-linear relationship with the remaining process parameters. Mould hardness was seen to vary linearly with testing time and non-linearly with the other parameters. Analysis of variance (ANOVA) was performed to test statistical adequacy of the models. Twenty random test cases were considered to test and compare their performances. Non-linear regression models were found to perform better than the linear models for both the responses. An attempt was also made to express compression strength of the moulding sand system as a function of mould hardness.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Simultaneous Determination of Cobalt, Copper, and Nickel by Multivariate Linear Regression.
Dado, Greg; Rosenthal, Jeffrey
1990-01-01
Presented is an experiment where the concentrations of three metal ions in a solution are simultaneously determined by ultraviolet-vis spectroscopy. Availability of the computer program used for statistically analyzing data using a multivariate linear regression is listed. (KR)
Enders, Felicity
2013-12-01
Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students' understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health. A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field. Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P REGRESS quiz was internally reliable (Cronbach's alpha 0.89). The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions. © 2013 Wiley Periodicals, Inc.
Multivariable Linear Regression Model for Promotional Forecasting:The Coca Cola - Morrisons Case
Zheng, Yiwei/Y
2009-01-01
This paper describes a promotional forecasting model, built by linear regression module in Microsoft Excel. It intends to provide quick and reliable forecasts with a moderate credit and to assist the CPFR between the Coca Cola Enterprises (CCE) and the Morrisons. The model is derived from previous researches and literature review on CPFR, promotion, forecasting and modelling. It is designed as a multivariable linear regression model, which involves several promotional mix as variables includi...
Linear Regression in High Dimension and/or for Correlated Inputs
Jacques, J.; Fraix-Burnet, D.
2014-12-01
Ordinary least square is the common way to estimate linear regression models. When inputs are correlated or when they are too numerous, regression methods using derived inputs directions or shrinkage methods can be efficient alternatives. Methods using derived inputs directions build new uncorrelated variables as linear combination of the initial inputs, whereas shrinkage methods introduce regularization and variable selection by penalizing the usual least square criterion. Both kinds of methods are presented and illustrated thanks to the R software on an astronomical dataset.
How to use linear regression and correlation in quantitative method comparison studies.
Twomey, P J; Kroll, M H
2008-04-01
Linear regression methods try to determine the best linear relationship between data points while correlation coefficients assess the association (as opposed to agreement) between the two methods. Linear regression and correlation play an important part in the interpretation of quantitative method comparison studies. Their major strength is that they are widely known and as a result both are employed in the vast majority of method comparison studies. While previously performed by hand, the availability of statistical packages means that regression analysis is usually performed by software packages including MS Excel, with or without the software programe Analyze-it as well as by other software packages. Such techniques need to be employed in a way that compares the agreement between the two methods examined and more importantly, because we are dealing with individual patients, whether the degree of agreement is clinically acceptable. Despite their use for many years, there is a lot of ignorance about the validity as well as the pros and cons of linear regression and correlation techniques. This review article describes the types of linear regression and regression (parametric and non-parametric methods) and the necessary general and specific requirements. The selection of the type of regression depends on where one has been trained, the tradition of the laboratory and the availability of adequate software.
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
The number of subjects per variable required in linear regression analyses
P.C. Austin (Peter); E.W. Steyerberg (Ewout)
2015-01-01
textabstractObjectives To determine the number of independent variables that can be included in a linear regression model. Study Design and Setting We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression c
Tightness of M-estimators for multiple linear regression in time series
Johansen, Søren; Nielsen, Bent
We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf
We consider the semiparametric generalised linear regression model which has mainstream empirical models such as the (partially) linear mean regression, logistic and multinomial regression as special cases. As an extension to related literature we allow a misclassified covariate to be interacted...... with a nonparametric function of a continuous covariate. This model is tailormade to address known data quality issues of administrative labour market data. Using a sample of 20m observations from Germany we estimate the determinants of labour market transitions and illustrate the role of considerable...
A Stochastic Restricted Principal Components Regression Estimator in the Linear Model
Daojiang He
2014-01-01
Full Text Available We propose a new estimator to combat the multicollinearity in the linear model when there are stochastic linear restrictions on the regression coefficients. The new estimator is constructed by combining the ordinary mixed estimator (OME and the principal components regression (PCR estimator, which is called the stochastic restricted principal components (SRPC regression estimator. Necessary and sufficient conditions for the superiority of the SRPC estimator over the OME and the PCR estimator are derived in the sense of the mean squared error matrix criterion. Finally, we give a numerical example and a Monte Carlo study to illustrate the performance of the proposed estimator.
Song, Chao; Kwan, Mei-Po; Zhu, Jiping
2017-04-08
An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Tao Hu; Heng-jian Cui; Xing-wei Tong
2009-01-01
This article considers a semiparametric varying-coefficient partially linear regression model with current status data. The semiparametric varying-coefficient partially linear regression model which is a gen-eralization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable. A Sieve maximum likelihood estimation method is proposed and the asymptotic properties of the proposed estimators are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. The convergence rate of the estima-tor for the unknown smooth function is obtained and the estimator for the unknown parameter is shown to be asymptotically efficient and normally distributed. Simulation studies are conducted to examine the small-sample properties of the proposed estimates and a real dataset is used to illustrate our approach.
Prediction on adsorption ratio of carbon dioxide to methane on coals with multiple linear regression
YU Hong-guan; MENG Xian-ming; FAN Wei-tang; YE Jian-ping
2007-01-01
The multiple linear regression equations for adsorption ratio of CO2/CH4 and its coal quality indexes were built with SPSS software on basis of existing coal quality data and its adsorption amount of CO2 and CH4.The regression equations built were tested with data collected from some S,and the influences of coal quality indexes on adsorption ratio of CO2/CH4 were studied with investigation of regression equations.The study results show that the regression equation for adsorption ratio of CO2/CH4 and volatile matter,ash and moisture in coal can be Obtained with multiple linear regression analysis,that the influence of same coal quality index with the degree of metamorphosis or influence of coal quality indexes for same coal rank on adsorption ratio is not consistent.
An improved multiple linear regression and data analysis computer program package
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Meaney, Christopher; Moineddin, Rahim
2014-01-24
In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the
Karadag, Dogan; Koc, Yunus; Turan, Mustafa; Ozturk, Mustafa
2007-06-01
Ammonium ion exchange from aqueous solution using clinoptilolite zeolite was investigated at laboratory scale. Batch experimental studies were conducted to evaluate the effect of various parameters such as pH, zeolite dosage, contact time, initial ammonium concentration and temperature. Freundlich and Langmuir isotherm models and pseudo-second-order model were fitted to experimental data. Linear and non-linear regression methods were compared to determine the best fitting of isotherm and kinetic model to experimental data. The rate limiting mechanism of ammonium uptake by zeolite was determined as chemical exchange. Non-linear regression has better performance for analyzing experimental data and Freundlich model was better than Langmuir to represent equilibrium data.
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
Yadav, Manish; Singh, Nitin Kumar
2017-08-01
A comparison of the linear and non-linear regression method in selecting the optimum isotherm among three most commonly used adsorption isotherms (Langmuir, Freundlich, and Redlich-Peterson) was made to the experimental data of fluoride (F) sorption onto Bio-F at a solution temperature of 30 ± 1 °C. The coefficient of correlation (r2 ) was used to select the best theoretical isotherm among the investigated ones. A total of four Langmuir linear equations were discussed and out of which linear form of most popular Langmuir-1 and Langmuir-2 showed the higher coefficient of determination (0.976 and 0.989) as compared to other Langmuir linear equations. Freundlich and Redlich-Peterson isotherms showed a better fit to the experimental data in linear least-square method, while in non-linear method Redlich-Peterson isotherm equations showed the best fit to the tested data set. The present study showed that the non-linear method could be a better way to obtain the isotherm parameters and represent the most suitable isotherm. Redlich-Peterson isotherm was found to be the best representative (r2 = 0.999) for this sorption system. It is also observed that the values of β are not close to unity, which means the isotherms are approaching the Freundlich but not the Langmuir isotherm.
Makoto Suzuki
Full Text Available Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2 = 0.676, P<0.0001; linear regression modeling, R(2 = 0.598, P<0.0001. Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.
On asymptotics of t-type regression estimation in multiple linear model
无
2004-01-01
We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather
The Relationship between Economic Growth and Money Laundering – a Linear Regression Model
Daniel Rece
2009-09-01
Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.
Efficient Quantile Estimation for Functional-Coefficient Partially Linear Regression Models
Zhangong ZHOU; Rong JIANG; Weimin QIAN
2011-01-01
The quantile estimation methods are proposed for functional-coefficient partially linear regression (FCPLR) model by combining nonparametric and functional-coefficient regression (FCR) model.The local linear scheme and the integrated method are used to obtain local quantile estimators of all unknown functions in the FCPLR model.These resulting estimators are asymptotically normal,but each of them has big variance.To reduce variances of these quantile estimators,the one-step backfitting technique is used to obtain the efficient quantile estimators of all unknown functions,and their asymptotic normalities are derived.Two simulated examples are carried out to illustrate the proposed estimation methodology.
Linear regression model selection using p-values when the model dimension grows
Pokarowski, Piotr; Teisseyre, Paweł
2012-01-01
We consider a new criterion-based approach to model selection in linear regression. Properties of selection criteria based on p-values of a likelihood ratio statistic are studied for families of linear regression models. We prove that such procedures are consistent i.e. the minimal true model is chosen with probability tending to 1 even when the number of models under consideration slowly increases with a sample size. The simulation study indicates that introduced methods perform promisingly when compared with Akaike and Bayesian Information Criteria.
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Asymptotic Normality of LS Estimate in Simple Linear EV Regression Model
Jixue LIU
2006-01-01
Though EV model is theoretically more appropriate for applications in which measurement errors exist, people are still more inclined to use the ordinary regression models and the traditional LS method owing to the difficulties of statistical inference and computation. So it is meaningful to study the performance of LS estimate in EV model.In this article we obtain general conditions guaranteeing the asymptotic normality of the estimates of regression coefficients in the linear EV model. It is noticeable that the result is in some way different from the corresponding result in the ordinary regression model.
Using the classical linear regression model in analysis of the dependences of conveyor belt life
Miriam Andrejiová
2013-12-01
Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.
The number of subjects per variable required in linear regression analyses.
Austin, Peter C; Steyerberg, Ewout W
2015-06-01
To determine the number of independent variables that can be included in a linear regression model. We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R(2) of the fitted model. A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R(2), although adjusted R(2) estimates behaved well. The bias in estimating the model R(2) statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi
2013-01-01
Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, Plinear regression modeling, R(2) = 0.598, P<0.0001). Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.
Kiani, Alishir; Chwalibog, André; Nielsen, Mette O
2007-01-01
study metabolizable energy (ME) intake ranges for twin-bearing ewes were 220-440, 350- 700, 350-900 kJ per metabolic body weight (W0.75) at week seven, five, two pre-partum respectively. Indirect calorimetry and a linear regression approach were used to quantify EE(gest) and then partition to EE...
Lybol, C.; Sweep, F.C.; Ottevanger, P.B.; Massuger, L.F.A.G.; Thomas, C.M.G.
2013-01-01
OBJECTIVE: Currently, human chorionic gonadotropin (hCG) follow-up after evacuation of hydatidiform moles is essential to identify patients requiring chemotherapeutic treatment for gestational trophoblastic neoplasia (GTN). We propose a model based on linear regression of postevacuation serum hCG co
Calibrated Peer Review for Interpreting Linear Regression Parameters: Results from a Graduate Course
Enders, Felicity B.; Jenkins, Sarah; Hoverman, Verna
2010-01-01
Biostatistics is traditionally a difficult subject for students to learn. While the mathematical aspects are challenging, it can also be demanding for students to learn the exact language to use to correctly interpret statistical results. In particular, correctly interpreting the parameters from linear regression is both a vital tool and a…
Dufrenois, F; Noyer, J C
2013-02-01
Linear discriminant analysis, such as Fisher's criterion, is a statistical learning tool traditionally devoted to separating a training dataset into two or even several classes by the way of linear decision boundaries. In this paper, we show that this tool can formalize the robust linear regression problem as a robust estimator will do. More precisely, we develop a one-class Fischer's criterion in which the maximization provides both the regression parameters and the separation of the data in two classes: typical data and atypical data or outliers. This new criterion is built on the statistical properties of the subspace decomposition of the hat matrix. From this angle, we improve the discriminative properties of the hat matrix which is traditionally used as outlier diagnostic measure in linear regression. Naturally, we call this new approach discriminative hat matrix. The proposed algorithm is fully nonsupervised and needs only the initialization of one parameter. Synthetic and real datasets are used to study the performance both in terms of regression and classification of the proposed approach. We also illustrate its potential application to image recognition and fundamental matrix estimation in computer vision.
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Nelson, Dean
2009-01-01
Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…
Point Estimates and Confidence Intervals for Variable Importance in Multiple Linear Regression
Thomas, D. Roland; Zhu, PengCheng; Decady, Yves J.
2007-01-01
The topic of variable importance in linear regression is reviewed, and a measure first justified theoretically by Pratt (1987) is examined in detail. Asymptotic variance estimates are used to construct individual and simultaneous confidence intervals for these importance measures. A simulation study of their coverage properties is reported, and an…
Christensen, Bent Jesper; Kruse, Robinson; Sibbertsen, Philipp
We consider hypothesis testing in a general linear time series regression framework when the possibly fractional order of integration of the error term is unknown. We show that the approach suggested by Vogelsang (1998a) for the case of integer integration does not apply to the case of fractional...
Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf
2017-01-01
observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate...
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Giuliano de Oliveira Freitas
2013-10-01
Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Wheeler, David C.; Calder, Catherine A.
2007-06-01
The realization in the statistical and geographical sciences that a relationship between an explanatory variable and a response variable in a linear regression model is not always constant across a study area has led to the development of regression models that allow for spatially varying coefficients. Two competing models of this type are geographically weighted regression (GWR) and Bayesian regression models with spatially varying coefficient processes (SVCP). In the application of these spatially varying coefficient models, marginal inference on the regression coefficient spatial processes is typically of primary interest. In light of this fact, there is a need to assess the validity of such marginal inferences, since these inferences may be misleading in the presence of explanatory variable collinearity. In this paper, we present the results of a simulation study designed to evaluate the sensitivity of the spatially varying coefficients in the competing models to various levels of collinearity. The simulation study results show that the Bayesian regression model produces more accurate inferences on the regression coefficients than does GWR. In addition, the Bayesian regression model is overall fairly robust in terms of marginal coefficient inference to moderate levels of collinearity, and degrades less substantially than GWR with strong collinearity.
Linear Regression on Sparse Features for Single-Channel Speech Separation
Schmidt, Mikkel N.; Olsson, Rasmus Kongsgaard
2007-01-01
In this work we address the problem of separating multiple speakers from a single microphone recording. We formulate a linear regression model for estimating each speaker based on features derived from the mixture. The employed feature representation is a sparse, non-negative encoding of the speech...... compared to linear regression on spectral features and compared to separation based directly on the non-negative sparse features....... mixture in terms of pre-learned speaker-dependent dictionaries. Previous work has shown that this feature representation by itself provides some degree of separation. We show that the performance is significantly improved when regression analysis is performed on the sparse, non-negative features, both...
truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models
Maria Karlsson
2014-05-01
Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.
Drzewiecki, Wojciech
2016-12-01
In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data.
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Hoffman, Haydn; Lee, Sunghoon I; Garst, Jordan H; Lu, Derek S; Li, Charles H; Nagasawa, Daniel T; Ghalehsari, Nima; Jahanforouz, Nima; Razaghy, Mehrdad; Espinal, Marie; Ghavamrezaii, Amir; Paak, Brian H; Wu, Irene; Sarrafzadeh, Majid; Lu, Daniel C
2015-09-01
This study introduces the use of multivariate linear regression (MLR) and support vector regression (SVR) models to predict postoperative outcomes in a cohort of patients who underwent surgery for cervical spondylotic myelopathy (CSM). Currently, predicting outcomes after surgery for CSM remains a challenge. We recruited patients who had a diagnosis of CSM and required decompressive surgery with or without fusion. Fine motor function was tested preoperatively and postoperatively with a handgrip-based tracking device that has been previously validated, yielding mean absolute accuracy (MAA) results for two tracking tasks (sinusoidal and step). All patients completed Oswestry disability index (ODI) and modified Japanese Orthopaedic Association questionnaires preoperatively and postoperatively. Preoperative data was utilized in MLR and SVR models to predict postoperative ODI. Predictions were compared to the actual ODI scores with the coefficient of determination (R(2)) and mean absolute difference (MAD). From this, 20 patients met the inclusion criteria and completed follow-up at least 3 months after surgery. With the MLR model, a combination of the preoperative ODI score, preoperative MAA (step function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.452; MAD=0.0887; p=1.17 × 10(-3)). With the SVR model, a combination of preoperative ODI score, preoperative MAA (sinusoidal function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.932; MAD=0.0283; p=5.73 × 10(-12)). The SVR model was more accurate than the MLR model. The SVR can be used preoperatively in risk/benefit analysis and the decision to operate.
ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.
Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won
2016-07-01
In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.
Sieve M-estimation for semiparametric varying-coefficient partially linear regression model
无
2010-01-01
This article considers a semiparametric varying-coefficient partially linear regression model.The semiparametric varying-coefficient partially linear regression model which is a generalization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable.A sieve M-estimation method is proposed and the asymptotic properties of the proposed estimators are discussed.Our main object is to estimate the nonparametric component and the unknown parameters simultaneously.It is easier to compute and the required computation burden is much less than the existing two-stage estimation method.Furthermore,the sieve M-estimation is robust in the presence of outliers if we choose appropriate ρ(·).Under some mild conditions,the estimators are shown to be strongly consistent;the convergence rate of the estimator for the unknown nonparametric component is obtained and the estimator for the unknown parameter is shown to be asymptotically normally distributed.Numerical experiments are carried out to investigate the performance of the proposed method.
S. Goyal
2012-03-01
Full Text Available This paper highlights the significance of computational intelligence models for predicting shelf life of processed cheese stored at 7-8 g.C. Linear Layer and Generalized Regression models were developed with input parameters: Soluble nitrogen, pH, Standard plate count, Yeast & mould count, Spores, and sensory score as output parameter. Mean Square Error, Root Mean Square Error, Coefficient of Determination and Nash - Sutcliffo Coefficient were used in order to compare the prediction ability of the models. The study revealed that Generalized Regression computational intelligence models are quite effective in predicting the shelf life of processed cheese stored at 7-8 g.C.
Ge-mai Chen; Jin-hong You
2005-01-01
Consider a repeated measurement partially linear regression model with an unknown vector pasemiparametric generalized least squares estimator (SGLSE) ofβ, we propose an iterative weighted semiparametric least squares estimator (IWSLSE) and show that it improves upon the SGLSE in terms of asymptotic covariance matrix. An adaptive procedure is given to determine the number of iterations. We also show that when the number of replicates is less than or equal to two, the IWSLSE can not improve upon the SGLSE.These results are generalizations of those in [2] to the case of semiparametric regressions.
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.
Kawashima, Issaku; Kumano, Hiroaki
2017-01-01
Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Allore, Heather; Tinetti, Mary E; Araujo, Katy L B; Hardy, Susan; Peduzzi, Peter
2005-02-01
Many important physiologic and clinical predictors are continuous. Clinical investigators and epidemiologists' interest in these predictors lies, in part, in the risk they pose for adverse outcomes, which may be continuous as well. The relationship between continuous predictors and a continuous outcome may be complex and difficult to interpret. Therefore, methods to detect levels of a predictor variable that predict the outcome and determine the threshold for clinical intervention would provide a beneficial tool for clinical investigators and epidemiologists. We present a case study using regression tree methodology to predict Social and Productive Activities score at 3 years using five modifiable impairments. The predictive ability of regression tree methodology was compared with multiple linear regression using two independent data sets, one for development and one for validation. The regression tree approach and the multiple linear regression model provided similar fit (model deviances) on the development cohort. In the validation cohort, the deviance of the multiple linear regression model was 31% greater than the regression tree approach. Regression tree analysis developed a better model of impairments predicting Social and Productive Activities score that may be more easily applied in research settings than multiple linear regression alone.
Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
SERF: A Simple, Effective, Robust, and Fast Image Super-Resolver From Cascaded Linear Regression.
Hu, Yanting; Wang, Nannan; Tao, Dacheng; Gao, Xinbo; Li, Xuelong
2016-09-01
Example learning-based image super-resolution techniques estimate a high-resolution image from a low-resolution input image by relying on high- and low-resolution image pairs. An important issue for these techniques is how to model the relationship between high- and low-resolution image patches: most existing complex models either generalize hard to diverse natural images or require a lot of time for model training, while simple models have limited representation capability. In this paper, we propose a simple, effective, robust, and fast (SERF) image super-resolver for image super-resolution. The proposed super-resolver is based on a series of linear least squares functions, namely, cascaded linear regression. It has few parameters to control the model and is thus able to robustly adapt to different image data sets and experimental settings. The linear least square functions lead to closed form solutions and therefore achieve computationally efficient implementations. To effectively decrease these gaps, we group image patches into clusters via k-means algorithm and learn a linear regressor for each cluster at each iteration. The cascaded learning process gradually decreases the gap of high-frequency detail between the estimated high-resolution image patch and the ground truth image patch and simultaneously obtains the linear regression parameters. Experimental results show that the proposed method achieves superior performance with lower time consumption than the state-of-the-art methods.
On the Relationship Between Confidence Sets and Exchangeable Weights in Multiple Linear Regression.
Pek, Jolynn; Chalmers, R Philip; Monette, Georges
2016-01-01
When statistical models are employed to provide a parsimonious description of empirical relationships, the extent to which strong conclusions can be drawn rests on quantifying the uncertainty in parameter estimates. In multiple linear regression (MLR), regression weights carry two kinds of uncertainty represented by confidence sets (CSs) and exchangeable weights (EWs). Confidence sets quantify uncertainty in estimation whereas the set of EWs quantify uncertainty in the substantive interpretation of regression weights. As CSs and EWs share certain commonalities, we clarify the relationship between these two kinds of uncertainty about regression weights. We introduce a general framework describing how CSs and the set of EWs for regression weights are estimated from the likelihood-based and Wald-type approach, and establish the analytical relationship between CSs and sets of EWs. With empirical examples on posttraumatic growth of caregivers (Cadell et al., 2014; Schneider, Steele, Cadell & Hemsworth, 2011) and on graduate grade point average (Kuncel, Hezlett & Ones, 2001), we illustrate the usefulness of CSs and EWs for drawing strong scientific conclusions. We discuss the importance of considering both CSs and EWs as part of the scientific process, and provide an Online Appendix with R code for estimating Wald-type CSs and EWs for k regression weights.
Agha, Salah R; Alnahhal, Mohammed J
2012-11-01
The current study investigates the possibility of obtaining the anthropometric dimensions, critical to school furniture design, without measuring all of them. The study first selects some anthropometric dimensions that are easy to measure. Two methods are then used to check if these easy-to-measure dimensions can predict the dimensions critical to the furniture design. These methods are multiple linear regression and neural networks. Each dimension that is deemed necessary to ergonomically design school furniture is expressed as a function of some other measured anthropometric dimensions. Results show that out of the five dimensions needed for chair design, four can be related to other dimensions that can be measured while children are standing. Therefore, the method suggested here would definitely save time and effort and avoid the difficulty of dealing with students while measuring these dimensions. In general, it was found that neural networks perform better than multiple linear regression in the current study.
Lunt, Mark
2015-07-01
In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading.
Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G
2015-12-01
The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer's disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM.
The applicability of linear regression models in working environments' thermal evaluation.
Pablo Adamoglu de Oliveira
2006-04-01
Full Text Available The simultaneous analysis of thermal variables with normal distribution with the aim of checking if there is any significative correlation among them or if there is the possibility of making predictions of the values of some of them based on others’ values is considered a problem of great importance in statistics studies. The aim of this paper is to study the applicability of linear regression models in working environments’ thermal comfort studies, thus contributing for the comprehension of the possible environmental cooling, heating or winding needs. It starts with a bibliographical research, followed by a field research, data collection and and software statistical-mathematical data treatment. It was then performed data analysis and the construction of the regression linear models using the t and F tests for determining the consistency of the models and their parameters, as well as the building of conclusions based on the information obtained and on the significance of the mathematical models built.
COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS
K. Seetharaman
2015-08-01
Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.
USE OF THE SIMPLE LINEAR REGRESSION MODEL IN MACRO-ECONOMICAL ANALYSES
Constantin ANGHELACHE
2011-10-01
Full Text Available The article presents the fundamental aspects of the linear regression, as a toolbox which can be used in macroeconomic analyses. The article describes the estimation of the parameters, the statistical tests used, the homoscesasticity and heteroskedasticity. The use of econometrics instrument in macroeconomics is an important factor that guarantees the quality of the models, analyses, results and possible interpretation that can be drawn at this level.
Portfolio optimization using local linear regression ensembles in RapidMiner
Gabor Nagy; Gergo Barta; Tamas Henk
2015-01-01
In this paper we implement a Local Linear Regression Ensemble Committee (LOLREC) to predict 1-day-ahead returns of 453 assets form the S&P500. The estimates and the historical returns of the committees are used to compute the weights of the portfolio from the 453 stock. The proposed method outperforms benchmark portfolio selection strategies that optimize the growth rate of the capital. We investigate the effect of algorithm parameter m: the number of selected stocks on achieved average annua...
Multiple Linear Regression Application on the Inter-Network Settlement of Internet
YANG Qing-feng; ZHANG Qi-xiang; L(U) Ting-jie
2006-01-01
This paper develops an analytical framework to explain the Internet interconnection settlement issues. The paper shows that multiple linear regression can be used in assessing the network value of Internet Backbone Providers (IBPs).By using the exchange rate of each network, we can define a rate of network value, which reflects the contribution of each network to interconnection and the interconnected network resource usage by each of the network.
Asymptotic Properties in Semiparametric Partially Linear Regression Models for Functional Data
Tao ZHANG
2013-01-01
We consider the semiparametric partially linear regression models with mean function xTβ+g(z),where X and z are functional data.The new estimators of β and g(z) are presented and some asymptotic results are given.The strong convergence rates of the proposed estimators are obtained.In our estimation,the observation number of each subject will be completely flexible.Some simulation study is conducted to investigate the finite sample performance of the proposed estimators.
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA
Ersin Yılmaz
2016-05-01
Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for the estimation of the model. With the weights regression model will be consistent and unbiased with that. And also there is a method for the censored data that is a semi parametric regression and this method also give useful results for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.
A note on constrained M-estimation and its recursive analog in multivariate linear regression models
RAO; Calyampudi; R
2009-01-01
In this paper,the constrained M-estimation of the regression coeffcients and scatter parameters in a general multivariate linear regression model is considered.Since the constrained M-estimation is not easy to compute,an up-dating recursion procedure is proposed to simplify the com-putation of the estimators when a new observation is obtained.We show that,under mild conditions,the recursion estimates are strongly consistent.In addition,the asymptotic normality of the recursive constrained M-estimators of regression coeffcients is established.A Monte Carlo simulation study of the recursion estimates is also provided.Besides,robustness and asymptotic behavior of constrained M-estimators are briefly discussed.
Hassan, A K
2015-01-01
...°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required...
Radial basis function networks with linear interval regression weights for symbolic interval data.
Su, Shun-Feng; Chuang, Chen-Chia; Tao, C W; Jeng, Jin-Tsong; Hsiao, Chih-Ching
2012-02-01
This paper introduces a new structure of radial basis function networks (RBFNs) that can successfully model symbolic interval-valued data. In the proposed structure, to handle symbolic interval data, the Gaussian functions required in the RBFNs are modified to consider interval distance measure, and the synaptic weights of the RBFNs are replaced by linear interval regression weights. In the linear interval regression weights, the lower and upper bounds of the interval-valued data as well as the center and range of the interval-valued data are considered. In addition, in the proposed approach, two stages of learning mechanisms are proposed. In stage 1, an initial structure (i.e., the number of hidden nodes and the adjustable parameters of radial basis functions) of the proposed structure is obtained by the interval competitive agglomeration clustering algorithm. In stage 2, a gradient-descent kind of learning algorithm is applied to fine-tune the parameters of the radial basis function and the coefficients of the linear interval regression weights. Various experiments are conducted, and the average behavior of the root mean square error and the square of the correlation coefficient in the framework of a Monte Carlo experiment are considered as the performance index. The results clearly show the effectiveness of the proposed structure.
Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.
Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo
2015-08-01
Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform.
Inferring cellular regulatory networks with Bayesian model averaging for linear regression (BMALR).
Huang, Xun; Zi, Zhike
2014-08-01
Bayesian network and linear regression methods have been widely applied to reconstruct cellular regulatory networks. In this work, we propose a Bayesian model averaging for linear regression (BMALR) method to infer molecular interactions in biological systems. This method uses a new closed form solution to compute the posterior probabilities of the edges from regulators to the target gene within a hybrid framework of Bayesian model averaging and linear regression methods. We have assessed the performance of BMALR by benchmarking on both in silico DREAM datasets and real experimental datasets. The results show that BMALR achieves both high prediction accuracy and high computational efficiency across different benchmarks. A pre-processing of the datasets with the log transformation can further improve the performance of BMALR, leading to a new top overall performance. In addition, BMALR can achieve robust high performance in community predictions when it is combined with other competing methods. The proposed method BMALR is competitive compared to the existing network inference methods. Therefore, BMALR will be useful to infer regulatory interactions in biological networks. A free open source software tool for the BMALR algorithm is available at https://sites.google.com/site/bmalr4netinfer/.
Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J
2015-08-01
Regression-based prosthesis control using surface electromyography (EMG) has demonstrated real-time simultaneous control of multiple degrees of freedom (DOFs) in transradial amputees. However, these systems have been limited to control of wrist DOFs. Use of intramuscular EMG has shown promise for both wrist and hand control in able-bodied subjects, but to date has not been evaluated in amputee subjects. The objective of this study was to evaluate two regression-based simultaneous control methods using intramuscular EMG in transradial amputees and compare their performance to able-bodied subjects. Two transradial amputees and sixteen able-bodied subjects used fine wire EMG recorded from six forearm muscles to control three wrist/hand DOFs: wrist rotation, wrist flexion/extension, and hand open/close. Both linear regression and probability-weighted regression systems were evaluated in a virtual Fitts' Law test. Though both amputee subjects initially produced worse performance metrics than the able-bodied subjects, the amputee subject who completed multiple experimental blocks of the Fitts' law task demonstrated substantial learning. This subject's performance was within the range of able-bodied subjects by the end of the experiment. Both amputee subjects also showed improved performance when using probability-weighted regression for targets requiring use of only one DOF, and mirrored statistically significant differences observed with able-bodied subjects. These results indicate that amputee subjects may require more learning to achieve similar performance metrics as able-bodied subjects. These results also demonstrate that comparative findings between linear and probability-weighted regression with able-bodied subjects reflect performance differences when used by the amputee population.
LU; Zudi
2001-01-01
［1］Engle, R. F., Granger, C. W. J., Rice, J. et al., Semiparametric estimates of the relation between weather and electricity sales, Journal of the American Statistical Association, 1986, 81: 310.［2］Heckman, N. E., Spline smoothing in partly linear models, Journal of the Royal Statistical Society, Ser. B, 1986, 48: 244.［3］Rice, J., Convergence rates for partially splined models, Statistics & Probability Letters, 1986, 4: 203.［4］Chen, H., Convergence rates for parametric components in a partly linear model, Annals of Statistics, 1988, 16: 136.［5］Robinson, P. M., Root-n-consistent semiparametric regression, Econometrica, 1988, 56: 931.［6］Speckman, P., Kernel smoothing in partial linear models, Journal of the Royal Statistical Society, Ser. B, 1988, 50: 413.［7］Cuzick, J., Semiparametric additive regression, Journal of the Royal Statistical Society, Ser. B, 1992, 54: 831.［8］Cuzick, J., Efficient estimates in semiparametric additive regression models with unknown error distribution, Annals of Statistics, 1992, 20: 1129.［9］Chen, H., Shiau, J. H., A two-stage spline smoothing method for partially linear models, Journal of Statistical Planning & Inference, 1991, 27: 187.［10］Chen, H., Shiau, J. H., Data-driven efficient estimators for a partially linear model, Annals of Statistics, 1994, 22: 211.［11］Schick, A., Root-n consistent estimation in partly linear regression models, Statistics & Probability Letters, 1996, 28: 353.［12］Hamilton, S. A., Truong, Y. K., Local linear estimation in partly linear model, Journal of Multivariate Analysis, 1997, 60: 1.［13］Mills, T. C., The Econometric Modeling of Financial Time Series, Cambridge: Cambridge University Press, 1993, 137.［14］Engle, R. F., Autoregressive conditional heteroscedasticity with estimates of United Kingdom inflation, Econometrica, 1982, 50: 987.［15］Bera, A. K., Higgins, M. L., A survey of ARCH models: properties of estimation and testing, Journal of Economic
Verhelst, Helene E; Beele, Hilde; Joos, Rik; Vanneuville, Benedicte; Van Coster, Rudy N
2008-11-01
An 8-year-old girl with linear scleroderma "en coup de sabre" is reported who, at preschool age, presented with intractable simple partial seizures more than 1 year before skin lesions were first noticed. MRI revealed hippocampal atrophy, controlaterally to the seizures and ipsilaterally to the skin lesions. In the following months, a mental and motor regression was noticed. Cerebral CT scan showed multiple foci of calcifications in the affected hemisphere. In previously reported patients the skin lesions preceded the neurological signs. To the best of our knowledge, hippocampal atrophy was not earlier reported as presenting symptom of linear scleroderma. Linear scleroderma should be included in the differential diagnosis in patients with unilateral hippocampal atrophy even when the typical skin lesions are not present.
Significance tests to determine the direction of effects in linear regression models.
Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander
2015-02-01
Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice.
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.
Edelman, Eric R; van Kuijk, Sander M J; Hamaekers, Ankie E W; de Korte, Marcel J M; van Merode, Godefridus G; Buhre, Wolfgang F F A
2017-01-01
For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling
Eric R. Edelman
2017-06-01
Full Text Available For efficient utilization of operating rooms (ORs, accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT. We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT. TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related
Sparse Logistic Regression for Diagnosis of Liver Fibrosis in Rat by Using SCAD-Penalized Likelihood
Fang-Rong Yan
2011-01-01
Full Text Available The objective of the present study is to find out the quantitative relationship between progression of liver fibrosis and the levels of certain serum markers using mathematic model. We provide the sparse logistic regression by using smoothly clipped absolute deviation (SCAD penalized function to diagnose the liver fibrosis in rats. Not only does it give a sparse solution with high accuracy, it also provides the users with the precise probabilities of classification with the class information. In the simulative case and the experiment case, the proposed method is comparable to the stepwise linear discriminant analysis (SLDA and the sparse logistic regression with least absolute shrinkage and selection operator (LASSO penalty, by using receiver operating characteristic (ROC with bayesian bootstrap estimating area under the curve (AUC diagnostic sensitivity for selected variable. Results show that the new approach provides a good correlation between the serum marker levels and the liver fibrosis induced by thioacetamide (TAA in rats. Meanwhile, this approach might also be used in predicting the development of liver cirrhosis.
Ncibi, Mohamed Chaker
2008-05-01
In any single component isotherm study, determining the best-fitting model is a key analysis to mathematically describe the involved sorption system and, therefore, to explore the related theoretical assumptions. Hence, several error calculation functions have been widely used to estimate the error deviations between experimental and theoretically predicted equilibrium adsorption values (Q(e,exp)vs.Q(e,theo) as X- and Y-axis, respectively), including the average relative error deviation, the Marquardt's percent standard error deviation, the hybrid fractional error function, the sum of the squares of the errors, the correlation coefficient and the residuals. In this study, five other statistical functions are analysed to investigate their applicability as suitable tools to evaluate isotherm model fitness, namely the Pearson correlation coefficient, the coefficient of determination, the Chi-square test, the F-test and the Student's T-test, using the commonly-used functions as references. The adsorption of textile dye onto Posidonia oceanica seagrass fibres was carried out, as study case, in batch mode at 20 degrees C. Besides, and in order to get an overall approach of the possible utilization of these statistical functions within the studied item, the examination was realized for both linear and non-linear regression analysis. The related results showed that, among the five studied statistical tools, the chi(2) and Student's T-tests were suitable to determine the best-fitting isotherm model for the case of linear modelling approach. On the other hand, dealing with the non-linear analysis, despite the Student's T-test, all the other functions gave satisfactorily results, by agreeing the commonly-used error functions calculation.
Distributed Monitoring of the R(sup 2) Statistic for Linear Regression
Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.
2011-01-01
The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
Multiple regression technique for Pth degree polynominals with and without linear cross products
Davis, J. W.
1973-01-01
A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
Research on the multiple linear regression in non-invasive blood glucose measurement.
Zhu, Jianming; Chen, Zhencheng
2015-01-01
A non-invasive blood glucose measurement sensor and the data process algorithm based on the metabolic energy conservation (MEC) method are presented in this paper. The physiological parameters of human fingertip can be measured by various sensing modalities, and blood glucose value can be evaluated with the physiological parameters by the multiple linear regression analysis. Five methods such as enter, remove, forward, backward and stepwise in multiple linear regression were compared, and the backward method had the best performance. The best correlation coefficient was 0.876 with the standard error of the estimate 0.534, and the significance was 0.012 (sig. regression equation was valid. The Clarke error grid analysis was performed to compare the MEC method with the hexokinase method, using 200 data points. The correlation coefficient R was 0.867 and all of the points were located in Zone A and Zone B, which shows the MEC method provides a feasible and valid way for non-invasive blood glucose measurement.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Linear Regression Model for Global Solar Radiation on Horizontal Surfaces at Warri, Nigeria
Michael S. Okundamiya
2013-10-01
Full Text Available The growing anxiety on the negative effects of fossil fuels on the environment and the global emission reduction targets call for a more extensive use of renewable energy alternatives. Efficient solar energy utilization is an essential solution to the high atmospheric pollution caused by fossil fuel combustion. Global solar radiation (GSR data, which are useful for the design and evaluation of solar energy conversion system, are not measured at the forty-five meteorological stations in Nigeria. The dearth of the measured solar radiation data calls for accurate estimation. This study proposed a temperature-based linear regression, for predicting the monthly average daily GSR on horizontal surfaces, at Warri (latitude 5.020N and longitude 7.880E an oil city located in the south-south geopolitical zone, in Nigeria. The proposed model is analyzed based on five statistical indicators (coefficient of correlation, coefficient of determination, mean bias error, root mean square error, and t-statistic, and compared with the existing sunshine-based model for the same study. The results indicate that the proposed temperature-based linear regression model could replace the existing sunshine-based model for generating global solar radiation data. Keywords: air temperature; empirical model; global solar radiation; regression analysis; renewable energy; Warri
A note on the use of multiple linear regression in molecular ecology.
Frasier, Timothy R
2016-03-01
Multiple linear regression analyses (also often referred to as generalized linear models--GLMs, or generalized linear mixed models--GLMMs) are widely used in the analysis of data in molecular ecology, often to assess the relative effects of genetic characteristics on individual fitness or traits, or how environmental characteristics influence patterns of genetic differentiation. However, the coefficients resulting from multiple regression analyses are sometimes misinterpreted, which can lead to incorrect interpretations and conclusions within individual studies, and can propagate to wider-spread errors in the general understanding of a topic. The primary issue revolves around the interpretation of coefficients for independent variables when interaction terms are also included in the analyses. In this scenario, the coefficients associated with each independent variable are often interpreted as the independent effect of each predictor variable on the predicted variable. However, this interpretation is incorrect. The correct interpretation is that these coefficients represent the effect of each predictor variable on the predicted variable when all other predictor variables are zero. This difference may sound subtle, but the ramifications cannot be overstated. Here, my goals are to raise awareness of this issue, to demonstrate and emphasize the problems that can result and to provide alternative approaches for obtaining the desired information.
Adnane El Hamidi
2012-01-01
Full Text Available Interactions of Cu(II ions with calcium phosphate Brushite (DCPD in aqueous solutions were investigated by batch conditions and under several sorption parameters like contact time, pH of solution and initial metal concentration. The retention of copper was found maximum and dominated by exchange reaction process in the pH range 4-6. The reaction process was found initially fast and more than 98% was removed at equilibrium. The kinetics data of batch interaction was analyzed with various kinetic models. It was found that the pseudo-first order model using the non-linear regression method predicted best the experimental data. Furthermore, the adsorption process was modeled by Langmuir isotherm and the removal capacity was 331.64 mg.g-1. Consequently, Cu2+ concentration independent kinetics and single surface layer sorption isotherm are then suggested as appropriate mechanisms for the whole process.
Single Image Super-Resolution Using Global Regression Based on Multiple Local Linear Mappings.
Choi, Jae-Seok; Kim, Munchurl
2017-03-01
Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity. However, since SI only utilizes simple linear mappings, it may fail to precisely reconstruct HR patches with complex texture. In this paper, we present a novel SR method, which inherits the large-to-small patch conversion scheme from SI but uses global regression based on local linear mappings (GLM). Thus, our new SR method is called GLM-SI. In GLM-SI, each LR input patch is divided into 25 overlapped subpatches. Next, based on the local properties of these subpatches, 25 different local linear mappings are applied to the current LR input patch to generate 25 HR patch candidates, which are then regressed into one final HR patch using a global regressor. The local linear mappings are learned cluster-wise in our off-line training phase. The main contribution of this paper is as follows: Previously, linear-mapping-based conventional SR methods, including SI only used one simple yet coarse linear mapping to each patch to reconstruct its HR version. On the contrary, for each LR input patch, our GLM-SI is the first to apply a combination of multiple local linear mappings, where each local linear mapping is found according to local properties of the current LR patch. Therefore, it can better approximate nonlinear LR-to-HR mappings for HR patches with complex texture. Experiment results show that the proposed GLM-SI method outperforms most of the state-of-the-art methods, and shows comparable PSNR performance with much lower
Yunfeng Wu
2014-01-01
Full Text Available This paper presents a novel adaptive linear and normalized combination (ALNC method that can be used to combine the component radial basis function networks (RBFNs to implement better function approximation and regression tasks. The optimization of the fusion weights is obtained by solving a constrained quadratic programming problem. According to the instantaneous errors generated by the component RBFNs, the ALNC is able to perform the selective ensemble of multiple leaners by adaptively adjusting the fusion weights from one instance to another. The results of the experiments on eight synthetic function approximation and six benchmark regression data sets show that the ALNC method can effectively help the ensemble system achieve a higher accuracy (measured in terms of mean-squared error and the better fidelity (characterized by normalized correlation coefficient of approximation, in relation to the popular simple average, weighted average, and the Bagging methods.
strucchange: An R Package for Testing for Structural Change in Linear Regression Models
Achim Zeileis
2002-01-01
Full Text Available This paper reviews tests for structural change in linear regression models from the generalized fluctuation test framework as well as from the F test (Chow test framework. It introduces a unified approach for implementing these tests and presents how these ideas have been realized in an R package called strucchange. Enhancing the standard significance test approach the package contains methods to fit, plot and test empirical fluctuation processes (like CUSUM, MOSUM and estimates-based processes and to compute, plot and test sequences of F statistics with the supF , aveF and expF test. Thus, it makes powerful tools available to display information about structural changes in regression relationships and to assess their significance. Furthermore, it is described how incoming data can be monitored.
Shetty, Rahul; Bigiel, Frank
2012-01-01
We develop a Bayesian linear regression method which rigorously treats measurement uncertainties, and accounts for hierarchical data structure for investigating the relationship between the star formation rate and gas surface density. The method simultaneously estimates the intercept, slope, and scatter about the regression line of each individual subject (e.g. a galaxy) and the population (e.g. an ensemble of galaxies). Using synthetic datasets, we demonstrate that the Bayesian method accurately recovers the parameters of both the individuals and the population, especially when compared to commonly employed least squares methods, such as the bisector. We apply the Bayesian method to estimate the Kennicutt-Schmidt (KS) parameters of a sample of spiral galaxies compiled by Bigiel et al. (2008). We find significant variation in the KS parameters, indicating that no single KS relationship holds for all galaxies. This suggests that the relationship between molecular gas and star formation differs between galaxies...
A multivariate linear regression model for the Jordanian industrial electric energy consumption
Al-Ghandoor, A.; Nahleh, Y.A.; Sandouqa, Y.; Al-Salaymeh, M. [Hashemite Univ., Zarqa (Jordan). Dept. of Industrial Engineering
2007-08-09
The amount of electricity used by the industrial sector in Jordan is an important driver for determining the future energy needs of the country. This paper proposed a model to simulate electricity and energy consumption by industry. The general model approach was based on multivariate regression analysis to provide valuable information regarding energy demands and analysis, and to identify the various factors that influence Jordanian industrial electricity consumption. It was determined that industrial gross output and capacity utilization are the most important variables that drive electricity consumption. The results revealed that the multivariate linear regression model can be used to adequately model the Jordanian industrial electricity consumption with coefficient of determination (R2) and adjusted R2 values of 99.3 and 99.2 per cent, respectively. 19 refs., 4 tabs., 2 figs.
无
2001-01-01
Partly linear regression model is useful in practice, but littleis investigated in the literature to adapt it to the real data which are dependent and conditionally heteroscedastic. In this paper, the estimators of the regression components are constructed via local polynomial fitting and the large sample properties are explored. Under certain mild regularities, the conditions are obtained to ensure that the estimators of the nonparametric component and its derivatives are consistent up to the convergence rates which are optimal in the i.i.d. case, and the estimator of the parametric component is root-n consistent with the same rate as for parametric model. The technique adopted in the proof differs from that used and corrects the errors in the reference by Hamilton and Truong under i.i.d. samples.
Monopole and dipole estimation for multi-frequency sky maps by linear regression
Wehus, I K; Eriksen, H K; Banday, A J; Dickinson, C; Ghosh, T; Gorski, K M; Lawrence, C R; Leahy, J P; Maino, D; Reich, P; Reich, W
2014-01-01
We describe a simple but efficient method for deriving a consistent set of monopole and dipole corrections for multi-frequency sky map data sets, allowing robust parametric component separation with the same data set. The computational core of this method is linear regression between pairs of frequency maps, often called "T-T plots". Individual contributions from monopole and dipole terms are determined by performing the regression locally in patches on the sky, while the degeneracy between different frequencies is lifted when ever the dominant foreground component exhibits a significant spatial spectral index variation. Based on this method, we present two different, but each internally consistent, sets of monopole and dipole coefficients for the 9-year WMAP, Planck 2013, SFD 100 um, Haslam 408 MHz and Reich & Reich 1420 MHz maps. The two sets have been derived with different analysis assumptions and data selection, and provides an estimate of residual systematic uncertainties. In general, our values are...
Li, Yanming; Nan, Bin; Zhu, Ji
2015-06-01
We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study.
Linear and support vector regressions based on geometrical correlation of data
Kaijun Wang
2007-10-01
Full Text Available Linear regression (LR and support vector regression (SVR are widely used in data analysis. Geometrical correlation learning (GcLearn was proposed recently to improve the predictive ability of LR and SVR through mining and using correlations between data of a variable (inner correlation. This paper theoretically analyzes prediction performance of the GcLearn method and proves that GcLearn LR and SVR will have better prediction performance than traditional LR and SVR for prediction tasks when good inner correlations are obtained and predictions by traditional LR and SVR are far away from their neighbor training data under inner correlation. This gives the applicable condition of GcLearn method.
Farzaneh Ahmadzadeh
2013-06-01
Full Text Available The liner of an ore grinding mill is a critical component in the grinding process, necessary for both high metal recovery and shell protection. From an economic point of view, it is important to keep mill liners in operation as long as possible, minimising the downtime for maintenance or repair. Therefore, predicting their wear is crucial. This paper tests different methods of predicting wear in the context of remaining height and remaining life of the liners. The key concern is to make decisions on replacement and maintenance without stopping the mill for extra inspection as this leads to financial savings. The paper applies linear multiple regression and artificial neural networks (ANN techniques to determine the most suitable methodology for predicting wear. The advantages of the ANN model over the traditional approach of multiple regression analysis include its high accuracy.
Barbara D. Klein
1999-01-01
Full Text Available Although databases used in many organizations have been found to contain errors, little is known about the effect of these errors on predictions made by linear regression models. The paper uses a real-world example, the prediction of the net asset values of mutual funds, to investigate the effect of data quality on linear regression models. The results of two experiments are reported. The first experiment shows that the error rate and magnitude of error in data used in model prediction negatively affect the predictive accuracy of linear regression models. The second experiment shows that the error rate and the magnitude of error in data used to build the model positively affect the predictive accuracy of linear regression models. All findings are statistically significant. The findings have managerial implications for users and builders of linear regression models.
Agarwal, Parul; Sambamoorthi, Usha
2015-12-01
Depression is common among individuals with osteoarthritis and leads to increased healthcare burden. The objective of this study was to examine excess total healthcare expenditures associated with depression among individuals with osteoarthritis in the US. Adults with self-reported osteoarthritis (n = 1881) were identified using data from the 2010 Medical Expenditure Panel Survey (MEPS). Among those with osteoarthritis, chi-square tests and ordinary least square regressions (OLS) were used to examine differences in healthcare expenditures between those with and without depression. Post-regression linear decomposition technique was used to estimate the relative contribution of different constructs of the Anderson's behavioral model, i.e., predisposing, enabling, need, personal healthcare practices, and external environment factors, to the excess expenditures associated with depression among individuals with osteoarthritis. All analysis accounted for the complex survey design of MEPS. Depression coexisted among 20.6 % of adults with osteoarthritis. The average total healthcare expenditures were $13,684 among adults with depression compared to $9284 among those without depression. Multivariable OLS regression revealed that adults with depression had 38.8 % higher healthcare expenditures (p regression linear decomposition analysis indicated that 50 % of differences in expenditures among adults with and without depression can be explained by differences in need factors. Among individuals with coexisting osteoarthritis and depression, excess healthcare expenditures associated with depression were mainly due to comorbid anxiety, chronic conditions and poor health status. These expenditures may potentially be reduced by providing timely intervention for need factors or by providing care under a collaborative care model.
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.
Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard
2017-04-01
To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.
Yubo Wang
2017-06-01
Full Text Available It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC. In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976 ratio and outperforms existing methods such as short-time Fourier transfrom (STFT, continuous Wavelet transform (CWT and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
Wang, Yubo; Veluvolu, Kalyana C
2017-06-14
It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
Variable selection in multiple linear regression: The influence of individual cases
SJ Steel
2007-12-01
Full Text Available The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the C_p criterion and Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is proposed as the selection influence of the specific omitted case. Four standard examples from the literature are considered and the selection influence of the cases is calculated. It is argued that the selection procedure may be improved by taking the selection influence of individual data cases into account.
Circular and linear regression fitting circles and lines by least squares
Chernov, Nikolai
2010-01-01
Find the right algorithm for your image processing applicationExploring the recent achievements that have occurred since the mid-1990s, Circular and Linear Regression: Fitting Circles and Lines by Least Squares explains how to use modern algorithms to fit geometric contours (circles and circular arcs) to observed data in image processing and computer vision. The author covers all facets-geometric, statistical, and computational-of the methods. He looks at how the numerical algorithms relate to one another through underlying ideas, compares the strengths and weaknesses of each algorithm, and il
Vesnin, V. L.; Muradov, V. G.
2012-09-01
Absorption spectra of multicomponent hydrocarbon mixtures based on n-heptane and isooctane with addition of benzene (up to 1%) and toluene and o-xylene (up to 20%) were investigated experimentally in the region of the first overtones of the hydrocarbon groups (λ = 1620-1780 nm). It was shown that their concentrations could be determined separately by using a multiple linear regression method. The optimum result was obtained by including four wavelengths at 1671, 1680, 1685, and 1695 nm, which took into account absorption of CH groups in benzene, toluene, and o-xylene and CH3 groups, respectively.
Describing Adequacy of cure with maximum hardness ratios and non-linear regression.
Bouschlicher, Murray; Berning, Kristen; Qian, Fang
2008-01-01
Knoop Hardness (KH) ratios (HR) > or = 80% are commonly used as criteria for the adequate cure of a composite. These per-specimen HRs can be misleading, as both numerator and denominator may increase concurrently, prior to reaching an asymptotic, top-surface maximum hardness value (H(MAX)). Extended cure times were used to establish H(MAX) and descriptive statistics, and non-linear regression analysis were used to describe the relationship between exposure duration and HR and predict the time required for HR-H(MAX) = 80%. Composite samples 2.00 x 5.00 mm diameter (n = 5/grp) were cured for 10 seconds, 20 seconds, 40 seconds, 60 seconds, 90 seconds, 120 seconds, 180 seconds and 240 seconds in a 2-composite x 2-light curing unit design. A microhybrid (Point 4, P4) or microfill resin (Heliomolar, HM) composite was cured with a QTH or LED light curing unit and then stored in the dark for 24 hours prior to KH testing. Non-linear regression was calculated with: H = (H(MAX)-c)(1-e(-kt)) +c, H(MAX) = maximum hardness (a theoretical asymptotic value), c = constant (t = 0), k = rate constant and t = exposure duration describes the relationship between radiant exposure (irradiance x time) and HRs. Exposure durations for HR-H(MAX) = 80% were calculated. Two-sample t-tests for pairwise comparisons evaluated relative performance of the light curing units for similar surface x composite x exposure (10-90s). A good measure of goodness-of-fit of the non-linear regression, r2, ranged from 0.68-0.95. (mean = 0.82). Microhybrid (P4) exposure to achieve HR-H(MAX = 80% was 21 seconds for QTH and 34 seconds for the LED light curing unit. Corresponding values for microfill (HM) were 71 and 74 seconds, respectively. P4 HR-H(MAX) of LED vs QTH was statistically similar for 10 to 40 seconds, while HM HR-H(MAX) of LED was significantly lower than QTH for 10 to 40 seconds. It was concluded that redefined hardness ratios based on maximum hardness used in conjunction with non-linear regression
DOA Finding with Support Vector Regression Based Forward–Backward Linear Prediction
Jingjing Pan
2017-05-01
Full Text Available Direction-of-arrival (DOA estimation has drawn considerable attention in array signal processing, particularly with coherent signals and a limited number of snapshots. Forward–backward linear prediction (FBLP is able to directly deal with coherent signals. Support vector regression (SVR is robust with small samples. This paper proposes the combination of the advantages of FBLP and SVR in the estimation of DOAs of coherent incoming signals with low snapshots. The performance of the proposed method is validated with numerical simulations in coherent scenarios, in terms of different angle separations, numbers of snapshots, and signal-to-noise ratios (SNRs. Simulation results show the effectiveness of the proposed method.
Avval Zhila Mohajeri
2015-01-01
Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.
SECANT-FUZZY LINEAR REGRESSION METHOD FOR HARMONIC COMPONENTS ESTIMATION IN A POWER SYSTEM
Garba Inoussa; LUO An
2003-01-01
In order to avoid unnecessary damage of electrical equipments and installations,high quality power should be delivered to the end user and strict control on frequency should be made, Therefore, it is important to estimate the power system's harmonic components with higher accuracy. This paper presents a new approach for estimating harmonic component in a power system using secant - fuzzy linear regression method. In this approach the non - sinusoidal voltage or current waveform is written as I linear function. The coefficient of this function is assumed to be fuzzy number with a membership function that has center and spread value. The time dependent quantity is written as Taylor series with two different time dependent quantities. The objective is to use the sample obtained from the transmission line to find the power system harmonic components and frequencies. We used an experimental voltage signal from a sub power station as a numerical test.
Pagowski, M O; Grell, G A; Devenyi, D; Peckham, S E; McKeen, S A; Gong, W; Monache, L D; McHenry, J N; McQueen, J; Lee, P
2006-02-02
Forecasts from seven air quality models and surface ozone data collected over the eastern USA and southern Canada during July and August 2004 provide a unique opportunity to assess benefits of ensemble-based ozone forecasting and devise methods to improve ozone forecasts. In this investigation, past forecasts from the ensemble of models and hourly surface ozone measurements at over 350 sites are used to issue deterministic 24-h forecasts using a method based on dynamic linear regression. Forecasts of hourly ozone concentrations as well as maximum daily 8-h and 1-h averaged concentrations are considered. It is shown that the forecasts issued with the application of this method have reduced bias and root mean square error and better overall performance scores than any of the ensemble members and the ensemble average. Performance of the method is similar to another method based on linear regression described previously by Pagowski et al., but unlike the latter, the current method does not require measurements from multiple monitors since it operates on individual time series. Improvement in the forecasts can be easily implemented and requires minimal computational cost.
R B Magar; V Jothiprakash
2011-12-01
In this study, multi-linear regression (MLR) approach is used to construct intermittent reservoir daily inflow forecasting system. To illustrate the applicability and effect of using lumped and distributed input data in MLR approach, Koyna river watershed in Maharashtra, India is chosen as a case study. The results are also compared with autoregressive integrated moving average (ARIMA) models. MLR attempts to model the relationship between two or more independent variables over a dependent variable by fitting a linear regression equation. The main aim of the present study is to see the consequences of development and applicability of simple models, when sufficient data length is available. Out of 47 years of daily historical rainfall and reservoir inflow data, 33 years of data is used for building the model and 14 years of data is used for validating the model. Based on the observed daily rainfall and reservoir inflow, various types of time-series, cause-effect and combined models are developed using lumped and distributed input data. Model performance was evaluated using various performance criteria and it was found that as in the present case, of well correlated input data, both lumped and distributed MLR models perform equally well. For the present case study considered, both MLR and ARIMA models performed equally sound due to availability of large dataset.
Setyaningsih, S.
2017-01-01
The main element to build a leading university requires lecturer commitment in a professional manner. Commitment is measured through willpower, loyalty, pride, loyalty, and integrity as a professional lecturer. A total of 135 from 337 university lecturers were sampled to collect data. Data were analyzed using validity and reliability test and multiple linear regression. Many studies have found a link on the commitment of lecturers, but the basic cause of the causal relationship is generally neglected. These results indicate that the professional commitment of lecturers affected by variables empowerment, academic culture, and trust. The relationship model between variables is composed of three substructures. The first substructure consists of endogenous variables professional commitment and exogenous three variables, namely the academic culture, empowerment and trust, as well as residue variable ɛ y . The second substructure consists of one endogenous variable that is trust and two exogenous variables, namely empowerment and academic culture and the residue variable ɛ 3. The third substructure consists of one endogenous variable, namely the academic culture and exogenous variables, namely empowerment as well as residue variable ɛ 2. Multiple linear regression was used in the path model for each substructure. The results showed that the hypothesis has been proved and these findings provide empirical evidence that increasing the variables will have an impact on increasing the professional commitment of the lecturers.
Asmaa S. Abdul Jabar
2016-09-01
Full Text Available On 31 May 2003, the scan line corrector (SLC of the Landsat 7 Enhanced Thematic Mapper Plus (ETM+ sensor which compensates for the forward motion of the satellite in the imagery acquired failed permanently, resulting in loss of the ability to scan about 20% of the pixels in each Landsat 7 SLC-off image. This permanent failure has seriously hampered the scientific applications of ETM+ images. In this study, an innovative gap filling approach has been introduced to recover the missing pixels in the SLC-off images using multi-temporal ETM+ SLC-off auxiliary fill images. A correlation is established between the corresponding pixels in the target SLC-off image and two fill images in parallel using the multiple linear regressions (MLR model. Simulated and actual SLC-off ETM+ images were used to assess the performance of the proposed method by comparing with multi-temporal data based methods, the LLHM method which is based on simple linear regression (SLR model. The qualitative and quantitative evaluations indicate that the proposed method can recover the value of un-scanned pixels accurately, especially in heterogeneous landscape and even with more temporally distant fill images.
A Comparative Investigation of Confidence Intervals for IndependentVariables in Linear Regression.
Dudgeon, Paul
2016-01-01
In linear regression, the most appropriate standardized effect size for individual independent variables having an arbitrary metric remains open to debate, despite researchers typically reporting a standardized regression coefficient. Alternative standardized measures include the semipartial correlation, the improvement in the squared multiple correlation, and the squared partial correlation. No arguments based on either theoretical or statistical grounds for preferring one of these standardized measures have been mounted in the literature. Using a Monte Carlo simulation, the performance of interval estimators for these effect-size measures was compared in a 5-way factorial design. Formal statistical design methods assessed both the accuracy and robustness of the four interval estimators. The coverage probability of a large-sample confidence interval for the semipartial correlation coefficient derived from Aloe and Becker was highly accurate and robust in 98% of instances. It was better in small samples than the Yuan-Chan large-sample confidence interval for a standardized regression coefficient. It was also consistently better than both a bootstrap confidence interval for the improvement in the squared multiple correlation and a noncentral interval for the squared partial correlation.
Yoneoka, Daisuke; Henmi, Masayuki
2017-06-01
Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Aboveground biomass and carbon stocks modelling using non-linear regression model
Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd
2016-06-01
Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.
Post-L1-Penalized Estimators in High-Dimensional Linear Regression Models
Belloni, Alexandre
2010-01-01
In this paper we study the post-penalized estimator which applies ordinary, unpenalized linear regression to the model selected by the first step penalized estimators, typically the LASSO. We show that post-LASSO can perform as well or nearly as well as the LASSO in terms of the rate of convergence. We show that this performance occurs even if the LASSO-based model selection "fails", in the sense of missing some components of the "true" regression model. Furthermore, post-LASSO can perform strictly better than LASSO, in the sense of a strictly faster rate of convergence, if the LASSO-based model selection correctly includes all components of the "true" model as a subset and enough sparsity is obtained. Of course, in the extreme case, when LASSO perfectly selects the true model, the past-LASSO estimator becomes the oracle estimator. We show that the results hold in both parametric and non-parametric models; and by the "true" model we mean the best $s$-dimensional approximation to the true regression model, whe...
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure.
Multivariate linear regression of high-dimensional fMRI data with multiple target variables.
Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia
2014-05-01
Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets.
KAYODE AYINDE
2012-11-01
Full Text Available Performances of estimators of linear regression model with autocorrelated error term have been attributed to the nature and specification of the explanatory variables. The violation of assumption of the independence of the explanatory variables is not uncommon especially in business, economic and social sciences, leading to the development of many estimators. Moreover, prediction is one of the main essences of regression analysis. This work, therefore, attempts to examine the parameter estimates of the Ordinary Least Square estimator (OLS, Cochrane-Orcutt estimator (COR, Maximum Likelihood estimator (ML and the estimators based on Principal Component analysis (PC in prediction of linear regression model with autocorrelated error terms under the violations of assumption of independent regressors (multicollinearity using Monte-Carlo experiment approach. With uniform variables as regressors, it further identifies the best estimator that can be used for prediction purpose by averaging the adjusted co-efficient of determination of each estimator over the number of trials. Results reveal that the performances of COR and ML estimators at each level of multicollinearity over the levels of autocorrelation are convex – like while that of the OLS and PC estimators are concave; and that asthe level of multicollinearity increases, the estimators perform much better at all the levels of autocorrelation. Except when the sample size is small (n=10, the performances of the COR and ML estimators are generally best and asymptotically the same. When the sample size is small, the COR estimator is still best except when the autocorrelation level is low. At these instances, the PC estimator is either best or competes with the best estimator. Moreover, at low level of autocorrelation in all the sample sizes, the OLS estimator competes with the best estimator in all the levels of multicollinearity.
Heteroscedasticity as a Basis of Direction Dependence in Reversible Linear Regression Models.
Wiedermann, Wolfgang; Artner, Richard; von Eye, Alexander
2017-01-01
Heteroscedasticity is a well-known issue in linear regression modeling. When heteroscedasticity is observed, researchers are advised to remedy possible model misspecification of the explanatory part of the model (e.g., considering alternative functional forms and/or omitted variables). The present contribution discusses another source of heteroscedasticity in observational data: Directional model misspecifications in the case of nonnormal variables. Directional misspecification refers to situations where alternative models are equally likely to explain the data-generating process (e.g., x → y versus y → x). It is shown that the homoscedasticity assumption is likely to be violated in models that erroneously treat true nonnormal predictors as response variables. Recently, Direction Dependence Analysis (DDA) has been proposed as a framework to empirically evaluate the direction of effects in linear models. The present study links the phenomenon of heteroscedasticity with DDA and describes visual diagnostics and nine homoscedasticity tests that can be used to make decisions concerning the direction of effects in linear models. Results of a Monte Carlo simulation that demonstrate the adequacy of the approach are presented. An empirical example is provided, and applicability of the methodology in cases of violated assumptions is discussed.
Kew, William; Mitchell, John B O
2015-09-01
The application of Machine Learning to cheminformatics is a large and active field of research, but there exist few papers which discuss whether ensembles of different Machine Learning methods can improve upon the performance of their component methodologies. Here we investigated a variety of methods, including kernel-based, tree, linear, neural networks, and both greedy and linear ensemble methods. These were all tested against a standardised methodology for regression with data relevant to the pharmaceutical development process. This investigation focused on QSPR problems within drug-like chemical space. We aimed to investigate which methods perform best, and how the 'wisdom of crowds' principle can be applied to ensemble predictors. It was found that no single method performs best for all problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally to outperform the simpler linear ensemble. Choice of data preprocessing methodology was found to be crucial to performance of each method too. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Dawson, Terence P.; Curran, Paul J.; Kupiec, John A.
1995-01-01
link between wavelengths chosen by stepwise regression and the biochemical of interest, and this in turn has cast doubts on the use of imaging spectrometry for the estimation of foliar biochemical concentrations at sites distant from the training sites. To investigate this problem, an analysis was conducted on the variation in canopy biochemical concentrations and reflectance spectra using forced entry linear regression.
Sharifzadeh, Sara; Clemmensen, Line Katrine Harder; Borggaard, Claus
2014-01-01
feature selection method outperforms the PCA for both linear and non-linear methods. The highest performance was obtained by linear ridge regression applied on the selected features from the proposed Elastic net (EN) -based feature selection strategy. All the best models use a reduced number...... of meat samples (430–970 nm) were used for training and testing of the L⁎a⁎b prediction models. Finding a sparse solution or the use of a minimum number of bands is of particular interest to make an industrial vision set-up simpler and cost effective. In this paper, a wide range of linear, non-linear......, kernel-based regression and sparse regression methods are compared. In order to improve the prediction results of these models, we propose a supervised feature selection strategy which is compared with the Principal component analysis (PCA) as a pre-processing step. The results showed that the proposed...
Linear regression analysis of oxygen ionic conductivity in co-doped electrolyte
XIE Guang-yuan; LI Jian; PU Jian; GUO Mi
2006-01-01
A mathematical model for the estimation of oxygen-ion conductivity of doped ZrO2 and CeO2 electrolytes was established based on the assumptions that the electronic conduction and defect association can be neglected. A linear regression method was employed to determine the parameters in the model. This model was confirmed by the published conductivity data of the doped ZrO2 and CeO2 electrolytes. In addition,a series of compositions in Ce0.8Gd0.2-xMxO1.9-δ system (M is the co-dopant) was prepared,their high temperature conductivity were measured. The model was further validated by the measured conductivity data.
Maniquiz, Marla C; Lee, Soyoung; Kim, Lee-Hyung
2010-01-01
Rainfall is an important factor in estimating the event mean concentration (EMC) which is used to quantify the washed-off pollutant concentrations from non-point sources (NPSs). Pollutant loads could also be calculated using rainfall, catchment area and runoff coefficient. In this study, runoff quantity and quality data gathered from a 28-month monitoring conducted on the road and parking lot sites in Korea were evaluated using multiple linear regression (MLR) to develop equations for estimating pollutant loads and EMCs as a function of rainfall variables. The results revealed that total event rainfall and average rainfall intensity are possible predictors of pollutant loads. Overall, the models are indicators of the high uncertainties of NPSs; perhaps estimation of EMCs and loads could be accurately obtained by means of water quality sampling or a long-term monitoring is needed to gather more data that can be used for the development of estimation models.
Słania J.
2014-10-01
Full Text Available The article presents the process of production of coated electrodes and their welding properties. The factors concerning the welding properties and the currently applied method of assessing are given. The methodology of the testing based on the measuring and recording of instantaneous values of welding current and welding arc voltage is discussed. Algorithm for creation of reference data base of the expert system is shown, aiding the assessment of covered electrodes welding properties. The stability of voltage–current characteristics was discussed. Statistical factors of instantaneous values of welding current and welding arc voltage waveforms used for determining of welding process stability are presented. The results of coated electrodes welding properties are compared. The article presents the results of linear regression as well as the impact of the independent variables on the welding process performance. Finally the conclusions drawn from the research are given.
Linear Regression Model of the Ash Mass Fraction and Electrical Conductivity for Slovenian Honey
Mojca Jamnik
2008-01-01
Full Text Available Mass fraction of ash is a quality criterion for determining the botanical origin of honey. At present, this parameter is generally being replaced by the measurement of electrical conductivity (κ. The value κ depends on the ash and acid content of honey; the higher their content, the higher the resulting conductivity. A linear regression model for the relationship between ash and electrical conductivity has been established for Slovenian honey by analysing 290 samples of Slovenian honey (including acacia, lime, chestnut, spruce, fir, multifloral and mixed forest honeydew honey. The obtained model differs from the one proposed by the International Honey Commission (IHC in the slope, but not in the section part of the relation formula. Therefore, the Slovenian model is recommended when calculating the ash mass fraction from the results of electrical conductivity in samples of Slovenian honey.
The Omega Counter, a Frequency Counter Based on the Linear Regression
Rubiola, E; Bourgeois, P -Y; Vernotte, F
2015-01-01
This article introduces the {\\Omega} counter, a frequency counter -- or a frequency-to-digital converter, in a different jargon -- based on the Linear Regression (LR) algorithm on time stamps. We discuss the noise of the electronics. We derive the statistical properties of the {\\Omega} counter on rigorous mathematical basis, including the weighted measure and the frequency response. We describe an implementation based on a SoC, under test in our laboratory, and we compare the {\\Omega} counter to the traditional {\\Pi} and {\\Lambda} counters. The LR exhibits optimum rejection of white phase noise, superior to that of the {\\Pi} and {\\Lambda} counters. White noise is the major practical problem of wideband digital electronics, both in the instrument internal circuits and in the fast processes which we may want to measure. The {\\Omega} counter finds a natural application in the measurement of the Parabolic Variance, described in the companion article arXiv:1506.00687 [physics.data-an].
Chicken barn climate and hazardous volatile compounds control using simple linear regression and PID
Abdullah, A. H.; Bakar, M. A. A.; Shukor, S. A. A.; Saad, F. S. A.; Kamis, M. S.; Mustafa, M. H.; Khalid, N. S.
2016-07-01
The hazardous volatile compounds from chicken manure in chicken barn are potentially to be a health threat to the farm animals and workers. Ammonia (NH3) and hydrogen sulphide (H2S) produced in chicken barn are influenced by climate changes. The Electronic Nose (e-nose) is used for the barn's air, temperature and humidity data sampling. Simple Linear Regression is used to identify the correlation between temperature-humidity, humidity-ammonia and ammonia-hydrogen sulphide. MATLAB Simulink software was used for the sample data analysis using PID controller. Results shows that the performance of PID controller using the Ziegler-Nichols technique can improve the system controller to control climate in chicken barn.
Multiple Linear Regression Model Based on Neural Network and Its Application in the MBR Simulation
Chunqing Li
2012-01-01
Full Text Available The computer simulation of the membrane bioreactor MBR has become the research focus of the MBR simulation. In order to compensate for the defects, for example, long test period, high cost, invisible equipment seal, and so forth, on the basis of conducting in-depth study of the mathematical model of the MBR, combining with neural network theory, this paper proposed a three-dimensional simulation system for MBR wastewater treatment, with fast speed, high efficiency, and good visualization. The system is researched and developed with the hybrid programming of VC++ programming language and OpenGL, with a multifactor linear regression model of affecting MBR membrane fluxes based on neural network, applying modeling method of integer instead of float and quad tree recursion. The experiments show that the three-dimensional simulation system, using the above models and methods, has the inspiration and reference for the future research and application of the MBR simulation technology.
Chen, Wen-Yuan; Wang, Mei; Fu, Zhou-Xing
2014-06-16
Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1) we use a terrain drop compensation (TDC) technique to solve the problem of the concavity of railway crossings; (2) we use a linear regression technique to predict the position and length of an object from image processing; (3) we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP) to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas.
Wen-Yuan Chen
2014-06-01
Full Text Available Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1 we use a terrain drop compensation (TDC technique to solve the problem of the concavity of railway crossings; (2 we use a linear regression technique to predict the position and length of an object from image processing; (3 we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas.
Tai, Shen-Chuan; Chen, Peng-Yu; Chao, Chian-Yen
2016-07-01
The Consultative Committee for Space Data Systems proposed an efficient image compression standard that can do lossless compression (CCSDS-ICS). CCSDS-ICS is the most widely utilized standard for satellite communications. However, the original CCSDS-ICS is weak in terms of error resilience with even a single incorrect bit possibly causing numerous missing pixels. A restoration algorithm based on the neighborhood similar pixel interpolator is proposed to fill in missing pixels. The linear regression model is used to generate the reference image from other panchromatic or multispectral images. Furthermore, an adaptive search window is utilized to sieve out similar pixels from the pixels in the search region defined in the neighborhood similar pixel interpolator. The experimental results show that the proposed methods are capable of reconstructing missing regions with good visual quality.
Ghazali, Nurul Adyani; Ramli, Nor Azam; Yahaya, Ahmad Shukri; Yusof, Noor Faizah Fitri M D; Sansuddin, Nurulilyana; Al Madhoun, Wesam Ahmed
2010-06-01
Analysis and forecasting of air quality parameters are important topics of atmospheric and environmental research today due to the health impact caused by air pollution. This study examines transformation of nitrogen dioxide (NO(2)) into ozone (O(3)) at urban environment using time series plot. Data on the concentration of environmental pollutants and meteorological variables were employed to predict the concentration of O(3) in the atmosphere. Possibility of employing multiple linear regression models as a tool for prediction of O(3) concentration was tested. Results indicated that the presence of NO(2) and sunshine influence the concentration of O(3) in Malaysia. The influence of the previous hour ozone on the next hour concentrations was also demonstrated.
High dimensional linear regression models under long memory dependence and measurement error
Kaul, Abhishek
This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the
Yoo, Yun Joo; Sun, Lei; Poirier, Julia G.; Paterson, Andrew D.
2016-01-01
ABSTRACT By jointly analyzing multiple variants within a gene, instead of one at a time, gene‐based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster‐specific effects in a quadratic sum of squares and cross‐products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well‐powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P‐value, variance‐component, and principal‐component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene‐specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome‐wide analysis. The cluster construction of the MLC test statistics helps reveal within‐gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations. PMID:27885705
Rodríguez-Barranco, Miguel; Tobías, Aurelio; Redondo, Daniel; Molina-Portillo, Elena; Sánchez, María José
2017-03-17
Meta-analysis is very useful to summarize the effect of a treatment or a risk factor for a given disease. Often studies report results based on log-transformed variables in order to achieve the principal assumptions of a linear regression model. If this is the case for some, but not all studies, the effects need to be homogenized. We derived a set of formulae to transform absolute changes into relative ones, and vice versa, to allow including all results in a meta-analysis. We applied our procedure to all possible combinations of log-transformed independent or dependent variables. We also evaluated it in a simulation based on two variables either normally or asymmetrically distributed. In all the scenarios, and based on different change criteria, the effect size estimated by the derived set of formulae was equivalent to the real effect size. To avoid biased estimates of the effect, this procedure should be used with caution in the case of independent variables with asymmetric distributions that significantly differ from the normal distribution. We illustrate an application of this procedure by an application to a meta-analysis on the potential effects on neurodevelopment in children exposed to arsenic and manganese. The procedure proposed has been shown to be valid and capable of expressing the effect size of a linear regression model based on different change criteria in the variables. Homogenizing the results from different studies beforehand allows them to be combined in a meta-analysis, independently of whether the transformations had been performed on the dependent and/or independent variables.
Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression
Abdul Jameel, Abdul Gani
2016-09-14
An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN) of 71 pure hydrocarbons and 54 hydrocarbon blends were utilized as a data set to study the relationship between ignition quality and molecular structure. CN and DCN are functional equivalents and collectively referred to as D/CN, herein. The effect of molecular weight and weight percent of structural parameters such as paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic CH–CH2 groups, naphthenic CH–CH2 groups, and aromatic C–CH groups on D/CN was studied. A particular emphasis on the effect of branching (i.e., methyl substitution) on the D/CN was studied, and a new parameter denoted as the branching index (BI) was introduced to quantify this effect. A new formula was developed to calculate the BI of hydrocarbon fuels using 1H NMR spectroscopy. Multiple linear regression (MLR) modeling was used to develop an empirical relationship between D/CN and the eight structural parameters. This was then used to predict the DCN of many hydrocarbon fuels. The developed model has a high correlation coefficient (R2 = 0.97) and was validated with experimentally measured DCN of twenty-two real fuel mixtures (e.g., gasolines and diesels) and fifty-nine blends of known composition, and the predicted values matched well with the experimental data.
Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.
2013-01-01
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.
2013-01-01
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Rubio, Francisco J.
2016-02-09
We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
Multiple linear and principal component regressions for modelling ecotoxicity bioassay response.
Gomes, Ana I; Pires, José C M; Figueiredo, Sónia A; Boaventura, Rui A R
2014-01-01
The ecotoxicological response of the living organisms in an aquatic system depends on the physical, chemical and bacteriological variables, as well as the interactions between them. An important challenge to scientists is to understand the interaction and behaviour of factors involved in a multidimensional process such as the ecotoxicological response. With this aim, multiple linear regression (MLR) and principal component regression were applied to the ecotoxicity bioassay response of Chlorella vulgaris and Vibrio fischeri in water collected at seven sites of Leça river during five monitoring campaigns (February, May, June, August and September of 2006). The river water characterization included the analysis of 22 physicochemical and 3 microbiological parameters. The model that best fitted the data was MLR, which shows: (i) a negative correlation with dissolved organic carbon, zinc and manganese, and a positive one with turbidity and arsenic, regarding C. vulgaris toxic response; (ii) a negative correlation with conductivity and turbidity and a positive one with phosphorus, hardness, iron, mercury, arsenic and faecal coliforms, concerning V. fischeri toxic response. This integrated assessment may allow the evaluation of the effect of future pollution abatement measures over the water quality of Leça River.
Zare Abyaneh, Hamid
2014-01-01
This paper examined the efficiency of multivariate linear regression (MLR) and artificial neural network (ANN) models in prediction of two major water quality parameters in a wastewater treatment plant. Biochemical oxygen demand (BOD) and chemical oxygen demand (COD) as well as indirect indicators of organic matters are representative parameters for sewer water quality. Performance of the ANN models was evaluated using coefficient of correlation (r), root mean square error (RMSE) and bias values. The computed values of BOD and COD by model, ANN method and regression analysis were in close agreement with their respective measured values. Results showed that the ANN performance model was better than the MLR model. Comparative indices of the optimized ANN with input values of temperature (T), pH, total suspended solid (TSS) and total suspended (TS) for prediction of BOD was RMSE = 25.1 mg/L, r = 0.83 and for prediction of COD was RMSE = 49.4 mg/L, r = 0.81. It was found that the ANN model could be employed successfully in estimating the BOD and COD in the inlet of wastewater biochemical treatment plants. Moreover, sensitive examination results showed that pH parameter have more effect on BOD and COD predicting to another parameters. Also, both implemented models have predicted BOD better than COD.
Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.
2016-02-01
The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.
Predicting students' success at pre-university studies using linear and logistic regressions
Suliman, Noor Azizah; Abidin, Basir; Manan, Norhafizah Abdul; Razali, Ahmad Mahir
2014-09-01
The study is aimed to find the most suitable model that could predict the students' success at the medical pre-university studies, Centre for Foundation in Science, Languages and General Studies of Cyberjaya University College of Medical Sciences (CUCMS). The predictors under investigation were the national high school exit examination-Sijil Pelajaran Malaysia (SPM) achievements such as Biology, Chemistry, Physics, Additional Mathematics, Mathematics, English and Bahasa Malaysia results as well as gender and high school background factors. The outcomes showed that there is a significant difference in the final CGPA, Biology and Mathematics subjects at pre-university by gender factor, while by high school background also for Mathematics subject. In general, the correlation between the academic achievements at the high school and medical pre-university is moderately significant at α-level of 0.05, except for languages subjects. It was found also that logistic regression techniques gave better prediction models than the multiple linear regression technique for this data set. The developed logistic models were able to give the probability that is almost accurate with the real case. Hence, it could be used to identify successful students who are qualified to enter the CUCMS medical faculty before accepting any students to its foundation program.
2014-09-18
CDF Cumulative Distribution Function CEMA Correlation Electro-Magnetic Attack DPA Differential Power Analysis DRA Dimensionality Reduction Assessment... CEMA ) SCA attacks are examined. A novel method to find time samples with high information leakage of sensitive data using the adjusted coefficient of...correlation R2a in a linear regression attack is introduced [92]. Three linear regression attacks from current literature [34, 50, 115] and CEMA [19] are
Multiple linear regression model for predicting biomass digestibility from structural features.
Zhu, Li; O'Dwyer, Jonathan P; Chang, Vincent S; Granda, Cesar B; Holtzapple, Mark T
2010-07-01
A total of 147 model lignocellulose samples with a broad spectrum of structural features (lignin contents, acetyl contents, and crystallinity indices) were hydrolyzed with a wide range of cellulase loadings during 1-, 6-, and 72-h hydrolysis periods. Carbohydrate conversions at 1, 6, and 72 h were linearly proportional to the logarithm of cellulase loadings from approximately 10% to 90% conversion, indicating that the simplified HCH-1 model is valid for predicting lignocellulose digestibility. The HCH-1 model is a modified Michaelis-Menton model that accounts for the fraction of insoluble substrate available to bind with enzyme. The slopes and intercepts of a simplified HCH-1 model were correlated with structural features using multiple linear regression (MLR) models. The agreement between the measured and predicted 1-, 6-, and 72-h slopes and intercepts of glucan, xylan, and total sugar hydrolyses indicate that lignin content, acetyl content, and cellulose crystallinity are key factors that determine biomass digestibility. The 1-, 6-, and 72-h glucan, xylan, and total sugar conversions predicted from structural features using MLR models and the simplified HCH-1 model fit satisfactorily with the measured data (R(2) approximately 1.0). The parameter selection suggests that lignin content and cellulose crystallinity more strongly affect on digestibility than acetyl content. Cellulose crystallinity has greater influence during short hydrolysis periods whereas lignin content has more influence during longer hydrolysis periods. Cellulose crystallinity shows more influence on glucan hydrolysis whereas lignin content affects xylan hydrolysis to a greater extent.
Fernández-Fernández, Mario; Rodríguez-González, Pablo; García Alonso, J Ignacio
2016-10-01
We have developed a novel, rapid and easy calculation procedure for Mass Isotopomer Distribution Analysis based on multiple linear regression which allows the simultaneous calculation of the precursor pool enrichment and the fraction of newly synthesized labelled proteins (fractional synthesis) using linear algebra. To test this approach, we used the peptide RGGGLK as a model tryptic peptide containing three subunits of glycine. We selected glycine labelled in two (13) C atoms ((13) C2 -glycine) as labelled amino acid to demonstrate that spectral overlap is not a problem in the proposed methodology. The developed methodology was tested first in vitro by changing the precursor pool enrichment from 10 to 40% of (13) C2 -glycine. Secondly, a simulated in vivo synthesis of proteins was designed by combining the natural abundance RGGGLK peptide and 10 or 20% (13) C2 -glycine at 1 : 1, 1 : 3 and 3 : 1 ratios. Precursor pool enrichments and fractional synthesis values were calculated with satisfactory precision and accuracy using a simple spreadsheet. This novel approach can provide a relatively rapid and easy means to measure protein turnover based on stable isotope tracers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Describing Growth Pattern of Bali Cows Using Non-linear Regression Models
Mohd. Hafiz A.W
2016-12-01
Full Text Available The objective of this study was to evaluate the best fit non-linear regression model to describe the growth pattern of Bali cows. Estimates of asymptotic mature weight, rate of maturing and constant of integration were derived from Brody, von Bertalanffy, Gompertz and Logistic models which were fitted to cross-sectional data of body weight taken from 74 Bali cows raised in MARDI Research Station Muadzam Shah Pahang. Coefficient of determination (R2 and residual mean squares (MSE were used to determine the best fit model in describing the growth pattern of Bali cows. Von Bertalanffy model was the best model among the four growth functions evaluated to determine the mature weight of Bali cattle as shown by the highest R2 and lowest MSE values (0.973 and 601.9, respectively, followed by Gompertz (0.972 and 621.2, respectively, Logistic (0.971 and 648.4, respectively and Brody (0.932 and 660.5, respectively models. The correlation between rate of maturing and mature weight was found to be negative in the range of -0.170 to -0.929 for all models, indicating that animals of heavier mature weight had lower rate of maturing. The use of non-linear model could summarize the weight-age relationship into several biologically interpreted parameters compared to the entire lifespan weight-age data points that are difficult and time consuming to interpret.
A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis
Taneja, Abhishek
2011-01-01
The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...
Measuring treatment and scale bias effects by linear regression in the analysis of OHI-S scores.
Moore, B J
1977-05-01
A linear regression model is presented for estimating unbiased treatment effects from OHI-S scores. An example is given to illustrate an analysis and to compare results of an unbiased regression estimator with those based on a biased simple difference estimator.
Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H
2017-05-10
We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P valuelinear regression P value). The statistical power of CAT test decreased, while the result of linear regression analysis remained the same when population size was reduced by 100 times and AMI incidence rate remained unchanged. The two statistical methods have their advantages and disadvantages. It is necessary to choose statistical method according the fitting degree of data, or comprehensively analyze the results of two methods.
Non-linear regression model for spatial variation in precipitation chemistry for South India
Siva Soumya, B.; Sekhar, M.; Riotte, J.; Braun, Jean-Jacques
Chemical composition of rainwater changes from sea to inland under the influence of several major factors - topographic location of area, its distance from sea, annual rainfall. A model is developed here to quantify the variation in precipitation chemistry under the influence of inland distance and rainfall amount. Various sites in India categorized as 'urban', 'suburban' and 'rural' have been considered for model development. pH, HCO 3, NO 3 and Mg do not change much from coast to inland while, SO 4 and Ca change is subjected to local emissions. Cl and Na originate solely from sea salinity and are the chemistry parameters in the model. Non-linear multiple regressions performed for the various categories revealed that both rainfall amount and precipitation chemistry obeyed a power law reduction with distance from sea. Cl and Na decrease rapidly for the first 100 km distance from sea, then decrease marginally for the next 100 km, and later stabilize. Regression parameters estimated for different cases were found to be consistent ( R2 ˜ 0.8). Variation in one of the parameters accounted for urbanization. Model was validated using data points from the southern peninsular region of the country. Estimates are found to be within 99.9% confidence interval. Finally, this relationship between the three parameters - rainfall amount, coastline distance, and concentration (in terms of Cl and Na) was validated with experiments conducted in a small experimental watershed in the south-west India. Chemistry estimated using the model was in good correlation with observed values with a relative error of ˜5%. Monthly variation in the chemistry is predicted from a downscaling model and then compared with the observed data. Hence, the model developed for rain chemistry is useful in estimating the concentrations at different spatio-temporal scales and is especially applicable for south-west region of India.
Monopole and dipole estimation for multi-frequency sky maps by linear regression
Wehus, I. K.; Fuskeland, U.; Eriksen, H. K.; Banday, A. J.; Dickinson, C.; Ghosh, T.; Górski, K. M.; Lawrence, C. R.; Leahy, J. P.; Maino, D.; Reich, P.; Reich, W.
2017-01-01
We describe a simple but efficient method for deriving a consistent set of monopole and dipole corrections for multi-frequency sky map data sets, allowing robust parametric component separation with the same data set. The computational core of this method is linear regression between pairs of frequency maps, often called T-T plots. Individual contributions from monopole and dipole terms are determined by performing the regression locally in patches on the sky, while the degeneracy between different frequencies is lifted whenever the dominant foreground component exhibits a significant spatial spectral index variation. Based on this method, we present two different, but each internally consistent, sets of monopole and dipole coefficients for the nine-year WMAP, Planck 2013, SFD 100 μm, Haslam 408 MHz and Reich & Reich 1420 MHz maps. The two sets have been derived with different analysis assumptions and data selection, and provide an estimate of residual systematic uncertainties. In general, our values are in good agreement with previously published results. Among the most notable results are a relative dipole between the WMAP and Planck experiments of 10-15μK (depending on frequency), an estimate of the 408 MHz map monopole of 8.9 ± 1.3 K, and a non-zero dipole in the 1420 MHz map of 0.15 ± 0.03 K pointing towards Galactic coordinates (l,b) = (308°,-36°) ± 14°. These values represent the sum of any instrumental and data processing offsets, as well as any Galactic or extra-Galactic component that is spectrally uniform over the full sky.
Gangopadhyay, S.; Clark, M. P.; Rajagopalan, B.
2002-12-01
The success of short term (days to fortnight) streamflow forecasting largely depends on the skill of surface climate (e.g., precipitation and temperature) forecasts at local scales in the individual river basins. The surface climate forecasts are used to drive the hydrologic models for streamflow forecasting. Typically, Medium Range Forecast (MRF) models provide forecasts of large scale circulation variables (e.g. pressures, wind speed, relative humidity etc.) at different levels in the atmosphere on a regular grid - which are then used to "downscale" to the surface climate at locations within the model grid box. Several statistical and dynamical methods are available for downscaling. This paper compares the utility of two statistical downscaling methodologies: (1) multiple linear regression (MLR) and (2) a nonparametric approach based on k-nearest neighbor (k-NN) bootstrap method, in providing local-scale information of precipitation and temperature at a network of stations in the Upper Colorado River Basin. Downscaling to the stations is based on output of large scale circulation variables (i.e. predictors) from the NCEP Medium Range Forecast (MRF) database. Fourteen-day six hourly forecasts are developed using these two approaches, and their forecast skill evaluated. A stepwise regression is performed at each location to select the predictors for the MLR. The k-NN bootstrap technique resamples historical data based on their "nearness" to the current pattern in the predictor space. Prior to resampling a Principal Component Analysis (PCA) is performed on the predictor set to identify a small subset of predictors. Preliminary results using the MLR technique indicate a significant value in the downscaled MRF output in predicting runoff in the Upper Colorado Basin. It is expected that the k-NN approach will match the skill of the MLR approach at individual stations, and will have the added advantage of preserving the spatial co-variability between stations, capturing
Modeling of Soil Aggregate Stability using Support Vector Machines and Multiple Linear Regression
Ali Asghar Besalatpour
2016-02-01
Full Text Available Introduction: Soil aggregate stability is a key factor in soil resistivity to mechanical stresses, including the impacts of rainfall and surface runoff, and thus to water erosion (Canasveras et al., 2010. Various indicators have been proposed to characterize and quantify soil aggregate stability, for example percentage of water-stable aggregates (WSA, mean weight diameter (MWD, geometric mean diameter (GMD of aggregates, and water-dispersible clay (WDC content (Calero et al., 2008. Unfortunately, the experimental methods available to determine these indicators are laborious, time-consuming and difficult to standardize (Canasveras et al., 2010. Therefore, it would be advantageous if aggregate stability could be predicted indirectly from more easily available data (Besalatpour et al., 2014. The main objective of this study is to investigate the potential use of support vector machines (SVMs method for estimating soil aggregate stability (as quantified by GMD as compared to multiple linear regression approach. Materials and Methods: The study area was part of the Bazoft watershed (31° 37′ to 32° 39′ N and 49° 34′ to 50° 32′ E, which is located in the Northern part of the Karun river basin in central Iran. A total of 160 soil samples were collected from the top 5 cm of soil surface. Some easily available characteristics including topographic, vegetation, and soil properties were used as inputs. Soil organic matter (SOM content was determined by the Walkley-Black method (Nelson & Sommers, 1986. Particle size distribution in the soil samples (clay, silt, sand, fine sand, and very fine sand were measured using the procedure described by Gee & Bauder (1986 and calcium carbonate equivalent (CCE content was determined by the back-titration method (Nelson, 1982. The modified Kemper & Rosenau (1986 method was used to determine wet-aggregate stability (GMD. The topographic attributes of elevation, slope, and aspect were characterized using a 20-m
An Ionospheric Index Model based on Linear Regression and Neural Network Approaches
Tshisaphungo, Mpho; McKinnell, Lee-Anne; Bosco Habarulema, John
2017-04-01
The ionosphere is well known to reflect radio wave signals in the high frequency (HF) band due to the present of electron and ions within the region. To optimise the use of long distance HF communications, it is important to understand the drivers of ionospheric storms and accurately predict the propagation conditions especially during disturbed days. This paper presents the development of an ionospheric storm-time index over the South African region for the application of HF communication users. The model will result into a valuable tool to measure the complex ionospheric behaviour in an operational space weather monitoring and forecasting environment. The development of an ionospheric storm-time index is based on a single ionosonde station data over Grahamstown (33.3°S,26.5°E), South Africa. Critical frequency of the F2 layer (foF2) measurements for a period 1996-2014 were considered for this study. The model was developed based on linear regression and neural network approaches. In this talk validation results for low, medium and high solar activity periods will be discussed to demonstrate model's performance.
Abolghasem Beheshti
2016-05-01
Full Text Available A quantitative structure–activity relationship (QSAR was performed to analyze antimalarial activities of 68 urea derivatives using multiple linear regressions (MLR. QSAR analyses were performed on the available 68 IC50 oral data based on theoretical molecular descriptors. A suitable set of molecular descriptors were calculated to represent the molecular structures of compounds, such as constitutional, topological, geometrical, electrostatic and quantum-chemical descriptors. The important descriptors were selected with the aid of the genetic algorithm (GA method. The obtained model was validated using leave-one-out (LOO cross-validation; external test set and Y-randomization test. The root mean square errors (RMSE of the training set, and the test set for GA–MLR model were calculated to be 0.314 and 0.486, the square of correlation coefficients (R2 were obtained 0.801 and 0.803, respectively. Results showed that the predictive ability of the model was satisfactory, and it can be used for designing similar group of antimalarial compounds.
Boldizsar Nagy
2017-05-01
Full Text Available In the present study the biosorption characteristics of Cd (II and Zn (II ions from monocomponent aqueous solutions by Agaricus bisporus macrofungus were investigated. The initial metal ion concentrations, contact time, initial pH and temperature were parameters that influence the biosorption. Maximum removal efficiencies up to 76.10% and 70.09% (318 K for Cd (II and Zn (II, respectively and adsorption capacities up to 3.49 and 2.39 mg/g for Cd (II and Zn (II, respectively at the highest concentration, were calculated. The experimental data were analyzed using pseudo-first- and pseudo-second-order kinetic models, various isotherm models in linear and nonlinear (CMA-ES optimization algorithm regression and thermodynamic parameters were calculated. The results showed that the biosorption process of both studied metal ions, followed pseudo second-order kinetics, while equilibrium is best described by Sips isotherm. The changes in morphological structure after heavy metal-biomass interactions were evaluated by SEM analysis. Our results confirmed that macrofungus A. bisporus could be used as a cost effective, efficient biosorbent for the removal of Cd (II and Zn (II from aqueous synthetic solutions.
QSAR study of prolylcarboxypeptidase inhibitors by genetic algorithm: Multiple linear regressions
Eslam Pourbasheer; Saadat Vahdani; Reza Aalizadeh; Alireza Banaei; Mohammad Reza Ganjali
2015-07-01
The predictive analysis based on quantitative structure activity relationships (QSAR) on benzim-idazolepyrrolidinyl amides as prolylcarboxypeptidase (PrCP) inhibitors was performed. Molecules were represented by chemical descriptors that encode constitutional, topological, geometrical, and electronic structure features. The hierarchical clustering method was used to classify the dataset into training and test subsets. The important descriptors were selected with the aid of the genetic algorithm method. The QSAR model was constructed, using the multiple linear regressions (MLR), and its robustness and predictability were verified by internal and external cross-validation methods. Furthermore, the calculation of the domain of applicability defines the area of reliable predictions. The root mean square errors (RMSE) of the training set and the test set for GA-MLR model were calculated to be 0.176, 0.279 and the correlation coefficients (R2) were obtained to be 0.839, 0.923, respectively. The proposed model has good stability, robustness and predictability when verified by internal and external validation.
A Fast Incremental Learning for Radial Basis Function Networks Using Local Linear Regression
Ozawa, Seiichi; Okamoto, Keisuke
To avoid the catastrophic interference in incremental learning, we have proposed Resource Allocating Network with Long Term Memory (RAN-LTM). In RAN-LTM, not only new training data but also some memory items stored in long-term memory are trained either by a gradient descent algorithm or by solving a linear regression problem. In the latter approach, radial basis function (RBF) centers are not trained but selected based on output errors when connection weights are updated. The proposed incremental learning algorithm belongs to the latter approach where the errors not only for a training data but also for several retrieved memory items and pseudo training data are minimized to suppress the catastrophic interference. The novelty of the proposed algorithm is that connection weights to be learned are restricted based on RBF activation in order to improve the efficiency in learning time and memory size. We evaluate the performance of the proposed algorithm in one-dimensional and multi-dimensional function approximation problems in terms of approximation accuracy, learning time, and average memory size. The experimental results demonstrate that the proposed algorithm can learn fast and have good performance with less memory size compared to memory-based learning methods.
Forecasting on the total volumes of Malaysia's imports and exports by multiple linear regression
Beh, W. L.; Yong, M. K. Au
2017-04-01
This study is to give an insight on the doubt of the important of macroeconomic variables that affecting the total volumes of Malaysia's imports and exports by using multiple linear regression (MLR) analysis. The time frame for this study will be determined by using quarterly data of the total volumes of Malaysia's imports and exports covering the period between 2000-2015. The macroeconomic variables will be limited to eleven variables which are the exchange rate of US Dollar with Malaysia Ringgit (USD-MYR), exchange rate of China Yuan with Malaysia Ringgit (RMB-MYR), exchange rate of European Euro with Malaysia Ringgit (EUR-MYR), exchange rate of Singapore Dollar with Malaysia Ringgit (SGD-MYR), crude oil prices, gold prices, producer price index (PPI), interest rate, consumer price index (CPI), industrial production index (IPI) and gross domestic product (GDP). This study has applied the Johansen Co-integration test to investigate the relationship among the total volumes to Malaysia's imports and exports. The result shows that crude oil prices, RMB-MYR, EUR-MYR and IPI play important roles in the total volumes of Malaysia's imports. Meanwhile crude oil price, USD-MYR and GDP play important roles in the total volumes of Malaysia's exports.
C. Makendran
2015-01-01
Full Text Available Prediction models for low volume village roads in India are developed to evaluate the progression of different types of distress such as roughness, cracking, and potholes. Even though the Government of India is investing huge quantum of money on road construction every year, poor control over the quality of road construction and its subsequent maintenance is leading to the faster road deterioration. In this regard, it is essential that scientific maintenance procedures are to be evolved on the basis of performance of low volume flexible pavements. Considering the above, an attempt has been made in this research endeavor to develop prediction models to understand the progression of roughness, cracking, and potholes in flexible pavements exposed to least or nil routine maintenance. Distress data were collected from the low volume rural roads covering about 173 stretches spread across Tamil Nadu state in India. Based on the above collected data, distress prediction models have been developed using multiple linear regression analysis. Further, the models have been validated using independent field data. It can be concluded that the models developed in this study can serve as useful tools for the practicing engineers maintaining flexible pavements on low volume roads.
Shayan, Zahra; Mohammad Gholi Mezerji, Naser; Shayan, Leila; Naseri, Parisa
2015-11-03
Logistic regression (LR) and linear discriminant analysis (LDA) are two popular statistical models for prediction of group membership. Although they are very similar, the LDA makes more assumptions about the data. When categorical and continuous variables used simultaneously, the optimal choice between the two models is questionable. In most studies, classification error (CE) is used to discriminate between subjects in several groups, but this index is not suitable to predict the accuracy of the outcome. The present study compared LR and LDA models using classification indices. This cross-sectional study selected 243 cancer patients. Sample sets of different sizes (n = 50, 100, 150, 200, 220) were randomly selected and the CE, B, and Q classification indices were calculated by the LR and LDA models. CE revealed the a lack of superiority for one model over the other, but the results showed that LR performed better than LDA for the B and Q indices in all situations. No significant effect for sample size on CE was noted for selection of an optimal model. Assessment of the accuracy of prediction of real data indicated that the B and Q indices are appropriate for selection of an optimal model. The results of this study showed that LR performs better in some cases and LDA in others when based on CE. The CE index is not appropriate for classification, although the B and Q indices performed better and offered more efficient criteria for comparison and discrimination between groups.
Correction of TRMM 3B42V7 Based on Linear Regression Models over China
Shaohua Liu
2016-01-01
Full Text Available High temporal-spatial precipitation is necessary for hydrological simulation and water resource management, and remotely sensed precipitation products (RSPPs play a key role in supporting high temporal-spatial precipitation, especially in sparse gauge regions. TRMM 3B42V7 data (TRMM precipitation is an essential RSPP outperforming other RSPPs. Yet the utilization of TRMM precipitation is still limited by the inaccuracy and low spatial resolution at regional scale. In this paper, linear regression models (LRMs have been constructed to correct and downscale the TRMM precipitation based on the gauge precipitation at 2257 stations over China from 1998 to 2013. Then, the corrected TRMM precipitation was validated by gauge precipitation at 839 out of 2257 stations in 2014 at station and grid scales. The results show that both monthly and annual LRMs have obviously improved the accuracy of corrected TRMM precipitation with acceptable error, and monthly LRM performs slightly better than annual LRM in Mideastern China. Although the performance of corrected TRMM precipitation from the LRMs has been increased in Northwest China and Tibetan plateau, the error of corrected TRMM precipitation is still significant due to the large deviation between TRMM precipitation and low-density gauge precipitation.
Retrieving Soil Water Contents from Soil Temperature Measurements by Using Linear Regression
Qin XU; Binbin ZHOU
2003-01-01
A simple linear regression method is developed to retrieve daily averaged soil water content from diurnal variations of soil temperature measured at three or more depths. The method is applied to Oklahoma Mesonet soil temperature data collected at the depths of 5, 10, and 30 cm during 11-20 June 1995. The retrieved bulk soil water contents are compared with direct measurements for one pair of nearly collocated Mesonet and ARM stations and also compared with the retrievals of a previous method at 14 enhanced Oklahoma Mesonet stations. The results show that the current method gives more persistent retrievals than the previous method. The method is also applied to Oklahoma Mesonet soil temperature data collected at the depths of 5, 25, 60, and 75 cm from the Norman site during 20 30 July 1998 and 1-31 July 2000. The retrieved soil water contents are verified by collocated soil water content measurements with rms differences smaller than the soil water observation error (0.05 ma m-a). The retrievals are found to be moderately sensitive to random errors (±0.1 K) in the soil temperature observations and errors in the soil type specifications.
Ennourri, Karim; Hassen, Hanen Ben; Zouari, Nabil
2013-01-01
A multiple linear regression analyses were performed to screen for the significant factors simultaneously influencing production of deltaendotoxin, proteolytic activities and spore formation by a Bacillus thuringiensis kurstaki strain. Investigated factors included: pH of the medium, available oxygen and inoculum size. It was observed that oxygen availability was the most influencing setting on both deltaendotoxins production and spores counts, followed by initial pH of the medium and inoculum size. On other hand, pH of medium was found to be the most significant parameter for proteolytic activity, followed by inoculum size and dissolved oxygen. Our results suggested that the first order with two-factor interaction model seemed to be more satisfactory than simple first order model for optimization of delta-endotoxin overproduction. The coefficients of determination (R') indicated a better adequacy of the second order models to justify the obtained data. Based on results, relationships between delta-endotoxins production, proteolytic activities and spores counts were established. Our results can help to balance delta-endotoxins production and its stability.
Claudia Ledesma
2013-08-01
Full Text Available Water quality is traditionally monitored and evaluated based upon field data collected at limited locations. The storage capacity of reservoirs is reduced by deposits of suspended matter. The major factors affecting surface water quality are suspended sediments, chlorophyll and nutrients. Modeling and monitoring the biogeochemical status of reservoirs can be done through data from remote sensors. Since the improvement of sensors’ spatial and spectral resolutions, satellites have been used to monitor the interior areas of bodies of water. Water quality parameters, such as chlorophyll-a concentration and secchi disk depth, were found to have a high correlation with transformed spectral variables derived from bands 1, 2, 3 and 4 of LANDSAT 5TM satellite. We created models of estimated responses in regard to values of chlorophyll-a. To do so, we used population models of single and multiple linear regression, whose parameters are associated with the reflectance data of bands 2 and 4 of the sub-image of the satellite, as well as the data of chlorophyll-a obtained in 25 selected stations. According to the physico-chemical analyzes performed, the characteristics of the water in the reservoir of Rio Tercero, correspond to somewhat hard freshwater with calcium bicarbonate. The water was classified as usable as a source of plant treatment, excellent for irrigation because of its low salinity and low residual sodium carbonate content, but unsuitable for animal consumption because of its low salt content.
A Vehicle Traveling Time Prediction Method Based on Grey Theory and Linear Regression Analysis
TU Jun; LI Yan-ming; LIU Cheng-liang
2009-01-01
Vehicle traveling time prediction is an important part of the research of intelligent transportation system. By now, there have been various kinds of methods for vehicle traveling time prediction. But few consider both aspects of time and space. In this paper, a vehicle traveling time prediction method based on grey theory (GT) and linear regression analysis (LRA) is presented. In aspects of time, we use the history data sequence of bus speed on a certain road to predict the future bus speed on that road by GT. And in aspects of space, we calculate the traffic affecting factors between various roads by LRA. Using these factors we can predict the vehicle's speed at the lower road if the vehicle's speed at the current road is known. Finally we use time factor and space factor as the weighting factors of the two results predicted by GT and LRA respectively to find the fina0l result, thus calculating the vehicle's travehng time. The method also considers such factors as dwell time, thus making the prediction more accurate.
Inference of gene regulatory networks from genetic perturbations with linear regression model.
Zijian Dong
Full Text Available It is an effective strategy to use both genetic perturbation data and gene expression data to infer regulatory networks that aims to improve the detection accuracy of the regulatory relationships among genes. Based on both types of data, the genetic regulatory networks can be accurately modeled by Structural Equation Modeling (SEM. In this paper, a linear regression (LR model is formulated based on the SEM, and a novel iterative scheme using Bayesian inference is proposed to estimate the parameters of the LR model (LRBI. Comparative evaluations of LRBI with other two algorithms, the Adaptive Lasso (AL-Based and the Sparsity-aware Maximum Likelihood (SML, are also presented. Simulations show that LRBI has significantly better performance than AL-Based, and overperforms SML in terms of power of detection. Applying the LRBI algorithm to experimental data, we inferred the interactions in a network of 35 yeast genes. An open-source program of the LRBI algorithm is freely available upon request.
Zhang, Yiwei; Pan, Wei
2015-03-01
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.
Fushimi, Akihiro; Kawashima, Hiroto; Kajihara, Hideo
Understanding the contribution of each emission source of air pollutants to ambient concentrations is important to establish effective measures for risk reduction. We have developed a source apportionment method based on an atmospheric dispersion model and multiple linear regression analysis (MLR) in conjunction with ambient concentrations simultaneously measured at points in a grid network. We used a Gaussian plume dispersion model developed by the US Environmental Protection Agency called the Industrial Source Complex model (ISC) in the method. Our method does not require emission amounts or source profiles. The method was applied to the case of benzene in the vicinity of the Keiyo Central Coastal Industrial Complex (KCCIC), one of the biggest industrial complexes in Japan. Benzene concentrations were simultaneously measured from December 2001 to July 2002 at sites in a grid network established in the KCCIC and the surrounding residential area. The method was used to estimate benzene emissions from the factories in the KCCIC and from automobiles along a section of a road, and then the annual average contribution of the KCCIC to the ambient concentrations was estimated based on the estimated emissions. The estimated contributions of the KCCIC were 65% inside the complex, 49% at 0.5-km sites, 35% at 1.5-km sites, 20% at 3.3-km sites, and 9% at a 5.6-km site. The estimated concentrations agreed well with the measured values. The estimated emissions from the factories and the road were slightly larger than those reported in the first Pollutant Release and Transfer Register (PRTR). These results support the reliability of our method. This method can be applied to other chemicals or regions to achieve reasonable source apportionments.
Optimization of end-members used in multiple linear regression geochemical mixing models
Dunlea, Ann G.; Murray, Richard W.
2015-11-01
Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).
Cecchini Diego M
2009-11-01
Full Text Available Abstract Background The central nervous system is considered a sanctuary site for HIV-1 replication. Variables associated with HIV cerebrospinal fluid (CSF viral load in the context of opportunistic CNS infections are poorly understood. Our objective was to evaluate the relation between: (1 CSF HIV-1 viral load and CSF cytological and biochemical characteristics (leukocyte count, protein concentration, cryptococcal antigen titer; (2 CSF HIV-1 viral load and HIV-1 plasma viral load; and (3 CSF leukocyte count and the peripheral blood CD4+ T lymphocyte count. Methods Our approach was to use a prospective collection and analysis of pre-treatment, paired CSF and plasma samples from antiretroviral-naive HIV-positive patients with cryptococcal meningitis and assisted at the Francisco J Muñiz Hospital, Buenos Aires, Argentina (period: 2004 to 2006. We measured HIV CSF and plasma levels by polymerase chain reaction using the Cobas Amplicor HIV-1 Monitor Test version 1.5 (Roche. Data were processed with Statistix 7.0 software (linear regression analysis. Results Samples from 34 patients were analyzed. CSF leukocyte count showed statistically significant correlation with CSF HIV-1 viral load (r = 0.4, 95% CI = 0.13-0.63, p = 0.01. No correlation was found with the plasma viral load, CSF protein concentration and cryptococcal antigen titer. A positive correlation was found between peripheral blood CD4+ T lymphocyte count and the CSF leukocyte count (r = 0.44, 95% CI = 0.125-0.674, p = 0.0123. Conclusion Our study suggests that CSF leukocyte count influences CSF HIV-1 viral load in patients with meningitis caused by Cryptococcus neoformans.
Fisher, Charles K; Mehta, Pankaj
2014-01-01
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called "errors-in-variables". Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct "keystone species", Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut
A non-linear regression method for CT brain perfusion analysis
Bennink, E.; Oosterbroek, J.; Viergever, M. A.; Velthuis, B. K.; de Jong, H. W. A. M.
2015-03-01
CT perfusion (CTP) imaging allows for rapid diagnosis of ischemic stroke. Generation of perfusion maps from CTP data usually involves deconvolution algorithms providing estimates for the impulse response function in the tissue. We propose the use of a fast non-linear regression (NLR) method that we postulate has similar performance to the current academic state-of-art method (bSVD), but that has some important advantages, including the estimation of vascular permeability, improved robustness to tracer-delay, and very few tuning parameters, that are all important in stroke assessment. The aim of this study is to evaluate the fast NLR method against bSVD and a commercial clinical state-of-art method. The three methods were tested against a published digital perfusion phantom earlier used to illustrate the superiority of bSVD. In addition, the NLR and clinical methods were also tested against bSVD on 20 clinical scans. Pearson correlation coefficients were calculated for each of the tested methods. All three methods showed high correlation coefficients (>0.9) with the ground truth in the phantom. With respect to the clinical scans, the NLR perfusion maps showed higher correlation with bSVD than the perfusion maps from the clinical method. Furthermore, the perfusion maps showed that the fast NLR estimates are robust to tracer-delay. In conclusion, the proposed fast NLR method provides a simple and flexible way of estimating perfusion parameters from CT perfusion scans, with high correlation coefficients. This suggests that it could be a better alternative to the current clinical and academic state-of-art methods.
Faridah Hani Mohamed Salleh
2017-01-01
Full Text Available Gene regulatory network (GRN reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C as a direct interaction (A → C. Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.
Charles K Fisher
Full Text Available Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1 a correlation between the abundances of two species does not imply that those species are interacting, 2 the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3 errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called "errors-in-variables". Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS, that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct "keystone species", Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in
Fisher, Charles K.; Mehta, Pankaj
2014-01-01
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called “errors-in-variables”. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human
Evaluating Non-Linear Regression Models in Analysis of Persian Walnut Fruit Growth
I. Karamatlou
2016-02-01
Full Text Available Introduction: Persian walnut (Juglans regia L. is a large, wind-pollinated, monoecious, dichogamous, long lived, perennial tree cultivated for its high quality wood and nuts throughout the temperate regions of the world. Growth model methodology has been widely used in the modeling of plant growth. Mathematical models are important tools to study the plant growth and agricultural systems. These models can be applied for decision-making anddesigning management procedures in horticulture. Through growth analysis, planning for planting systems, fertilization, pruning operations, harvest time as well as obtaining economical yield can be more accessible.Non-linear models are more difficult to specify and estimate than linear models. This research was aimed to studynon-linear regression models based on data obtained from fruit weight, length and width. Selecting the best models which explain that fruit inherent growth pattern of Persian walnut was a further goal of this study. Materials and Methods: The experimental material comprising 14 Persian walnut genotypes propagated by seed collected from a walnut orchard in Golestan province, Minoudasht region, Iran, at latitude 37◦04’N; longitude 55◦32’E; altitude 1060 m, in a silt loam soil type. These genotypes were selected as a representative sampling of the many walnut genotypes available throughout the Northeastern Iran. The age range of walnut trees was 30 to 50 years. The annual mean temperature at the location is16.3◦C, with annual mean rainfall of 690 mm.The data used here is the average of walnut fresh fruit and measured withgram/millimeter/day in2011.According to the data distribution pattern, several equations have been proposed to describesigmoidal growth patterns. Here, we used double-sigmoid and logistic–monomolecular models to evaluate fruit growth based on fruit weight and4different regression models in cluding Richards, Gompertz, Logistic and Exponential growth for evaluation
Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression
Beckstead, Jason W.
2012-01-01
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…
Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D
2015-05-01
Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical
D'Souza, Sonia; Rasmussen, John; Schwirtz, Ansgar
2012-01-01
and valuable ergonomic tool. Objective: To investigate age and gender effects on the torque-producing ability in the knee and elbow in older adults. To create strength scaled equations based on age, gender, upper/lower limb lengths and masses using multiple linear regression. To reduce the number of dependent...
Optimization of operational flow rates of an oil pipeline on the basis of a linear regression model
Smati, A.; Djelloul, A. (Institut National des Hydrocarbures et de la Chimie, Boumerdes (Algeria))
Many uncontrollable factors cause random fluctuations in the properties of an oil pipeline. After a brief statistical analysis of the leading parameters used to identify the phenomenon, this article describes an optimization algorithm for minimizing energy consumption in pumping stations. The proposed algorithm is based on a linear regression model. Several very flexible approaches to multivariable identification are examined.
Accumulated feedlot manure negatively affects the environment. The objective was to test the validity of using EMI mapping methods combined with predictive-based sampling and ordinary linear regression for measuring spatially variable manure accumulation. A Dualem-1S EMI meter also recording GPS c...
Mohammad Ali HORMOZI
2015-06-01
Full Text Available We analyzed the effect of chemical fertilizer, seed, biocide, farm machinery and labor hours on production of paddy (paddy rice in the Khuzestan province in the South Western part of Iran. Here we test two methods (linear regression and neural network. We conclude that the results gotten by neural network with no hidden layer and linear regression are closed to each other. We insist that for a data set of this type the regression analysis yields more reliable results compared to a neural network. They suggest that machinery has a very clear positive effect on yield while fertilizer and labor doesn't affect on it. One can say that there is no necessity that increasing the amount of some "useful input" increase paddy production.
Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.
2010-01-01
The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study.
Linear regressive model structures for estimation and prediction of compartmental diffusive systems
Vries, D.; Keesman, K.J.; Zwart, H.
2006-01-01
Abstract In input-output relations of (compartmental) diffusive systems, physical parameters appear non-linearly, resulting in the use of (constrained) non-linear parameter estimation techniques with its short-comings regarding global optimality and computational effort. Given a LTI system in state
Linear regressive model structures for estimation and prediction of compartmental diffusive systems
Vries, D.; Keesman, K.J.; Zwart, H.J.
2006-01-01
In input-output relations of (compartmental) diffusive systems, physical parameters appear non-linearly, resulting in the use of (constrained) non-linear parameter estimation techniques with its short-comings regarding global optimality and computational effort. Given a LTI system in state space for
J. Alm
2007-11-01
Full Text Available Closed (non-steady state chambers are widely used for quantifying carbon dioxide (CO2 fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764 conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions
Linear regression models of floor surface parameters on friction between Neolite and quarry tiles.
Chang, Wen-Ruey; Matz, Simon; Grönqvist, Raoul; Hirvonen, Mikko
2010-01-01
For slips and falls, friction is widely used as an indicator of surface slipperiness. Surface parameters, including surface roughness and waviness, were shown to influence friction by correlating individual surface parameters with the measured friction. A collective input from multiple surface parameters as a predictor of friction, however, could provide a broader perspective on the contributions from all the surface parameters evaluated. The objective of this study was to develop regression models between the surface parameters and measured friction. The dynamic friction was measured using three different mixtures of glycerol and water as contaminants. Various surface roughness and waviness parameters were measured using three different cut-off lengths. The regression models indicate that the selected surface parameters can predict the measured friction coefficient reliably in most of the glycerol concentrations and cut-off lengths evaluated. The results of the regression models were, in general, consistent with those obtained from the correlation between individual surface parameters and the measured friction in eight out of nine conditions evaluated in this experiment. A hierarchical regression model was further developed to evaluate the cumulative contributions of the surface parameters in the final iteration by adding these parameters to the regression model one at a time from the easiest to measure to the most difficult to measure and evaluating their impacts on the adjusted R(2) values. For practical purposes, the surface parameter R(a) alone would account for the majority of the measured friction even if it did not reach a statistically significant level in some of the regression models.
Sidik, S. M.
1975-01-01
Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Modeling protein tandem mass spectrometry data with an extended linear regression strategy.
Liu, Han; Bonner, Anthony J; Emili, Andrew
2004-01-01
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithm. The intensity patterns presented in mass spectra are useful information for identification of peptides and proteins. However, widely used algorithms can not predicate the peak intensity patterns exactly. We have developed a systematic analytical approach based on a family of extended regression models, which permits routine, large scale protein expression profile modeling. By proving an important technical result that the regression coefficient vector is just the eigenvector corresponding to the least eigenvalue of a space transformed version of the original data, this extended regression problem can be reduced to a SVD decomposition problem, thus gain the robustness and efficiency. To evaluate the performance of our model, from 60,960 spectra, we chose 2,859 with high confidence, non redundant matches as training data, based on this specific problem, we derived some measurements of goodness of fit to show that our modeling method is reasonable. The issues of overfitting and underfitting are also discussed. This extended regression strategy therefore offers an effective and efficient framework for in-depth investigation of complex mammalian proteomes.
NetRaVE: constructing dependency networks using sparse linear regression
Phatak, A.; Kiiveri, H.; Clemmensen, Line Katrine Harder;
2010-01-01
NetRaVE is a small suite of R functions for generating dependency networks using sparse regression methods. Such networks provide an alternative to interpreting 'top n lists' of genes arising out of an analysis of microarray data, and they provide a means of organizing and visualizing the resulting...
Weighted linear regression using D2H and D2 as the independent variables
Hans T. Schreuder; Michael S. Williams
1998-01-01
Several error structures for weighted regression equations used for predicting volume were examined for 2 large data sets of felled and standing loblolly pine trees (Pinus taeda L.). The generally accepted model with variance of error proportional to the value of the covariate squared ( D2H = diameter squared times height or D...
NURWAHA Deogratias; WANG Xin-hou
2008-01-01
This paper presents a comparison study of two models for predicting the strength of rotor spun cotton yarns from fiber properties. The adaptive neuro-fuzzy system inference (ANFIS) and Multiple Linear Regression models are used to predict the rotor spun yarn strength. Fiber properties and yarn count are used as inputs to train the two models and the count-strength-product (CSP) was the target. The predictive performances of the two models are estimated and compared. We found that the ANFIS has a better predictive power in comparison with linear multipleregression model. The impact of each fiber property is also illustrated.
Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.
Erdi Tosun
2016-12-01
Full Text Available This study deals with usage of linear regression (LR and artificial neural network (ANN modeling to predict engine performance; torque and exhaust emissions; and carbon monoxide, oxides of nitrogen (CO, NOx of a naturally aspirated diesel engine fueled with standard diesel, peanut biodiesel (PME and biodiesel-alcohol (EME, MME, PME mixtures. Experimental work was conducted to obtain data to train and test the models. Backpropagation algorithm was used as a learning algorithm of ANN in the multilayered feedforward networks. Engine speed (rpm and fuel properties, cetane number (CN, lower heating value (LHV and density (ρ were used as input parameters in order to predict performance and emission parameters. It was shown that while linear regression modeling approach was deficient to predict desired parameters, more accurate results were obtained with the usage of ANN.
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications. PMID:27806075
Exchange Rates and Monetary Fundamentals: What Do We Learn from Linear and Nonlinear Regressions?
Guangfeng Zhang
2014-01-01
Full Text Available This paper revisits the association between exchange rates and monetary fundamentals with the focus on both linear and nonlinear approaches. With the monthly data of Euro/US dollar and Japanese yen/US dollar, our linear analysis demonstrates the monetary model is a long-run description of exchange rate movements, and our nonlinear modelling suggests the error correction model describes the short-run adjustment of deviations of exchange rates, and monetary fundamentals are capable of explaining exchange rate dynamics under an unrestricted framework.
Telleria Carlos M
2005-08-01
Full Text Available Abstract Background In pregnant rats, structural luteal regression takes place after parturition and is associated with cell death by apoptosis. We have recently shown that the hormonal environment is responsible for the fate of the corpora lutea (CL. Changing the levels of circulating hormones in post-partum rats, either by injecting androgen, progesterone, or by allowing dams to suckle, was coupled with a delay in the onset of apoptosis in the CL. The objectives of the present investigation were: i to examine the effect of exogenous estradiol on apoptosis of the rat CL during post-partum luteal regression; and ii to evaluate the post-partum luteal expression of the estrogen receptor (ER genes. Methods In a first experiment, rats after parturition were separated from their pups and injected daily with vehicle or estradiol benzoate for 4 days. On day 4 post-partum, animals were sacrificed, blood samples were taken to determine serum concentrations of hormones, and the ovaries were isolated to study apoptosis in situ. In a second experiment, non-lactating rats after parturition received vehicle, estradiol benzoate or estradiol benzoate plus bromoergocryptine for 4 days, and their CL were isolated and used to study apoptosis ex vivo. In a third experiment, we obtained CL from rats on day 15 of pregnancy and from non-lactating rats on day 4 post-partum, and studied the expression of the messenger RNAs (mRNAs encoding the ERalpha and ERbeta genes. Results Exogenous administration of estradiol benzoate induced an increase in the number of apoptotic cells within the CL on day 4 post-partum when compared with animals receiving vehicle alone. Animals treated with the estrogen had higher serum prolactin and progesterone concentrations, with no changes in serum androstenedione. Administration of bromoergocryptine blocked the increase in serum prolactin and progesterone concentrations, and DNA fragmentation induced by the estrogen treatment. ERalpha and
Ardeshir Khazaei
2017-09-01
Full Text Available The quantitative structure–activity relationship (QSAR analyses were carried out in a series of novel sulfonamide derivatives as the procollagen C-proteinase inhibitors for treatment of fibrotic conditions. Sphere exclusion method was used to classify data set into categories of train and test set at different radii ranging from 0.9 to 0.5. Multiple linear regression (MLR, principal component regression (PCR and partial least squares (PLS were used as the regression methods and stepwise, Genetic algorithm (GA, and simulated annealing (SA were used as the feature selection methods. Three of the statistically best significant models were chosen from the results for discussion. Model 1 was obtained by MLR–SA methodology at a radius of 1.6. This model with a coefficient of determination (r2 = 0.71 can well predict the real inhibitor activities. Cross-validated q2 of this model, 0.64, indicates good internal predictive power of the model. External validation of the model (pred_r2 = 0.85 showed that the model can well predict activity of novel PCP inhibitors. The model 2 which developed using PLS–SW explains 72% (r2 = 0.72 of the total variance in the training set as well as it has internal (q2 and external (pred_r2 predictive ability of ∼67% and ∼71% respectively. The last developed model by PCR–SA has a correlation coefficient (r2 of 0.68 which can explains 68% of the variance in the observed activity values. In this case internal and external validations are 0.61 and 0.75, respectively. Alignment Independent (AI and atomic valence connectivity index (chiv have the greatest effect on the biological activities. Developed models can be useful in designing and synthesis of effective and optimized novel PCP inhibitors which can be used for treatment of fibrotic conditions.
Creating a non-linear total sediment load formula using polynomial best subset regression model
Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali
2016-08-01
The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.
Leon, Andrew C; Heo, Moonseong
2009-01-15
Mixed-effects linear regression models have become more widely used for analysis of repeatedly measured outcomes in clinical trials over the past decade. There are formulae and tables for estimating sample sizes required to detect the main effects of treatment and the treatment by time interactions for those models. A formula is proposed to estimate the sample size required to detect an interaction between two binary variables in a factorial design with repeated measures of a continuous outcome. The formula is based, in part, on the fact that the variance of an interaction is fourfold that of the main effect. A simulation study examines the statistical power associated with the resulting sample sizes in a mixed-effects linear regression model with a random intercept. The simulation varies the magnitude (Δ) of the standardized main effects and interactions, the intraclass correlation coefficient (ρ ), and the number (k) of repeated measures within-subject. The results of the simulation study verify that the sample size required to detect a 2 × 2 interaction in a mixed-effects linear regression model is fourfold that to detect a main effect of the same magnitude.
FLORIN MARIUS PAVELESCU
2010-12-01
Full Text Available In econometric models, linear regressions with three explanatory variables are widely used. As examples can be cited: Cobb-Douglas production function with three inputs (capital, labour and disembodied technical change, Kmenta function used for approximation of CES production function parameters, error-correction models, etc. In case of multiple linear regressions, estimated parameters values and some statistical tests are influenced by collinearity between explanatory variables. In fact, collinearity acts as a noise which distorts the signal (proper parameter values. This influence is emphasized by the coefficients of alignment to collinearity hazard values. The respective coefficients have some similarities with the signal to noise ratio. Consequently, it may be used when the type of collinearity is determined. For these reasons, the main purpose of this paper is to identify all the modeling factors and quantify their impact on the above-mentioned indicator values in the context of linear regression with three explanatory variables.Classification-JEL:C13,C20,C51,C52Keywords:types of collinearity, coefficient of mediated correlation, rank of explanatory variable, order of attractor of collinearity, mediated collinearity, anticollinearity.
Ristya Widi Endah Yani
2008-12-01
Full Text Available Background: Bootstrap is a computer simulation-based method that provides estimation accuracy in estimating inferential statistical parameters. Purpose: This article describes a research using secondary data (n = 30 aimed to elucidate bootstrap method as the estimator of linear regression test based on the computer programs MINITAB 13, SPSS 13, and MacroMINITAB. Methods: Bootstrap regression methods determine ˆ β and Yˆ value from OLS (ordinary least square, ε i = Yi −Yˆi value, determine how many repetition for bootstrap (B, take n sample by replacement from ε i to ε (i , Yi = Yˆi + ε (i value, ˆ β value from sample bootstrap at i vector. If the amount of repetition less than, B a recalculation should be back to take n sample by using replacement from ε i . Otherwise, determine ˆ β from “bootstrap” methods as the average ˆ β value from the result of B times sample taken. Result: The result has similar result compared to linear regression equation with OLS method (α = 5%. The resulting regression equation for caries was = 1.90 + 2.02 (OHI-S, indicating that every one increase of OHI-S unit will result in caries increase of 2.02 units. Conclusion: This was conducted with B as many as 10,500 with 10 times iterations.
Regression Is a Univariate General Linear Model Subsuming Other Parametric Methods as Special Cases.
Vidal, Sherry
Although the concept of the general linear model (GLM) has existed since the 1960s, other univariate analyses such as the t-test and the analysis of variance models have remained popular. The GLM produces an equation that minimizes the mean differences of independent variables as they are related to a dependent variable. From a computer printout…
Bertrand-Krajewski, J L
2004-01-01
In order to replace traditional sampling and analysis techniques, turbidimeters can be used to estimate TSS concentration in sewers, by means of sensor and site specific empirical equations established by linear regression of on-site turbidity Tvalues with TSS concentrations C measured in corresponding samples. As the ordinary least-squares method is not able to account for measurement uncertainties in both T and C variables, an appropriate regression method is used to solve this difficulty and to evaluate correctly the uncertainty in TSS concentrations estimated from measured turbidity. The regression method is described, including detailed calculations of variances and covariance in the regression parameters. An example of application is given for a calibrated turbidimeter used in a combined sewer system, with data collected during three dry weather days. In order to show how the established regression could be used, an independent 24 hours long dry weather turbidity data series recorded at 2 min time interval is used, transformed into estimated TSS concentrations, and compared to TSS concentrations measured in samples. The comparison appears as satisfactory and suggests that turbidity measurements could replace traditional samples. Further developments, including wet weather periods and other types of sensors, are suggested.
Marcos Antonio Tavares Lira
2011-09-01
Full Text Available O trabalho apresentado trata da estimativa de recursos eólicos no litoral cearense, usando a teoria da regressão linear. Seu objetivo principal é estimar valores de velocidade média do vento em altitude a partir de dados observados a 10 metros. Duas regiões são investigadas: Paracuru e Camocim, ambas situadas no Estado do Ceará. Para cada região será adotado o mesmo procedimento. Inicialmente, caracteriza-se a região a partir do perfil diário e mensal de velocidade média do vento obtidos dos dados brutos da respectiva Plataforma de Coleta de Dados (PCD e Torre Anemométrica (TA da região. Os dados caracterizadores da direção predominante do vento também são utilizados. Utilizando-se a equação do perfil logarítmico do vento estimam-se os valores de velocidade média do vento às altitude de 20, 40 e 60 metros, a partir dos dados observados em superfície a 10 metros, calculando-se em seguida os coeficientes de correlação entre esses dados estimados em altitude e os observados na TA da região. Utiliza-se, em seguida, o modelo de regressão linear para se estimar novos valores em altitude. Inicialmente esse procedimento é feito para um período de calibração do modelo e em seguida para o período de validação do mesmo. Em ambos os períodos o modelo de regressão linear mostrou um bom desempenho, quer seja pelo alto índice de concordância entre as séries de dados estimados e observados e seus respectivos coeficientes de correlação, quer seja pelos baixos valores dos erros entre essas séries.The current study deals with the estimation of wind resources in the coast of Ceará using the linear regression theory. Its main objective is to estimate average wind speed values at different altitudes from observed data at surface. Two areas located in the State of Ceará are investigated: Paracuru and Camocim. For each region the same procedure will be adopted. Initially the region is characterized by the daily and monthly
Jaber, Abobaker M; Ismail, Mohd Tahir; Altaher, Alsaidi M
2014-01-01
This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD) with nonparametric methods of local linear quantile (LLQ). We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.
Abobaker M. Jaber
2014-01-01
Full Text Available This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD with nonparametric methods of local linear quantile (LLQ. We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.
Linear Quantile Mixed Models: The lqmm Package for Laplace Quantile Regression
Marco Geraci
2014-05-01
Full Text Available Inference in quantile analysis has received considerable attention in the recent years. Linear quantile mixed models (Geraci and Bottai 2014 represent a ?exible statistical tool to analyze data from sampling designs such as multilevel, spatial, panel or longitudinal, which induce some form of clustering. In this paper, I will show how to estimate conditional quantile functions with random e?ects using the R package lqmm. Modeling, estimation and inference are discussed in detail using a real data example. A thorough description of the optimization algorithms is also provided.
Abobaker M. Jaber
2014-01-01
Full Text Available Empirical mode decomposition (EMD is particularly useful in analyzing nonstationary and nonlinear time series. However, only partial data within boundaries are available because of the bounded support of the underlying time series. Consequently, the application of EMD to finite time series data results in large biases at the edges by increasing the bias and creating artificial wiggles. This study introduces a new two-stage method to automatically decrease the boundary effects present in EMD. At the first stage, local polynomial quantile regression (LLQ is applied to provide an efficient description of the corrupted and noisy data. The remaining series is assumed to be hidden in the residuals. Hence, EMD is applied to the residuals at the second stage. The final estimate is the summation of the fitting estimates from LLQ and EMD. Simulation was conducted to assess the practical performance of the proposed method. Results show that the proposed method is superior to classical EMD.
Gurudeo Anand Tularam
2012-01-01
Full Text Available House price prediction continues to be important for government agencies insurance companies and real estate industry. This study investigates the performance of house sales price models based on linear and non-linear approaches to study the effects of selected variables. Linear stepwise Multivariate Regression (MR and nonlinear models of Neural Network (NN and Adaptive Neuro-Fuzzy (ANFIS are developed and compared. The GIS methods are used to integrate the data for the study area (Bathurst, Australia. While it was expected that the nonlinear methods would be much better the analysis shows NN and ANFIS are only slightly better than MR suggesting questions about high R2 often found in the literature. While structural data and macro-finance variables may contribute to higher R2 performance comparison was the goal of this study and besides the Australian data lacked structural elements. The results show that MR model could be improved. Also, the land value and location explained at best about 45% of the sale price variation. The analysis of price forecasts (within the 10% range of the actual prediction on average revealed that the non-linear models performed slightly better (29% than the linear (26%. The inclusion of social data improves the MR prediction in most of the suburbs. The suburbs analysis shows the importance of socially based locations and also variance due to types of housing dominant. In general terms of R2, the NN model (0.45 performed only slightly better than ANFIS 0.39 and better than MR (0.37; but the linear MRsoc performed better (0.42. In suburb level, the NN model (7/15 performed better than ANFIS (3/15 but the linear MR (5/15 was better than ANFIS. The improved linear MR (6/15 performed nearly as well as the non-linear NN. Linear methods appear to just as precise as the the more time consuming non linear methods in most cases for accounting for the differences and variation. However, when a much more in depth analysis is
Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara
2017-01-01
In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.
Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael
2011-01-01
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael
2011-01-01
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Groping Toward Linear Regression Analysis: Newton's Analysis of Hipparchus' Equinox Observations
Belenkiy, Ari
2008-01-01
In 1700, Newton, in designing a new universal calendar contained in the manuscripts known as Yahuda MS 24 from Jewish National and University Library at Jerusalem and analyzed in our recent article in Notes & Records Royal Society (59 (3), Sept 2005, pp. 223-54), attempted to compute the length of the tropical year using the ancient equinox observations reported by a famous Greek astronomer Hipparchus of Rhodes, ten in number. Though Newton had a very thin sample of data, he obtained a tropical year only a few seconds longer than the correct length. The reason lies in Newton's application of a technique similar to modern regression analysis. Actually he wrote down the first of the two so-called "normal equations" known from the Ordinary Least Squares method. Newton also had a vague understanding of qualitative variables. This paper concludes by discussing open historico-astronomical problems related to the inclination of the Earth's axis of rotation. In particular, ignorance about the long-range variation...
Lo, Ching F.
1999-01-01
The integration of Radial Basis Function Networks and Back Propagation Neural Networks with the Multiple Linear Regression has been accomplished to map nonlinear response surfaces over a wide range of independent variables in the process of the Modem Design of Experiments. The integrated method is capable to estimate the precision intervals including confidence and predicted intervals. The power of the innovative method has been demonstrated by applying to a set of wind tunnel test data in construction of response surface and estimation of precision interval.
Meric de Bellefon, G.; van Duysen, J. C.; Sridharan, K.
2017-08-01
The stacking fault energy (SFE) plays an important role in deformation behavior and radiation damage of FCC metals and alloys such as austenitic stainless steels. In the present communication, existing expressions to calculate SFE in those steels from chemical composition are reviewed and an improved multivariate linear regression with random intercepts is used to analyze a new database of 144 SFE measurements collected from 30 literature references. It is shown that the use of random intercepts can account for experimental biases in these literature references. A new expression to predict SFE from austenitic stainless steel compositions is proposed.
无
2008-01-01
A class of estimators of the mean survival time with interval censored data are studied by unbiased transformation method.The estimators are constructed based on the observations to ensure unbiasedness in the sense that the estimators in a certain class have the same expectation as the mean survival time.The estimators have good properties such as strong consistency (with the rate of O(n-1/2 (log log n)1/2)) and asymptotic normality.The application to linear regression is considered and the simulation reports are given.
Nadson S. Timbó
2016-01-01
Full Text Available The stock exchange is an important apparatus for economic growth as it is an opportunity for investors to acquire equity and, at the same time, provide resources for organizations expansions. On the other hand, a major concern regarding entering this market is related with the dynamic in which deals are made since the pricing of shares happens in a smart and oscillatory way. Due to this context, several researchers are studying techniques in order to predict the stock exchange, maximize profits and reduce risks. Thus, this study proposes a linear regression model for stock exchange prediction which, combined with financial indicators, provides support decision-making by investors.
Liu, Yan; Salvendy, Gavriel
2009-05-01
This paper aims to demonstrate the effects of measurement errors on psychometric measurements in ergonomics studies. A variety of sources can cause random measurement errors in ergonomics studies and these errors can distort virtually every statistic computed and lead investigators to erroneous conclusions. The effects of measurement errors on five most widely used statistical analysis tools have been discussed and illustrated: correlation; ANOVA; linear regression; factor analysis; linear discriminant analysis. It has been shown that measurement errors can greatly attenuate correlations between variables, reduce statistical power of ANOVA, distort (overestimate, underestimate or even change the sign of) regression coefficients, underrate the explanation contributions of the most important factors in factor analysis and depreciate the significance of discriminant function and discrimination abilities of individual variables in discrimination analysis. The discussions will be restricted to subjective scales and survey methods and their reliability estimates. Other methods applied in ergonomics research, such as physical and electrophysiological measurements and chemical and biomedical analysis methods, also have issues of measurement errors, but they are beyond the scope of this paper. As there has been increasing interest in the development and testing of theories in ergonomics research, it has become very important for ergonomics researchers to understand the effects of measurement errors on their experiment results, which the authors believe is very critical to research progress in theory development and cumulative knowledge in the ergonomics field.
Elliott, J; Krone-Martins, A; Cameron, E; Ishida, E E O; Hilbe, J
2014-01-01
Machine learning techniques offer a precious tool box for use within astronomy to solve problems involving so-called big data. They provide a means to make accurate predictions about a particular system without prior knowledge of the underlying physical processes of the data. In this article, and the companion papers of this series, we present the set of Generalized Linear Models (GLMs) as a fast alternative method for tackling general astronomical problems, including the ones related to the machine learning paradigm. To demonstrate the applicability of GLMs to inherently positive and continuous physical observables, we explore their use in estimating the photometric redshifts of galaxies from their multi-wavelength photometry. Using the gamma family with a log link function we predict redshifts from the photo-z Accuracy Testing simulated catalogue and a subset of the Sloan Digital Sky Survey from Data Release 10. We obtain fits that result in catastrophic outlier rates as low as ~1% for simulated and ~2% for...
Outlier Detection Method in Linear Regression Based on Sum of Arithmetic Progression
Adikaram, K. K. L. B.; Hussein, M. A.; Effenberger, M.; Becker, T.
2014-01-01
We introduce a new nonparametric outlier detection method for linear series, which requires no missing or removed data imputation. For an arithmetic progression (a series without outliers) with n elements, the ratio (R) of the sum of the minimum and the maximum elements and the sum of all elements is always 2/n : (0,1]. R ≠ 2/n always implies the existence of outliers. Usually, R outlier, and R > 2/n implies that the maximum is an outlier. Based upon this, we derived a new method for identifying significant and nonsignificant outliers, separately. Two different techniques were used to manage missing data and removed outliers: (1) recalculate the terms after (or before) the removed or missing element while maintaining the initial angle in relation to a certain point or (2) transform data into a constant value, which is not affected by missing or removed elements. With a reference element, which was not an outlier, the method detected all outliers from data sets with 6 to 1000 elements containing 50% outliers which deviated by a factor of ±1.0e − 2 to ±1.0e + 2 from the correct value. PMID:25121139
A componential model of human interaction with graphs: 1. Linear regression modeling
Gillan, Douglas J.; Lewis, Robert
1994-01-01
Task analyses served as the basis for developing the Mixed Arithmetic-Perceptual (MA-P) model, which proposes (1) that people interacting with common graphs to answer common questions apply a set of component processes-searching for indicators, encoding the value of indicators, performing arithmetic operations on the values, making spatial comparisons among indicators, and repsonding; and (2) that the type of graph and user's task determine the combination and order of the components applied (i.e., the processing steps). Two experiments investigated the prediction that response time will be linearly related to the number of processing steps according to the MA-P model. Subjects used line graphs, scatter plots, and stacked bar graphs to answer comparison questions and questions requiring arithmetic calculations. A one-parameter version of the model (with equal weights for all components) and a two-parameter version (with different weights for arithmetic and nonarithmetic processes) accounted for 76%-85% of individual subjects' variance in response time and 61%-68% of the variance taken across all subjects. The discussion addresses possible modifications in the MA-P model, alternative models, and design implications from the MA-P model.
Gupta, Kinjal Dhar; Vilalta, Ricardo; Asadourian, Vicken; Macri, Lucas
2014-05-01
We describe an approach to automate the classification of Cepheid variable stars into two subtypes according to their pulsation mode. Automating such classification is relevant to obtain a precise determination of distances to nearby galaxies, which in addition helps reduce the uncertainty in the current expansion of the universe. One main difficulty lies in the compatibility of models trained using different galaxy datasets; a model trained using a training dataset may be ineffectual on a testing set. A solution to such difficulty is to adapt predictive models across domains; this is necessary when the training and testing sets do not follow the same distribution. The gist of our methodology is to train a predictive model on a nearby galaxy (e.g., Large Magellanic Cloud), followed by a model-adaptation step to make the model operable on other nearby galaxies. We follow a parametric approach to density estimation by modeling the training data (anchor galaxy) using a mixture of linear models. We then use maximum likelihood to compute the right amount of variable displacement, until the testing data closely overlaps the training data. At that point, the model can be directly used in the testing data (target galaxy).
Barrett, C. A.
1985-01-01
Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.
Hassan, A K
2015-01-01
In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature-conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility.
Elliott, J.; de Souza, R. S.; Krone-Martins, A.; Cameron, E.; Ishida, E. E. O.; Hilbe, J.
2015-04-01
Machine learning techniques offer a precious tool box for use within astronomy to solve problems involving so-called big data. They provide a means to make accurate predictions about a particular system without prior knowledge of the underlying physical processes of the data. In this article, and the companion papers of this series, we present the set of Generalized Linear Models (GLMs) as a fast alternative method for tackling general astronomical problems, including the ones related to the machine learning paradigm. To demonstrate the applicability of GLMs to inherently positive and continuous physical observables, we explore their use in estimating the photometric redshifts of galaxies from their multi-wavelength photometry. Using the gamma family with a log link function we predict redshifts from the PHoto-z Accuracy Testing simulated catalogue and a subset of the Sloan Digital Sky Survey from Data Release 10. We obtain fits that result in catastrophic outlier rates as low as ∼1% for simulated and ∼2% for real data. Moreover, we can easily obtain such levels of precision within a matter of seconds on a normal desktop computer and with training sets that contain merely thousands of galaxies. Our software is made publicly available as a user-friendly package developed in Python, R and via an interactive web application. This software allows users to apply a set of GLMs to their own photometric catalogues and generates publication quality plots with minimum effort. By facilitating their ease of use to the astronomical community, this paper series aims to make GLMs widely known and to encourage their implementation in future large-scale projects, such as the Large Synoptic Survey Telescope.
Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf
2015-10-01
The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200-400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset.
Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.
1998-01-01
The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.
Molnos, Sophie; Baumbach, Clemens; Wahl, Simone; Müller-Nurasyid, Martina; Strauch, Konstantin; Wang-Sattler, Rui; Waldenberger, Melanie; Meitinger, Thomas; Adamski, Jerzy; Kastenmüller, Gabi; Suhre, Karsten; Peters, Annette; Grallert, Harald; Theis, Fabian J; Gieger, Christian
2017-09-29
Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different "omics" layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different "omics" layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/ .
Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David
2017-10-01
Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.
Avdakovic, Samir; Nuhanovic, Amir
2013-01-01
In this paper, the relationship between the Gross Domestic Product (GDP), air temperature variations and power consumption is evaluated using the linear regression and Wavelet Coherence (WTC) approach on a 1971-2011 time series for the United Kingdom (UK). The results based on the linear regression approach indicate that some 66% variability of the UK electricity demand can be explained by the quarterly GDP variations, while only 11% of the quarterly changes of the UK electricity demand are caused by seasonal air temperature variations. WTC however, can detect the period of time when GDP and air temperature significantly correlate with electricity demand and the results of the wavelet correlation at different time scales indicate that a significant correlation is to be found on a long-term basis for GDP and on an annual basis for seasonal air-temperature variations. This approach provides an insight into the properties of the impact of the main factors on power consumption on the basis of which the power syst...
Regression of Adjuvant-Induced Arthritis in Rats Following Bone Marrow Transplantation
van Bekkum, Dirk W.; Bohre, Els P. M.; Houben, Paul F. J.; Knaan-Shanzer, Shoshan
1989-12-01
Total body irradiation followed by bone marrow transplantation was found to be an effective treatment for adjuvant arthritis induced in rats. This treatment is most effective when applied shortly after the clinical manifestation of arthritis--i.e., 4-7 weeks after administration of Mycobacterium tuberculosis. Transplantation of bone marrow at a later stage results in a limited recovery, in that the inflammatory reaction regresses but the newly formed excessive bone is not eliminated. Local irradiation of the affected joints had no effect on the disease. It could also be excluded that the recovery of arthritis following marrow transplantation is due to lack of available antigen. Transplantation of syngeneic bone marrow is as effective as that of allogeneic bone marrow from a rat strain that is not susceptible to induction of adjuvant arthritis. The beneficial effect of this treatment cannot be ascribed to the immunosuppressive effect of total body irradiation, since treatment with the highly immunosuppressive drug Cyclosporin A resulted in a regression of the joint swelling but relapse occurred shortly after discontinuation of the treatment.
Wu, W.; Chen, G. Y.; Kang, R.; Xia, J. C.; Huang, Y. P.; Chen, K. J.
2017-07-01
During slaughtering and further processing, chicken carcasses are inevitably contaminated by microbial pathogen contaminants. Due to food safety concerns, many countries implement a zero-tolerance policy that forbids the placement of visibly contaminated carcasses in ice-water chiller tanks during processing. Manual detection of contaminants is labor consuming and imprecise. Here, a successive projections algorithm (SPA)-multivariable linear regression (MLR) classifier based on an optimal performance threshold was developed for automatic detection of contaminants on chicken carcasses. Hyperspectral images were obtained using a hyperspectral imaging system. A regression model of the classifier was established by MLR based on twelve characteristic wavelengths (505, 537, 561, 562, 564, 575, 604, 627, 656, 665, 670, and 689 nm) selected by SPA , and the optimal threshold T = 1 was obtained from the receiver operating characteristic (ROC) analysis. The SPA-MLR classifier provided the best detection results when compared with the SPA-partial least squares (PLS) regression classifier and the SPA-least squares supported vector machine (LS-SVM) classifier. The true positive rate (TPR) of 100% and the false positive rate (FPR) of 0.392% indicate that the SPA-MLR classifier can utilize spatial and spectral information to effectively detect contaminants on chicken carcasses.
Regression of warfarin-induced medial elastocalcinosis by high intake of vitamin K in rats.
Schurgers, Leon J; Spronk, Henri M H; Soute, Berry A M; Schiffers, Paul M; DeMey, Jo G R; Vermeer, Cees
2007-04-01
Arterial calcification (AC) is generally regarded as an independent risk factor for cardiovascular morbidity and mortality. Matrix Gla protein (MGP) is a potent inhibitor of AC, and its activity depends on vitamin K (VK). In rats, inactivation of MGP by treatment with the vitamin K antagonist warfarin leads to rapid calcification of the arteries. Here, we investigated whether preformed AC can be regressed by a VK-rich diet. Rats received a calcification-inducing diet containing both VK and warfarin (W&K). During a second 6-week period, animals were randomly assigned to receive either W&K (3.0 mg/g and 1.5 mg/g, subsequently), a diet containing a normal (5 microg/g) or high (100 microg/g) amount of VK (either K1 or K2). Increased aortic calcium concentration was observed in the group that continued to receive W&K and also in the group changed to the normal dose of VK and AC progressed. Both the VK-rich diets decreased the arterial calcium content by some 50%. In addition, arterial distensibility was restored by the VK-rich diet. Using MGP antibodies, local VK deficiency was demonstrated at sites of calcification. This is the first study in rats demonstrating that AC and the resulting decreased arterial distensibility are reversible by high-VK intake.
Jinhong YOU; CHEN Min; Gemai CHEN
2004-01-01
Consider a semiparametric regression model with linear time series errors Yκ = x′κβ + g(tκ) + εκ,1 ≤ k ≤ n, where Yκ's are responses, xκ= (xκ1,xκ2,…,xκp)′and tκ ∈ T( ) R are fixed design points, β = (β1,β2,…… ,βp)′ is an unknown parameter vector, g(.) is an unknown bounded real-valued function defined on a compact subset T of the real line R, and εκ is a linear process given by εκ = ∑∞j=0 ψjeκ-j, ψ0 = 1, where ∑∞j=0 |ψj| ＜∞, and ej, j = 0,±1,±2,…, are I.I.d, random variables. In this paper we establish the asymptotic normality of the least squares estimator ofβ, a smooth estimator of g(·), and estimators of the autocovariance and autocorrelation functions of the linear process εκ.
Ulf-Peter Hansen
2007-10-01
Full Text Available The demonstrated modified spectrophotometric method makes use of the 2,2-diphenyl-1-picrylhydrazyl (DPPH radical and its specific absorbance properties. Theabsorbance decreases when the radical is reduced by antioxidants. In contrast to otherinvestigations, the absorbance was measured at a wavelength of 550 nm. This wavelengthenabled the measurements of the stable free DPPH radical without interference frommicroalgal pigments. This approach was applied to methanolic microalgae extracts for twodifferent DPPH concentrations. The changes in absorbance measured vs. the concentrationof the methanolic extract resulted in curves with a linear decrease ending in a saturationregion. Linear regression analysis of the linear part of DPPH reduction versus extractconcentration enabled the determination of the microalgaeÃ¢Â€Â™s methanolic extractsantioxidative potentials which was independent to the employed DPPH concentrations. Theresulting slopes showed significant differences (6 - 34 ÃŽÂ¼mol DPPH g-1 extractconcentration between the single different species of microalgae (Anabaena sp.,Isochrysis galbana, Phaeodactylum tricornutum, Porphyridium purpureum, Synechocystissp. PCC6803 in their ability to reduce the DPPH radical. The independency of the signal on the DPPH concentration is a valuable advantage over the determination of the EC50 value.
曹慧; 李祖光; 陈小珍
2011-01-01
The volatile compounds emitted from Mosla chinensis Maxim were analyzed by headspace solid-phase micro- extraction （HS-SPME） and headspace liquid-phase microextraction （HS-LPME） combined with gas chromatography-mass spectrometry （GC-MS）. The main volatiles from Mosla chinensis Maxim were studied in this paper. It can be seen that 61 compounds were separated and identified. Forty-nine volatile compounds were identified by SPME method, mainly including myrcene, a-terpinene, p-cymene, （E）-ocimene, thymol, thymol acetate and （E）-fl-farnesene. Forty-five major volatile compounds were identified by LPME method, including a-thujene, a-pinene, camphene, butanoic acid, 2-methylpropyl ester, myrcene, butanoic acid, butyl ester, a-terpinene, p-cymene, （E）-ocimene, butane, 1,1-dibutoxy-, thymol, thymol acetate and （E）-fl-farnesene. After analyzing the volatile compounds, multiple linear regression （MLR） method was used for building the regression model. Then the quantitative structure-retention relationship （QSRR） model was validated by predictive-ability test. The prediction results were in good agreement with the experimental values. The results demonstrated that headspace SPME-GC-MS and LPME-GC-MS are the simple, rapid and easy sample enrichment technique suitable for analysis of volatile compounds. This investigation provided an effective method for predicting the retention indices of new compounds even in the absence of the standard candidates.
Singh, S.; Jaishi, H. P.; Tiwari, R. P.; Tiwari, R. C.
2017-07-01
This paper reports the analysis of soil radon data recorded in the seismic zone-V, located in the northeastern part of India (latitude 23.73N, longitude 92.73E). Continuous measurements of soil-gas emission along Chite fault in Mizoram (India) were carried out with the replacement of solid-state nuclear track detectors at weekly interval. The present study was done for the period from March 2013 to May 2015 using LR-115 Type II detectors, manufactured by Kodak Pathe, France. In order to reduce the influence of meteorological parameters, statistical analysis tools such as multiple linear regression and artificial neural network have been used. Decrease in radon concentration was recorded prior to some earthquakes that occurred during the observation period. Some false anomalies were also recorded which may be attributed to the ongoing crustal deformation which was not major enough to produce an earthquake.
Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.
2016-01-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.
P Shivakumara; G Hemantha Kumar; D S Guru; P Nagabhushan
2005-02-01
When a document is scanned either mechanically or manually for digitization, it often suffers from some degree of skew or tilt. Skew-angle detection plays an important role in the ﬁeld of document analysis systems and OCR in achieving the expected accuracy. In this paper, we consider skew estimation of Roman script. The method uses the boundary growing approach to extract the lowermost and uppermost coordinates of pixels of characters of text lines present in the document, which can be subjected to linear regression analysis (LRA) to determine the skew angle of a skewed document. Further, the proposed technique works ﬁne for scaled text binary documents also. The technique works based on the assumption that the space between the text lines is greater than the space between the words and characters. Finally, in order to evaluate the performance of the proposed methodology we compare the experimental results with those of well-known existing methods.
Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena
2013-01-01
The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models.
Upender Manne
2007-01-01
Full Text Available Background: Although a majority of studies in cancer biomarker discovery claim to use proportional hazards regression (PHREG to the study the ability of a biomarker to predict survival, few studies use the predicted probabilities obtained from the model to test the quality of the model. In this paper, we compared the quality of predictions by a PHREG model to that of a linear discriminant analysis (LDA in both training and test set settings. Methods: The PHREG and LDA models were built on a 491 colorectal cancer (CRC patient dataset comprised of demographic and clinicopathologic variables, and phenotypic expression of p53 and Bcl-2. Two variable selection methods, stepwise discriminant analysis and the backward selection, were used to identify the final models. The endpoint of prediction in these models was five-year post-surgery survival. We also used linear regression model to examine the effect of bin size in the training set on the accuracy of prediction in the test set.Results: The two variable selection techniques resulted in different models when stage was included in the list of variables available for selection. However, the proportion of survivors and non-survivors correctly identified was identical in both of these models. When stage was excluded from the variable list, the error rate for the LDA model was 42% as compared to an error rate of 34% for the PHREG model.Conclusions: This study suggests that a PHREG model can perform as well or better than a traditional classifier such as LDA to classify patients into prognostic classes. Also, this study suggests that in the absence of the tumor stage as a variable, Bcl-2 expression is a strong prognostic molecular marker of CRC.
Buermeyer, Jonas; Gundlach, Matthias; Grund, Anna-Lisa; Grimm, Volker; Spizyn, Alexander; Breckow, Joachim
2016-09-01
This work is part of the analysis of the effects of constructional energy-saving measures to radon concentration levels in dwellings performed on behalf of the German Federal Office for Radiation Protection. In parallel to radon measurements for five buildings, both meteorological data outside the buildings and the indoor climate factors were recorded. In order to access effects of inhabited buildings, the amount of carbon dioxide (CO2) was measured. For a statistical linear regression model, the data of one object was chosen as an example. Three dummy variables were extracted from the process of the CO2 concentration to provide information on the usage and ventilation of the room. The analysis revealed a highly autoregressive model for the radon concentration with additional influence by the natural environmental factors. The autoregression implies a strong dependency on a radon source since it reflects a backward dependency in time. At this point of the investigation, it cannot be determined whether the influence by outside factors affects the source of radon or the habitant’s ventilation behavior resulting in variation of the occurring concentration levels. In any case, the regression analysis might provide further information that would help to distinguish these effects. In the next step, the influence factors will be weighted according to their impact on the concentration levels. This might lead to a model that enables the prediction of radon concentration levels based on the measurement of CO2 in combination with environmental parameters, as well as the development of advices for ventilation.
Silva, Ana Elisa Pereira; Freitas, Corina da Costa; Dutra, Luciano Vieira; Molento, Marcelo Beltrão
2016-02-15
Fasciola hepatica is the causative agent of fasciolosis, a disease that triggers a chronic inflammatory process in the liver affecting mainly ruminants and other animals including humans. In Brazil, F. hepatica occurs in larger numbers in the most Southern state of Rio Grande do Sul. The objective of this study was to estimate areas at risk using an eight-year (2002-2010) time series of climatic and environmental variables that best relate to the disease using a linear regression method to municipalities in the state of Rio Grande do Sul. The positivity index of the disease, which is the rate of infected animal per slaughtered animal, was divided into three risk classes: low, medium and high. The accuracy of the known sample classification on the confusion matrix for the low, medium and high rates produced by the estimated model presented values between 39 and 88% depending of the year. The regression analysis showed the importance of the time-based data for the construction of the model, considering the two variables of the previous year of the event (positivity index and maximum temperature). The generated data is important for epidemiological and parasite control studies mainly because F. hepatica is an infection that can last from months to years.
Panatchai Chetchotisak
2015-09-01
Full Text Available Because of nonlinear strain distributions caused either by abrupt changes in geometry or in loading in deep beam, the approach for conventional beams is not applicable. Consequently, strut-and-tie model (STM has been applied as the most rational and simple method for strength prediction and design of reinforced concrete deep beams. A deep beam is idealized by the STM as a truss-like structure consisting of diagonal concrete struts and tension ties. There have been numerous works proposing the STMs for deep beams. However, uncertainty and complexity in shear strength computations of deep beams can be found in some STMs. Therefore, improvement of methods for predicting the shear strengths of deep beams are still needed. By means of a large experimental database of 406 deep beam test results covering a wide range of influencing parameters, several shapes and geometry of STM and six state-of-the-art formulation of the efficiency factors found in the design codes and literature, the new STMs for predicting the shear strength of simply supported reinforced concrete deep beams using multiple linear regression analysis is proposed in this paper. Furthermore, the regression diagnostics and the validation process are included in this study. Finally, two numerical examples are also provided for illustration.
Ebrahimi, Hadi; Rajaee, Taher
2017-01-01
Simulation of groundwater level (GWL) fluctuations is an important task in management of groundwater resources. In this study, the effect of wavelet analysis on the training of the artificial neural network (ANN), multi linear regression (MLR) and support vector regression (SVR) approaches was investigated, and the ANN, MLR and SVR along with the wavelet-ANN (WNN), wavelet-MLR (WLR) and wavelet-SVR (WSVR) models were compared in simulating one-month-ahead of GWL. The only variable used to develop the models was the monthly GWL data recorded over a period of 11 years from two wells in the Qom plain, Iran. The results showed that decomposing GWL time series into several sub-time series, extremely improved the training of the models. For both wells 1 and 2, the Meyer and Db5 wavelets produced better results compared to the other wavelets; which indicated wavelet types had similar behavior in similar case studies. The optimal number of delays was 6 months, which seems to be due to natural phenomena. The best WNN model, using Meyer mother wavelet with two decomposition levels, simulated one-month-ahead with RMSE values being equal to 0.069 m and 0.154 m for wells 1 and 2, respectively. The RMSE values for the WLR model were 0.058 m and 0.111 m, and for WSVR model were 0.136 m and 0.060 m for wells 1 and 2, respectively.
Jäntschi, Lorentz; Bálint, Donatella; Bolboacă, Sorana D
2016-01-01
Multiple linear regression analysis is widely used to link an outcome with predictors for better understanding of the behaviour of the outcome of interest. Usually, under the assumption that the errors follow a normal distribution, the coefficients of the model are estimated by minimizing the sum of squared deviations. A new approach based on maximum likelihood estimation is proposed for finding the coefficients on linear models with two predictors without any constrictive assumptions on the distribution of the errors. The algorithm was developed, implemented, and tested as proof-of-concept using fourteen sets of compounds by investigating the link between activity/property (as outcome) and structural feature information incorporated by molecular descriptors (as predictors). The results on real data demonstrated that in all investigated cases the power of the error is significantly different by the convenient value of two when the Gauss-Laplace distribution was used to relax the constrictive assumption of the normal distribution of the error. Therefore, the Gauss-Laplace distribution of the error could not be rejected while the hypothesis that the power of the error from Gauss-Laplace distribution is normal distributed also failed to be rejected.
vahid Rezaverdinejad
2017-01-01
important models to estimate ETc in greenhouse. The inputs of these models are net radiation, temperature, day after planting and air vapour pressure deficit (or relative humidity. Materials and Methods: In this study, daily ETc of reference crop, greenhouse tomato and cucumber crops were measured using lysimeter method in Urmia region. Several linear, nonlinear regressions and artificial neural networks were considered for ETc modelling in greenhouse. For this purpose, the effective meteorological parameters on ETc process includes: air temperature (T, air humidity (RH, air pressure (P, air vapour pressure deficit (VPD, day after planting (N and greenhouse net radiation (SR were considered and measured. According to the goodness of fit, different models of artificial neural networks and regression were compared and evaluated. Furthermore, based on partial derivatives of regression models, sensitivity analysis was conducted. The accuracy and performance of the employed models was judged by ten statistical indices namely root mean square error (RMSE, normalized root mean square error (NRMSE and coefficient of determination (R2. Results and Discussion: Based on the results, the most accurate regression model to reference ETc prediction was obtained three variables exponential function of VPD, RH and SR with RMSE=0.378 mm day-1. The RMSE of optimal artificial neural network to reference ET prediction for train and test data sets were obtained 0.089 and 0.365 mm day-1, respectively. The performance of logarithmic and exponential functions to prediction of cucumber ETc were proper, with high dependent variables especially, and the most accurate regression model to cucumber ET prediction was obtained for exponential function of five variables: VPD, N, T, RH and SR with RMSE=0.353 mm day-1. In addition, for tomato ET prediction, the most accurate regression model was obtained for exponential function of four variables: VPD, N, RH and SR with RMSE= 0.329 mm day-1. The best
Bie, Peter
2011-01-01
Sodium intake and renin system avtivity: Effects of metroprolol on the log-linear relationship in conscious rats.......Sodium intake and renin system avtivity: Effects of metroprolol on the log-linear relationship in conscious rats....
Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
Ahangar, Reza Gharoie; Pournaghshband, Hassan
2010-01-01
In this paper, researchers estimated the stock price of activated companies in Tehran (Iran) stock exchange. It is used Linear Regression and Artificial Neural Network methods and compared these two methods. In Artificial Neural Network, of General Regression Neural Network method (GRNN) for architecture is used. In this paper, first, researchers considered 10 macro economic variables and 30 financial variables and then they obtained seven final variables including 3 macro economic variables and 4 financial variables to estimate the stock price using Independent components Analysis (ICA). So, we presented an equation for two methods and compared their results which shown that artificial neural network method is more efficient than linear regression method.
Shih-Hung Yang
2016-12-01
Full Text Available Several neural decoding algorithms have successfully converted brain signals into commands to control a computer cursor and prosthetic devices. A majority of decoding methods, such as population vector algorithms (PVA, optimal linear estimators (OLE, and neural networks (NN, are effective in predicting movement kinematics, including movement direction, speed and trajectory but usually require a large number of neurons to achieve desirable performance. This study proposed a novel decoding algorithm even with signals obtained from a smaller numbers of neurons. We adopted sliced inverse regression (SIR to predict forelimb movement from single-unit activities recorded in the rat primary motor (M1 cortex in a water-reward lever-pressing task. SIR performed weighted principal component analysis (PCA to achieve effective dimension reduction for nonlinear regression. To demonstrate the decoding performance, SIR was compared to PVA, OLE, and NN. Furthermore, PCA and sequential feature selection (SFS which are popular feature selection techniques were implemented for comparison of feature selection effectiveness. Among SIR, PVA, OLE, PCA, SFS, and NN decoding methods, the trajectories predicted by SIR (with a root mean square error, RMSE, of 8.47 ± 1.32 mm was closer to the actual trajectories compared with those predicted by PVA (30.41 ± 11.73 mm, OLE (20.17 ± 6.43 mm, PCA (19.13 ± 0.75 mm, SFS (22.75 ± 2.01 mm, and NN (16.75 ± 2.02 mm. The superiority of SIR was most obvious when the sample size of neurons was small. We concluded that SIR sorted the input data to obtain the effective transform matrices for movement prediction, making it a robust decoding method for conditions with sparse neuronal information.
Ma, Jing; Yu, Jiong; Hao, Guangshu; Wang, Dan; Sun, Yanni; Lu, Jianxin; Cao, Hongcui; Lin, Feiyan
2017-02-20
The prevalence of high hyperlipemia is increasing around the world. Our aims are to analyze the relationship of triglyceride (TG) and cholesterol (TC) with indexes of liver function and kidney function, and to develop a prediction model of TG, TC in overweight people. A total of 302 adult healthy subjects and 273 overweight subjects were enrolled in this study. The levels of fasting indexes of TG (fs-TG), TC (fs-TC), blood glucose, liver function, and kidney function were measured and analyzed by correlation analysis and multiple linear regression (MRL). The back propagation artificial neural network (BP-ANN) was applied to develop prediction models of fs-TG and fs-TC. The results showed there was significant difference in biochemical indexes between healthy people and overweight people. The correlation analysis showed fs-TG was related to weight, height, blood glucose, and indexes of liver and kidney function; while fs-TC was correlated with age, indexes of liver function (P < 0.01). The MRL analysis indicated regression equations of fs-TG and fs-TC both had statistic significant (P < 0.01) when included independent indexes. The BP-ANN model of fs-TG reached training goal at 59 epoch, while fs-TC model achieved high prediction accuracy after training 1000 epoch. In conclusions, there was high relationship of fs-TG and fs-TC with weight, height, age, blood glucose, indexes of liver function and kidney function. Based on related variables, the indexes of fs-TG and fs-TC can be predicted by BP-ANN models in overweight people.
Kokaly, R.F.; Clark, R.N.
1999-01-01
We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using
Stephen Eyije Abechi
2016-04-01
Full Text Available Aim: To develop good and rational Quantitative Structure Activity Relationship (QSAR mathematical models that can predict to a significant level the anti-tyrosinase and anti-Candida Albicans Minimum inhibitory concentration (MIC of ketone and tetra- etone derivatives. Place and Duration of Study: Department of Chemistry (Mathieson Laboratory (3-Physical Chemistry unit, Ahmadu Bello University, Zaria, Nigeria, between December 2015 and March 2016. Methodology: A set of 44 ketone and tetra-ketone derivatives with their anti-tyrosinase and anti-Candida Albicans activities in terms of minimum inhibitory concentration (MIC against the gram-positive fungal and hyperpigmentation were selected for 1D-3D quantitative structure activity relationship (QSAR analysis using the parameterization method 6 (PM6 basis set. The computed descriptors were correlated with their experimental MIC. Genetic Function Approximation (GFA method and Multi-Linear Regression analysis (MLR were used to derive the most statistically significant QSAR model. Results: The result obtained indicates that the most statistically significant QSAR model was a five- parametric linear equation with the squared correlation coefficient (R2 value of 0.9914, adjusted squared correlation coefficient (R 2 adj value of 0.9896 and Leave one out (LOO cross validation coefficient (Q2 value of 0.9853. An external set was used for confirming the predictive power of the model, its R2 pred = 0.9618 and rm^2 = 0.8981. Conclusion: The QSAR results reveal that molecular mass, atomic mass, polarity, electronic and topological predominantly influence the anti-tyrosinase and anti-Candida Albicans activity of the complexes. The wealth of information in this study will provide an insight to designing novel bioactive ketones and tetra-ketones compound that will curb the emerging trend of multi-drug resistant strain of fungal and hyperpigmentation
Fereshteh Shiri
2010-08-01
Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Casey P Durand
Full Text Available INTRODUCTION: Statistical interactions are a common component of data analysis across a broad range of scientific disciplines. However, the statistical power to detect interactions is often undesirably low. One solution is to elevate the Type 1 error rate so that important interactions are not missed in a low power situation. To date, no study has quantified the effects of this practice on power in a linear regression model. METHODS: A Monte Carlo simulation study was performed. A continuous dependent variable was specified, along with three types of interactions: continuous variable by continuous variable; continuous by dichotomous; and dichotomous by dichotomous. For each of the three scenarios, the interaction effect sizes, sample sizes, and Type 1 error rate were varied, resulting in a total of 240 unique simulations. RESULTS: In general, power to detect the interaction effect was either so low or so high at α = 0.05 that raising the Type 1 error rate only served to increase the probability of including a spurious interaction in the model. A small number of scenarios were identified in which an elevated Type 1 error rate may be justified. CONCLUSIONS: Routinely elevating Type 1 error rate when testing interaction effects is not an advisable practice. Researchers are best served by positing interaction effects a priori and accounting for them when conducting sample size calculations.
de Souza, R. S.; Hilbe, J. M.; Buelens, B.; Riggs, J. D.; Cameron, E.; Ishida, E. E. O.; Chies-Santos, A. L.; Killedar, M.
2015-10-01
In this paper, the third in a series illustrating the power of generalized linear models (GLMs) for the astronomical community, we elucidate the potential of the class of GLMs which handles count data. The size of a galaxy's globular cluster (GC) population (NGC) is a prolonged puzzle in the astronomical literature. It falls in the category of count data analysis, yet it is usually modelled as if it were a continuous response variable. We have developed a Bayesian negative binomial regression model to study the connection between NGC and the following galaxy properties: central black hole mass, dynamical bulge mass, bulge velocity dispersion and absolute visual magnitude. The methodology introduced herein naturally accounts for heteroscedasticity, intrinsic scatter, errors in measurements in both axes (either discrete or continuous) and allows modelling the population of GCs on their natural scale as a non-negative integer variable. Prediction intervals of 99 per cent around the trend for expected NGC comfortably envelope the data, notably including the Milky Way, which has hitherto been considered a problematic outlier. Finally, we demonstrate how random intercept models can incorporate information of each particular galaxy morphological type. Bayesian variable selection methodology allows for automatically identifying galaxy types with different productions of GCs, suggesting that on average S0 galaxies have a GC population 35 per cent smaller than other types with similar brightness.
Linard, Joshua I.
2013-01-01
Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.
Jiao, Bingqing; Zhang, Delong; Liang, Aiying; Liang, Bishan; Wang, Zengjian; Li, Junchao; Cai, Yuxuan; Gao, Mengxia; Gao, Zhenni; Chang, Song; Huang, Ruiwang; Liu, Ming
2017-09-07
Previous studies have indicated a tight linkage between resting-state functional connectivity of the human brain and creative ability. This study aimed to further investigate the association between the topological organization of resting-state brain networks and creativity. Therefore, we acquired resting-state fMRI data from 22 high-creativity participants and 22 low-creativity participants (as determined by their Torrance Tests of Creative Thinking scores). We then constructed functional brain networks for each participant and assessed group differences in network topological properties before exploring the relationships between respective network topological properties and creative ability. We identified an optimized organization of intrinsic brain networks in both groups. However, compared with low-creativity participants, high-creativity participants exhibited increased global efficiency and substantially decreased path length, suggesting increased efficiency of information transmission across brain networks in creative individuals. Using a multiple linear regression model, we further demonstrated that regional functional integration properties (i.e., the betweenness centrality and global efficiency) of brain networks, particularly the default mode network (DMN) and sensorimotor network (SMN), significantly predicted the individual differences in creative ability. Furthermore, the associations between network regional properties and creative performance were creativity-level dependent, where the difference in the resource control component may be important in explaining individual difference in creative performance. These findings provide novel insights into the neural substrate of creativity and may facilitate objective identification of creative ability. Copyright © 2017. Published by Elsevier B.V.
Neela Deshpande
2014-12-01
Full Text Available In the recent past Artificial Neural Networks (ANN have emerged out as a promising technique for predicting compressive strength of concrete. In the present study back propagation was used to predict the 28 day compressive strength of recycled aggregate concrete (RAC along with two other data driven techniques namely Model Tree (MT and Non-linear Regression (NLR. Recycled aggregate is the current need of the hour owing to its environmental friendly aspect of re-use of the construction waste. The study observed that, prediction of 28 day compressive strength of RAC was done better by ANN than NLR and MT. The input parameters were cubic meter proportions of Cement, Natural fine aggregate, Natural coarse Aggregates, recycled aggregates, Admixture and Water (also called as raw data. The study also concluded that ANN performs better when non-dimensional parameters like Sand–Aggregate ratio, Water–total materials ratio, Aggregate–Cement ratio, Water–Cement ratio and Replacement ratio of natural aggregates by recycled aggregates, were used as additional input parameters. Study of each network developed using raw data and each non dimensional parameter facilitated in studying the impact of each parameter on the performance of the models developed using ANN, MT and NLR as well as performance of the ANN models developed with limited number of inputs. The results indicate that ANN learn from the examples and grasp the fundamental domain rules governing strength of concrete.
Alexeeff, Stacey E; Carroll, Raymond J; Coull, Brent
2016-04-01
Spatial modeling of air pollution exposures is widespread in air pollution epidemiology research as a way to improve exposure assessment. However, there are key sources of exposure model uncertainty when air pollution is modeled, including estimation error and model misspecification. We examine the use of predicted air pollution levels in linear health effect models under a measurement error framework. For the prediction of air pollution exposures, we consider a universal Kriging framework, which may include land-use regression terms in the mean function and a spatial covariance structure for the residuals. We derive the bias induced by estimation error and by model misspecification in the exposure model, and we find that a misspecified exposure model can induce asymptotic bias in the effect estimate of air pollution on health. We propose a new spatial simulation extrapolation (SIMEX) procedure, and we demonstrate that the procedure has good performance in correcting this asymptotic bias. We illustrate spatial SIMEX in a study of air pollution and birthweight in Massachusetts.
Kamruzzaman, Md; Mamun, A S M A; Bakar, Sheikh Muhammad Abu; Saw, Aik; Kamarul, T; Islam, Md Nurul; Hossain, Md Golam
2016-11-21
The aim of this study was to investigate the socioeconomic and demographic factors influencing the body mass index (BMI) of non-pregnant married Bangladeshi women of reproductive age. Secondary (Hierarchy) data from the 2011 Bangladesh Demographic and Health Survey, collected using two-stage stratified cluster sampling, were used. Two-level linear regression analysis was performed to remove the cluster effect of the variables. The mean BMI of married non-pregnant Bangladeshi women was 21.60±3.86 kg/m2, and the prevalence of underweight, overweight and obesity was 22.8%, 14.9% and 3.2%, respectively. After removing the cluster effect, age and age at first marriage were found to be positively (pchildren was negatively related with women's BMI. Lower BMI was especially found among women from rural areas and poor families, with an uneducated husband, with no television at home and who were currently breast-feeding. Age, total children ever born, age at first marriage, type of residence, education level, level of husband's education, wealth index, having a television at home and practising breast-feeding were found to be important predictors for the BMI of married Bangladeshi non-pregnant women of reproductive age. This information could be used to identify sections of the Bangladeshi population that require special attention, and to develop more effective strategies to resolve the problem of malnutrition.
Ani Shabri
2014-01-01
Full Text Available Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI, has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
一元整体线性回归预测法的MATLAB程序设计%MATLAB Programming for Unary Total Linear Regression Model
彭友
2015-01-01
本文根据一元整体回归模型的误差方程推导整体最小二乘法计算回归参数和拟合优度的一组公式,并给出整体线性回归预测法迭代计算的MATLAB程序,最后用简单算例验证该方法和程序.结果表明,当自变量也含有随机误差时,本文给出的程序正确,整体估计比最小二乘估计更优、更合理.%This paper derives a set of formulas for solving unary linear regression problem and computing goodness of fit with the errors function of unary total linear regression model,and gives a series of code in MAT-LAB language for solving unary total linear regression problem.A case study is given to validate the program of unary total linear regression model at last.The computed results show that when the independent variables are per-turbed by random noise,the given code is correct,and the results from the program are better than LS estimation of unary linear regression model.
Poullis, Michael
2014-11-01
EuroSCORE II, despite improving on the original EuroSCORE system, has not solved all the calibration and predictability issues. Recursive, non-linear and mixed recursive and non-linear regression analysis were assessed with regard to sensitivity, specificity and predictability of the original EuroSCORE and EuroSCORE II systems. The original logistic EuroSCORE, EuroSCORE II and recursive, non-linear and mixed recursive and non-linear regression analyses of these risk models were assessed via receiver operator characteristic curves (ROC) and Hosmer-Lemeshow statistic analysis with regard to the accuracy of predicting in-hospital mortality. Analysis was performed for isolated coronary artery bypass grafts (CABGs) (n = 2913), aortic valve replacement (AVR) (n = 814), mitral valve surgery (n = 340), combined AVR and CABG (n = 517), aortic (n = 350), miscellaneous cases (n = 642), and combinations of the above cases (n = 5576). The original EuroSCORE had an ROC below 0.7 for isolated AVR and combined AVR and CABG. None of the methods described increased the ROC above 0.7. The EuroSCORE II risk model had an ROC below 0.7 for isolated AVR only. Recursive regression, non-linear regression, and mixed recursive and non-linear regression all increased the ROC above 0.7 for isolated AVR. The original EuroSCORE had a Hosmer-Lemeshow statistic that was above 0.05 for all patients and the subgroups analysed. All of the techniques markedly increased the Hosmer-Lemeshow statistic. The EuroSCORE II risk model had a Hosmer-Lemeshow statistic that was significant for all patients (P linear regression failed to improve on the original Hosmer-Lemeshow statistic. The mixed recursive and non-linear regression using the EuroSCORE II risk model was the only model that produced an ROC of 0.7 or above for all patients and procedures and had a Hosmer-Lemeshow statistic that was highly non-significant. The original EuroSCORE and the EuroSCORE II risk models do not have adequate ROC and Hosmer
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2017-07-26
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R(2)), using R(2) as the primary metric of assay agreement. However, the use of R(2) alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Faezehossadat Khademi
2016-12-01
Full Text Available Compressive strength of concrete, recognized as one of the most significant mechanical properties of concrete, is identified as one of the most essential factors for the quality assurance of concrete. In the current study, three different data-driven models, i.e., Artificial Neural Network (ANN, Adaptive Neuro-Fuzzy Inference System (ANFIS, and Multiple Linear Regression (MLR were used to predict the 28 days compressive strength of recycled aggregate concrete (RAC. Recycled aggregate is the current need of the hour owing to its environmental pleasant aspect of re-using the wastes due to construction. 14 different input parameters, including both dimensional and non-dimensional parameters, were used in this study for predicting the 28 days compressive strength of concrete. The present study concluded that estimation of 28 days compressive strength of recycled aggregate concrete was performed better by ANN and ANFIS in comparison to MLR. In other words, comparing the test step of all the three models, it can be concluded that the MLR model is better to be utilized for preliminary mix design of concrete, and ANN and ANFIS models are suggested to be used in the mix design optimization and in the case of higher accuracy necessities. In addition, the performance of data-driven models with and without the non-dimensional parameters is explored. It was observed that the data-driven models show better accuracy when the non-dimensional parameters were used as additional input parameters. Furthermore, the effect of each non-dimensional parameter on the performance of each data-driven model is investigated. Finally, the effect of number of input parameters on 28 days compressive strength of concrete is examined.
Manu Batra
2016-01-01
Full Text Available Context: Dental caries among children has been described as a pandemic disease with a multifactorial nature. Various sociodemographic factors and oral hygiene practices are commonly tested for their influence on dental caries. In recent years, a recent statistical model that allows for covariate adjustment has been developed and is commonly referred zero-inflated negative binomial (ZINB models. Aim: To compare the fit of the two models, the conventional linear regression (LR model and ZINB model to assess the risk factors associated with dental caries. Materials and Methods: A cross-sectional survey was conducted on 1138 12-year-old school children in Moradabad Town, Uttar Pradesh during months of February-August 2014. Selected participants were interviewed using a questionnaire. Dental caries was assessed by recording decayed, missing, or filled teeth (DMFT index. Statistical Analysis Used: To assess the risk factor associated with dental caries in children, two approaches have been applied - LR model and ZINB model. Results: The prevalence of caries-free subjects was 24.1%, and mean DMFT was 3.4 ± 1.8. In LR model, all the variables were statistically significant. Whereas in ZINB model, negative binomial part showed place of residence, father′s education level, tooth brushing frequency, and dental visit statistically significant implying that the degree of being caries-free (DMFT = 0 increases for group of children who are living in urban, whose father is university pass out, who brushes twice a day and if have ever visited a dentist. Conclusion: The current study report that the LR model is a poorly fitted model and may lead to spurious conclusions whereas ZINB model has shown better goodness of fit (Akaike information criterion values - LR: 3.94; ZINB: 2.39 and can be preferred if high variance and number of an excess of zeroes are present.
Clement, Dominic; Gruber, Nicolas
2017-04-01
Major progress has been made by the international community (e.g., GO-SHIP, IOCCP, IMBER/SOLAS carbon working groups) in recent years by collecting and providing homogenized datasets for carbon and other biogeochemical variables in the surface ocean (SOCAT) and interior ocean (GLODAPv2). Together with previous efforts, this has enabled the community to develop methods to assess changes in the ocean carbon cycle through time. Of particular interest is the determination of the decadal change in the anthropogenic CO2 inventory solely based on in-situ measurements from at least two time periods in the interior ocean. However, all such methods face the difficulty of a scarce dataset in both space and time, making the use of appropriate interpolation techniques in time and space a crucial element of any method. Here we present a new method based on the parameter C*, whose variations reflect the total change in dissolved inorganic carbon (DIC) driven by the exchange of CO2 across the air-sea interface. We apply the extended Multiple Linear Regression method (Friis et al., 2005) on C* in order (1) to calculate the change in anthropogenic CO2 from the original DIC/C* measurements, and (2) to interpolate the result onto a spatial grid using other biogeochemical variables (T,S,AOU, etc.). These calculations are made on isopycnal slabs across whole ocean basins. In combination with the transient steady state assumption (Tanhua et al., 2007) providing a temporal correction factor, we address the spatial and temporal interpolation challenges. Using synthetic data from a hindcast simulation with a global ocean biogeochemistry model (NCAR-CCSM with BEC), we tested the method for robustness and accuracy in determining ΔCant. We will present data-based results for all ocean basins, with the most recent estimate of an global uptake of 32±6 Pg C between 1994 and 2007, indicating an uptake rate 2.5±0.5 Pg C yr-1 for this time period. These results are compared with regional and
López-Serrano PM
2016-04-01
Full Text Available The Sierra Madre Occidental mountain range (Durango, Mexico is of great ecological interest because of the high degree of environmental heterogeneity in the area. The objective of the present study was to estimate the biomass of mixed and uneven-aged forests in the Sierra Madre Occidental by using Landsat-5 TM spectral data and forest inventory data. We used the ATCOR3® atmospheric and topographic correction module to convert remotely sensed imagery digital signals to surface reflectance values. The usual approach of modeling stand variables by using multiple linear regression was compared with a hybrid model developed in two steps: in the first step a regression tree was used to obtain an initial classification of homogeneous biomass groups, and multiple linear regression models were then fitted to each node of the pruned regression tree. Cross-validation of the hybrid model explained 72.96% of the observed stand biomass variation, with a reduction in the RMSE of 25.47% with respect to the estimates yielded by the linear model fitted to the complete database. The most important variables for the binary classification process in the regression tree were the albedo, the corrected readings of the short-wave infrared band of the satellite (2.08-2.35 µm and the topographic moisture index. We used the model output to construct a map for estimating biomass in the study area, which yielded values of between 51 and 235 Mg ha-1. The use of regression trees in combination with stepwise regression of corrected satellite imagery proved a reliable method for estimating forest biomass.
Leif E. Peterson
1997-11-01
Full Text Available A computer program for multifactor relative risks, confidence limits, and tests of hypotheses using regression coefficients and a variance-covariance matrix obtained from a previous additive or multiplicative regression analysis is described in detail. Data used by the program can be stored and input from an external disk-file or entered via the keyboard. The output contains a list of the input data, point estimates of single or joint effects, confidence intervals and tests of hypotheses based on a minimum modified chi-square statistic. Availability of the program is also discussed.
Mortaza Jamshidian
2005-01-01
Full Text Available The problem of simultaneous inference and multiple comparison for comparing means of k( ≥ 3 populations has been long studied in the statistics literature and is widely available in statistics literature. However to-date, the problem of multiple comparison of regression models has not found its way to the software. It is only recently that the computational aspects of this problem have been resolved in a general setting. SimReg employs this new methodology and provides users with software for multiple regression of several regression models. The comparisons can be among any set of pairs, and moreover any number of predictors can be included in the model. More importantly predictors can be constrained to their natural boundaries, if known. Computational methods for the problem of simultaneous confidence bands when predictors are constrained to intervals has also recently been addressed. SimReg utilizes this recent development to offer simultaneous confidence bands for regression models with any number of predictor variables. Again, the predictors can be constrained to their natural boundaries which results in narrower bands, as compared to the case where no restriction is imposed. A by-product of these confidence bands is a new method for comparing two regression surfaces, that is more informative than the usual partial F test.
Glass, Edmund R; Dozmorov, Mikhail G
2016-10-06
The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from blood samples suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. Combined with cell counts, heterogeneous gene expression may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression, and a global cutoff to judge significance, such as False Discovery Rate (FDR). Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect linear regression. In this paper we quantify the parameter space affecting the performance of linear regression (sensitivity of cell type-specific differential expression detection) on a per-gene basis. We evaluated the effect of sample sizes, cell type-specific proportion variability, and mean squared error on sensitivity of cell type-specific differential expression detection using linear regression. Each parameter affected variability of cell type-specific expression estimates and, subsequently, the sensitivity of differential expression detection. We provide the R package, LRCDE, which performs linear regression-based cell type-specific differential expression (deconvolution) detection on a gene-by-gene basis. Accounting for variability around cell type-specific gene expression estimates, it computes per-gene t-statistics of differential detection, p-values, t-statistic-based sensitivity, group-specific mean squared error, and several gene-specific diagnostic metrics. The sensitivity of linear regression-based cell type-specific differential expression detection differed for each gene as a function of mean squared error, per group sample sizes, and variability of the proportions
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
KIM, J.
2015-12-01
Aerosol has played an important role in air quality for short term and climate change for long term. Especially, it is important to understand how aerosol optical depth (AOD) has changed to date for the prognosis of future atmospheric state and radiation budget which are related to human life. In this study, the trend of AOD at 550 nm from MODIS Aqua (MYD08) was estimated for 10 years from 2004 to 2014 using linear regression method and ensemble empirical mode decomposition method (EEMD). Search region was selected to East Asia [18.5°N-51.5°N, 85.5°E-150.5°E] which is considered to be of great interest in emission source. The result of linear regression shows remarkably increasing trend in North and East China including Sanjiang, Hailun, Beijing, Beijing forest and Jinozhou Bay, than rather downward trend in other neighboring regions. Actually, however, AOD has seasonality itself and its trend is also affected by external source consistently, so non-linear trend analysis was conducted to analyze the changing tendency of AOD trends. Consequently, secular trends of AOD defined by EEMD showed almost similar values over the entire region, but their shapes over time are quite different with those of linear regression. Here, AOD linear trend in Beijing has monotonically increased [0.03% yr-1] since 2004, but its non-linear trend shows that initial increasing trend has alleviated and even turned into downward trend from about 2010. Lastly, the validation of MODIS AOD with AErosol RObotic NETwork (AERONET) was conducted additionally which showed fairly good agreement with those of AERONET (R=0.901, RMSE=0.226, MAE=0.031, MBE=-0.001).
Hittner, James B.; N. Clayton Silver
2016-01-01
In linear multiple regression it is common practice to test whether the squared multiple correlation co efficient, R2, differs significantly from zero. Although frequently used, this test is misleading because the expected value of R2 is not zero under the null hypothesis that ρ, the population value of the multiple correlation coefficient, equals zero. The non-zero expected value of R2 has implications both for significance testing and effect size estimation involving the squared multipl...
线性回归模型中的渐近最优性%Asymptotic Optimality in Linear Regression
牟唯嫣; 张辉; 陈建杰
2014-01-01
对线性回归中系数的一类估计给出了理论上的最优均方误差。证明了渐近意义下最小二乘估计和 lasso 估计均不具有最优均方误差性。最后给出了一个具有渐近最小均方误差的回归估计。%Theoretical optimal Mean Squared Error ( MSE) of a class of regression estimators was provided. It was proved that neither the least squares estimator nor the lasso estimator possesses the asymptotic optimality. A new re-gression estimator which has the asymptotic optimality was also given.
Ganji, Shobha H; Kukes, Gary D; Lambrecht, Nils; Kashyap, Moti L; Kamanna, Vaijinath S
2014-02-15
Nonalcoholic fatty liver disease (NAFLD), a leading cause of liver damage, comprises a spectrum of liver abnormalities including the early fat deposition in the liver (hepatic steatosis) and advanced nonalcoholic steatohepatitis. Niacin decreases plasma triglycerides, but its effect on hepatic steatosis is elusive. To examine the effect of niacin on steatosis, rats were fed either a rodent normal chow, chow containing high fat (HF), or HF containing 0.5% or 1.0% niacin in the diet for 4 wk. For regression studies, rats were first fed the HF diet for 6 wk to induce hepatic steatosis and were then treated with niacin (0.5% in the diet) while on the HF diet for 6 wk. The findings indicated that inclusion of niacin at 0.5% and 1.0% doses in the HF diet significantly decreased liver fat content, liver weight, hepatic oxidative products, and prevented hepatic steatosis. Niacin treatment to rats with preexisting hepatic steatosis induced by the HF diet significantly regressed steatosis. Niacin had no effect on the mRNA expression of fatty acid synthesis or oxidation genes (including sterol-regulatory element-binding protein 1, acetyl-CoA carboxylase 1, fatty acid synthase, and carnitine palmitoyltransferase 1) but significantly inhibited mRNA levels, protein expression, and activity of diacylglycerol acyltrasferase 2, a key enzyme in triglyceride synthesis. These novel findings suggest that niacin effectively prevents and causes the regression of experimental hepatic steatosis. Approved niacin formulation(s) for other indications or niacin analogs may offer a very cost-effective opportunity for the clinical development of niacin for treating NAFLD and fatty liver disease.
Rapakoulia, Trisevgeni
2017-08-09
Motivation: Drug combination therapy for treatment of cancers and other multifactorial diseases has the potential of increasing the therapeutic effect, while reducing the likelihood of drug resistance. In order to reduce time and cost spent in comprehensive screens, methods are needed which can model additive effects of possible drug combinations. Results: We here show that the transcriptional response to combinatorial drug treatment at promoters, as measured by single molecule CAGE technology, is accurately described by a linear combination of the responses of the individual drugs at a genome wide scale. We also find that the same linear relationship holds for transcription at enhancer elements. We conclude that the described approach is promising for eliciting the transcriptional response to multidrug treatment at promoters and enhancers in an unbiased genome wide way, which may minimize the need for exhaustive combinatorial screens.
M Taki
2017-05-01
Full Text Available Introduction Controlling greenhouse microclimate not only influences the growth of plants, but also is critical in the spread of diseases inside the greenhouse. The microclimate parameters were inside air, greenhouse roof and soil temperature, relative humidity and solar radiation intensity. Predicting the microclimate conditions inside a greenhouse and enabling the use of automatic control systems are the two main objectives of greenhouse climate model. The microclimate inside a greenhouse can be predicted by conducting experiments or by using simulation. Static and dynamic models are used for this purpose as a function of the metrological conditions and the parameters of the greenhouse components. Some works were done in past to 2015 year to simulation and predict the inside variables in different greenhouse structures. Usually simulation has a lot of problems to predict the inside climate of greenhouse and the error of simulation is higher in literature. The main objective of this paper is comparison between heat transfer and regression models to evaluate them to predict inside air and roof temperature in a semi-solar greenhouse in Tabriz University. Materials and Methods In this study, a semi-solar greenhouse was designed and constructed at the North-West of Iran in Azerbaijan Province (geographical location of 38°10′ N and 46°18′ E with elevation of 1364 m above the sea level. In this research, shape and orientation of the greenhouse, selected between some greenhouses common shapes and according to receive maximum solar radiation whole the year. Also internal thermal screen and cement north wall was used to store and prevent of heat lost during the cold period of year. So we called this structure, ‘semi-solar’ greenhouse. It was covered with glass (4 mm thickness. It occupies a surface of approximately 15.36 m2 and 26.4 m3. The orientation of this greenhouse was East–West and perpendicular to the direction of the wind prevailing
Veeraragavan, Surabi; Wan, Ying-Wooi; Connolly, Daniel R; Hamilton, Shannon M; Ward, Christopher S; Soriano, Sirena; Pitcher, Meagan R; McGraw, Christopher M; Huang, Sharon G; Green, Jennie R; Yuva, Lisa A; Liang, Agnes J; Neul, Jeffrey L; Yasui, Dag H; LaSalle, Janine M; Liu, Zhandong; Paylor, Richard; Samaco, Rodney C
2016-08-01
Mouse models of the transcriptional modulator Methyl-CpG-Binding Protein 2 (MeCP2) have advanced our understanding of Rett syndrome (RTT). RTT is a 'prototypical' neurodevelopmental disorder with many clinical features overlapping with other intellectual and developmental disabilities (IDD). Therapeutic interventions for RTT may therefore have broader applications. However, the reliance on the laboratory mouse to identify viable therapies for the human condition may present challenges in translating findings from the bench to the clinic. In addition, the need to identify outcome measures in well-chosen animal models is critical for preclinical trials. Here, we report that a novel Mecp2 rat model displays high face validity for modelling psychomotor regression of a learned skill, a deficit that has not been shown in Mecp2 mice. Juvenile play, a behavioural feature that is uniquely present in rats and not mice, is also impaired in female Mecp2 rats. Finally, we demonstrate that evaluating the molecular consequences of the loss of MeCP2 in both mouse and rat may result in higher predictive validity with respect to transcriptional changes in the human RTT brain. These data underscore the similarities and differences caused by the loss of MeCP2 among divergent rodent species which may have important implications for the treatment of individuals with disease-causing MECP2 mutations. Taken together, these findings demonstrate that the Mecp2 rat model is a complementary tool with unique features for the study of RTT and highlight the potential benefit of cross-species analyses in identifying potential disease-relevant preclinical outcome measures. © The Author 2016. Published by Oxford University Press.
Dashtbozorgi, Zahra; Golmohammadi, Hassan
2010-12-01
The main aim of this study was the development of a quantitative structure-property relationship method using an artificial neural network (ANN) for predicting the water-to-wet butyl acetate partition coefficients of organic solutes. As a first step, a genetic algorithm-multiple linear regression model was developed; the descriptors appearing in this model were considered as inputs for the ANN. These descriptors are principal moment of inertia C (I(C)), area-weighted surface charge of hydrogen-bonding donor atoms (HACA-2), Kier and Hall index (order 2) ((2)χ), Balaban index (J), minimum bond order of a C atom (P(C)) and relative negative-charged SA (RNCS). Then a 6-4-1 neural network was generated for the prediction of water-to-wet butyl acetate partition coefficients of 76 organic solutes. By comparing the results obtained from multiple linear regression and ANN models, it can be seen that statistical parameters (Fisher ratio, correlation coefficient and standard error) of the ANN model are better than that regression model, which indicates that nonlinear model can simulate the relationship between the structural descriptors and the partition coefficients of the investigated molecules more accurately.
Baba, Toshimi; Gotoh, Yusaku; Yamaguchi, Satoshi; Nakagawa, Satoshi; Abe, Hayato; Masuda, Yutaka; Kawahara, Takayoshi
2017-08-01
This study aimed to evaluate a validation reliability of single-step genomic best linear unbiased prediction (ssGBLUP) with a multiple-lactation random regression test-day model and investigate an effect of adding genotyped cows on the reliability. Two data sets for test-day records from the first three lactations were used: full data from February 1975 to December 2015 (60 850 534 records from 2 853 810 cows) and reduced data cut off in 2011 (53 091 066 records from 2 502 307 cows). We used marker genotypes of 4480 bulls and 608 cows. Genomic enhanced breeding values (GEBV) of 305-day milk yield in all the lactations were estimated for at least 535 young bulls using two marker data sets: bull genotypes only and both bulls and cows genotypes. The realized reliability (R(2) ) from linear regression analysis was used as an indicator of validation reliability. Using only genotyped bulls, R(2) was ranged from 0.41 to 0.46 and it was always higher than parent averages. The very similar R(2) were observed when genotyped cows were added. An application of ssGBLUP to a multiple-lactation random regression model is feasible and adding a limited number of genotyped cows has no significant effect on reliability of GEBV for genotyped bulls. © 2016 Japanese Society of Animal Science.
Suliman, Mohamed
2016-12-19
This paper proposes a new approach to find the regularization parameter for linear least-squares discrete ill-posed problems. In the proposed approach, an artificial perturbation matrix with a bounded norm is forced into the discrete ill-posed model matrix. This perturbation is introduced to enhance the singular-value (SV) structure of the matrix and hence to provide a better solution. The proposed approach is derived to select the regularization parameter in a way that minimizes the mean-squared error (MSE) of the estimator. Numerical results demonstrate that the proposed approach outperforms a set of benchmark methods in most cases when applied to different scenarios of discrete ill-posed problems. Jointly, the proposed approach enjoys the lowest run-time and offers the highest level of robustness amongst all the tested methods.
Babapour, R; Naghdi, R; Ghajar, I; Ghodsi, R
2015-07-01
Rock proportion of subsoil directly influences the cost of embankment in forest road construction. Therefore, developing a reliable framework for rock ratio estimation prior to the road planning could lead to more light excavation and less cost operations. Prediction of rock proportion was subjected to statistical analyses using the application of Artificial Neural Network (ANN) in MATLAB and five link functions of ordinal logistic regression (OLR) according to the rock type and terrain slope properties. In addition to bed rock and slope maps, more than 100 sample data of rock proportion were collected, observed by geologists, from any available bed rock of every slope class. Four predictive models were developed for rock proportion, employing independent variables and applying both the selected probit link function of OLR and Layer Recurrent and Feed forward back propagation networks of Neural Networks. In ANN, different numbers of neurons are considered for the hidden layer(s). Goodness of the fit measures distinguished that ANN models produced better results than OLR with R (2) = 0.72 and Root Mean Square Error = 0.42. Furthermore, in order to show the applicability of the proposed approach, and to illustrate the variability of rock proportion resulted from the model application, the optimum models were applied to a mountainous forest in where forest road network had been constructed in the past.
Neal, Dan J; Simons, Jeffrey S
2007-12-01
Analysis of alcohol use data and other low base rate risk behaviors using ordinary least squares regression models can be problematic. This article presents 2 alternative statistical approaches, generalized linear models and bootstrapping, that may be more appropriate for such data. First, the basic theory behind the approaches is presented. Then, using a data set of alcohol use behaviors and consequences, results based on these approaches are contrasted with the results from ordinary least squares regression. The less traditional approaches consistently demonstrated better fit with model assumptions, as demonstrated by graphical analysis of residuals, and identified more significant variables potentially resulting in theoretically different interpretations of the models of alcohol use. In conclusion, these models show significant promise for furthering the understanding of alcohol-related behaviors.
Santra, Lakshman; Rajmani, R S; Kumar, G V P P S Ravi; Saxena, Shikha; Dhara, Sujoy K; Kumar, Amit; Sahoo, Aditya Prasad; Singh, Lakshya Veer; Desai, G S; Chaturvedi, Uttara; Kumar, Sudesh; Tiwari, Ashok K
2014-10-01
The Non-Structural protein 1 of Canine Parvovirus-2 (CPV2.NS1) plays a major role in viral cytotoxicity and pathogenicity. CPV2.NS1 has been proven to cause apoptosis in HeLa cells in vitro in our laboratory. Here we report that CPV2.NS1 has no toxic side effects on healthy cells but regresses skin tumors in Wistar rats. Histopathological examination of tumor tissue from CPV2.NS1 treated group revealed infiltration of mononuclear and polymorphonuclear cells with increased extra cellular matrix, indicating signs of regression. Tumor regression was also evidenced by significant decrease in mitotic index, AgNOR count and PCNA index, and increase in TUNEL positive apoptotic cells in CPV2.NS1 treated group. Further, CPV2.NS1 induced anti-tumor immune response through significant increase in CD8(+) and NK cell population in CPV2.NS1 treated group. These findings suggest that CPV2.NS1 can be a possible therapeutic candidate as an alternative to chemotherapy for the treatment of cancer.
Cozzi-Lepri, Alessandro; Prosperi, Mattia C F; Kjær, Jesper;
2011-01-01
of treatment change episodes (TCE). We used the average square error (ASE) on a 10-fold cross-validation and on a test dataset (the EuroSIDA TCE database) to compare the performance of a newly derived lopinavir/r score with that of the 3 most widely used expert-based interpretation rules (ANRS, HIVDB and Rega...... explored the potential of linear regression to construct a simple predictive model for lopinavir/r-based TCE. Although, the performance of our proposed score was similar to that of already existing IS, previously unrecognized lopinavir/r-associated mutations were identified. The analysis illustrates...
Shamsipur, M; Hemmateenejad, B; Akhond, M; Sharghi, H
2001-07-06
A quantitative structure-property relationship study is suggested for the prediction of acidity constants of some recently synthesized 9,10-anthraquinone derivatives in binary methanol-water mixtures. Modeling of the acidity constant of the anthraquinones as a function of physicochemical parameters and mole fraction of methanol was established by means of the partial least-squares algorithm based on singular value decomposition (PLS-SVD) and multiple linear regression. The PLS-SVD procedure resulted in a better prediction ability and was found to be insensitive to noneffective descriptors. The classification of anthraquinones by the calculated descriptors was established.
Lambert, Ronald J W; Mytilinaios, Ioannis; Maitland, Luke; Brown, Angus M
2012-08-01
This study describes a method to obtain parameter confidence intervals from the fitting of non-linear functions to experimental data, using the SOLVER and Analysis ToolPaK Add-In of the Microsoft Excel spreadsheet. Previously we have shown that Excel can fit complex multiple functions to biological data, obtaining values equivalent to those returned by more specialized statistical or mathematical software. However, a disadvantage of using the Excel method was the inability to return confidence intervals for the computed parameters or the correlations between them. Using a simple Monte-Carlo procedure within the Excel spreadsheet (without recourse to programming), SOLVER can provide parameter estimates (up to 200 at a time) for multiple 'virtual' data sets, from which the required confidence intervals and correlation coefficients can be obtained. The general utility of the method is exemplified by applying it to the analysis of the growth of Listeria monocytogenes, the growth inhibition of Pseudomonas aeruginosa by chlorhexidine and the further analysis of the electrophysiological data from the compound action potential of the rodent optic nerve.
Azadi, Sama; Karimi-Jashni, Ayoub
2016-02-01
Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate.
Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar
2016-01-01
Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively.
Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally
2017-03-14
1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUCinf) of dalbavancin is a key parameter and AUCinf/MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. Cmax) Cmax versus AUCinf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUCinf were performed using published Cmax data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The Cmax versus AUCinf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE regression models, a single time point strategy of using Cmax (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUCinf of dalbavancin in patients.
Effect of sirolimus on the regression of peritoneal sclerosis in an experimental rat model.
Ceri, Mevlut; Unverdi, Selman; Dogan, Mehmet; Unverdi, Hatice; Karaca, Gokhan; Kocak, Gulay; Kurultak, Ilhan; Akbal, Erdem; Can, Murat; Duranay, Murat
2012-06-01
Immunosuppressive and anti-inflammatory agents have recently become increasingly popular in the treatment of encapsulating peritoneal sclerosis (EPS). The aim of our study was to investigate the effects of sirolimus on EPS in a rat model. We separated 32 non-uremic rats into four groups: 1 control group, 2 ml isotonic saline injected IP daily for 3 weeks; 2 chlorhexidine gluconate (CG) group, 2 ml 0,1 % CG and 15 % ethanol dissolved in saline injected IP daily for 3 weeks; 3 resting group, CG (weeks 0-3) plus peritoneal rest (weeks 3-6); 4 sirolimus group, CG (weeks 0-3), plus 0.2 ml (1 mg/ml) sirolimus (weeks 3-6). Pathological samples were examined by using hematoxylin eosin (HE) and Masson's trichrome stains. Peritoneal thickness, fibrosis, vascular changes, and inflammation were evaluated by light microscopy. Finally, tissue metalloproteinase (MMP)-2 levels were measured by enzyme-linked immunoassay. In the CG group, there was a significant increase in peritoneal thickness, inflammatory activity, and fibrosis score compared to the control group (p < 0.05). We also observed a lower fibrosis score and less peritoneal thickening in the sirolimus group compared to the resting and CG groups (p < 0.05). There was no difference in histopathologic findings, except for the inflammatory activity in the sirolimus group, compared to the control group. Although the CG group had higher tissue MMP-2 levels than the control group, the tissue MMP-2 levels were not significantly different from the other groups. Sirolimus has a beneficial effect on peritoneal fibrosis induced by CG. This suggests that sirolimus may have therapeutic value in the management of EPS.
Yano, Kentaro; Mita, Suzune; Morimoto, Kaori; Haraguchi, Tamami; Arakawa, Hiroshi; Yoshida, Miyako; Yamashita, Fumiyoshi; Uchida, Takahiro; Ogihara, Takuo
2015-09-01
P-glycoprotein (P-gp) regulates absorption of many drugs in the gastrointestinal tract and their accumulation in tumor tissues, but the basis of substrate recognition by P-gp remains unclear. Bitter-tasting phenylthiocarbamide, which stimulates taste receptor 2 member 38 (T2R38), increases P-gp activity and is a substrate of P-gp. This led us to hypothesize that bitterness intensity might be a predictor of P-gp-inhibitor/substrate status. Here, we measured the bitterness intensity of a panel of P-gp substrates and nonsubstrates with various taste sensors, and used multiple linear regression analysis to examine the relationship between P-gp-inhibitor/substrate status and various physical properties, including intensity of bitter taste measured with the taste sensor. We calculated the first principal component analysis score (PC1) as the representative value of bitterness, as all taste sensor's outputs shared significant correlation. The P-gp substrates showed remarkably greater mean bitterness intensity than non-P-gp substrates. We found that Km value of P-gp substrates were correlated with molecular weight, log P, and PC1 value, and the coefficient of determination (R(2) ) of the linear regression equation was 0.63. This relationship might be useful as an aid to predict P-gp substrate status at an early stage of drug discovery.
Giovanni Leopoldo Rozza
2015-09-01
Full Text Available With world becoming each day a global village, enterprises continuously seek to optimize their internal processes to hold or improve their competitiveness and make better use of natural resources. In this context, decision support tools are an underlying requirement. Such tools are helpful on predicting operational issues, avoiding cost risings, loss of productivity, work-related accident leaves or environmental disasters. This paper has its focus on the prediction of spent liquor caustic concentration of Bayer process for alumina production. Caustic concentration measuring is essential to keep it at expected levels, otherwise quality issues might arise. The organization requests caustic concentration by chemical analysis laboratory once a day, such information is not enough to issue preventive actions to handle process inefficiencies that will be known only after new measurement on the next day. Thereby, this paper proposes using Multiple Linear Regression and Artificial Neural Networks techniques a mathematical model to predict the spent liquor´s caustic concentration. Hence preventive actions will occur in real time. Such models were built using software tool for numerical computation (MATLAB and a statistical analysis software package (SPSS. The models output (predicted caustic concentration were compared with the real lab data. We found evidence suggesting superior results with use of Artificial Neural Networks over Multiple Linear Regression model. The results demonstrate that replacing laboratorial analysis by the forecasting model to support technical staff on decision making could be feasible.
Adaptive Lasso for Poisson log-linear regression model%自适应Lasso在Poisson对数线性回归模型下的性质
崔静; 郭鹏江; 夏志明
2011-01-01
Aim To study adaptive Lasso for Poisson log-linear regrersion model. Methods The methods of mathematical analysis and probability theory are used. Results Under some conditions, the adaptive Lasso estimator for Poisson log-linear regression has the oracle properties which are sparsity and asymptotic normality. Conclusion A-daptive Lasso can effectively choose variables for Poisson log-linar regression model and estimate the variable coefficient.%目的 研究自适应Lasso在Poisson对数线性模型下的性质.方法 利用数学分析及概率论中的性质.结果 证明了在Poisson对数线性模型下自适应Lasso估计量具有稀疏性和渐进正态性.结论 自适应Lasso可以有效选择Poisson对数线性模型中的变量,并同时估计变量系数.
Hasan, Haliza; Ahmad, Sanizah; Osman, Balkish Mohd; Sapri, Shamsiah; Othman, Nadirah
2017-08-01
In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.
Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza
2017-09-01
In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.
Parinet, Julien; Julien, Maxime; Nun, Pierrick; Robins, Richard J; Remaud, Gerald; Höhener, Patrick
2015-09-01
We aim at predicting the effect of structure and isotopic substitutions on the equilibrium vapour pressure isotope effect of various organic compounds (alcohols, acids, alkanes, alkenes and aromatics) at intermediate temperatures. We attempt to explore quantitative structure property relationships by using artificial neural networks (ANN); the multi-layer perceptron (MLP) and compare the performances of it with multi-linear regression (MLR). These approaches are based on the relationship between the molecular structure (organic chain, polar functions, type of functions, type of isotope involved) of the organic compounds, and their equilibrium vapour pressure. A data set of 130 equilibrium vapour pressure isotope effects was used: 112 were used in the training set and the remaining 18 were used for the test/validation dataset. Two sets of descriptors were tested, a set with all the descriptors: number of(12)C, (13)C, (16)O, (18)O, (1)H, (2)H, OH functions, OD functions, CO functions, Connolly Solvent Accessible Surface Area (CSA) and temperature and a reduced set of descriptors. The dependent variable (the output) is the natural logarithm of the ratios of vapour pressures (ln R), expressed as light/heavy as in classical literature. Since the database is rather small, the leave-one-out procedure was used to validate both models. Considering higher determination coefficients and lower error values, it is concluded that the multi-layer perceptron provided better results compared to multi-linear regression. The stepwise regression procedure is a useful tool to reduce the number of descriptors. To our knowledge, a Quantitative Structure Property Relationship (QSPR) approach for isotopic studies is novel.
Clostridium novyi-NT can cause regression of orthotopically implanted glioblastomas in rats.
Staedtke, Verena; Bai, Ren-Yuan; Sun, Weiyun; Huang, Judy; Kibler, Kathleen Kazuko; Tyler, Betty M; Gallia, Gary L; Kinzler, Kenneth; Vogelstein, Bert; Zhou, Shibin; Riggins, Gregory J
2015-03-20
Glioblastoma (GBM) is a highly aggressive primary brain tumor that is especially difficult to treat. The tumor's ability to withstand hypoxia leads to enhanced cancer cell survival and therapy resistance, but also yields a microenvironment that is in many aspects unique within the human body, thus offering potential therapeutic opportunities. The spore-forming anaerobic bacterium Clostridium novyi-NT(C. novyi-NT) has the ability to propagate in tumor-generated hypoxia, leading to oncolysis. Here, we show that intravenously injected spores of C. novyi-NT led to dramatic tumor destructions and significant survival increases in implanted, intracranial syngeneic F98 and human xenograft 060919 rat GBM models. C. novyi-NT germination was specific and confined to the neoplasm, with sparing of the normal brain parenchyma. All animals tolerated the bacteriolytic treatment, but edema and increased intracranial pressure could quickly be lethal if not monitored and medically managed with hydration and antibiotics. These results provide pre-clinical data supporting the development of this therapeutic approach for the treatment of patients with GBM.
Unitary Response Regression Models
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Liška, František; Peterková, Renata; Peterka, Miroslav; Landa, Vladimír; Zídek, Václav; Mlejnek, Petr; Šilhavý, Jan; Šimáková, Miroslava; Křen, Vladimír; Starker, Colby G.; Voytas, Daniel F.; Izsvák, Zsuzsanna; Pravenec, Michal
2016-01-01
Recently, it has been found that spontaneous mutation Lx (polydactyly-luxate syndrome) in the rat is determined by deletion of a conserved intronic sequence of the Plzf (Promyelocytic leukemia zinc finger protein) gene. In addition, Plzf is a prominent candidate gene for quantitative trait loci (QTLs) associated with cardiac hypertrophy and fibrosis in the spontaneously hypertensive rat (SHR). In the current study, we tested the effects of Plzf gene targeting in the SHR using TALENs (transcription activator-like effector nucleases). SHR ova were microinjected with constructs pTAL438/439 coding for a sequence-specific endonuclease that binds to target sequence in the first coding exon of the Plzf gene. Out of 43 animals born after microinjection, we detected a single male founder. Sequence analysis revealed a deletion of G that resulted in frame shift mutation starting in codon 31 and causing a premature stop codon at position of amino acid 58. The Plzftm1Ipcv allele is semi-lethal since approximately 95% of newborn homozygous animals died perinatally. All homozygous animals exhibited manifestations of a caudal regression syndrome including tail anomalies and serious size reduction and deformities of long bones, and oligo- or polydactyly on the hindlimbs. The heterozygous animals only exhibited the tail anomalies. Impaired development of the urinary tract was also revealed: one homozygous and one heterozygous rat exhibited a vesico-ureteric reflux with enormous dilatation of ureters and renal pelvis. In the homozygote, this was combined with a hypoplastic kidney. These results provide evidence for the important role of Plzf gene during development of the caudal part of a body—column vertebrae, hindlimbs and urinary system in the rat. PMID:27727328
Liška, František; Peterková, Renata; Peterka, Miroslav; Landa, Vladimír; Zídek, Václav; Mlejnek, Petr; Šilhavý, Jan; Šimáková, Miroslava; Křen, Vladimír; Starker, Colby G; Voytas, Daniel F; Izsvák, Zsuzsanna; Pravenec, Michal
2016-01-01
Recently, it has been found that spontaneous mutation Lx (polydactyly-luxate syndrome) in the rat is determined by deletion of a conserved intronic sequence of the Plzf (Promyelocytic leukemia zinc finger protein) gene. In addition, Plzf is a prominent candidate gene for quantitative trait loci (QTLs) associated with cardiac hypertrophy and fibrosis in the spontaneously hypertensive rat (SHR). In the current study, we tested the effects of Plzf gene targeting in the SHR using TALENs (transcription activator-like effector nucleases). SHR ova were microinjected with constructs pTAL438/439 coding for a sequence-specific endonuclease that binds to target sequence in the first coding exon of the Plzf gene. Out of 43 animals born after microinjection, we detected a single male founder. Sequence analysis revealed a deletion of G that resulted in frame shift mutation starting in codon 31 and causing a premature stop codon at position of amino acid 58. The Plzftm1Ipcv allele is semi-lethal since approximately 95% of newborn homozygous animals died perinatally. All homozygous animals exhibited manifestations of a caudal regression syndrome including tail anomalies and serious size reduction and deformities of long bones, and oligo- or polydactyly on the hindlimbs. The heterozygous animals only exhibited the tail anomalies. Impaired development of the urinary tract was also revealed: one homozygous and one heterozygous rat exhibited a vesico-ureteric reflux with enormous dilatation of ureters and renal pelvis. In the homozygote, this was combined with a hypoplastic kidney. These results provide evidence for the important role of Plzf gene during development of the caudal part of a body-column vertebrae, hindlimbs and urinary system in the rat.
Kim, Yong-Il; Kim, Yong Joong; Paeng, Jin Chul; Cheon, Gi Jeong; Lee, Dong Soo; Chung, June-Key; Kang, Keon Wook
2017-06-20
(18)F-Fluorodeoxyglucose (FDG) positron emission tomography (PET)/computed tomography (CT) has been investigated as a method to predict pancreatic cancer recurrence after pancreatic surgery. We evaluated the recently introduced heterogeneity indices of (18)F-FDG PET/CT used for predicting pancreatic cancer recurrence after surgery and compared them with current clinicopathologic and (18)F-FDG PET/CT parameters. A total of 93 pancreatic ductal adenocarcinoma patients (M:F = 60:33, mean age = 64.2 ± 9.1 years) who underwent preoperative (18)F-FDG PET/CT following pancreatic surgery were retrospectively enrolled. The standardized uptake values (SUVs) and tumor-to-background ratios (TBR) were measured on each (18)F-FDG PET/CT, as metabolic parameters. Metabolic tumor volume (MTV) and total lesion glycolysis (TLG) were examined as volumetric parameters. The coefficient of variance (heterogeneity index-1; SUVmean divided by the standard deviation) and linear regression slopes (heterogeneity index-2) of the MTV, according to SUV thresholds of 2.0, 2.5 and 3.0, were evaluated as heterogeneity indices. Predictive values of clinicopathologic and (18)F-FDG PET/CT parameters and heterogeneity indices were compared in terms of pancreatic cancer recurrence. Seventy patients (75.3%) showed recurrence after pancreatic cancer surgery (mean recurrence = 9.4 ± 8.4 months). Comparing the recurrence and no recurrence patients, all of the (18)F-FDG PET/CT parameters and heterogeneity indices demonstrated significant differences. In univariate Cox-regression analyses, MTV (P = 0.013), TLG (P = 0.007), and heterogeneity index-2 (P = 0.027) were significant. Among the clinicopathologic parameters, CA19-9 (P = 0.025) and venous invasion (P = 0.002) were selected as significant parameters. In multivariate Cox-regression analyses, MTV (P = 0.005), TLG (P = 0.004), and heterogeneity index-2 (P = 0.016) with venous invasion (P < 0.001, 0.001, and 0
Ramoelo, A.; Skidmore, A. K.; Cho, M. A.; Mathieu, R.; Heitkönig, I. M. A.; Dudeni-Tlhone, N.; Schlerf, M.; Prins, H. H. T.
2013-08-01
Grass nitrogen (N) and phosphorus (P) concentrations are direct indicators of rangeland quality and provide imperative information for sound management of wildlife and livestock. It is challenging to estimate grass N and P concentrations using remote sensing in the savanna ecosystems. These areas are diverse and heterogeneous in soil and plant moisture, soil nutrients, grazing pressures, and human activities. The objective of the study is to test the performance of non-linear partial least squares regression (PLSR) for predicting grass N and P concentrations through integrating in situ hyperspectral remote sensing and environmental variables (climatic, edaphic and topographic). Data were collected along a land use gradient in the greater Kruger National Park region. The data consisted of: (i) in situ-measured hyperspectral spectra, (ii) environmental variables and measured grass N and P concentrations. The hyperspectral variables included published starch, N and protein spectral absorption features, red edge position, narrow-band indices such as simple ratio (SR) and normalized difference vegetation index (NDVI). The results of the non-linear PLSR were compared to those of conventional linear PLSR. Using non-linear PLSR, integrating in situ hyperspectral and environmental variables yielded the highest grass N and P estimation accuracy (R2 = 0.81, root mean square error (RMSE) = 0.08, and R2 = 0.80, RMSE = 0.03, respectively) as compared to using remote sensing variables only, and conventional PLSR. The study demonstrates the importance of an integrated modeling approach for estimating grass quality which is a crucial effort towards effective management and planning of protected and communal savanna ecosystems.
Jarrahi, Morteza; Vafaei, Abbas Ali; Taherian, Abbas Ali; Miladi, Hossein; Rashidi Pour, Ali
2010-05-01
In this investigation, the effect of Matricaria chamomilla extract on linear incisional wound healing was studied. Thirty male Wistar rats were subjected to a linear 3 cm incision made over the skin of the back. The animals were randomly divided into three experimental groups, as control, olive oil, and treatment. Control group did not receive any drug or cold cream. Olive oil group received topical olive oil once a day from beginning of experiments to complete wound closure. Treatment group were treated topically by M. chamomilla extract dissolved in olive oil at the same time. For computing the percentage of wound healing, the area of the wound measured at the beginning of experiments and the next 2, 5, 8, 11, 14, 17, and 20 days. The percentage of wound healing was calculated by Walker formula after measurement of the wound area. Results showed that there were statistically significant differences between treatment and olive oil animals (p chamomilla administered topically has wound healing potential in linear incisional wound model in rats.
Whitlock, C. H., III
1977-01-01
Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.
Alexandrowicz, Rainer W; Jahn, Rebecca; Friedrich, Fabian; Unger, Anne
2016-06-01
Various studies have shown that caregiving relatives of schizophrenic patients are at risk of suffering from depression. These studies differ with respect to the applied statistical methods, which could influence the findings. Therefore, the present study analyzes to which extent different methods may cause differing results. The present study contrasts by means of one data set the results of three different modelling approaches, Rasch Modelling (RM), Structural Equation Modelling (SEM), and Linear Regression Modelling (LRM). The results of the three models varied considerably, reflecting the different assumptions of the respective models. Latent trait models (i. e., RM and SEM) generally provide more convincing results by correcting for measurement error and the RM specifically proves superior for it treats ordered categorical data most adequately.
Huttunen, Jani; Kokkola, Harri; Mielonen, Tero; Esa Juhani Mononen, Mika; Lipponen, Antti; Reunanen, Juha; Vilhelm Lindfors, Anders; Mikkonen, Santtu; Erkki Juhani Lehtinen, Kari; Kouremeti, Natalia; Bais, Alkiviadis; Niska, Harri; Arola, Antti
2016-07-01
In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during
A Restricted Least Squares Estimation for Fuzzy Linear Regression Models%模糊线性回归模型的约束最小二乘估计
王宁; 张文修
2006-01-01
自Tanaka等1982年提出模糊回归概念以来,该问题已得到广泛的研究.作为主要估计方法之一的模糊最小二乘估计以其与统计最小二乘估计的密切联系更受到人们的重视.本文依据适当定义的两个模糊数之间的距离,提出了模糊线性回归模型的一个约束最小二乘估计方法,该方法不仅能使估计的模糊参数的宽度具有非负性而且估计的模糊参数的中心线与传统的最小二乘估计相一致.最后,通过数值例子说明了所提方法的具体应用.%Fuzzy linear regression has been extensively studied since its inception symbolized by the work of Tanaka et al. In 1982. As one of the main estimation methods, fuzzy least squares approach is appealing because it corresponds, to some extend, to the well known statistical regression analysis. In this article, a restricted least squares method is proposed to fit fuzzy linear models with crisp inputs and symmetric fuzzy output. This method can obtain not only non-negative spreads of the estimated fuzzy parameters and a traditional least squares center line of the fitted fuzzy output which is of particular!importance to a decision maker. Numerical examples are further considered to demonstrate the practical application of the proposed method.
Baird, Jim; Curry, Robin; Reid, Tim
2013-03-01
This article describes the development and application of a multiple linear regression model to identify how the key elements of waste and recycling infrastructure, namely container capacity and frequency of collection, affect the yield from municipal kerbside recycling programmes. The overall aim of the research was to gain an understanding of the factors affecting the yield from municipal kerbside recycling programmes in Scotland with an underlying objective to evaluate the efficacy of the model as a decision-support tool for informing the design of kerbside recycling programmes. The study isolates the principal kerbside collection service offered by all 32 councils across Scotland, eliminating those recycling programmes associated with flatted properties or multi-occupancies. The results of the regression analysis model have identified three principal factors which explain 80% of the variability in the average yield of the principal dry recyclate services: weekly residual waste capacity, number of materials collected and the weekly recycling capacity. The use of the model has been evaluated and recommendations made on ongoing methodological development and the use of the results in informing the design of kerbside recycling programmes. We hope that the research can provide insights for the further development of methods to optimise the design and operation of kerbside recycling programmes.
Lewin, M.D.; Sarasua, S.; Jones, P.A. (Agency for Toxic Substances and Disease Registry, Atlanta, GA (United States). Div. of Health Studies)
1999-07-01
For the purpose of examining the association between blood lead levels and household-specific soil lead levels, the authors used a multivariate linear regression model to find a slope factor relating soil lead levels to blood lead levels. They used previously collected data from the Agency for Toxic Substances and Disease Registry's (ATSDR's) multisite lead and cadmium study. The data included in the blood lead measurements of 1,015 children aged 6--71 months, and corresponding household-specific environmental samples. The environmental samples included lead in soil, house dust, interior paint, and tap water. After adjusting for income, education or the parents, presence of a smoker in the household, sex, and dust lead, and using a double log transformation, they found a slope factor of 0.1388 with a 95% confidence interval of 0.09--0.19 for the dose-response relationship between the natural log of the soil lead level and the natural log of the blood lead level. The predicted blood lead level corresponding to a soil lead level of 500 mg/kg was 5.99 [micro]g/kg with a 95% prediction interval of 2.08--17.29. Predicted values and their corresponding prediction intervals varied by covariate level. The model shows that increased soil lead level is associated with elevated blood leads in children, but that predictions based on this regression model are subject to high levels of uncertainty and variability.
ERLS Algorithm for Linear Regression Model with Missing Response Variable%响应变量缺失下线性回归模型的ERLS算法
刘力军
2012-01-01
针对线性回归模型,提出了一个新的期望递归最小二乘算法（Expectation Recursive Least Square,ERLS）。在响应变量数据存在部分缺失的条件下,ERLS取响应变量的期望值代替缺失值,基于该期望值与自变量数据,实现自适应的递归估计回归系数,避免了高维数据相关矩阵的求逆困难。ERLS算法充分利用了全部有效数据,实现了在线回归估计。数值实验结果表明,在观测数据存在野值时,通过引入非线性抑制函数,ERLS算法优于LS方法。%A novel Expectation Least Square(ERLS) algorithm is proposed for linear regression model.Under the condition that response is partly missing,ERLS uses expectation value of the response instead of the missing value.Based on the expectation value and the data of independent variable,ERLS adaptively estimates the regression coefficients,which avoids the difficulty of inversion operation to the correlation matrix of high-dimensional data.ERLS makes fully use of the available data and sovles the regression problem in an online manner.Numerical expriments show that,by introducing a nonlinear function of supression,ERLS is superior to LS solution under the existence of wild data points.
BELDHUIS, HJA; SUZUKI, T; PIJN, JPM; TEISMAN, A; DASILVA, FHL; BOHUS, B
1993-01-01
The relationship between ipsi- and contralateral epileptiform electroencephalographic (EEG) activity was investigated in rats that were kindled daily in the amygdala. Two types of relationship-linear and non-linear associations-were studied and used to estimate time delays of EEG activity between ho
Comparison of linear polarization degree in healthy and wounded rat skin
Ribeiro, Martha S.; Freitas, Anderson Z.; Silva, Daniela F.; Zezell, Denise M.; Pellegrini, Cleusa M. R.; Costa, Fabiano G.; Zorn, Telma M. T.
2001-10-01
Low-intensity laser therapy (LILT) with adequate wavelength, intensity, and dose can accelerate tissue repair. However, there is still disperse information about light characteristics. Several works indicate that laser polarization plays an important role on the wound healing process. This study was conducted to verify the degree of linear polarization in normal and pathological rat skin samples. Artificial burns about 6 mm in diameter were created with liquid N2 on the back of the animals. The degree of polarization was measured in normal and pathological skin samples. It was verified that linearly polarized light can survive in the superficial layers of skin and it can be more preserved in skin under pathological condition when compared with health skin. The present study supports the hypothesis that polarized laser radiation can be used to treat open wounds and improve the healing.
Bonelli, Maria Grazia; Ferrini, Mauro; Manni, Andrea
2016-12-01
The assessment of metals and organic micropollutants contamination in agricultural soils is a difficult challenge due to the extensive area used to collect and analyze a very large number of samples. With Dioxins and dioxin-like PCBs measurement methods and subsequent the treatment of data, the European Community advises the develop low-cost and fast methods allowing routing analysis of a great number of samples, providing rapid measurement of these compounds in the environment, feeds and food. The aim of the present work has been to find a method suitable to describe the relations occurring between organic and inorganic contaminants and use the value of the latter in order to forecast the former. In practice, the use of a metal portable soil analyzer coupled with an efficient statistical procedure enables the required objective to be achieved. Compared to Multiple Linear Regression, the Artificial Neural Networks technique has shown to be an excellent forecasting method, though there is no linear correlation between the variables to be analyzed.
Chunggil Jung
2017-08-01
Full Text Available This study attempts to estimate spatial soil moisture in South Korea (99,000 km2 from January 2013 to December 2015 using a multiple linear regression (MLR model and the Terra moderate-resolution imaging spectroradiometer (MODIS land surface temperature (LST and normalized distribution vegetation index (NDVI data. The MODIS NDVI was used to reflect vegetation variations. Observed precipitation was measured using the automatic weather stations (AWSs of the Korea Meteorological Administration (KMA, and soil moisture data were recorded at 58 stations operated by various institutions. Prior to MLR analysis, satellite LST data were corrected by applying the conditional merging (CM technique and observed LST data from 71 KMA stations. The coefficient of determination (R2 of the original LST and observed LST was 0.71, and the R2 of corrected LST and observed LST was 0.95 for 3 selected LST stations. The R2 values of all corrected LSTs were greater than 0.83 for total 71 LST stations. The regression coefficients of the MLR model were estimated seasonally considering the five-day antecedent precipitation. The p-values of all the regression coefficients were less than 0.05, and the R2 values were between 0.28 and 0.67. The reason for R2 values less than 0.5 is that the soil classification at each observation site was not completely accurate. Additionally, the observations at most of the soil moisture monitoring stations used in this study started in December 2014, and the soil moisture measurements did not stabilize. Notably, R2 and root mean square error (RMSE in winter were poor, as reflected by the many missing values, and uncertainty existed in observations due to freezing and mechanical errors in the soil. Thus, the prediction accuracy is low in winter due to the difficulty of establishing an appropriate regression model. Specifically, the estimated map of the soil moisture index (SMI can be used to better understand the severity of droughts with the
Corina V. Sasso
2014-01-01
Full Text Available Prolactin (PRL is a key player in the development of mammary cancer. We studied the effects of parity or hyperprolactinemia on mammary carcinogenesis in OFA hr/hr treated with 7,12-dimethylbenzanthracene. They were divided into three groups: nulliparous (Null, primiparous (PL, after pregnancy and lactation, and hyperprolactinemic rats (I, implanted in the arcuate nucleus with 17β-estradiol. The tumor incidence was similar in the three groups. However, a higher percentage of regressing tumors was evident in the PL group. Serum PRL, mammary development, and mammary β-casein content were higher in I rats compared to Null. The expression of hormone receptors was similar in the different groups. However, mammary tissue from PL rats bearing tumors had increased expression of PRL and estrogen alpha receptors compared to rats free of tumors. Our results suggest that serum PRL levels do not have relevance on the incidence of tumors, probably because the low levels of PRL in OFA rats are not further decreased by PL like in other strains. However, supraphysiological levels of PRL affect carcinogenesis. PL induces regression of the tumors due to the differentiation produced on the mammary cells. Alterations in the expression of hormonal receptors may be involved in progression and regression of tumors.
El Dib, Regina; Gomaa, Huda; Ortiz, Alberto; Politei, Juan; Kapoor, Anil; Barreto, Fellype
2017-01-01
Anderson-Fabry disease (AFD) is an X-linked recessive inborn error of glycosphingolipid metabolism caused by a deficiency of alpha-galactosidase A. Renal failure, heart and cerebrovascular involvement reduce survival. A Cochrane review provided little evidence on the use of enzyme replacement therapy (ERT). We now complement this review through a linear regression and a pooled analysis of proportions from cohort studies. To evaluate the efficacy and safety of ERT for AFD. For the systematic review, a literature search was performed, from inception to March 2016, using Medline, EMBASE and LILACS. Inclusion criteria were cohort studies, patients with AFD on ERT or natural history, and at least one patient-important outcome (all-cause mortality, renal, cardiovascular or cerebrovascular events, and adverse events) reported. The pooled proportion and the confidence interval (CI) are shown for each outcome. Simple linear regressions for composite endpoints were performed. 77 cohort studies involving 15,305 participants proved eligible. The pooled proportions were as follows: a) for renal complications, agalsidase alfa 15.3% [95% CI 0.048, 0.303; I2 = 77.2%, p = 0.0005]; agalsidase beta 6% [95% CI 0.04, 0.07; I2 = not applicable]; and untreated patients 21.4% [95% CI 0.1522, 0.2835; I2 = 89.6%, p<0.0001]. Effect differences favored agalsidase beta compared to untreated patients; b) for cardiovascular complications, agalsidase alfa 28% [95% CI 0.07, 0.55; I2 = 96.7%, p<0.0001]; agalsidase beta 7% [95% CI 0.05, 0.08; I2 = not applicable]; and untreated patients 26.2% [95% CI 0.149, 0.394; I2 = 98.8%, p<0.0001]. Effect differences favored agalsidase beta compared to untreated patients; and c) for cerebrovascular complications, agalsidase alfa 11.1% [95% CI 0.058, 0.179; I2 = 70.5%, p = 0.0024]; agalsidase beta 3.5% [95% CI 0.024, 0.046; I2 = 0%, p = 0.4209]; and untreated patients 18.3% [95% CI 0.129, 0.245; I2 = 95% p < 0.0001]. Effect differences favored agalsidase beta
Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.
2016-06-01
Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).
Lewin, M D; Sarasua, S; Jones, P A
1999-07-01
For the purpose of examining the association between blood lead levels and household-specific soil lead levels, we used a multivariate linear regression model to find a slope factor relating soil lead levels to blood lead levels. We used previously collected data from the Agency for Toxic Substances and Disease Registry's (ATSDR's) multisite lead and cadmium study. The data included the blood lead measurements (0.5 to 40.2 microg/dL) of 1015 children aged 6-71 months, and corresponding household-specific environmental samples. The environmental samples included lead in soil (18.1-9980 mg/kg), house dust (5.2-71,000 mg/kg), interior paint (0-16.5 mg/cm2), and tap water (0.3-103 microg/L). After adjusting for income, education of the parents, presence of a smoker in the household, sex, and dust lead, and using a double log transformation, we found a slope factor of 0.1388 with a 95% confidence interval of 0.09-0.19 for the dose-response relationship between the natural log of the soil lead level and the natural log of the blood lead level. The predicted blood lead level corresponding to a soil lead level of 500 mg/kg was 5.99 microg/kg with a 95% prediction interval of 2. 08-17.29. Predicted values and their corresponding prediction intervals varied by covariate level. The model shows that increased soil lead level is associated with elevated blood leads in children, but that predictions based on this regression model are subject to high levels of uncertainty and variability.
A Matlab program for stepwise regression
Yanhong Qi
2016-03-01
Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.
Welp, Gerhard; Thiel, Michael
2017-01-01
Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties–sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen–in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models–multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)–were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of
Forkuor, Gerald; Hounkpatin, Ozias K L; Welp, Gerhard; Thiel, Michael
2017-01-01
Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness
线性回归模型的Boosting变量选择方法∗%Boosting Variable Selection Algorithm for Linear Regression Models
李毓; 张春霞; 王冠伟
2015-01-01
针对线性回归模型的变量选择问题，本文基于遗传算法提出了一种新的Boosting学习方法。该方法对每一训练个体赋予权重，以遗传算法作为Boosting的基学习算法，将带有权重分布的训练集作为遗传算法的输入进行变量选择。同时，根据前一次变量选择效果的好坏更新训练集上的权重分布。重复上述步骤多次，最后以加权融合方式合并多次变量选择的结果。基于模拟和实际数据的试验结果表明，本文新提出的Boosting方法能显著提高传统遗传算法用于变量选择的质量，准确识别出与响应变量相关的协变量，这为线性回归模型的变量选择提供了一种有效的新方法。%With respect to variable selection for linear regression models, this paper proposes a novel Boosting learning method based on genetic algorithm. In the novel algorithm, all train-ing examples are firstly assigned equal weights and a traditional genetic algorithm is adopted as the base learning algorithm of Boosting. Then, the training set associated with a weight distribution is taken as the input of genetic algorithm to do variable selection. Subsequently, the weight distribution is updated according to the quality of the previous variable selection results. Through repeating the above steps for multiple times, the results are then fused via a weighted combination rule. The performance of the proposed Boosting method is investigated on some simulated and real-world data. The experimental results show that our method can significantly improve the variable selection performance of traditional genetic algorithm and accurately identify the relevant variables. Thus, the novel Boosting method can be deemed as an effective technique for handling variable selection problems in linear regression models.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Fahmy, Ossama T; Ragab, Marwa A A
2010-11-15
This manuscript discusses the application of chemometrics to the handling of HPLC response data using the internal standard method (ISM). This was performed on a model mixture containing terbutaline sulphate, guaiphenesin, bromhexine HCl, sodium benzoate and propylparaben as an internal standard. Derivative treatment of chromatographic response data of analyte and internal standard was followed by convolution of the resulting derivative curves using 8-points sin x(i) polynomials (discrete Fourier functions). The response of each analyte signal, its corresponding derivative and convoluted derivative data were divided by that of the internal standard to obtain the corresponding ratio data. This was found beneficial in eliminating different types of interferences. It was successfully applied to handle some of the most common chromatographic problems and non-ideal conditions, namely: overlapping chromatographic peaks and very low analyte concentrations. For example, a significant change in the correlation coefficient of sodium benzoate, in case of overlapping peaks, went from 0.9975 to 0.9998 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. Also a significant improvement in the precision and accuracy for the determination of synthetic mixtures and dosage forms in non-ideal cases was achieved. For example, in the case of overlapping peaks guaiphenesin mean recovery% and RSD% went from 91.57, 9.83 to 100.04, 0.78 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. This work also compares the application of Theil's method, a non-parametric regression method, in handling the response ratio data, with the least squares parametric regression method, which is considered the de facto standard method used for regression. Theil's method was found to be superior to the method of least squares as it assumes that errors could occur in both x- and y-directions and
Caselli; Daniele; Mangone; Paolillo
2000-01-15
The apparent pK(a) of dyes in water-in-oil microemulsions depends on the charge of the acid and base forms of the buffers present in the water pool. Extended principal-component analysis allows the precise determination of the apparent pK(a) and of the spectra of the acid and base forms of the dye. Combination with multiple linear regression increases the precision. The pK(a) of 7-hydroxycoumarin (umbelliferone) was spectrophotometrically measured in a water/AOT/isooctane microemulsion in the presence of a series of buffers carrying different charges at various different water/surfactant ratios. The spectra of the acid and base forms of the dye in the microemulsion are very similar to those in bulk water in the presence of Tris and ammonia. The presence of carbonate changes somewhat the spectrum of the acid form. Results are discussed taking into account the profile of the electrostatic potential drop in the water pool and the possible partition of umbelliferone between the aqueous core and the surfactant. The pK(a) values corrected for these effects are independent of w(0) and are close to the value of the pK(a) in bulk water. Copyright 2000 Academic Press.
Bernales, A. M.; Antolihao, J. A.; Samonte, C.; Campomanes, F.; Rojas, R. J.; dela Serna, A. M.; Silapan, J.
2016-06-01
The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC) and land surface temperature (LST). Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric "Effective mesh size" was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas) and looking for common predictors between LSTs of these two different farming periods.
Khanfar, Mohammad A; Taha, Mutasem O
2013-10-28
The mammalian target of rapamycin (mTOR) has an important role in cell growth, proliferation, and survival. mTOR is frequently hyperactivated in cancer, and therefore, it is a clinically validated target for cancer therapy. In this study, we combined exhaustive pharmacophore modeling and quantitative structure-activity relationship (QSAR) analysis to explore the structural requirements for potent mTOR inhibitors employing 210 known mTOR ligands. Genetic function algorithm (GFA) coupled with k nearest neighbor (kNN) and multiple linear regression (MLR) analyses were employed to build self-consistent and predictive QSAR models based on optimal combinations of pharmacophores and physicochemical descriptors. Successful pharmacophores were complemented with exclusion spheres to optimize their receiver operating characteristic curve (ROC) profiles. Optimal QSAR models and their associated pharmacophore hypotheses were validated by identification and experimental evaluation of several new promising mTOR inhibitory leads retrieved from the National Cancer Institute (NCI) structural database. The most potent hit illustrated an IC50 value of 48 nM.
McGrane, Scott J; Tetzlaff, Doerthe; Soulsby, Chris
2014-11-01
Faecal coliform (FC) bacteria were used as a proxy of faecal indicator organisms (FIOs) to assess the microbiological pollution risk for eight mesoscale catchments with increasing lowland influence across north-east Scotland. This study sought to assess the impact of urban areas on microbial contaminant fluxes. Fluxes were lowest in upland catchments where populations are relatively low. By contrast, lowland catchments with larger settlements and a greater number of grazing populations have more elevated FC concentrations throughout the year. Peak FC counts occurred during the summer months (April-September) when biological activity is at its highest. Lowland catchments experience high FC concentrations throughout the year whereas upland catchments exhibit more seasonal variations with elevated summer conditions and reduced winter concentrations. A simple linear regression model based on catchment characteristics provided scope to predict FC fluxes. Percentage of improved grazing pasture and human population explained 90 and 62 % of the variation in mean annual FC concentrations. This approach provides scope for an initial screening tool to predict the impact of urban space and agricultural practice on FC concentrations at the catchment scale and can aid in pragmatic planning and water quality improvement decisions. However, greater understanding of the short-term dynamics is still required which would benefit from higher resolution sampling than the approach undertaken here.
A. A. Zarei
2016-03-01
Full Text Available Winter dens are one of the important components of brown bear's (Ursus arctos syriacus habitat, affecting their reproduction and survival. Therefore identification of factors affecting the habitat selection and suitable denning areas in the conservation of our largest carnivore is necessary. We used Geographically Weighted Logistic Regression (GWLR and Generalized Linear Model (GLM for modeling suitability of denning habitat in Kouhkhom region in Fars province. In the present research, 20 dens (presence locations and 20 caves where signs of bear were not found (absence locations were used as dependent variables and six environmental factors were used for each location as independent variables. The results of GLM showed that variables of distance to settlements, altitude, and distance to water were the most important parameters affecting suitability of the brown bear's denning habitat. The results of GWLR showed the significant local variations in the relationship between occurrence of brown bear dens and the variable of distance to settlements. Based on the results of both models, suitable habitats for denning of the species are impassable areas in the mountains and inaccessible for humans.
Hui Wang
2014-01-01
Full Text Available Immunoglobulin A nephropathy (IgAN is a complex trait regulated by the inter-action among multiple physiologic regulatory systems and probably involving numerous genes, which leads to inconsistent findings in genetic studies. One possibility of failure to replicate some single-locus results is that the underlying genetics of IgAN nephropathy is based on multiple genes with minor effects. To learn the association between 23 single nucleotide polymorphisms (SNPs in 14 genes predisposing to chronic glomerular diseases and IgAN in Han males, the 23 SNPs genotypes of 21 Han males were detected and analyzed with a BaiO gene chip, and their asso-ciations were analyzed with univariate analysis and multiple linear regression analysis. Analysis showed that CTLA4 rs231726 and CR2 rs1048971 revealed a significant association with IgAN. These findings support the multi-gene nature of the etiology of IgAN and propose a potential gene-gene interactive model for future studies.
Yu, Jianwei; Liu, Juan; An, Wei; Wang, Yongjing; Zhang, Junzhi; Wei, Wei; Su, Ming; Yang, Min
2015-01-01
A total of 86 source water samples from 38 cities across major watersheds of China were collected for a bromide (Br(-)) survey, and the bromate (BrO3 (-)) formation potentials (BFPs) of 41 samples with Br(-) concentration >20 μg L(-1) were evaluated using a batch ozonation reactor. Statistical analyses indicated that higher alkalinity, hardness, and pH of water samples could lead to higher BFPs, with alkalinity as the most important factor. Based on the survey data, a multiple linear regression (MLR) model including three parameters (alkalinity, ozone dose, and total organic carbon (TOC)) was established with a relatively good prediction performance (model selection criterion = 2.01, R (2) = 0.724), using logarithmic transformation of the variables. Furthermore, a contour plot was used to interpret the influence of alkalinity and TOC on BrO3 (-) formation with prediction accuracy as high as 71 %, suggesting that these two parameters, apart from ozone dosage, were the most important ones affecting the BFPs of source waters with Br(-) concentration >20 μg L(-1). The model could be a useful tool for the prediction of the BFPs of source water.
Caselli, Maurizio; Mangone, Annarosa; Paolillo, Paola; Traini, Angela
2002-01-01
The pKa of 3',3",5',5"tetrabromo-m-cresolsulfonephtalein (Bromocresol Green) and o-cresolsulphonephtalein (Cresol Red) was spectrophotometrically measured in a water/AOT/isooctane microemulsion in the presence of a series of buffers carrying different charges at different water/surfactant ratios. Extended Principal Component Analysis was used for a precise determination of the apparent pKa and of the spectra of the acid and base forms of the dye. The apparent pKa of dyes in water-in-oil microemulsions depends on the charge of the acid and base forms of the buffers present in the water pool. Combination with multiple linear regression increases the precision. Results are discussed taking into account the profile of the electrostatic potential in the water pool and the possible partition of the indicator between the aqueous core and the surfactant. The pKa corrected for these effects are independent of w0 and are close to the value of the pKa in bulk water. On the basis of a tentative hypothesis it is possible to calculate the true pKa of the buffer in the pool.
Zvezdelina Lyubenova Yaneva
2013-01-01
Full Text Available The study assessed the applicability of Rhizopus oryzae dead fungi as a biosorbent medium for p-nitrophenol (p-NP removal from aqueous phase. The extent of biosorption was measured through five equilibrium sorption isotherms represented by the Langmuir, Freundlich, Redlich-Peterson, multilayer and Fritz-Schlunder models. Linear and nonlinear regression methods were compared to determine the best-fitting equilibrium model to the experimental data. A detailed error analysis was undertaken to investigate the effect of applying seven error criteria for the determination of the single-component isotherm parameters. According to the comparison of the error functions and to the estimation of the corrected Akaike information criterion (, the Freundlich equation was ranked as the first and the Fritz-Schlunder as the second best-fitting models describing the experimental data. The present investigations proved the high efficiency (94% of Rhizopus Oryzae as an alternative adsorbent for p-NP removal from aqueous phase and revealed the mechanism of the separation process.
Introduction to regression graphics
Cook, R Dennis
2009-01-01
Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava
Peluso, Marco E M; Munnia, Armelle; Ceppi, Marcello
2014-11-05
Exposures to bisphenol-A, a weak estrogenic chemical, largely used for the production of plastic containers, can affect the rodent behaviour. Thus, we examined the relationships between bisphenol-A and the anxiety-like behaviour, spatial skills, and aggressiveness, in 12 toxicity studies of rodent offspring from females orally exposed to bisphenol-A, while pregnant and/or lactating, by median and linear splines analyses. Subsequently, the meta-regression analysis was applied to quantify the behavioural changes. U-shaped, inverted U-shaped and J-shaped dose-response curves were found to describe the relationships between bisphenol-A with the behavioural outcomes. The occurrence of anxiogenic-like effects and spatial skill changes displayed U-shaped and inverted U-shaped curves, respectively, providing examples of effects that are observed at low-doses. Conversely, a J-dose-response relationship was observed for aggressiveness. When the proportion of rodents expressing certain traits or the time that they employed to manifest an attitude was analysed, the meta-regression indicated that a borderline significant increment of anxiogenic-like effects was present at low-doses regardless of sexes (β)=-0.8%, 95% C.I. -1.7/0.1, P=0.076, at ≤120 μg bisphenol-A. Whereas, only bisphenol-A-males exhibited a significant inhibition of spatial skills (β)=0.7%, 95% C.I. 0.2/1.2, P=0.004, at ≤100 μg/day. A significant increment of aggressiveness was observed in both the sexes (β)=67.9,C.I. 3.4, 172.5, P=0.038, at >4.0 μg. Then, bisphenol-A treatments significantly abrogated spatial learning and ability in males (P<0.001 vs. females). Overall, our study showed that developmental exposures to low-doses of bisphenol-A, e.g. ≤120 μg/day, were associated to behavioural aberrations in offspring.
分类线性回归的Landsat影像去云方法%Classified Linear Regression Based Landsat Image Cloud Removal Method
吴炜; 骆剑承; 沈占锋; 王卫红
2013-01-01
An approach for cloud removal based on linear regression after image classification is proposed in this article.First of all,the clouds in a remote sensing image and its referenced data to be processed are detected,from which two cloud masks are built.Then,an ISODATA classification is applied to the referenced image with the cloud mask.Next,the masked part of the contaminated image is classified with the existing clusters of the referenced data using the minimum distance method.Last,the digital numbers of the cloudy are as of the contaminated image are replaced with by the prediction value of the referenced data calculated by the linear relationships determined between clusters on the referenced image and the corresponding contaminates done according to the pixel location.This algorithm is programmed to automatically detect and remove the clouds areas in Landsat images.The accuracy of cloud detection and the prediction of original values of the cloud cover are evaluated.Results show that the proposed method is effective.%首先,对参考影像和待去云影像上的云覆盖区域进行检测和掩膜；然后,对掩膜后的参考影像进行ISODATA聚类,并建立各个类别参考影像到待去云影像灰度值的线性回归方程;再对待去云影像上的云覆盖区域,依据参考影像上的灰度值进行最小距离方法分类,划分到聚类形成的各个类别之中;最后,依据各个类别回归方程进行灰度值预测.实验结果表明,所提方法能够进行云区的检测和去除,预测精度相比传统方法有较大提高.
Esbaugh, A J; Brix, K V; Mager, E M; De Schamphelaere, K; Grosell, M
2012-03-01
The current study examined the chronic toxicity of lead (Pb) to three invertebrate species: the cladoceran Ceriodaphnia dubia, the snail Lymnaea stagnalis and the rotifer Philodina rapida. The test media consisted of natural waters from across North America, varying in pertinent water chemistry parameters including dissolved organic carbon (DOC), calcium, pH and total CO(2). Chronic toxicity was assessed using reproductive endpoints for C. dubia and P. rapida while growth was assessed for L. stagnalis, with chronic toxicity varying markedly according to water chemistry. A multi-linear regression (MLR) approach was used to identify the relative importance of individual water chemistry components in predicting chronic Pb toxicity for each species. DOC was an integral component of MLR models for C. dubia and L. stagnalis, but surprisingly had no predictive impact on chronic Pb toxicity for P. rapida. Furthermore, sodium and total CO(2) were also identified as important factors affecting C. dubia toxicity; no other factors were predictive for L. stagnalis. The Pb toxicity of P. rapida was predicted by calcium and pH. The predictive power of the C. dubia and L. stagnalis MLR models was generally similar to that of the current C. dubia BLM, with R(2) values of 0.55 and 0.82 for the respective MLR models, compared to 0.45 and 0.79 for the respective BLMs. In contrast the BLM poorly predicted P. rapida toxicity (R(2)=0.19), as compared to the MLR (R(2)=0.92). The cross species variability in the effects of water chemistry, especially with respect to rotifers, suggests that cross species modeling of invertebrate chronic Pb toxicity using a C. dubia model may not always be appropriate.
Keszler, Agnes; Kalyanaraman, B; Hogg, Neil
2003-11-01
The kinetics of the reaction between superoxide and the spin trapping agents 5,5-dimethyl-1-pyrroline N-oxide (DMPO), 5-(diethoxyphosphoryl)-5-methyl-1-pyrroline N-oxide (DEPMPO), and 5-tert-butoxycarbonyl-5-methyl-1-pyrroline N-oxide (BMPO) were re-examined in the superoxide-generating xanthine/xanthine oxidase system, by competition with spontaneous dismutation. The approach used singular value decomposition (SVD), multiple linear regression, and spectral simulation. The experiments were carried out using a two-syringe mixing arrangement with fast scan acquisition of 100 consecutive EPR spectra. Using SVD analysis, the extraction of both temporal and spectral information could be obtained from in a single run. The superoxide spin adduct was the exclusive EPR active species in the case of DEPMPO and BMPO, and the major component when DMPO was used. In the latter case a very low concentration of hydroxyl adduct was also observed, which did not change during the decay of the DMPO-superoxide adduct. This indicates that the hydroxyl radical adduct is not formed from the spontaneous decay of the superoxide radical adduct, as has been previously suggested [correction]. It was established that in short-term studies (up to 100 s) DMPO was the superior spin trapping agent, but for reaction times longer than 100 s the other two spin traps were more advantageous. The second order rate constants for the spin trapping reaction were found to be DMPO (2.4 M(-1)s(-1)), DEPMPO (0.53 M(-1)s(-1)), and BMPO (0.24 M(-1)s(-1)) determined through competition with spontaneous dismutation of superoxide, at pH 7.4 and 20 degrees C.
Barbu, N.; Cuculeanu, V.; Stefan, S.
2016-10-01
The aim of this study is to investigate the relationship between the frequency of very warm days (TX90p) in Romania and large-scale atmospheric circulation for winter (December-February) and summer (June-August) between 1962 and 2010. In order to achieve this, two catalogues from COST733Action were used to derive daily circulation types. Seasonal occurrence frequencies of the circulation types were calculated and have been utilized as predictors within the multiple linear regression model (MLRM) for the estimation of winter and summer TX90p values for 85 synoptic stations covering the entire Romania. A forward selection procedure has been utilized to find adequate predictor combinations and those predictor combinations were tested for collinearity. The performance of the MLRMs has been quantified based on the explained variance. Furthermore, the leave-one-out cross-validation procedure was applied and the root-mean-squared error skill score was calculated at station level in order to obtain reliable evidence of MLRM robustness. From this analysis, it can be stated that the MLRM performance is higher in winter compared to summer. This is due to the annual cycle of incoming insolation and to the local factors such as orography and surface albedo variations. The MLRM performances exhibit distinct variations between regions with high performance in wintertime for the eastern and southern part of the country and in summertime for the western part of the country. One can conclude that the MLRM generally captures quite well the TX90p variability and reveals the potential for statistical downscaling of TX90p values based on circulation types.
Nakamura, Kengo; Yasutaka, Tetsuo; Kuwatani, Tatsu; Komai, Takeshi
2017-11-01
In this study, we applied sparse multiple linear regression (SMLR) analysis to clarify the relationships between soil properties and adsorption characteristics for a range of soils across Japan and identify easily-obtained physical and chemical soil properties that could be used to predict K and n values of cadmium, lead and fluorine. A model was first constructed that can easily predict the K and n values from nine soil parameters (pH, cation exchange capacity, specific surface area, total carbon, soil organic matter from loss on ignition and water holding capacity, the ratio of sand, silt and clay). The K and n values of cadmium, lead and fluorine of 17 soil samples were used to verify the SMLR models by the root mean square error values obtained from 512 combinations of soil parameters. The SMLR analysis indicated that fluorine adsorption to soil may be associated with organic matter, whereas cadmium or lead adsorption to soil is more likely to be influenced by soil pH, IL. We found that an accurate K value can be predicted from more than three soil parameters for most soils. Approximately 65% of the predicted values were between 33 and 300% of their measured values for the K value; 76% of the predicted values were within ±30% of their measured values for the n value. Our findings suggest that adsorption properties of lead, cadmium and fluorine to soil can be predicted from the soil physical and chemical properties using the presented models. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hu, L; Liang, M; Mouraux, A; Wise, R G; Hu, Y; Iannetti, G D
2011-12-01
Across-trial averaging is a widely used approach to enhance the signal-to-noise ratio (SNR) of event-related potentials (ERPs). However, across-trial variability of ERP latency and amplitude may contain physiologically relevant information that is lost by across-trial averaging. Hence, we aimed to develop a novel method that uses 1) wavelet filtering (WF) to enhance the SNR of ERPs and 2) a multiple linear regression with a dispersion term (MLR(d)) that takes into account shape distortions to estimate the single-trial latency and amplitude of ERP peaks. Using simulated ERP data sets containing different levels of noise, we provide evidence that, compared with other approaches, the proposed WF+MLR(d) method yields the most accurate estimate of single-trial ERP features. When applied to a real laser-evoked potential data set, the WF+MLR(d) approach provides reliable estimation of single-trial latency, amplitude, and morphology of ERPs and thereby allows performing meaningful correlations at single-trial level. We obtained three main findings. First, WF significantly enhances the SNR of single-trial ERPs. Second, MLR(d) effectively captures and measures the variability in the morphology of single-trial ERPs, thus providing an accurate and unbiased estimate of their peak latency and amplitude. Third, intensity of pain perception significantly correlates with the single-trial estimates of N2 and P2 amplitude. These results indicate that WF+MLR(d) can be used to explore the dynamics between different ERP features, behavioral variables, and other neuroimaging measures of brain activity, thus providing new insights into the functional significance of the different brain processes underlying the brain responses to sensory stimuli.
Hu, L.; Liang, M.; Mouraux, A.; Wise, R. G.; Hu, Y.
2011-01-01
Across-trial averaging is a widely used approach to enhance the signal-to-noise ratio (SNR) of event-related potentials (ERPs). However, across-trial variability of ERP latency and amplitude may contain physiologically relevant information that is lost by across-trial averaging. Hence, we aimed to develop a novel method that uses 1) wavelet filtering (WF) to enhance the SNR of ERPs and 2) a multiple linear regression with a dispersion term (MLRd) that takes into account shape distortions to estimate the single-trial latency and amplitude of ERP peaks. Using simulated ERP data sets containing different levels of noise, we provide evidence that, compared with other approaches, the proposed WF+MLRd method yields the most accurate estimate of single-trial ERP features. When applied to a real laser-evoked potential data set, the WF+MLRd approach provides reliable estimation of single-trial latency, amplitude, and morphology of ERPs and thereby allows performing meaningful correlations at single-trial level. We obtained three main findings. First, WF significantly enhances the SNR of single-trial ERPs. Second, MLRd effectively captures and measures the variability in the morphology of single-trial ERPs, thus providing an accurate and unbiased estimate of their peak latency and amplitude. Third, intensity of pain perception significantly correlates with the single-trial estimates of N2 and P2 amplitude. These results indicate that WF+MLRd can be used to explore the dynamics between different ERP features, behavioral variables, and other neuroimaging measures of brain activity, thus providing new insights into the functional significance of the different brain processes underlying the brain responses to sensory stimuli. PMID:21880936
Dikaios, Nikolaos; Atkinson, David; Tudisca, Chiara; Purpura, Pierpaolo; Forster, Martin; Ahmed, Hashim; Beale, Timothy; Emberton, Mark; Punwani, Shonit
2017-03-01
The aim of this work is to compare Bayesian Inference for nonlinear models with commonly used traditional non-linear regression (NR) algorithms for estimating tracer kinetics in Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI). The algorithms are compared in terms of accuracy, and reproducibility under different initialization settings. Further it is investigated how a more robust estimation of tracer kinetics affects cancer diagnosis. The derived tracer kinetics from the Bayesian algorithm were validated against traditional NR algorithms (i.e. Levenberg-Marquardt, simplex) in terms of accuracy on a digital DCE phantom and in terms of goodness-of-fit (Kolmogorov-Smirnov test) on ROI-based concentration time courses from two different patient cohorts. The first cohort consisted of 76 men, 20 of whom had significant peripheral zone prostate cancer (any cancer-core-length (CCL) with Gleason>3+3 or any-grade with CCL>=4mm) following transperineal template prostate mapping biopsy. The second cohort consisted of 9 healthy volunteers and 24 patients with head and neck squamous cell carcinoma. The diagnostic ability of the derived tracer kinetics was assessed with receiver operating characteristic area under curve (ROC AUC) analysis. The Bayesian algorithm accurately recovered the ground-truth tracer kinetics for the digital DCE phantom consistently improving the Structural Similarity Index (SSIM) across the 50 different initializations compared to NR. For optimized initialization, Bayesian did not improve significantly the fitting accuracy on both patient cohorts, and it only significantly improved the ve ROC AUC on the HN population from ROC AUC=0.56 for the simplex to ROC AUC=0.76. For both cohorts, the values and the diagnostic ability of tracer kinetic parameters estimated with the Bayesian algorithm weren't affected by their initialization. To conclude, the Bayesian algorithm led to a more accurate and reproducible quantification of tracer kinetic
Callén, M S; López, J M; Mastral, A M
2010-08-15
The estimation of benzo(a)pyrene (BaP) concentrations in ambient air is very important from an environmental point of view especially with the introduction of the Directive 2004/107/EC and due to the carcinogenic character of this pollutant. A sampling campaign of particulate matter less or equal than 10 microns (PM10) carried out during 2008-2009 in four locations of Spain was collected to determine experimentally BaP concentrations by gas chromatography mass-spectrometry mass-spectrometry (GC-MS-MS). Multivariate linear regression models (MLRM) were used to predict BaP air concentrations in two sampling places, taking PM10 and meteorological variables as possible predictors. The model obtained with data from two sampling sites (all sites model) (R(2)=0.817, PRESS/SSY=0.183) included the significant variables like PM10, temperature, solar radiation and wind speed and was internally and externally validated. The first validation was performed by cross validation and the last one by BaP concentrations from previous campaigns carried out in Zaragoza from 2001-2004. The proposed model constitutes a first approximation to estimate BaP concentrations in urban atmospheres with very good internal prediction (Q(CV)(2)=0.813, PRESS/SSY=0.187) and with the maximal external prediction for the 2001-2002 campaign (Q(ext)(2)=0.679 and PRESS/SSY=0.321) versus the 2001-2004 campaign (Q(ext)(2)=0.551, PRESS/SSY=0.449).
Quantitative Analysis of Yu Ebao Based on Linear Regression Model%基于线性回归模型的余额宝价值分析
刘冬青
2014-01-01
余额宝以其较低的门槛让更多的人接触到货币基金。通过介绍余额宝的主体框架以及2013年余额宝的收益和费用情况，建立线性回归模型，定量分析了余额宝的收益。经过研究发现：可以怀疑在高额利益的背后余额宝公司可能存在前期的贴息问题以吸引客户；余额宝通过协议存款将利润从银行转给客户的同时，并且在一定程度上也把风险转嫁给了客户；余额宝给金融行业带来了革命性的创新是不可否认的，同时也带来了潜在的系统性风险。%Yu Ebao,with its low threshold, let more people access to the monetary fund.Through detailed introduction of the main frame of Yu Ebao ,and the income and expenses of Yu Ebao.Establish a linear regression model,to analysis the Yu Ebao quantitatively.Through the study found that: in order to attract customers,Yu Ebao may have the problem of interests behind the high profits;Yu Ebao transfer the profit to customers from bank through agreement deposits ,and to a certain extent also the risk on to their customers at the same time ;Yu Ebao has brought the revolutionary innovation to the financial industry is undeniable,but also brought the potential systemic risk.
A. M. Bernales
2016-06-01
Full Text Available The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC and land surface temperature (LST. Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric “Effective mesh size” was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas and looking for common predictors between LSTs of these two different farming periods.
Canciam, Cesar Augusto [Universidade Tecnologica Federal do Parana (UTFPR), Campus Ponta Grossa, PR (Brazil)], e-mail: canciam@utfpr.edu.br
2012-07-01
When evaluating the consumption of bio fuels, the knowledge of the density is of great importance for rectify the effect of temperature. The thermal expansion coefficient is a thermodynamic property that provides a measure of the density variation in response to temperature variation, keeping the pressure constant. This study aimed to predict the thermal expansion coefficients of ethyl bio diesels from castor beans, soybeans, sunflower seeds and Mabea fistulifera Mart. oils and of methyl bio diesels from soybeans, sunflower seeds, souari nut, cotton, coconut, castor beans and palm oils, from beef tallow, chicken fat and hydrogenated vegetable fat residual. For this purpose, there was a linear regression analysis of the density of each bio diesel a function of temperature. These data were obtained from other works. The thermal expansion coefficients for bio diesels are between 6.3729x{sup 10-4} and 1.0410x10{sup -3} degree C-1. In all the cases, the correlation coefficients were over 0.99. (author)
Liu, Tong-Zu; Xu, Chang; Rota, Matteo; Cai, Hui; Zhang, Chao; Shi, Ming-Jun; Yuan, Rui-Xia; Weng, Hong; Meng, Xiang-Yu; Kwong, Joey S W; Sun, Xin
2017-04-01
Approximately 27-37% of the general population experience prolonged sleep duration and 12-16% report shortened sleep duration. However, prolonged or shortened sleep duration may be associated with serious health problems. A comprehensive, flexible, non-linear meta-regression with restricted cubic spline (RCS) was used to investigate the dose-response relationship between sleep duration and all-cause mortality in adults. Medline (Ovid), Embase, EBSCOhost-PsycINFO, and EBSCOhost-CINAHL Plus databases, reference lists of relevant review articles, and included studies were searched up to Nov. 29, 2015. Prospective cohort studies investigating the association between sleep duration and all-cause mortality in adults with at least three categories of sleep duration were eligible for inclusion. We eventually included in our study 40 cohort studies enrolling 2,200,425 participants with 271,507 deaths. A J-shaped association between sleep duration and all-cause mortality was present: compared with 7 h of sleep (reference for 24-h sleep duration), both shortened and prolonged sleep durations were associated with increased risk of all-cause mortality (4 h: relative risk [RR] = 1.05; 95% confidence interval [CI] = 1.02-1.07; 5 h: RR = 1.06; 95% CI = 1.03-1.09; 6 h: RR = 1.04; 95% CI = 1.03-1.06; 8 h: RR = 1.03; 95% CI = 1.02-1.05; 9 h: RR = 1.13; 95% CI = 1.10-1.16; 10 h: RR = 1.25; 95% CI = 1.22-1.28; 11 h: RR = 1.38; 95% CI = 1.33-1.44; n = 29; P < 0.01 for non-linear test). With regard to the night-sleep duration, prolonged night-sleep duration was associated with increased all-cause mortality (8 h: RR = 1.01; 95% CI = 0.99-1.02; 9 h: RR = 1.08; 95% CI = 1.05-1.11; 10 h: RR = 1.24; 95% CI = 1.21-1.28; n = 13; P < 0.01 for non-linear test). Subgroup analysis showed females with short sleep duration a day (<7 h) were at high risk of all-cause mortality (4 h: RR = 1.07; 95% CI = 1.02-1.13; 5 h: RR = 1.08; 95
RAO Calyampudi R; WU YueHua
2009-01-01
In this paper, the constrained M-estimation of the regression coefficients and scatter parameters in a general multivariate linear regression model is considered. Since the constrained Mestimation is not easy to compute, an up-dating recursion procedure is proposed to simplify the computation of the estimators when a new observation is obtained. We show that, under mild conditions,the recursion estimates are strongly consistent. In addition, the asymptotic normality of the recursive constrained M-estimators of regression coefficients is established. A Monte Carlo simulation study of the recursion estimates is also provided. Besides, robustness and asymptotic behavior of constrained M-estimators are briefly discussed.
Kanazashi, Miho; Tanaka, Masayuki; Murakami, Shinichiro; Kondo, Hiroyo; Nagatomo, Fumiko; Ishihara, Akihiko; Roy, Roland R; Fujino, Hidemi
2014-08-01
A chronic decrease in neuromuscular activity (activation and/or loading) results in muscle atrophy and capillary regression that are due, in part, to the overproduction of reactive oxygen species. We have reported that antioxidant treatment with astaxanthin attenuates the overexpression of reactive oxygen species in atrophied muscles that, in turn, ameliorates capillary regression in hindlimb-unloaded rats. Astaxanthin supplementation, however, had little effect on muscle mass and fibre cross-sectional area. In contrast, intermittent loading of the hindlimbs of hindlimb-unloaded rats ameliorates muscle atrophy. Therefore, we hypothesized that the combination of astaxanthin supplementation and intermittent loading would attenuate both muscle atrophy and capillary regression during hindlimb unloading. As expected, 2 weeks of hindlimb unloading resulted in atrophy, a decrease in capillary volume and a shift towards smaller-diameter capillaries in the soleus muscle. Intermittent loading alone (1 h of cage ambulation per day) attenuated atrophy of the soleus, while astaxanthin treatment alone maintained the capillary network to near control levels. The combination of intermittent loading and astaxanthin treatment, however, ameliorated atrophy of the soleus and maintained the capillary volume and luminal diameters and the superoxide dismutase-1 protein levels near control values. These results indicate that intermittent loading combined with astaxanthin supplementation could be an effective therapy for both the muscle atrophy and the capillary regression associated with a chronic decrease in neuromuscular activity.
Honório Sampaio Menezes
2009-08-01
Full Text Available PURPOSE: To compare body weight and length, heart weight and length, heart-to-body weight ratio, glycemia, and morphometric cellular data of offspring of diabetic rats (ODR and of normal rats (control. METHODS: Diabetes was induced in 3 pregnant Wistar rats, bearing 30 rats, on the 11th day after conception by intraperitoneal injection of 50 mg/kg of streptozotocin. Six normal pregnant Wistar rats, bearing 50 rats, made up the control group. Morphometric data were obtained using a scale for the weight, length, heart and body measurements. Morphometric cellular data were obtained by a computer assisted method applied to the measurements of myocytes. Statistical analysis utilized Student's t-test, ANOVA and Levene test. RESULTS: Control offspring had greater mean body weight and length than offspring of diabetic rats (p OBJETIVO: Comparar as medidas cardíacas e a morfometria celular miocárdica dos filhotes de ratas diabéticas (FRD com filhotes de ratas normais (FRN. MÉTODOS: Foram estudados 30 filhotes de 3 ratas Wistar com diabetes gestacional induzido por 50mg/kg de estreptozotocina, no 11° dia após a concepção. O grupo controle foi de 50 filhotes de 6 ratas Wistar normais. As medidas de comprimento, peso corporal e peso cardíaco foram realizadas com paquímetro e balança e as medidas celulares por analisador computadorizado de imagem. A análise estatística usou o Teste t de Student, ANOVA e teste de Levene. RESULTADOS: A média de peso e comprimento dos filhotes, desde o nascimento até os 21 dias de vida, foi significativamente maior (p<0,001 no grupo dos FRN. O peso, tamanho cardíaco e a proporção cardíaca dos FRD, ao nascimento, foi, significativamente, maior (p<0,001, regredindo ao longo dos 21 dias de vida. Os FRD apresentaram uma regressão significativa da área e perímetro nuclear (p<0,01 do nascimento aos 21 dias de vida, o mesmo não ocorrendo no grupo controle. CONCLUSÕES: Os FRD apresentaram, ao nascimento, maior
M.A. Mousavi Shalmani
2014-08-01
Full Text Available In order to assessment of water quality and characterize seasonal variation in 18O and 2H in relation with different chemical and physiographical parameters and modelling of effective parameters, an study was conducted during 2010 to 2011 in 30 different ponds in the north of Iran. Samples were collected at three different seasons and analysed for chemical and isotopic components. Data shows that highest amounts of δ18O and δ2H were recorded in the summer (-1.15‰ and -12.11‰ and the lowest amounts were seen in the winter (-7.50‰ and -47.32‰ respectively. Data also reveals that there is significant increase in d-excess during spring and summer in ponds 20, 21, 22, 24, 25 and 26. We can conclude that residual surface runoff (from upper lands is an important source of water to transfer soluble salts in to these ponds. In this respect, high retention time may be the main reason for movements of light isotopes in to the ponds. This has led d-excess of pond 12 even greater in summer than winter. This could be an acceptable reason for ponds 25 and 26 (Siyahkal county with highest amount of d-excess and lowest amounts of δ18O and δ2H. It seems light water pumped from groundwater wells with minor source of salt (originated from sea deep percolation in to the ponds, could may be another reason for significant decrease in the heavy isotopes of water (18O and 2H for ponds 2, 12, 14 and 25 from spring to summer. Overall conclusion of multiple linear regression test indicate that firstly from 30 variables (under investigation only a few cases can be used for identifying of changes in 18O and 2H by applications. Secondly, among the variables (studied, phytoplankton content was a common factor for interpretation of 18O and 2H during spring and summer, and also total period (during a year. Thirdly, the use of water in the spring was recommended for sampling, for 18O and 2H interpretation compared with other seasons. This is because of function can be
Masuda, Takanori; Nakaura, Takeshi; Funama, Yoshinori; Higaki, Toru; Kiguchi, Masao; Imada, Naoyuki; Sato, Tomoyasu; Awai, Kazuo
We evaluated the effect of the age, sex, total body weight (TBW), height (HT) and cardiac output (CO) of patients on aortic and hepatic contrast enhancement during hepatic-arterial phase (HAP) and portal venous phase (PVP) computed tomography (CT) scanning. This prospective study received institutional review board approval; prior informed consent to participate was obtained from all 168 patients. All were examined using our routine protocol; the contrast material was 600 mg/kg iodine. Cardiac output was measured with a portable electrical velocimeter within 5 minutes of starting the CT scan. We calculated contrast enhancement (per gram of iodine: [INCREMENT]HU/gI) of the abdominal aorta during the HAP and of the liver parenchyma during the PVP. We performed univariate and multivariate linear regression analysis between all patient characteristics and the [INCREMENT]HU/gI of aortic- and liver parenchymal enhancement. Univariate linear regression analysis demonstrated statistically significant correlations between the [INCREMENT]HU/gI and the age, sex, TBW, HT, and CO (all P linear regression analysis showed that only the TBW and CO were of independent predictive value (P linear regression analysis only the TBW and CO were significantly correlated with aortic and liver parenchymal enhancement; the age, sex, and HT were not. The CO was the only independent factor affecting aortic and liver parenchymal enhancement at hepatic CT when the protocol was adjusted for the TBW.
Ramoelo, Abel
2013-06-01
Full Text Available squares regression (PLSR) for predicting grass N and P concentrations through integrating in situ hyperspectral remote sensing and environmental variables (climatic, edaphic and topographic). Data were collected along a land use gradient in the greater...
Adriana Bugno
2007-09-01
Full Text Available The antimicrobial activity of Curcuma zedoaria (Christm Roscoe extract against some oral microorganisms was compared with the antimicrobial activity of five commercial mouthrinses in order to evaluate the potential of the plant extract to be incorporated into formulas for improving or creating antiseptic activity. The in vitro antimicrobial efficacy of plant extracts and commercial products were evaluated against Streptococcus mutans, Enterococcus faecalis, Staphylococcus aureus and Candida albicans using a linear regression method to evaluate the microbial reduction obtained in function of the exposure time, considering as effectiveness a 99.999% reduction in count of standardized microbial populations within 60 seconds. The results showed that the antimicrobial efficacy of Curcuma zedoaria (Christm Roscoe extract was similar to that of commercial products, and its incorporation into a mouthrinse could be an alternative for improving the antimicrobial efficacy of the oral product.A atividade antimicrobiana do extrato de Curcuma zedoaria (Christm Roscoe contra algumas bactérias da microbiota bucal foi comparada com a atividade antimicrobiana de cinco anti-sépticos comerciais, a fim de avaliar o potencial do extrato vegetal de ser incorporado em formulações com a finalidade de melhorar ou conferir atividade anti-séptica. A eficácia antimicrobiana in vitro do extrato vegetal e produtos comerciais foi avaliada frente a Streptococcus mutans,Enterococcus faecalis,Staphylococcus aureus e Candida albicans, utilizando o método de regressão linear para avaliar a redução microbiana obtida em função do tempo de exposição, considerando como eficácia a redução de 99,999% na contagem de população microbiana padronizada em 60 segundos. Os resultados demonstraram que a eficácia antimicrobiana do extrato de Curcuma zedoaria (Christm Roscoe foi similar a de produtos comerciais e que sua incorporação em anti-sépticos bucais pode ser uma
Giorgio M. Ribeiro
2005-03-01
Full Text Available Vários estudos vêm sendo realizados ultimamente, com o propósito de se avaliar a qualidade de água de irrigação na região semi-árida do Nordeste brasileiro. Em alguns desses estudos, os autores têm ajustado diversas características químicas, como cálcio, magnésio, sódio e cloreto e soma de cátions, em função da condutividade elétrica (CE através de equações empíricas; porém atenção deve ser dada às variações temporal e espacial dessas variáveis. Objetivou-se, com o presente trabalho, avaliar a influência da fonte, da época e do tipo de solo sobre a condutividade elétrica, em função dos íons da água de irrigação, utilizando-se a regressão linear. Foi utilizado um banco de dados composto por 562 análises, oriundas de 55 propriedades rurais. As determinações químicas feitas nas amostras de águas, foram: pH, CE, Ca2+, Mg2+, Na+, K+, Cl-, HCO3-, CO3(2- e SO4(2-. A partir de janeiro de 1988 realizaram-se amostragens nas propriedades, até 411 dias. O banco de dados foi dividido em 14 épocas de amostragem, três fontes (poço, rio e açude e para 10 solos. Para se comparar as equações ajustadas, empregou-se o teste de identidade de modelo, cujos resultados mostraram que as equações lineares ajustadas com a condutividade elétrica em função dos teores de cálcio, magnésio, potássio, sódio, cloreto, bicarbonato, carbonato e sulfato variaram significativamente com a época de amostragem, a fonte de água e com o tipo de solo.Several studies have been accomplished lately to evaluate irrigation water quality in the semi-arid region of the Northeast Brazil. In some of these studies, the authors have adjusted some chemical characteristics such as calcium, magnesium, sodium, chloride and sum of cations as a function of electrical conductivity (EC through empirical equations, however attention should be given to temporal and spatial variations. In this paper, the influence of water source, time of sampling
T.A. Uggere
2000-05-01
Full Text Available Cardiopulmonary reflexes are activated via changes in cardiac filling pressure (volume-sensitive reflex and chemical stimulation (chemosensitive reflex. The sensitivity of the cardiopulmonary reflexes to these stimuli is impaired in the spontaneously hypertensive rat (SHR and other models of hypertension and is thought to be associated with cardiac hypertrophy. The present study investigated whether the sensitivity of the cardiopulmonary reflexes in SHR is restored when cardiac hypertrophy and hypertension are reduced by enalapril treatment. Untreated SHR and WKY rats were fed a normal diet. Another groups of rats were treated with enalapril (10 mg kg-1 day-1, mixed in the diet; SHRE or WKYE for one month. After treatment, the volume-sensitive reflex was evaluated in each group by determining the decrease in magnitude of the efferent renal sympathetic nerve activity (RSNA produced by acute isotonic saline volume expansion. Chemoreflex sensitivity was evaluated by examining the bradycardia response elicited by phenyldiguanide administration. Cardiac hypertrophy was determined from the left ventricular/body weight (LV/BW ratio. Volume expansion produced an attenuated renal sympathoinhibitory response in SHR as compared to WKY rats. As compared to the levels observed in normotensive WKY rats, however, enalapril treatment restored the volume expansion-induced decrease in RSNA in SHRE. SHR with established hypertension had a higher LV/BW ratio (45% as compared to normotensive WKY rats. With enalapril treatment, the LV/BW ratio was reduced to 19% in SHRE. Finally, the reflex-induced bradycardia response produced by phenyldiguanide was significantly attenuated in SHR compared to WKY rats. Unlike the effects on the volume reflex, the sensitivity of the cardiac chemosensitive reflex to phenyldiguanide was not restored by enalapril treatment in SHRE. Taken together, these results indicate that the impairment of the volume-sensitive, but not the
Datta N
2005-01-01
Full Text Available BACKGROUND: Tumor regression parameters and time factor during external radiotherapy (EXTRT are of paramount importance. AIMS: To quantify the parameters of tumor regression and time factor during EXTRT in cancer cervix. SETTINGS AND DESIGN: Patients, treated solely with radiotherapy and enrolled for other prospective studies having weekly tumor regressions recorded were considered. MATERIALS AND METHODS: Seventy-seven patients received 50Gy of EXTRT followed by intracavitary brachytherapy. Loco-regional regressions were assessed clinically and regression fraction (RF was represented as RF = c + a1D + a2D2- a3T, with c, D and T as constant, cumulative EXTRT dose and treatment time respectively. STATISTICAL ANALYSIS USED: Step wise linear regression was performed for RF. Scatter plots were fitted using linear-quadratic fit. RESULTS: Coefficients of parameters D, D2 and T were computed for various dose intervals, namely 0-20 Gy, 0-30 Gy, 0-40 Gy and 0-50 Gy. At 0-20 Gy and 0-30 Gy, only the coefficient of D2 was significant (P < 0.001, while both D2 and T turned significant (P < 0.001 at 0-40 Gy. For the entire range of 0-50 Gy, all the coefficients of D, D2 and T showed significance, leading to an estimate of 26 Gy for a1/a2 and 0.96 Gy/day for a3/a1. CONCLUSIONS: As with a/β and g/a of post-irradiation cell survival curves, a1/a2 and a3/a1 represents the cumulative effect of various radiobiological factors influencing clinical regression of tumor during the course of EXTRT. The dynamic changes in the coefficients of D, D2sub and T, indicate their relative importance during various phases of EXTRT.
Geospatial measurements of ancillary sensor data, such as bulk soil electrical conductivity or remotely sensed imagery data, are commonly used to characterize spatial variation in soil or crop properties. Geostatistical techniques like kriging with external drift or regression kriging are often use...
Maleki, Afshin; Daraei, Hiua; Alaei, Loghman; Faraji, Aram
2014-01-01
Four stepwise multiple linear regressions (SMLR) and a genetic algorithm (GA) based multiple linear regressions (MLR), together with artificial neural network (ANN) models, were applied for quantitative structure-activity relationship (QSAR) modeling of dissociation constants (Kd) of 62 arylsulfonamide (ArSA) derivatives as human carbonic anhydrase II (HCA II) inhibitors. The best subsets of molecular descriptors were selected by SMLR and GA-MLR methods. These selected variables were used to generate MLR and ANN models. The predictability power of models was examined by an external test set and cross validation. In addition, some tests were done to examine other aspects of the models. The results show that for certain purposes GA-MLR is better than SMLR and for others, ANN overcomes MLR models.
Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.
1976-01-01
A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
2012-06-15
correlations of the individual variables with each other in a one to one relationship. Variance Inflation Factor ( VIF ) analysis was also conducted to further...accuracy of the model. 31 VIF determines if the variances of the estimated coefficients in the regression model are inflated due to...larger the variance of bk (Simon, 2004). The VIF itself is comprised of the last portion of the equation: 1 1 According to Dr. Simon, VIF values
Singh, Y.; Nair, R.R.; Singh, H.; Datta, P.; Jaiswal, P.; Dewangan, P.; Ramprasad, T.
-Godavari basin. Log prediction process, with uncertainties based on root mean square error properties, was implemented by way of a multi-layer feed forward neural network. The log properties were merged with seismic data by applying a non-linear transform...
Tomlinson, Sean
2016-04-01
The calculation and comparison of physiological characteristics of thermoregulation has provided insight into patterns of ecology and evolution for over half a century. Thermoregulation has typically been explored using linear techniques; I explore the application of non-linear scaling to more accurately calculate and compare characteristics and thresholds of thermoregulation, including the basal metabolic rate (BMR), peak metabolic rate (PMR) and the lower (Tlc) and upper (Tuc) critical limits to the thermo-neutral zone (TNZ) for Australian rodents. An exponentially-modified logistic function accurately characterised the response of metabolic rate to ambient temperature, while evaporative water loss was accurately characterised by a Michaelis-Menten function. When these functions were used to resolve unique parameters for the nine species studied here, the estimates of BMR and TNZ were consistent with the previously published estimates. The approach resolved differences in rates of metabolism and water loss between subfamilies of Australian rodents that haven't been quantified before. I suggest that non-linear scaling is not only more effective than the established segmented linear techniques, but also is more objective. This approach may allow broader and more flexible comparison of characteristics of thermoregulation, but it needs testing with a broader array of taxa than those used here.
Zhao, Feng; Wang, Yi-Xiang J.; Yuan, Jing; Deng, Min; Ahuja, Anil T. [Chinese University of Hong Kong, Department of Imaging and Interventional Radiology, Prince of Wales Hospital, Hong Kong SAR (China); Wong, Hing Lok [School of Public Health and Primary Care, Prince of Wales Hospital, The Chinese University of Hong Kong, Jockey Club Centre for Osteoporosis Care and Control, Hong Kong SAR (China); Chu, Eagle S.H.; Go, Minnie Y.Y.; Yu, Jun [Chinese University of Hong Kong, Institute of Digestive Disease and Department of Medicine and Therapeutics, Li Ka Shing Institute of Health Sciences, Hong Kong SAR (China); Teng, Gao-Jun [Southeast University, Department of Radiology, Zhongda Hospital, Nanjing (China)
2012-08-15
Recently it was shown that the magnetic resonance imaging (MRI) T1{rho} value increased with the severity of liver fibrosis in rats with bile duct ligation. Using a rat carbon tetrachloride (CCl{sub 4}) liver injury model, this study further investigated the merit of T1{rho} relaxation for liver fibrosis evaluation. Male Sprague-Dawley rats received intraperitoneal injection of 2 ml/kg CCl{sub 4} twice weekly for up to 6 weeks. Then CCl{sub 4} was withdrawn and the animals were allowed to recover. Liver T1{rho} MRI and conventional T2-weighted images were acquired. Animals underwent MRI at baseline and at 2 days, 2 weeks, 4 weeks and 6 weeks post CCl{sub 4} injection, and they were also examined at 1 week and 4 weeks post CCl{sub 4} withdrawal. Liver histology was also sampled at these time points. Liver T1{rho} values increased slightly, though significantly, on day 2, and then increased further and were highest at week 6 post CCl{sub 4} insults. The relative liver signal intensity change on T2-weighted images followed a different time course compared with that of T1{rho}. Liver T1{rho} values decreased upon the withdrawal of the CCl{sub 4} insult. Histology confirmed the animals had typical CCl{sub 4} liver injury and fibrosis progression and regression processes. MR T1{rho} imaging can monitor CCl{sub 4}-induced liver injury and fibrosis. (orig.)
Credit Scoring Problem Based on Regression Analysis
Khassawneh, Bashar Suhil Jad Allah
2014-01-01
ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....
陈璐璐
2016-01-01
首先建立股票价格的多元线性回归方程，使用EVIEWS软件计算回归系数，对回归系数进行经济意义的检验和统计检验；然后利用计量经济学课程内容检验回归方程是否存在多重共线性、异方差性、自相关性等情况；接着对模型进行改进，得到的回归方程可决系数较大，并且满足多元线性回归方程的古典假定；最后将改进后的模型应用于目标预测日的开盘价预测，预测误差在可以接受的范围之内。%we set up the stock price of multivariate linear regression equation firstly,using EVIEWS software calculating regression coefficient and economic significance of regression coefficients of inspection and statistics; Then by using the regression equation of econometrics course content test,the presence of multicollinearity,heteroscedasticity,since the correlation,and so on and so forth; improve the model with learned theory,the regression equation of determination coefficient is larger,and multiple linear regression equation of the classical assumptions; Finally the improved model was applied to target forecast day opening price forecasting,prediction error within the acceptable range.
Cavalcanti T.C.
2003-01-01
Full Text Available Two variants (A and B of the widely employed Walker 256 rat tumor cells are known. When inoculated sc, the A variant produces solid, invasive, highly metastasizing tumors that cause severe systemic effects and death. We have obtained a regressive variant (AR whose sc growth is slower, resulting in 70-80% regression followed by development of immunity against A and AR variants. Simultaneously with the beginning of tumor regression, a temporary anemia developed (~8 days duration, accompanied by marked splenomegaly (~300% and changes in red blood cell osmotic fragility, with mean corpuscular fragility increasing from 4.1 to 6.5 g/l NaCl. The possibility was raised that plasma factors associated with the immune response induced these changes. In the present study, we identify and compare the osmotic fragility increasing activity of plasma fractions obtained from A and AR tumor bearers at different stages of tumor development. The results showed that by day 4 compounds precipitating in 60% (NH42SO4 and able to increase red blood cell osmotic fragility appeared in the plasma of A and AR tumor bearers. Later, these compounds disappeared from the plasma of A tumor bearers but slightly increased in the plasma of AR tumor bearers. Furthermore, by day 10, compounds precipitating between 60 and 80% (NH42SO4 and with similar effects appeared only in plasma of AR tumor bearers. The salt solubility, production kinetics and hemolytic activity of these compounds resemble those of the immunoglobulins. This, together with their preferential increase in rats bearing the AR variant, suggest their association with an immune response against this tumor.
Peter G Traber
Full Text Available Galectin-3 protein is critical to the development of liver fibrosis because galectin-3 null mice have attenuated fibrosis after liver injury. Therefore, we examined the ability of novel complex carbohydrate galectin inhibitors to treat toxin-induced fibrosis and cirrhosis. Fibrosis was induced in rats by intraperitoneal injections with thioacetamide (TAA and groups were treated with vehicle, GR-MD-02 (galactoarabino-rhamnogalaturonan or GM-CT-01 (galactomannan. In initial experiments, 4 weeks of treatment with GR-MD-02 following completion of 8 weeks of TAA significantly reduced collagen content by almost 50% based on Sirius red staining. Rats were then exposed to more intense and longer TAA treatment, which included either GR-MD-02 or GM-CT-01 during weeks 8 through 11. TAA rats treated with vehicle developed extensive fibrosis and pathological stage 6 Ishak fibrosis, or cirrhosis. Treatment with either GR-MD-02 (90 mg/kg ip or GM-CT-01 (180 mg/kg ip given once weekly during weeks 8-11 led to marked reduction in fibrosis with reduction in portal and septal galectin-3 positive macrophages and reduction in portal pressure. Vehicle-treated animals had cirrhosis whereas in the treated animals the fibrosis stage was significantly reduced, with evidence of resolved or resolving cirrhosis and reduced portal inflammation and ballooning. In this model of toxin-induced liver fibrosis, treatment with two galectin protein inhibitors with different chemical compositions significantly reduced fibrosis, reversed cirrhosis, reduced galectin-3 expressing portal and septal macrophages, and reduced portal pressure. These findings suggest a potential role of these drugs in human liver fibrosis and cirrhosis.
1984-09-01
scores of 0.26 ( unad - justed, 0.32). This peak score was achieved with the inclu- sion of the first predictor. In this case, the first selected predictor...models. The results of these models are very similar with the EVAR model yielding an independent adjusted VISCAT I threat score of 0.17 ( unad - justed...68.91%) for MAXPROB I and -16.87% ( unad - justed, 61.78%) for natural regression. Fig. 38 shows the relationship of equally populous grouping size to
Maryam Khodadadi
2016-06-01
Full Text Available Background: Data mining (DM is an approach used in extracting valuable information from environmental processes. This research depicts a DM approach used in extracting some information from influent and effluent wastewater characteristic data of a waste stabilization pond (WSP in Birjand, a city in Eastern Iran. Methods: Multiple regression (MR and neural network (NN models were examined using influent characteristics (pH, Biochemical oxygen demand [BOD5], temperature, chemical oxygen demand [COD], total suspended solids [TSS], total dissolved solid [TDS], electrical conductivity [EC] and turbidity as the regression input vectors. Models were adjusted to input attributes, effluent BOD5 (BODout and COD (CODout. The models performances were estimated by 10-fold external cross-validation. An internal 5-fold cross-validation was also used for the training data set in NN model. The models were compared using regression error characteristic (REC plot and other statistical measures such as relative absolute error (RAE. Sensitivity analysis was also applied to extract useful knowledge from NN model. Results: NN models (with RAE = 78.71 ± 1.16 for BODout and 83.67 ± 1.35 for CODout and MR models (with RAE = 84.40% ± 1.07 for BODout and 88.07 ± 0.80 for CODout indicate different performances and the former was better (P < 0.05 for the prediction of both effluent BOD5 and COD parameters. For the prediction of CODout the NN model with hidden layer size (H = 4 and decay factor = 0.75 ± 0.03 presented the best predictive results. For BODout the H and decay factor were found to be 4 and 0.73 ± 0.03, respectively. TDS was found as the most descriptive influent wastewater characteristics for the prediction of the WSP performance. The REC plots confirmed the NN model performance superiority for both BOD and COD effluent prediction. Conclusion: Modeling the performance of WSP systems using NN models along with sensitivity analysis can offer better
Kahane, Leo H
2007-01-01
Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:
Granato, Gregory E.
2006-01-01
The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and
On the efficacy of linear system analysis of renal autoregulation in rats
Chon, K H; Chen, Y M; Holstein-Rathlou, N H;
1993-01-01
In order to assess the linearity of the mechanisms subserving renal blood flow autoregulation, broad-band arterial pressure fluctuations at three different power levels were induced experimentally and the resulting renal blood flow responses were recorded. Linear system analysis methods were...
Scott W. Keith
2014-09-01
Full Text Available This paper details the design, evaluation, and implementation of a framework for detecting and modeling nonlinearity between a binary outcome and a continuous predictor variable adjusted for covariates in complex samples. The framework provides familiar-looking parameterizations of output in terms of linear slope coefficients and odds ratios. Estimation methods focus on maximum likelihood optimization of piecewise linear free-knot splines formulated as B-splines. Correctly specifying the optimal number and positions of the knots improves the model, but is marked by computational intensity and numerical instability. Our inference methods utilize both parametric and nonparametric bootstrapping. Unlike other nonlinear modeling packages, this framework is designed to incorporate multistage survey sample designs common to nationally representative datasets. We illustrate the approach and evaluate its performance in specifying the correct number of knots under various conditions with an example using body mass index (BMI; kg/m2 and the complex multi-stage sampling design from the Third National Health and Nutrition Examination Survey to simulate binary mortality outcomes data having realistic nonlinear sample-weighted risk associations with BMI. BMI and mortality data provide a particularly apt example and area of application since BMI is commonly recorded in large health surveys with complex designs, often categorized for modeling, and nonlinearly related to mortality. When complex sample design considerations were ignored, our method was generally similar to or more accurate than two common model selection procedures, Schwarz’s Bayesian Information Criterion (BIC and Akaike’s Information Criterion (AIC, in terms of correctly selecting the correct number of knots. Our approach provided accurate knot selections when complex sampling weights were incorporated, while AIC and BIC were not effective under these conditions.