Sample records for regression

  1. Quantile regression

    Hao, Lingxin


    Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao

  2. Regression Basics

    Kahane, Leo H


    Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:

  3. Autistic Regression

    Matson, Johnny L.; Kozlowski, Alison M.


    Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…

  4. Logistic regression.

    Nick, Todd G; Campbell, Kathleen M


    The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as "statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable." Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin's lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.

  5. Linear regression

    Olive, David J


    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  6. Regression: A Bibliography.

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  7. Regression: A Bibliography.

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  8. Regression analysis by example

    Chatterjee, Samprit; Hadi, Ali S


    .... The emphasis continues to be on exploratory data analysis rather than statistical theory. The coverage offers in-depth treatment of regression diagnostics, transformation, multicollinearity, logistic regression, and robust regression...

  9. Reduced Rank Regression

    Johansen, Søren


    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating e...

  10. Regression analysis by example

    Chatterjee, Samprit


    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  11. Unitary Response Regression Models

    Lipovetsky, S.


    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  12. Flexible survival regression modelling

    Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben


    Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...

  13. Quantile Regression Methods

    Fitzenberger, Bernd; Wilke, Ralf Andreas


    Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights by m...... treatment of the topic is based on the perspective of applied researchers using quantile regression in their empirical work....

  14. Regression for economics

    Naghshpour, Shahdad


    Regression analysis is the most commonly used statistical method in the world. Although few would characterize this technique as simple, regression is in fact both simple and elegant. The complexity that many attribute to regression analysis is often a reflection of their lack of familiarity with the language of mathematics. But regression analysis can be understood even without a mastery of sophisticated mathematical concepts. This book provides the foundation and will help demystify regression analysis using examples from economics and with real data to show the applications of the method. T

  15. Autistic epileptiform regression.

    Canitano, Roberto; Zappella, Michele


    Autistic regression is a well known condition that occurs in one third of children with pervasive developmental disorders, who, after normal development in the first year of life, undergo a global regression during the second year that encompasses language, social skills and play. In a portion of these subjects, epileptiform abnormalities are present with or without seizures, resembling, in some respects, other epileptiform regressions of language and behaviour such as Landau-Kleffner syndrome. In these cases, for a more accurate definition of the clinical entity, the term autistic epileptifom regression has been suggested. As in other epileptic syndromes with regression, the relationships between EEG abnormalities, language and behaviour, in autism, are still unclear. We describe two cases of autistic epileptiform regression selected from a larger group of children with autistic spectrum disorders, with the aim of discussing the clinical features of the condition, the therapeutic approach and the outcome.

  16. Scaled Sparse Linear Regression

    Sun, Tingni


    Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual squares and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs nearly nothing beyond the computation of a path of the sparse regression estimator for penalty levels above a threshold. For the scaled Lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the method yields simultaneously an estimator for the noise level and an estimated coefficient vector in the Lasso path satisfying certain oracle inequalities for the estimation of the noise level, prediction, and the estimation of regression coefficients. These oracle inequalities provide sufficient conditions for the consistency and asymptotic...

  17. Rolling Regressions with Stata

    Kit Baum


    This talk will describe some work underway to add a "rolling regression" capability to Stata's suite of time series features. Although commands such as "statsby" permit analysis of non-overlapping subsamples in the time domain, they are not suited to the analysis of overlapping (e.g. "moving window") samples. Both moving-window and widening-window techniques are often used to judge the stability of time series regression relationships. We will present an implementation of a rolling regression...

  18. Unbiased Quasi-regression

    Guijun YANG; Lu LIN; Runchu ZHANG


    Quasi-regression, motivated by the problems arising in the computer experiments, focuses mainly on speeding up evaluation. However, its theoretical properties are unexplored systemically. This paper shows that quasi-regression is unbiased, strong convergent and asymptotic normal for parameter estimations but it is biased for the fitting of curve. Furthermore, a new method called unbiased quasi-regression is proposed. In addition to retaining the above asymptotic behaviors of parameter estimations, unbiased quasi-regression is unbiased for the fitting of curve.

  19. Introduction to regression graphics

    Cook, R Dennis


    Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava

  20. Applied linear regression

    Weisberg, Sanford


    Master linear regression techniques with a new edition of a classic text Reviews of the Second Edition: ""I found it enjoyable reading and so full of interesting material that even the well-informed reader will probably find something new . . . a necessity for all of those who do linear regression."" -Technometrics, February 1987 ""Overall, I feel that the book is a valuable addition to the now considerable list of texts on applied linear regression. It should be a strong contender as the leading text for a first serious course in regression analysis."" -American Scientist, May-June 1987

  1. Morse–Smale Regression

    Gerber, Samuel [Univ. of Utah, Salt Lake City, UT (United States); Rubel, Oliver [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Bremer, Peer -Timo [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Whitaker, Ross T. [Univ. of Utah, Salt Lake City, UT (United States)


    This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.

  2. Regression to Causality

    Bordacconi, Mats Joe; Larsen, Martin Vinæs


    Humans are fundamentally primed for making causal attributions based on correlations. This implies that researchers must be careful to present their results in a manner that inhibits unwarranted causal attribution. In this paper, we present the results of an experiment that suggests regression...... models – one of the primary vehicles for analyzing statistical results in political science – encourage causal interpretation. Specifically, we demonstrate that presenting observational results in a regression model, rather than as a simple comparison of means, makes causal interpretation of the results...... of equivalent results presented as either regression models or as a test of two sample means. Our experiment shows that the subjects who were presented with results as estimates from a regression model were more inclined to interpret these results causally. Our experiment implies that scholars using regression...

  3. Boosted beta regression.

    Matthias Schmid

    Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.

  4. Applied logistic regression

    Hosmer, David W; Sturdivant, Rodney X


     A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-

  5. Applied linear regression

    Weisberg, Sanford


    Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus

  6. Transductive Ordinal Regression

    Seah, Chun-Wei; Ong, Yew-Soon


    Ordinal regression is commonly formulated as a multi-class problem with ordinal constraints. The challenge of designing accurate classifiers for ordinal regression generally increases with the number of classes involved, due to the large number of labeled patterns that are needed. The availability of ordinal class labels, however, are often costly to calibrate or difficult to obtain. Unlabeled patterns, on the other hand, often exist in much greater abundance and are freely available. To take benefits from the abundance of unlabeled patterns, we present a novel transductive learning paradigm for ordinal regression in this paper, namely Transductive Ordinal Regression (TOR). The key challenge of the present study lies in the precise estimation of both the ordinal class label of the unlabeled data and the decision functions of the ordinal classes, simultaneously. The core elements of the proposed TOR include an objective function that caters to several commonly used loss functions casted in transductive setting...

  7. Nonparametric Predictive Regression

    Ioannis Kasparis; Elena Andreou; Phillips, Peter C.B.


    A unifying framework for inference is developed in predictive regressions where the predictor has unknown integration properties and may be stationary or nonstationary. Two easily implemented nonparametric F-tests are proposed. The test statistics are related to those of Kasparis and Phillips (2012) and are obtained by kernel regression. The limit distribution of these predictive tests holds for a wide range of predictors including stationary as well as non-stationary fractional and near unit...

  8. [Understanding logistic regression].

    El Sanharawi, M; Naudet, F


    Logistic regression is one of the most common multivariate analysis models utilized in epidemiology. It allows the measurement of the association between the occurrence of an event (qualitative dependent variable) and factors susceptible to influence it (explicative variables). The choice of explicative variables that should be included in the logistic regression model is based on prior knowledge of the disease physiopathology and the statistical association between the variable and the event, as measured by the odds ratio. The main steps for the procedure, the conditions of application, and the essential tools for its interpretation are discussed concisely. We also discuss the importance of the choice of variables that must be included and retained in the regression model in order to avoid the omission of important confounding factors. Finally, by way of illustration, we provide an example from the literature, which should help the reader test his or her knowledge.

  9. Constrained Sparse Galerkin Regression

    Loiseau, Jean-Christophe


    In this work, we demonstrate the use of sparse regression techniques from machine learning to identify nonlinear low-order models of a fluid system purely from measurement data. In particular, we extend the sparse identification of nonlinear dynamics (SINDy) algorithm to enforce physical constraints in the regression, leading to energy conservation. The resulting models are closely related to Galerkin projection models, but the present method does not require the use of a full-order or high-fidelity Navier-Stokes solver to project onto basis modes. Instead, the most parsimonious nonlinear model is determined that is consistent with observed measurement data and satisfies necessary constraints. The constrained Galerkin regression algorithm is implemented on the fluid flow past a circular cylinder, demonstrating the ability to accurately construct models from data.

  10. Practical Session: Logistic Regression

    Clausel, M.; Grégoire, G.


    An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of This example is based on data given in the file evans.txt coming from

  11. Minimax Regression Quantiles

    Bache, Stefan Holst

    A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....

  12. Nonlinear Regression with R

    Ritz, Christian; Parmigiani, Giovanni


    R is a rapidly evolving lingua franca of graphical display and statistical analysis of experiments from the applied sciences. This book provides a coherent treatment of nonlinear regression with R by means of examples from a diversity of applied sciences such as biology, chemistry, engineering, medicine and toxicology.

  13. Multiple linear regression analysis

    Edwards, T. R.


    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  14. Adaptive metric kernel regression

    Goutte, Cyril; Larsen, Jan


    regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...

  15. Software Regression Verification


    of recursive procedures. Acta Informatica , 45(6):403 – 439, 2008. [GS11] Benny Godlin and Ofer Strichman. Regression verifica- tion. Technical Report...functions. Therefore, we need to rede - fine m-term. – Mutual termination. If either function f or function f ′ (or both) is non- deterministic, then their

  16. Linear Regression Analysis

    Seber, George A F


    Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.

  17. Low rank Multivariate regression

    Giraud, Christophe


    We consider in this paper the multivariate regression problem, when the target regression matrix $A$ is close to a low rank matrix. Our primary interest in on the practical case where the variance of the noise is unknown. Our main contribution is to propose in this setting a criterion to select among a family of low rank estimators and prove a non-asymptotic oracle inequality for the resulting estimator. We also investigate the easier case where the variance of the noise is known and outline that the penalties appearing in our criterions are minimal (in some sense). These penalties involve the expected value of the Ky-Fan quasi-norm of some random matrices. These quantities can be evaluated easily in practice and upper-bounds can be derived from recent results in random matrix theory.

  18. Subset selection in regression

    Miller, Alan


    Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...

  19. Classification and regression trees

    Breiman, Leo; Olshen, Richard A; Stone, Charles J


    The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

  20. Aid and growth regressions

    Hansen, Henrik; Tarp, Finn


    . There are, however, decreasing returns to aid, and the estimated effectiveness of aid is highly sensitive to the choice of estimator and the set of control variables. When investment and human capital are controlled for, no positive effect of aid is found. Yet, aid continues to impact on growth via...... investment. We conclude by stressing the need for more theoretical work before this kind of cross-country regressions are used for policy purposes....

  1. Robust Nonstationary Regression


    This paper provides a robust statistical approach to nonstationary time series regression and inference. Fully modified extensions of traditional robust statistical procedures are developed which allow for endogeneities in the nonstationary regressors and serial dependence in the shocks that drive the regressors and the errors that appear in the equation being estimated. The suggested estimators involve semiparametric corrections to accommodate these possibilities and they belong to the same ...


    Constanţa-Nicoleta BODEA


    Full Text Available In this communication we will discuss two regression credibility models from Non – Life Insurance Mathematics that can be solved by means of matrix theory. In the first regression credibility model, starting from a well-known representation formula of the inverse for a special class of matrices a risk premium will be calculated for a contract with risk parameter θ. In the next regression credibility model, we will obtain a credibility solution in the form of a linear combination of the individual estimate (based on the data of a particular state and the collective estimate (based on aggregate USA data. To illustrate the solution with the properties mentioned above, we shall need the well-known representation theorem for a special class of matrices, the properties of the trace for a square matrix, the scalar product of two vectors, the norm with respect to a positive definite matrix given in advance and the complicated mathematical properties of conditional expectations and of conditional covariances.

  3. Modified Regression Correlation Coefficient for Poisson Regression Model

    Kaengthong, Nattacha; Domthong, Uthumporn


    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  4. Caudal Regression Syndrome

    Karim Hardani*


    Full Text Available A 10-month-old baby presented with developmental delay. He had flaccid paralysis on physical examination.An MRI of the spine revealed malformation of the ninth and tenth thoracic vertebral bodies with complete agenesis of the rest of the spine down that level. The thoracic spinal cord ends at the level of the fifth thoracic vertebra with agenesis of the posterior arches of the eighth, ninth and tenth thoracic vertebral bodies. The roots of the cauda equina appear tightened down and backward and ended into a subdermal fibrous fatty tissue at the level of the ninth and tenth thoracic vertebral bodies (closed meningocele. These findings are consistent with caudal regression syndrome.

  5. Recursive Algorithm For Linear Regression

    Varanasi, S. V.


    Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.

  6. Regression in autistic spectrum disorders.

    Stefanatos, Gerry A


    A significant proportion of children diagnosed with Autistic Spectrum Disorder experience a developmental regression characterized by a loss of previously-acquired skills. This may involve a loss of speech or social responsitivity, but often entails both. This paper critically reviews the phenomena of regression in autistic spectrum disorders, highlighting the characteristics of regression, age of onset, temporal course, and long-term outcome. Important considerations for diagnosis are discussed and multiple etiological factors currently hypothesized to underlie the phenomenon are reviewed. It is argued that regressive autistic spectrum disorders can be conceptualized on a spectrum with other regressive disorders that may share common pathophysiological features. The implications of this viewpoint are discussed.

  7. Combining Alphas via Bounded Regression

    Zura Kakushadze


    Full Text Available We give an explicit algorithm and source code for combining alpha streams via bounded regression. In practical applications, typically, there is insufficient history to compute a sample covariance matrix (SCM for a large number of alphas. To compute alpha allocation weights, one then resorts to (weighted regression over SCM principal components. Regression often produces alpha weights with insufficient diversification and/or skewed distribution against, e.g., turnover. This can be rectified by imposing bounds on alpha weights within the regression procedure. Bounded regression can also be applied to stock and other asset portfolio construction. We discuss illustrative examples.

  8. Linear regression in astronomy. I

    Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh


    Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.

  9. Time-adaptive quantile regression

    Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik


    An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....

  10. Linear regression in astronomy. II

    Feigelson, Eric D.; Babu, Gutti J.


    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  11. Polynomial Regression on Riemannian Manifolds

    Hinkle, Jacob; Fletcher, P Thomas; Joshi, Sarang


    In this paper we develop the theory of parametric polynomial regression in Riemannian manifolds and Lie groups. We show application of Riemannian polynomial regression to shape analysis in Kendall shape space. Results are presented, showing the power of polynomial regression on the classic rat skull growth data of Bookstein as well as the analysis of the shape changes associated with aging of the corpus callosum from the OASIS Alzheimer's study.

  12. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung


    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  13. Quantile regression theory and applications

    Davino, Cristina; Vistocco, Domenico


    A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and

  14. Business applications of multiple regression

    Richardson, Ronny


    This second edition of Business Applications of Multiple Regression describes the use of the statistical procedure called multiple regression in business situations, including forecasting and understanding the relationships between variables. The book assumes a basic understanding of statistics but reviews correlation analysis and simple regression to prepare the reader to understand and use multiple regression. The techniques described in the book are illustrated using both Microsoft Excel and a professional statistical program. Along the way, several real-world data sets are analyzed in deta

  15. Testing discontinuities in nonparametric regression

    Dai, Wenlin


    In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100

  16. Logistic Regression: Concept and Application

    Cokluk, Omay


    The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…

  17. Fungible weights in logistic regression.

    Jones, Jeff A; Waller, Niels G


    In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record

  18. Regression Testing Cost Reduction Suite

    Mohamed Alaa El-Din


    Full Text Available The estimated cost of software maintenance exceeds 70 percent of total software costs [1], and large portion of this maintenance expenses is devoted to regression testing. Regression testing is an expensive and frequently executed maintenance activity used to revalidate the modified software. Any reduction in the cost of regression testing would help to reduce the software maintenance cost. Test suites once developed are reused and updated frequently as the software evolves. As a result, some test cases in the test suite may become redundant when the software is modified over time since the requirements covered by them are also covered by other test cases. Due to the resource and time constraints for re-executing large test suites, it is important to develop techniques to minimize available test suites by removing redundant test cases. In general, the test suite minimization problem is NP complete. This paper focuses on proposing an effective approach for reducing the cost of regression testing process. The proposed approach is applied on real-time case study. It was found that the reduction in cost of regression testing for each regression testing cycle is ranging highly improved in the case of programs containing high number of selected statements which in turn maximize the benefits of using it in regression testing of complex software systems. The reduction in the regression test suite size will reduce the effort and time required by the testing teams to execute the regression test suite. Since regression testing is done more frequently in software maintenance phase, the overall software maintenance cost can be reduced considerably by applying the proposed approach.

  19. Rank regression: an alternative regression approach for data with outliers.

    Chen, Tian; Tang, Wan; Lu, Ying; Tu, Xin


    Linear regression models are widely used in mental health and related health services research. However, the classic linear regression analysis assumes that the data are normally distributed, an assumption that is not met by the data obtained in many studies. One method of dealing with this problem is to use semi-parametric models, which do not require that the data be normally distributed. But semi-parametric models are quite sensitive to outlying observations, so the generated estimates are unreliable when study data includes outliers. In this situation, some researchers trim the extreme values prior to conducting the analysis, but the ad-hoc rules used for data trimming are based on subjective criteria so different methods of adjustment can yield different results. Rank regression provides a more objective approach to dealing with non-normal data that includes outliers. This paper uses simulated and real data to illustrate this useful regression approach for dealing with outliers and compares it to the results generated using classical regression models and semi-parametric regression models.



    This letter presents a new discriminative model for Information Retrieval (IR), referred to as Ordinal Regression Model (ORM). ORM is different from most existing models in that it views IR as ordinal regression problem (i.e. ranking problem) instead of binary classification. It is noted that the task of IR is to rank documents according to the user information needed, so IR can be viewed as ordinal regression problem. Two parameter learning algorithms for ORM are presented. One is a perceptron-based algorithm. The other is the ranking Support Vector Machine (SVM). The effectiveness of the proposed approach has been evaluated on the task of ad hoc retrieval using three English Text REtrieval Conference (TREC) sets and two Chinese TREC sets. Results show that ORM significantly outperforms the state-of-the-art language model approaches and OKAPI system in all test sets; and it is more appropriate to view IR as ordinal regression other than binary classification.

  1. Multiple Regression and Its Discontents

    Snell, Joel C.; Marsh, Mitchell


    Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.

  2. Multiple Regression and Its Discontents

    Snell, Joel C.; Marsh, Mitchell


    Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.

  3. Regression methods for medical research

    Tai, Bee Choo


    Regression Methods for Medical Research provides medical researchers with the skills they need to critically read and interpret research using more advanced statistical methods. The statistical requirements of interpreting and publishing in medical journals, together with rapid changes in science and technology, increasingly demands an understanding of more complex and sophisticated analytic procedures.The text explains the application of statistical models to a wide variety of practical medical investigative studies and clinical trials. Regression methods are used to appropriately answer the

  4. Forecasting with Dynamic Regression Models

    Pankratz, Alan


    One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.

  5. Wrong Signs in Regression Coefficients

    McGee, Holly


    When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.

  6. From Rasch scores to regression

    Christensen, Karl Bang


    Rasch models provide a framework for measurement and modelling latent variables. Having measured a latent variable in a population a comparison of groups will often be of interest. For this purpose the use of observed raw scores will often be inadequate because these lack interval scale propertie....... This paper compares two approaches to group comparison: linear regression models using estimated person locations as outcome variables and latent regression models based on the distribution of the score....

  7. A Matlab program for stepwise regression

    Yanhong Qi


    Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.

  8. XRA image segmentation using regression

    Jin, Jesse S.


    Segmentation is an important step in image analysis. Thresholding is one of the most important approaches. There are several difficulties in segmentation, such as automatic selecting threshold, dealing with intensity distortion and noise removal. We have developed an adaptive segmentation scheme by applying the Central Limit Theorem in regression. A Gaussian regression is used to separate the distribution of background from foreground in a single peak histogram. The separation will help to automatically determine the threshold. A small 3 by 3 widow is applied and the modal of the local histogram is used to overcome noise. Thresholding is based on local weighting, where regression is used again for parameter estimation. A connectivity test is applied to the final results to remove impulse noise. We have applied the algorithm to x-ray angiogram images to extract brain arteries. The algorithm works well for single peak distribution where there is no valley in the histogram. The regression provides a method to apply knowledge in clustering. Extending regression for multiple-level segmentation needs further investigation.

  9. Biplots in Reduced-Rank Regression

    Braak, ter C.J.F.; Looman, C.W.N.


    Regression problems with a number of related response variables are typically analyzed by separate multiple regressions. This paper shows how these regressions can be visualized jointly in a biplot based on reduced-rank regression. Reduced-rank regression combines multiple regression and principal c

  10. Interpretation of Standardized Regression Coefficients in Multiple Regression.

    Thayer, Jerome D.

    The extent to which standardized regression coefficients (beta values) can be used to determine the importance of a variable in an equation was explored. The beta value and the part correlation coefficient--also called the semi-partial correlation coefficient and reported in squared form as the incremental "r squared"--were compared for…

  11. Inferential Models for Linear Regression

    Zuoyi Zhang


    Full Text Available Linear regression is arguably one of the most widely used statistical methods in applications.  However, important problems, especially variable selection, remain a challenge for classical modes of inference.  This paper develops a recently proposed framework of inferential models (IMs in the linear regression context.  In general, an IM is able to produce meaningful probabilistic summaries of the statistical evidence for and against assertions about the unknown parameter of interest and, moreover, these summaries are shown to be properly calibrated in a frequentist sense.  Here we demonstrate, using simple examples, that the IM framework is promising for linear regression analysis --- including model checking, variable selection, and prediction --- and for uncertain inference in general.

  12. [Is regression of atherosclerosis possible?].

    Thomas, D; Richard, J L; Emmerich, J; Bruckert, E; Delahaye, F


    Experimental studies have shown the regression of atherosclerosis in animals given a cholesterol-rich diet and then given a normal diet or hypolipidemic therapy. Despite favourable results of clinical trials of primary prevention modifying the lipid profile, the concept of atherosclerosis regression in man remains very controversial. The methodological approach is difficult: this is based on angiographic data and requires strict standardisation of angiographic views and reliable quantitative techniques of analysis which are available with image processing. Several methodologically acceptable clinical coronary studies have shown not only stabilisation but also regression of atherosclerotic lesions with reductions of about 25% in total cholesterol levels and of about 40% in LDL cholesterol levels. These reductions were obtained either by drugs as in CLAS (Cholesterol Lowering Atherosclerosis Study), FATS (Familial Atherosclerosis Treatment Study) and SCOR (Specialized Center of Research Intervention Trial), by profound modifications in dietary habits as in the Lifestyle Heart Trial, or by surgery (ileo-caecal bypass) as in POSCH (Program On the Surgical Control of the Hyperlipidemias). On the other hand, trials with non-lipid lowering drugs such as the calcium antagonists (INTACT, MHIS) have not shown significant regression of existing atherosclerotic lesions but only a decrease on the number of new lesions. The clinical benefits of these regression studies are difficult to demonstrate given the limited period of observation, relatively small population numbers and the fact that in some cases the subjects were asymptomatic. The decrease in the number of cardiovascular events therefore seems relatively modest and concerns essentially subjects who were symptomatic initially. The clinical repercussion of studies of prevention involving a single lipid factor is probably partially due to the reduction in progression and anatomical regression of the atherosclerotic plaque

  13. Nonparametric regression with filtered data

    Linton, Oliver; Nielsen, Jens Perch; Van Keilegom, Ingrid; 10.3150/10-BEJ260


    We present a general principle for estimating a regression function nonparametrically, allowing for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both the mean and the median regression cases are considered. The method works by first estimating the conditional hazard function or conditional survivor function and then integrating. We also investigate improved methods that take account of model structure such as independent errors and show that such methods can improve performance when the model structure is true. We establish the pointwise asymptotic normality of our estimators.

  14. Logistic regression for circular data

    Al-Daffaie, Kadhem; Khan, Shahjahan


    This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.

  15. Quasi-least squares regression

    Shults, Justine


    Drawing on the authors' substantial expertise in modeling longitudinal and clustered data, Quasi-Least Squares Regression provides a thorough treatment of quasi-least squares (QLS) regression-a computational approach for the estimation of correlation parameters within the framework of generalized estimating equations (GEEs). The authors present a detailed evaluation of QLS methodology, demonstrating the advantages of QLS in comparison with alternative methods. They describe how QLS can be used to extend the application of the traditional GEE approach to the analysis of unequally spaced longitu

  16. Regression of lumbar disk herniation

    G. Yu Evzikov


    Full Text Available Compression of the spinal nerve root, giving rise to pain and sensory and motor disorders in the area of its innervation is the most vivid manifestation of herniated intervertebral disk. Different treatment modalities, including neurosurgery, for evolving these conditions are discussed. There has been recent evidence that spontaneous regression of disk herniation can regress. The paper describes a female patient with large lateralized disc extrusion that has caused compression of the nerve root S1, leading to obvious myotonic and radicular syndrome. Magnetic resonance imaging has shown that the clinical manifestations of discogenic radiculopathy, as well myotonic syndrome and morphological changes completely regressed 8 months later. The likely mechanism is inflammation-induced resorption of a large herniated disk fragment, which agrees with the data available in the literature. A decision to perform neurosurgery for which the patient had indications was made during her first consultation. After regression of discogenic radiculopathy, there was only moderate pain caused by musculoskeletal diseases (facet syndrome, piriformis syndrome that were successfully eliminated by minimally invasive techniques. 

  17. Heteroscedasticity checks for regression models


    For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.

  18. Cactus: An Introduction to Regression

    Hyde, Hartley


    When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…

  19. Growth Regression and Economic Theory

    Elbers, Chris; Gunning, Jan Willem


    In this note we show that the standard, loglinear growth regression specificationis consistent with one and only one model in the class of stochastic Ramsey models. Thismodel is highly restrictive: it requires a Cobb-Douglas technology and a 100% depreciationrate and it implies that risk does not af

  20. Correlation Weights in Multiple Regression

    Waller, Niels G.; Jones, Jeff A.


    A general theory on the use of correlation weights in linear prediction has yet to be proposed. In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models. Using OLS weights as a comparison, we define cases in which the two weighting…

  1. Ridge Regression for Interactive Models.

    Tate, Richard L.


    An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…

  2. Logistic regression: a brief primer.

    Stoltzfus, Jill C


    Regression techniques are versatile in their application to medical research because they can measure associations, predict outcomes, and control for confounding variable effects. As one such technique, logistic regression is an efficient and powerful way to analyze the effect of a group of independent variables on a binary outcome by quantifying each independent variable's unique contribution. Using components of linear regression reflected in the logit scale, logistic regression iteratively identifies the strongest linear combination of variables with the greatest probability of detecting the observed outcome. Important considerations when conducting logistic regression include selecting independent variables, ensuring that relevant assumptions are met, and choosing an appropriate model building strategy. For independent variable selection, one should be guided by such factors as accepted theory, previous empirical investigations, clinical considerations, and univariate statistical analyses, with acknowledgement of potential confounding variables that should be accounted for. Basic assumptions that must be met for logistic regression include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers. Additionally, there should be an adequate number of events per independent variable to avoid an overfit model, with commonly recommended minimum "rules of thumb" ranging from 10 to 20 events per covariate. Regarding model building strategies, the three general types are direct/standard, sequential/hierarchical, and stepwise/statistical, with each having a different emphasis and purpose. Before reaching definitive conclusions from the results of any of these methods, one should formally quantify the model's internal validity (i.e., replicability within the same data set) and external validity (i.e., generalizability beyond the current sample). The resulting logistic regression model

  3. Regression Verification Using Impact Summaries

    Backes, John; Person, Suzette J.; Rungta, Neha; Thachuk, Oksana


    Regression verification techniques are used to prove equivalence of syntactically similar programs. Checking equivalence of large programs, however, can be computationally expensive. Existing regression verification techniques rely on abstraction and decomposition techniques to reduce the computational effort of checking equivalence of the entire program. These techniques are sound but not complete. In this work, we propose a novel approach to improve scalability of regression verification by classifying the program behaviors generated during symbolic execution as either impacted or unimpacted. Our technique uses a combination of static analysis and symbolic execution to generate summaries of impacted program behaviors. The impact summaries are then checked for equivalence using an o-the-shelf decision procedure. We prove that our approach is both sound and complete for sequential programs, with respect to the depth bound of symbolic execution. Our evaluation on a set of sequential C artifacts shows that reducing the size of the summaries can help reduce the cost of software equivalence checking. Various reduction, abstraction, and compositional techniques have been developed to help scale software verification techniques to industrial-sized systems. Although such techniques have greatly increased the size and complexity of systems that can be checked, analysis of large software systems remains costly. Regression analysis techniques, e.g., regression testing [16], regression model checking [22], and regression verification [19], restrict the scope of the analysis by leveraging the differences between program versions. These techniques are based on the idea that if code is checked early in development, then subsequent versions can be checked against a prior (checked) version, leveraging the results of the previous analysis to reduce analysis cost of the current version. Regression verification addresses the problem of proving equivalence of closely related program

  4. Polynomial Regressions and Nonsense Inference

    Daniel Ventosa-Santaulària


    Full Text Available Polynomial specifications are widely used, not only in applied economics, but also in epidemiology, physics, political analysis and psychology, just to mention a few examples. In many cases, the data employed to estimate such specifications are time series that may exhibit stochastic nonstationary behavior. We extend Phillips’ results (Phillips, P. Understanding spurious regressions in econometrics. J. Econom. 1986, 33, 311–340. by proving that an inference drawn from polynomial specifications, under stochastic nonstationarity, is misleading unless the variables cointegrate. We use a generalized polynomial specification as a vehicle to study its asymptotic and finite-sample properties. Our results, therefore, lead to a call to be cautious whenever practitioners estimate polynomial regressions.

  5. Producing The New Regressive Left

    Crone, Christine

    to be a committed artist, and how that translates into supporting al-Assad’s rule in Syria; the Ramadan programme Harrir Aqlak’s attempt to relaunch an intellectual renaissance and to promote religious pluralism; and finally, al-Mayadeen’s cooperation with the pan-Latin American TV station TeleSur and its ambitions...... becomes clear from the analytical chapters is the emergence of the new cross-ideological alliance of The New Regressive Left. This emerging coalition between Shia Muslims, religious minorities, parts of the Arab Left, secular cultural producers, and the remnants of the political,strategic resistance...... coalition (Iran, Hizbollah, Syria), capitalises on a series of factors that bring them together in spite of their otherwise diverse worldviews and agendas. The New Regressive Left is united by resistance against the growing influence of Saudi Arabia in the religious, cultural, political, economic...

  6. Quantile Regression With Measurement Error

    Wei, Ying


    Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. © 2009 American Statistical Association.

  7. Heteroscedasticity checks for regression models

    ZHU; Lixing


    [1]Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.[2]Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.[3]Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.[4]Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.[5]Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.[6]Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.[7]Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.[8]Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.[9]Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.[10]Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.[11]Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.[12]Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.[13]Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.[14]Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.[15]H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.[16]Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.[17

  8. Clustered regression with unknown clusters

    Barman, Kishor


    We consider a collection of prediction experiments, which are clustered in the sense that groups of experiments ex- hibit similar relationship between the predictor and response variables. The experiment clusters as well as the regres- sion relationships are unknown. The regression relation- ships define the experiment clusters, and in general, the predictor and response variables may not exhibit any clus- tering. We call this prediction problem clustered regres- sion with unknown clusters (CRUC) and in this paper we focus on linear regression. We study and compare several methods for CRUC, demonstrate their applicability to the Yahoo Learning-to-rank Challenge (YLRC) dataset, and in- vestigate an associated mathematical model. CRUC is at the crossroads of many prior works and we study several prediction algorithms with diverse origins: an adaptation of the expectation-maximization algorithm, an approach in- spired by K-means clustering, the singular value threshold- ing approach to matrix rank minimization u...

  9. Robust nonlinear regression in applications

    Lim, Changwon; Sen, Pranab K.; Peddada, Shyamal D.


    Robust statistical methods, such as M-estimators, are needed for nonlinear regression models because of the presence of outliers/influential observations and heteroscedasticity. Outliers and influential observations are commonly observed in many applications, especially in toxicology and agricultural experiments. For example, dose response studies, which are routinely conducted in toxicology and agriculture, sometimes result in potential outliers, especially in the high dose gr...

  10. Astronomical Methods for Nonparametric Regression

    Steinhardt, Charles L.; Jermyn, Adam


    I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.

  11. Genetics Home Reference: caudal regression syndrome

    ... Twitter Home Health Conditions caudal regression syndrome caudal regression syndrome Enable Javascript to view the expand/collapse ... Download PDF Open All Close All Description Caudal regression syndrome is a disorder that impairs the development ...

  12. On Weighted Support Vector Regression

    Han, Xixuan; Clemmensen, Line Katrine Harder


    We propose a new type of weighted support vector regression (SVR), motivated by modeling local dependencies in time and space in prediction of house prices. The classic weights of the weighted SVR are added to the slack variables in the objective function (OF‐weights). This procedure directly...... the differences and similarities of the two types of weights by demonstrating the connection between the Least Absolute Shrinkage and Selection Operator (LASSO) and the SVR. We show that an SVR problem can be transformed to a LASSO problem plus a linear constraint and a box constraint. We demonstrate...

  13. Multiatlas segmentation as nonparametric regression.

    Awate, Suyash P; Whitaker, Ross T


    This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.

  14. Prediction, Regression and Critical Realism

    Næss, Petter


    This paper considers the possibility of prediction in land use planning, and the use of statistical research methods in analyses of relationships between urban form and travel behaviour. Influential writers within the tradition of critical realism reject the possibility of predicting social...... of prediction necessary and possible in spatial planning of urban development. Finally, the political implications of positions within theory of science rejecting the possibility of predictions about social phenomena are addressed....... phenomena. This position is fundamentally problematic to public planning. Without at least some ability to predict the likely consequences of different proposals, the justification for public sector intervention into market mechanisms will be frail. Statistical methods like regression analyses are commonly...

  15. Nonparametric Regression with Common Shocks

    Eduardo A. Souza-Rodrigues


    Full Text Available This paper considers a nonparametric regression model for cross-sectional data in the presence of common shocks. Common shocks are allowed to be very general in nature; they do not need to be finite dimensional with a known (small number of factors. I investigate the properties of the Nadaraya-Watson kernel estimator and determine how general the common shocks can be while still obtaining meaningful kernel estimates. Restrictions on the common shocks are necessary because kernel estimators typically manipulate conditional densities, and conditional densities do not necessarily exist in the present case. By appealing to disintegration theory, I provide sufficient conditions for the existence of such conditional densities and show that the estimator converges in probability to the Kolmogorov conditional expectation given the sigma-field generated by the common shocks. I also establish the rate of convergence and the asymptotic distribution of the kernel estimator.

  16. Practical Session: Multiple Linear Regression

    Clausel, M.; Grégoire, G.


    Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of

  17. Lumbar herniated disc: spontaneous regression

    Yüksel, Kasım Zafer


    Background Low back pain is a frequent condition that results in substantial disability and causes admission of patients to neurosurgery clinics. To evaluate and present the therapeutic outcomes in lumbar disc hernia (LDH) patients treated by means of a conservative approach, consisting of bed rest and medical therapy. Methods This retrospective cohort was carried out in the neurosurgery departments of hospitals in Kahramanmaraş city and 23 patients diagnosed with LDH at the levels of L3−L4, L4−L5 or L5−S1 were enrolled. Results The average age was 38.4 ± 8.0 and the chief complaint was low back pain and sciatica radiating to one or both lower extremities. Conservative treatment was administered. Neurological examination findings, durations of treatment and intervals until symptomatic recovery were recorded. Laségue tests and neurosensory examination revealed that mild neurological deficits existed in 16 of our patients. Previously, 5 patients had received physiotherapy and 7 patients had been on medical treatment. The number of patients with LDH at the level of L3−L4, L4−L5, and L5−S1 were 1, 13, and 9, respectively. All patients reported that they had benefit from medical treatment and bed rest, and radiologic improvement was observed simultaneously on MRI scans. The average duration until symptomatic recovery and/or regression of LDH symptoms was 13.6 ± 5.4 months (range: 5−22). Conclusions It should be kept in mind that lumbar disc hernias could regress with medical treatment and rest without surgery, and there should be an awareness that these patients could recover radiologically. This condition must be taken into account during decision making for surgical intervention in LDH patients devoid of indications for emergent surgery. PMID:28119770

  18. Credit Scoring Problem Based on Regression Analysis

    Khassawneh, Bashar Suhil Jad Allah


    ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....

  19. Varying-coefficient functional linear regression

    Wu, Yichao; Müller, Hans-Georg; 10.3150/09-BEJ231


    Functional linear regression analysis aims to model regression relations which include a functional predictor. The analog of the regression parameter vector or matrix in conventional multivariate or multiple-response linear regression models is a regression parameter function in one or two arguments. If, in addition, one has scalar predictors, as is often the case in applications to longitudinal studies, the question arises how to incorporate these into a functional regression model. We study a varying-coefficient approach where the scalar covariates are modeled as additional arguments of the regression parameter function. This extension of the functional linear regression model is analogous to the extension of conventional linear regression models to varying-coefficient models and shares its advantages, such as increased flexibility; however, the details of this extension are more challenging in the functional case. Our methodology combines smoothing methods with regularization by truncation at a finite numb...

  20. Functional Regression for Quasar Spectra

    Ciollaro, Mattia; Freeman, Peter; Genovese, Christopher; Lei, Jing; O'Connell, Ross; Wasserman, Larry


    The Lyman-alpha forest is a portion of the observed light spectrum of distant galactic nuclei which allows us to probe remote regions of the Universe that are otherwise inaccessible. The observed Lyman-alpha forest of a quasar light spectrum can be modeled as a noisy realization of a smooth curve that is affected by a `damping effect' which occurs whenever the light emitted by the quasar travels through regions of the Universe with higher matter concentration. To decode the information conveyed by the Lyman-alpha forest about the matter distribution, we must be able to separate the smooth `continuum' from the noise and the contribution of the damping effect in the quasar light spectra. To predict the continuum in the Lyman-alpha forest, we use a nonparametric functional regression model in which both the response and the predictor variable (the smooth part of the damping-free portion of the spectrum) are function-valued random variables. We demonstrate that the proposed method accurately predicts the unobserv...

  1. Knowledge and Awareness: Linear Regression

    Monika Raghuvanshi


    Full Text Available Knowledge and awareness are factors guiding development of an individual. These may seem simple and practicable, but in reality a proper combination of these is a complex task. Economically driven state of development in younger generations is an impediment to the correct manner of development. As youths are at the learning phase, they can be molded to follow a correct lifestyle. Awareness and knowledge are important components of any formal or informal environmental education. The purpose of this study is to evaluate the relationship of these components among students of secondary/ senior secondary schools who have undergone a formal study of environment in their curricula. A suitable instrument is developed in order to measure the elements of Awareness and Knowledge among the participants of the study. Data was collected from various secondary and senior secondary school students in the age group 14 to 20 years using cluster sampling technique from the city of Bikaner, India. Linear regression analysis was performed using IBM SPSS 23 statistical tool. There exists a weak relation between knowledge and awareness about environmental issues, caused due to routine practices mishandling; hence one component can be complemented by other for improvement in both. Knowledge and awareness are crucial factors and can provide huge opportunities in any field. Resource utilization for economic solutions may pave the way for eco-friendly products and practices. If green practices are inculcated at the learning phase, they may become normal routine. This will also help in repletion of the environment.

  2. Streamflow forecasting using functional regression

    Masselot, Pierre; Dabo-Niang, Sophie; Chebana, Fateh; Ouarda, Taha B. M. J.


    Streamflow, as a natural phenomenon, is continuous in time and so are the meteorological variables which influence its variability. In practice, it can be of interest to forecast the whole flow curve instead of points (daily or hourly). To this end, this paper introduces the functional linear models and adapts it to hydrological forecasting. More precisely, functional linear models are regression models based on curves instead of single values. They allow to consider the whole process instead of a limited number of time points or features. We apply these models to analyse the flow volume and the whole streamflow curve during a given period by using precipitations curves. The functional model is shown to lead to encouraging results. The potential of functional linear models to detect special features that would have been hard to see otherwise is pointed out. The functional model is also compared to the artificial neural network approach and the advantages and disadvantages of both models are discussed. Finally, future research directions involving the functional model in hydrology are presented.

  3. Principal component regression analysis with SPSS.

    Liu, R X; Kuang, J; Gong, Q; Hou, X L


    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  4. Spontaneous Regression of an Incidental Spinal Meningioma

    Yilmaz, Ali; Kizilay, Zahir; Sair, Ahmet; Avcil, Mucahit; Ozkul, Ayca


    AIM: The regression of meningioma has been reported in literature before. In spite of the fact that the regression may be involved by hemorrhage, calcification or some drugs withdrawal, it is rarely observed spontaneously. CASE REPORT...

  5. Common pitfalls in statistical analysis: Logistic regression.

    Ranganathan, Priya; Pramesh, C S; Aggarwal, Rakesh


    Logistic regression analysis is a statistical technique to evaluate the relationship between various predictor variables (either categorical or continuous) and an outcome which is binary (dichotomous). In this article, we discuss logistic regression analysis and the limitations of this technique.

  6. Semiparametric regression during 2003–2007

    Ruppert, David


    Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.

  7. Unbalanced Regressions and the Predictive Equation

    Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

    Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...

  8. Standards for Standardized Logistic Regression Coefficients

    Menard, Scott


    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  9. Synthesizing Regression Results: A Factored Likelihood Method

    Wu, Meng-Jia; Becker, Betsy Jane


    Regression methods are widely used by researchers in many fields, yet methods for synthesizing regression results are scarce. This study proposes using a factored likelihood method, originally developed to handle missing data, to appropriately synthesize regression models involving different predictors. This method uses the correlations reported…

  10. Regression Analysis by Example. 5th Edition

    Chatterjee, Samprit; Hadi, Ali S.


    Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…

  11. Regression with Sparse Approximations of Data

    Noorzad, Pardis; Sturm, Bob L.


    We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected by...

  12. Standards for Standardized Logistic Regression Coefficients

    Menard, Scott


    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  13. Regression with Sparse Approximations of Data

    Noorzad, Pardis; Sturm, Bob L.


    We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected...... by a sparse approximation of the point in terms of the regressors. We show SPARROW can be considered a variant of \\(k\\)-nearest neighbors regression (\\(k\\)-NNR), and more generally, local polynomial kernel regression. Unlike \\(k\\)-NNR, however, SPARROW can adapt the number of regressors to use based...

  14. Assumptions of Multiple Regression: Correcting Two Misconceptions

    Matt N. Williams


    Full Text Available In 2002, an article entitled - Four assumptions of multiple regression that researchers should always test- by.Osborne and Waters was published in PARE. This article has gone on to be viewed more than 275,000 times.(as of August 2013, and it is one of the first results displayed in a Google search for - regression.assumptions- . While Osborne and Waters' efforts in raising awareness of the need to check assumptions.when using regression are laudable, we note that the original article contained at least two fairly important.misconceptions about the assumptions of multiple regression: Firstly, that multiple regression requires the.assumption of normally distributed variables; and secondly, that measurement errors necessarily cause.underestimation of simple regression coefficients. In this article, we clarify that multiple regression models.estimated using ordinary least squares require the assumption of normally distributed errors in order for.trustworthy inferences, at least in small samples, but not the assumption of normally distributed response or.predictor variables. Secondly, we point out that regression coefficients in simple regression models will be.biased (toward zero estimates of the relationships between variables of interest when measurement error is.uncorrelated across those variables, but that when correlated measurement error is present, regression.coefficients may be either upwardly or downwardly biased. We conclude with a brief corrected summary of.the assumptions of multiple regression when using ordinary least squares.

  15. Functional linear regression via canonical analysis

    He, Guozhong; Wang, Jane-Ling; Yang, Wenjing; 10.3150/09-BEJ228


    We study regression models for the situation where both dependent and independent variables are square-integrable stochastic processes. Questions concerning the definition and existence of the corresponding functional linear regression models and some basic properties are explored for this situation. We derive a representation of the regression parameter function in terms of the canonical components of the processes involved. This representation establishes a connection between functional regression and functional canonical analysis and suggests alternative approaches for the implementation of functional linear regression analysis. A specific procedure for the estimation of the regression parameter function using canonical expansions is proposed and compared with an established functional principal component regression approach. As an example of an application, we present an analysis of mortality data for cohorts of medflies, obtained in experimental studies of aging and longevity.

  16. Regression in children with autism spectrum disorders.

    Malhi, Prahbhjot; Singhi, Pratibha


    To understand the characteristics of autistic regression and to compare the clinical and developmental profile of children with autism spectrum disorders (ASD) in whom parents report developmental regression with age matched ASD children in whom no regression is reported. Participants were 35 (Mean age = 3.57 y, SD = 1.09) children with ASD in whom parents reported developmental regression before age 3 y and a group of age and IQ matched 35 ASD children in whom parents did not report regression. All children were recruited from the outpatient Child Psychology Clinic of the Department of Pediatrics of a tertiary care teaching hospital in North India. Multi-disciplinary evaluations including neurological, diagnostic, cognitive, and behavioral assessments were done. Parents were asked in detail about the age at onset of regression, type of regression, milestones lost, and event, if any, related to the regression. In addition, the Childhood Autism Rating Scale (CARS) was administered to assess symptom severity. The mean age at regression was 22.43 mo (SD = 6.57) and large majority (66.7%) of the parents reported regression between 12 and 24 mo. Most (75%) of the parents of the regression-autistic group reported regression in the language domain, particularly in the expressive language sector, usually between 18 and 24 mo of age. Regression of language was not an isolated phenomenon and regression in other domains was also reported including social skills (75%), cognition (31.25%). In majority of the cases (75%) the regression reported was slow and subtle. There were no significant differences in the motor, social, self help, and communication functioning between the two groups as measured by the DP II.There were also no significant differences between the two groups on the total CARS score and total number of DSM IV symptoms endorsed. However, the regressed children had significantly (t = 2.36, P = .021) more social deficits as per the DSM IV as

  17. Using Regression Mixture Analysis in Educational Research

    Cody S. Ding


    Full Text Available Conventional regression analysis is typically used in educational research. Usually such an analysis implicitly assumes that a common set of regression parameter estimates captures the population characteristics represented in the sample. In some situations, however, this implicit assumption may not be realistic, and the sample may contain several subpopulations such as high math achievers and low math achievers. In these cases, conventional regression models may provide biased estimates since the parameter estimates are constrained to be the same across subpopulations. This paper advocates the applications of regression mixture models, also known as latent class regression analysis, in educational research. Regression mixture analysis is more flexible than conventional regression analysis in that latent classes in the data can be identified and regression parameter estimates can vary within each latent class. An illustration of regression mixture analysis is provided based on a dataset of authentic data. The strengths and limitations of the regression mixture models are discussed in the context of educational research.

  18. Regression modeling methods, theory, and computation with SAS

    Panik, Michael


    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  19. Beta blockers & left ventricular hypertrophy regression.

    George, Thomas; Ajit, Mullasari S; Abraham, Georgi


    Left ventricular hypertrophy (LVH) particularly in hypertensive patients is a strong predictor of adverse cardiovascular events. Identifying LVH not only helps in the prognostication but also in the choice of therapeutic drugs. The prevalence of LVH is age linked and has a direct correlation to the severity of hypertension. Adequate control of blood pressure, most importantly central aortic pressure and blocking the effects of cardiomyocyte stimulatory growth factors like Angiotensin II helps in regression of LVH. Among the various antihypertensives ACE-inhibitors and angiotensin receptor blockers are more potent than other drugs in regressing LVH. Beta blockers especially the newer cardio selective ones do still have a role in regressing LVH albeit a minor one. A meta-analysis of various studies on LVH regression shows many lacunae. There have been no consistent criteria for defining LVH and documenting LVH regression. This article reviews current evidence on the role of Beta Blockers in LVH regression.

  20. Applied regression analysis a research tool

    Pantula, Sastry; Dickey, David


    Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...

  1. High-dimensional regression with unknown variance

    Giraud, Christophe; Verzelen, Nicolas


    We review recent results for high-dimensional sparse linear regression in the practical case of unknown variance. Different sparsity settings are covered, including coordinate-sparsity, group-sparsity and variation-sparsity. The emphasize is put on non-asymptotic analyses and feasible procedures. In addition, a small numerical study compares the practical performance of three schemes for tuning the Lasso esti- mator and some references are collected for some more general models, including multivariate regression and nonparametric regression.

  2. Regression calibration with heteroscedastic error variance.

    Spiegelman, Donna; Logan, Roger; Grove, Douglas


    The problem of covariate measurement error with heteroscedastic measurement error variance is considered. Standard regression calibration assumes that the measurement error has a homoscedastic measurement error variance. An estimator is proposed to correct regression coefficients for covariate measurement error with heteroscedastic variance. Point and interval estimates are derived. Validation data containing the gold standard must be available. This estimator is a closed-form correction of the uncorrected primary regression coefficients, which may be of logistic or Cox proportional hazards model form, and is closely related to the version of regression calibration developed by Rosner et al. (1990). The primary regression model can include multiple covariates measured without error. The use of these estimators is illustrated in two data sets, one taken from occupational epidemiology (the ACE study) and one taken from nutritional epidemiology (the Nurses' Health Study). In both cases, although there was evidence of moderate heteroscedasticity, there was little difference in estimation or inference using this new procedure compared to standard regression calibration. It is shown theoretically that unless the relative risk is large or measurement error severe, standard regression calibration approximations will typically be adequate, even with moderate heteroscedasticity in the measurement error model variance. In a detailed simulation study, standard regression calibration performed either as well as or better than the new estimator. When the disease is rare and the errors normally distributed, or when measurement error is moderate, standard regression calibration remains the method of choice.

  3. Enhanced piecewise regression based on deterministic annealing

    ZHANG JiangShe; YANG YuQian; CHEN XiaoWen; ZHOU ChengHu


    Regression is one of the important problems in statistical learning theory. This paper proves the global convergence of the piecewise regression algorithm based on deterministic annealing and continuity of global minimum of free energy w.r.t temperature, and derives a new simplified formula to compute the initial critical temperature. A new enhanced piecewise regression algorithm by using "migration of prototypes" is proposed to eliminate "empty cell" in the annealing process. Numerical experiments on several benchmark datasets show that the new algo-rithm can remove redundancy and improve generalization of the piecewise regres-sion model.

  4. Geodesic least squares regression on information manifolds

    Verdoolaege, Geert, E-mail: [Department of Applied Physics, Ghent University, Ghent, Belgium and Laboratory for Plasma Physics, Royal Military Academy, Brussels (Belgium)


    We present a novel regression method targeted at situations with significant uncertainty on both the dependent and independent variables or with non-Gaussian distribution models. Unlike the classic regression model, the conditional distribution of the response variable suggested by the data need not be the same as the modeled distribution. Instead they are matched by minimizing the Rao geodesic distance between them. This yields a more flexible regression method that is less constrained by the assumptions imposed through the regression model. As an example, we demonstrate the improved resistance of our method against some flawed model assumptions and we apply this to scaling laws in magnetic confinement fusion.

  5. [From clinical judgment to linear regression model.

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O


    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.

  6. Logistic Regression for Evolving Data Streams Classification

    YIN Zhi-wu; HUANG Shang-teng; XUE Gui-rong


    Logistic regression is a fast classifier and can achieve higher accuracy on small training data. Moreover,it can work on both discrete and continuous attributes with nonlinear patterns. Based on these properties of logistic regression, this paper proposed an algorithm, called evolutionary logistical regression classifier (ELRClass), to solve the classification of evolving data streams. This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier, to keep this classifier if its performance is deteriorated by the reason of bursting noise, or to construct a new classifier if a major concept drift is detected. The intensive experimental results demonstrate the effectiveness of this algorithm.

  7. New ridge parameters for ridge regression

    A.V. Dorugade


    Full Text Available Hoerl and Kennard (1970a introduced the ridge regression estimator as an alternative to the ordinary least squares (OLS estimator in the presence of multicollinearity. In ridge regression, ridge parameter plays an important role in parameter estimation. In this article, a new method for estimating ridge parameters in both situations of ordinary ridge regression (ORR and generalized ridge regression (GRR is proposed. The simulation study evaluates the performance of the proposed estimator based on the mean squared error (MSE criterion and indicates that under certain conditions the proposed estimators perform well compared to OLS and other well-known estimators reviewed in this article.

  8. Normalization Ridge Regression in Practice I: Comparisons Between Ordinary Least Squares, Ridge Regression and Normalization Ridge Regression.

    Bulcock, J. W.

    The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…

  9. Incremental Net Effects in Multiple Regression

    Lipovetsky, Stan; Conklin, Michael


    A regular problem in regression analysis is estimating the comparative importance of the predictors in the model. This work considers the 'net effects', or shares of the predictors in the coefficient of the multiple determination, which is a widely used characteristic of the quality of a regression model. Estimation of the net effects can be a…

  10. Regression Analysis and the Sociological Imagination

    De Maio, Fernando


    Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.

  11. Dealing with Outliers: Robust, Resistant Regression

    Glasser, Leslie


    Least-squares linear regression is the best of statistics and it is the worst of statistics. The reasons for this paradoxical claim, arising from possible inapplicability of the method and the excessive influence of "outliers", are discussed and substitute regression methods based on median selection, which is both robust and resistant, are…

  12. Competing Risks Quantile Regression at Work

    Dlugosz, Stephan; Lo, Simon M. S.; Wilke, Ralf


    Despite its emergence as a frequently used method for the empirical analysis of multivariate data, quantile regression is yet to become a mainstream tool for the analysis of duration data. We present a pioneering empirical study on the grounds of a competing risks quantile regression model. We use...

  13. Implementing Variable Selection Techniques in Regression.

    Thayer, Jerome D.

    Variable selection techniques in stepwise regression analysis are discussed. In stepwise regression, variables are added or deleted from a model in sequence to produce a final "good" or "best" predictive model. Stepwise computer programs are discussed and four different variable selection strategies are described. These…

  14. Regression Model With Elliptically Contoured Errors

    Arashi, M; Tabatabaey, S M M


    For the regression model where the errors follow the elliptically contoured distribution (ECD), we consider the least squares (LS), restricted LS (RLS), preliminary test (PT), Stein-type shrinkage (S) and positive-rule shrinkage (PRS) estimators for the regression parameters. We compare the quadratic risks of the estimators to determine the relative dominance properties of the five estimators.

  15. Regression Analysis and the Sociological Imagination

    De Maio, Fernando


    Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.

  16. A Simulation Investigation of Principal Component Regression.

    Allen, David E.

    Regression analysis is one of the more common analytic tools used by researchers. However, multicollinearity between the predictor variables can cause problems in using the results of regression analyses. Problems associated with multicollinearity include entanglement of relative influences of variables due to reduced precision of estimation,…

  17. Should metacognition be measured by logistic regression?

    Rausch, Manuel; Zehetleitner, Michael


    Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Atherosclerotic plaque regression: fact or fiction?

    Shanmugam, Nesan; Román-Rego, Ana; Ong, Peter; Kaski, Juan Carlos


    Coronary artery disease is the major cause of death in the western world. The formation and rapid progression of atheromatous plaques can lead to serious cardiovascular events in patients with atherosclerosis. The better understanding, in recent years, of the mechanisms leading to atheromatous plaque growth and disruption and the availability of powerful HMG CoA-reductase inhibitors (statins) has permitted the consideration of plaque regression as a realistic therapeutic goal. This article reviews the existing evidence underpinning current therapeutic strategies aimed at achieving atherosclerotic plaque regression. In this review we also discuss imaging modalities for the assessment of plaque regression, predictors of regression and whether plaque regression is associated with a survival benefit.

  19. Pathological assessment of liver fibrosis regression

    WANG Bingqiong


    Full Text Available Hepatic fibrosis is the common pathological outcome of chronic hepatic diseases. An accurate assessment of fibrosis degree provides an important reference for a definite diagnosis of diseases, treatment decision-making, treatment outcome monitoring, and prognostic evaluation. At present, many clinical studies have proven that regression of hepatic fibrosis and early-stage liver cirrhosis can be achieved by effective treatment, and a correct evaluation of fibrosis regression has become a hot topic in clinical research. Liver biopsy has long been regarded as the gold standard for the assessment of hepatic fibrosis, and thus it plays an important role in the evaluation of fibrosis regression. This article reviews the clinical application of current pathological staging systems in the evaluation of fibrosis regression from the perspectives of semi-quantitative scoring system, quantitative approach, and qualitative approach, in order to propose a better pathological evaluation system for the assessment of fibrosis regression.

  20. Quantile regression applied to spectral distance decay

    Rocchini, D.; Cade, B.S.


    Remotely sensed imagery has long been recognized as a powerful support for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance allows us to quantitatively estimate the amount of turnover in species composition with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological data sets are characterized by a high number of zeroes that add noise to the regression model. Quantile regressions can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this letter, we used ordinary least squares (OLS) and quantile regressions to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p species similarity when habitats are more similar. In this letter, we demonstrated the power of using quantile regressions applied to spectral distance decay to reveal species diversity patterns otherwise lost or underestimated by OLS regression. ?? 2008 IEEE.

  1. Hypotheses testing for fuzzy robust regression parameters

    Kula, Kamile Sanli [Ahi Evran University, Department of Mathematics, 40200 Kirsehir (Turkey)], E-mail:; Apaydin, Aysen [Ankara University, Department of Statistics, 06100 Ankara (Turkey)], E-mail:


    The classical least squares (LS) method is widely used in regression analysis because computing its estimate is easy and traditional. However, LS estimators are very sensitive to outliers and to other deviations from basic assumptions of normal theory [Huynh H. A comparison of four approaches to robust regression. Psychol Bull 1982;92:505-12; Stephenson D. 2000. Available from: (; Xu R, Li C. Multidimensional least-squares fitting with a fuzzy model. Fuzzy Sets and Systems 2001;119:215-23.]. If there exists outliers in the data set, robust methods are preferred to estimate parameters values. We proposed a fuzzy robust regression method by using fuzzy numbers when x is crisp and Y is a triangular fuzzy number and in case of outliers in the data set, a weight matrix was defined by the membership function of the residuals. In the fuzzy robust regression, fuzzy sets and fuzzy regression analysis was used in ranking of residuals and in estimation of regression parameters, respectively [Sanli K, Apaydin A. Fuzzy robust regression analysis based on the ranking of fuzzy sets. Inernat. J. Uncertainty Fuzziness and Knowledge-Based Syst 2008;16:663-81.]. In this study, standard deviation estimations are obtained for the parameters by the defined weight matrix. Moreover, we propose another point of view in hypotheses testing for parameters.

  2. Regression modeling of ground-water flow

    Cooley, R.L.; Naff, R.L.


    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  3. Relative risk regression analysis of epidemiologic data.

    Prentice, R L


    Relative risk regression methods are described. These methods provide a unified approach to a range of data analysis problems in environmental risk assessment and in the study of disease risk factors more generally. Relative risk regression methods are most readily viewed as an outgrowth of Cox's regression and life model. They can also be viewed as a regression generalization of more classical epidemiologic procedures, such as that due to Mantel and Haenszel. In the context of an epidemiologic cohort study, relative risk regression methods extend conventional survival data methods and binary response (e.g., logistic) regression models by taking explicit account of the time to disease occurrence while allowing arbitrary baseline disease rates, general censorship, and time-varying risk factors. This latter feature is particularly relevant to many environmental risk assessment problems wherein one wishes to relate disease rates at a particular point in time to aspects of a preceding risk factor history. Relative risk regression methods also adapt readily to time-matched case-control studies and to certain less standard designs. The uses of relative risk regression methods are illustrated and the state of development of these procedures is discussed. It is argued that asymptotic partial likelihood estimation techniques are now well developed in the important special case in which the disease rates of interest have interpretations as counting process intensity functions. Estimation of relative risks processes corresponding to disease rates falling outside this class has, however, received limited attention. The general area of relative risk regression model criticism has, as yet, not been thoroughly studied, though a number of statistical groups are studying such features as tests of fit, residuals, diagnostics and graphical procedures. Most such studies have been restricted to exponential form relative risks as have simulation studies of relative risk estimation

  4. Variable and subset selection in PLS regression

    Høskuldsson, Agnar


    The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...

  5. Applied Regression Modeling A Business Approach

    Pardoe, Iain


    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  6. Regressive language in severe head injury.

    Thomsen, I V; Skinhoj, E


    In a follow-up study of 50 patients with severe head injuries three patients had echolalia. One patient with initially global aphasia had echolalia for some weeks when he started talking. Another patient with severe diffuse brain damage, dementia, and emotional regression had echolalia. The dysfunction was considered a detour performance. In the third patient echolalia and palilalia were details in a total pattern of regression lasting for months. The patient, who had extensive frontal atrophy secondary to a very severe head trauma, presented an extreme state of regression returning to a foetal-body pattern and behaving like a baby.

  7. Regression of altitude-produced cardiac hypertrophy.

    Sizemore, D. A.; Mcintyre, T. W.; Van Liere, E. J.; Wilson , M. F.


    The rate of regression of cardiac hypertrophy with time has been determined in adult male albino rats. The hypertrophy was induced by intermittent exposure to simulated high altitude. The percentage hypertrophy was much greater (46%) in the right ventricle than in the left (16%). The regression could be adequately fitted to a single exponential function with a half-time of 6.73 plus or minus 0.71 days (90% CI). There was no significant difference in the rates of regression for the two ventricles.

  8. Regression of altitude-produced cardiac hypertrophy.

    Sizemore, D. A.; Mcintyre, T. W.; Van Liere, E. J.; Wilson , M. F.


    The rate of regression of cardiac hypertrophy with time has been determined in adult male albino rats. The hypertrophy was induced by intermittent exposure to simulated high altitude. The percentage hypertrophy was much greater (46%) in the right ventricle than in the left (16%). The regression could be adequately fitted to a single exponential function with a half-time of 6.73 plus or minus 0.71 days (90% CI). There was no significant difference in the rates of regression for the two ventricles.

  9. A comparison of regression and regression-kriging for soil characterization using remote sensing imagery

    In precision agriculture regression has been used widely to quality the relationship between soil attributes and other environmental variables. However, spatial correlation existing in soil samples usually makes the regression model suboptimal. In this study, a regression-kriging method was attemp...

  10. Multiple Instance Regression with Structured Data

    Wagstaff, Kiri L.; Lane, Terran; Roper, Alex


    This slide presentation reviews the use of multiple instance regression with structured data from multiple and related data sets. It applies the concept to a practical problem, that of estimating crop yield using remote sensed country wide weekly observations.

  11. Prediction of Dynamical Systems by Symbolic Regression

    Quade, Markus; Shafi, Kamran; Niven, Robert K; Noack, Bernd R


    We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting a...

  12. Some Simple Computational Formulas for Multiple Regression

    Aiken, Lewis R., Jr.


    Short-cut formulas are presented for direct computation of the beta weights, the standard errors of the beta weights, and the multiple correlation coefficient for multiple regression problems involving three independent variables and one dependent variable. (Author)

  13. Spontaneous Regression of an Incidental Spinal Meningioma

    Ali Yilmaz


    Full Text Available AIM: The regression of meningioma has been reported in literature before. In spite of the fact that the regression may be involved by hemorrhage, calcification or some drugs withdrawal, it is rarely observed spontaneously. CASE REPORT: We report a 17 year old man with a cervical meningioma which was incidentally detected. In his cervical MRI an extradural, cranio-caudal contrast enchanced lesion at C2-C3 levels of the cervical spinal cord was detected. Despite the slight compression towards the spinal cord, he had no symptoms and refused any kind of surgical approach. The meningioma was followed by control MRI and it spontaneously regressed within six months. There were no signs of hemorrhage or calcification. CONCLUSION: Although it is a rare condition, the clinicians should consider that meningiomas especially incidentally diagnosed may be regressed spontaneously.

  14. Spontaneous Regression of an Incidental Spinal Meningioma.

    Yilmaz, Ali; Kizilay, Zahir; Sair, Ahmet; Avcil, Mucahit; Ozkul, Ayca


    The regression of meningioma has been reported in literature before. In spite of the fact that the regression may be involved by hemorrhage, calcification or some drugs withdrawal, it is rarely observed spontaneously. We report a 17 year old man with a cervical meningioma which was incidentally detected. In his cervical MRI an extradural, cranio-caudal contrast enchanced lesion at C2-C3 levels of the cervical spinal cord was detected. Despite the slight compression towards the spinal cord, he had no symptoms and refused any kind of surgical approach. The meningioma was followed by control MRI and it spontaneously regressed within six months. There were no signs of hemorrhage or calcification. Although it is a rare condition, the clinicians should consider that meningiomas especially incidentally diagnosed may be regressed spontaneously.

  15. Vectors, a tool in statistical regression theory

    Corsten, L.C.A.


    Using linear algebra this thesis developed linear regression analysis including analysis of variance, covariance analysis, special experimental designs, linear and fertility adjustments, analysis of experiments at different places and times. The determination of the orthogonal projection, yielding e

  16. Patterns of Regression in Rett Syndrome

    J Gordon Millichap


    Full Text Available Patterns and features of regression in a case series of 53 girls and women with Rett syndrome were studied at the Institute of Child Health and Great Ormond Street Children’s Hospital, London, UK.

  17. A new bivariate negative binomial regression model

    Faroughi, Pouya; Ismail, Noriszura


    This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.

  18. Heteroscedastic regression analysis method for mixed data

    FU Hui-min; YUE Xiao-rui


    The heteroscedastic regression model was established and the heteroscedastic regression analysis method was presented for mixed data composed of complete data, type- I censored data and type- Ⅱ censored data from the location-scale distribution. The best unbiased estimations of regression coefficients, as well as the confidence limits of the location parameter and scale parameter were given. Furthermore, the point estimations and confidence limits of percentiles were obtained. Thus, the traditional multiple regression analysis method which is only suitable to the complete data from normal distribution can be extended to the cases of heteroscedastic mixed data and the location-scale distribution. So the presented method has a broad range of promising applications.

  19. Superquantile Regression: Theory, Algorithms, and Applications


    Isabel. I love having you in my arms, and although you are still too young to understand what a hug is, your warmth has given me the strength and...squares and the quantile regression models adjust to changes in the data set, denoted by the red dots. Notice that the observa- tions are moved upwards...model hardly changes. If we change this observation in red even further upwards, we would notice no more changes in the quantile regression function

  20. Marginal longitudinal semiparametric regression via penalized splines

    Al Kadiri, M.


    We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.

  1. Boosted regression tree, table, and figure data

    Spreadsheets are included here to support the manuscript Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. This dataset is associated with the following publication:Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).

  2. Fuzzy multiple linear regression: A computational approach

    Juang, C. H.; Huang, X. H.; Fleming, J. W.


    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  3. Discriminative Elastic-Net Regularized Linear Regression.

    Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen


    In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at

  4. Spontaneous regression of metastatic Merkel cell carcinoma.

    Hassan, S J


    Merkel cell carcinoma is a rare aggressive neuroendocrine carcinoma of the skin predominantly affecting elderly Caucasians. It has a high rate of local recurrence and regional lymph node metastases. It is associated with a poor prognosis. Complete spontaneous regression of Merkel cell carcinoma has been reported but is a poorly understood phenomenon. Here we present a case of complete spontaneous regression of metastatic Merkel cell carcinoma demonstrating a markedly different pattern of events from those previously published.

  5. The Infinite Hierarchical Factor Regression Model

    Rai, Piyush


    We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.

  6. Marginal longitudinal semiparametric regression via penalized splines.

    Kadiri, M Al; Carroll, R J; Wand, M P


    We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.

  7. Multiple-Instance Regression with Structured Data

    Wagstaff, Kiri L.; Lane, Terran; Roper, Alex


    We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.

  8. [Iris movement mediates pupillary membrane regression].

    Morizane, Yuki


    In the course of mammalian lens development, a transient capillary meshwork called as the pupillary membrane (PM) forms. It is located in the pupil area to nourish the anterior surface of the lens, and then regresses to clear the optical path. Although the involvement of the apoptotic process has been reported in PM regression, the initiating factor remains unknown. We initially found that regression of the PM coincided with the development of iris motility, and that iris movement caused cessation and resumption of blood flow within the PM. Therefore, we investigated whether the development of the capacity of the iris to constrict and dilate can function as an essential signal that induces apoptosis in the PM. Continuous inhibition of iris movement with mydriatic agents suppressed apoptosis of the PM and resulted in the persistence of PM in rats. The distribution of apoptotic cells in the regressing PM was diffuse and showed no apparent localization. These results indicated that iris movement induced regression of the PM by changing the blood flow within it. This study suggests the importance of the physiological interactions between tissues-in this case, the iris and the PM-as a signal to advance vascular regression during organ development.

  9. Post-processing through linear regression

    B. Van Schaeybroeck


    Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.

    These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  10. Post-processing through linear regression

    van Schaeybroeck, B.; Vannitsem, S.


    Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  11. Regression Test Selection for C# Programs

    Nashat Mansour


    Full Text Available We present a regression test selection technique for C# programs. C# is fairly new and is often used within the Microsoft .Net framework to give programmers a solid base to develop a variety of applications. Regression testing is done after modifying a program. Regression test selection refers to selecting a suitable subset of test cases from the original test suite in order to be rerun. It aims to provide confidence that the modifications are correct and did not affect other unmodified parts of the program. The regression test selection technique presented in this paper accounts for C#.Net specific features. Our technique is based on three phases; the first phase builds an Affected Class Diagram consisting of classes that are affected by the change in the source code. The second phase builds a C# Interclass Graph (CIG from the affected class diagram based on C# specific features. In this phase, we reduce the number of selected test cases. The third phase involves further reduction and a new metric for assigning weights to test cases for prioritizing the selected test cases. We have empirically validated the proposed technique by using case studies. The empirical results show the usefulness of the proposed regression testing technique for C#.Net programs.

  12. Regression analysis using dependent Polya trees.

    Schörgendorfer, Angela; Branscum, Adam J


    Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.

  13. Hyperglycemia impairs atherosclerosis regression in mice.

    Gaudreault, Nathalie; Kumar, Nikit; Olivas, Victor R; Eberlé, Delphine; Stephens, Kyle; Raffai, Robert L


    Diabetic patients are known to be more susceptible to atherosclerosis and its associated cardiovascular complications. However, the effects of hyperglycemia on atherosclerosis regression remain unclear. We hypothesized that hyperglycemia impairs atherosclerosis regression by modulating the biological function of lesional macrophages. HypoE (Apoe(h/h)Mx1-Cre) mice express low levels of apolipoprotein E (apoE) and develop atherosclerosis when fed a high-fat diet. Atherosclerosis regression occurs in these mice upon plasma lipid lowering induced by a change in diet and the restoration of apoE expression. We examined the morphological characteristics of regressed lesions and assessed the biological function of lesional macrophages isolated with laser-capture microdissection in euglycemic and hyperglycemic HypoE mice. Hyperglycemia induced by streptozotocin treatment impaired lesion size reduction (36% versus 14%) and lipid loss (38% versus 26%) after the reversal of hyperlipidemia. However, decreases in lesional macrophage content and remodeling in both groups of mice were similar. Gene expression analysis revealed that hyperglycemia impaired cholesterol transport by modulating ATP-binding cassette A1, ATP-binding cassette G1, scavenger receptor class B family member (CD36), scavenger receptor class B1, and wound healing pathways in lesional macrophages during atherosclerosis regression. Hyperglycemia impairs both reduction in size and loss of lipids from atherosclerotic lesions upon plasma lipid lowering without significantly affecting the remodeling of the vascular wall.

  14. Competing Risks Quantile Regression at Work

    Dlugosz, Stephan; Lo, Simon M. S.; Wilke, Ralf


    Despite its emergence as a frequently used method for the empirical analysis of multivariate data, quantile regression is yet to become a mainstream tool for the analysis of duration data. We present a pioneering empirical study on the grounds of a competing risks quantile regression model. We us...... into the distribution of transitions out of maternity leave. It is found that cumulative incidences implied by the quantile regression model differ from those implied by a proportional hazards model. To foster the use of the model, we make an R-package (cmprskQR) available....... large-scale maternity duration data with multiple competing risks derived from German linked social security records to analyse how public policies are related to the length of economic inactivity of young mothers after giving birth. Our results show that the model delivers detailed insights...




    Este trabalho tem como objetivo principal adaptar o modelo STR-Tree, o qual é a combinação de um modelo Smooth Transition Regression com Classification and Regression Tree (CART), a fim de utilizá-lo em Classificação. Para isto algumas alterações foram realizadas em sua forma estrutural e na estimação. Devido ao fato de estarmos fazendo classificação de variáveis dependentes binárias, se faz necessária a utilização das técnicas empregadas em Regressão Logística, dessa forma a estimação dos pa...

  16. Unsupervised K-Nearest Neighbor Regression

    Kramer, Oliver


    In many scientific disciplines structures in high-dimensional data have to be found, e.g., in stellar spectra, in genome data, or for face recognition tasks. In this work we present a novel approach to non-linear dimensionality reduction. It is based on fitting K-nearest neighbor regression to the unsupervised regression framework for learning of low-dimensional manifolds. Similar to related approaches that are mostly based on kernel methods, unsupervised K-nearest neighbor (UKNN) regression optimizes latent variables w.r.t. the data space reconstruction error employing the K-nearest neighbor heuristic. The problem of optimizing latent neighborhoods is difficult to solve, but the UKNN formulation allows an efficient strategy of iteratively embedding latent points to fixed neighborhood topologies. The approaches will be tested experimentally.


    Bogdan OANCEA


    Full Text Available In this paper we present a way to solve the linear regression model with R and Hadoop using the Rhadoop library. We show how the linear regression model can be solved even for very large models that require special technologies. For storing the data we used Hadoop and for computation we used R. The interface between R and Hadoop is the open source library RHadoop. We present the main features of the Hadoop and R software systems and the way of interconnecting them. We then show how the least squares solution for the linear regression problem could be expressed in terms of map-reduce programming paradigm and how could be implemented using the Rhadoop library.

  18. Rapidly Regressive Unilateral Fetal Pleural Effusion

    Tuncay Yuce


    Full Text Available Intrauterine pleural effusion of fetal lungs rarely regresses without intervention. In our case we treated a women at 32th weeks of gestation. Her pregnancy was complicated with fetal pleural effusion and polyhydramniosis. A therapeutic thoracocentesis was planned and she received two courses of betamethasone prior to procedure. On the day of planned procedure, a substantial regression of pleural effusion was observed and procedure was postponed. During her antenatal follow-up a complete regression of pleural effusion was observed. After delivery pleural effusion did not relapse. These findings hint there may be a role of antenatal steroids in treatment of fetal pleural effusion, which is known to be resistant to treatment modalities both during antenatal and postnatal period. [Cukurova Med J 2015; 40(Suppl 1: 25-28

  19. Uncertainty quantification in DIC with Kriging regression

    Wang, Dezhi; DiazDelaO, F. A.; Wang, Weizhuo; Lin, Xiaoshan; Patterson, Eann A.; Mottershead, John E.


    A Kriging regression model is developed as a post-processing technique for the treatment of measurement uncertainty in classical subset-based Digital Image Correlation (DIC). Regression is achieved by regularising the sample-point correlation matrix using a local, subset-based, assessment of the measurement error with assumed statistical normality and based on the Sum of Squared Differences (SSD) criterion. This leads to a Kriging-regression model in the form of a Gaussian process representing uncertainty on the Kriging estimate of the measured displacement field. The method is demonstrated using numerical and experimental examples. Kriging estimates of displacement fields are shown to be in excellent agreement with 'true' values for the numerical cases and in the experimental example uncertainty quantification is carried out using the Gaussian random process that forms part of the Kriging model. The root mean square error (RMSE) on the estimated displacements is produced and standard deviations on local strain estimates are determined.




    Full Text Available Ordinary least square is a parameter estimations for minimizing residual sum of squares. If the multicollinearity was found in the data, unbias estimator with minimum variance could not be reached. Multicollinearity is a linear correlation between independent variabels in model. Jackknife Ridge Regression(JRR as an extension of Generalized Ridge Regression (GRR for solving multicollinearity.  Generalized Ridge Regression is used to overcome the bias of estimators caused of presents multicollinearity by adding different bias parameter for each independent variabel in least square equation after transforming the data into an orthoghonal form. Beside that, JRR can  reduce the bias of the ridge estimator. The result showed that JRR model out performs GRR model.

  1. On Solving Lq-Penalized Regressions

    Tracy Zhou Wu


    Full Text Available Lq-penalized regression arises in multidimensional statistical modelling where all or part of the regression coefficients are penalized to achieve both accuracy and parsimony of statistical models. There is often substantial computational difficulty except for the quadratic penalty case. The difficulty is partly due to the nonsmoothness of the objective function inherited from the use of the absolute value. We propose a new solution method for the general Lq-penalized regression problem based on space transformation and thus efficient optimization algorithms. The new method has immediate applications in statistics, notably in penalized spline smoothing problems. In particular, the LASSO problem is shown to be polynomial time solvable. Numerical studies show promise of our approach.

  2. Principal component regression for crop yield estimation

    Suryanarayana, T M V


    This book highlights the estimation of crop yield in Central Gujarat, especially with regard to the development of Multiple Regression Models and Principal Component Regression (PCR) models using climatological parameters as independent variables and crop yield as a dependent variable. It subsequently compares the multiple linear regression (MLR) and PCR results, and discusses the significance of PCR for crop yield estimation. In this context, the book also covers Principal Component Analysis (PCA), a statistical procedure used to reduce a number of correlated variables into a smaller number of uncorrelated variables called principal components (PC). This book will be helpful to the students and researchers, starting their works on climate and agriculture, mainly focussing on estimation models. The flow of chapters takes the readers in a smooth path, in understanding climate and weather and impact of climate change, and gradually proceeds towards downscaling techniques and then finally towards development of ...

  3. A tutorial on Bayesian Normal linear regression

    Klauenberg, Katy; Wübbeler, Gerd; Mickan, Bodo; Harris, Peter; Elster, Clemens


    Regression is a common task in metrology and often applied to calibrate instruments, evaluate inter-laboratory comparisons or determine fundamental constants, for example. Yet, a regression model cannot be uniquely formulated as a measurement function, and consequently the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements are not applicable directly. Bayesian inference, however, is well suited to regression tasks, and has the advantage of accounting for additional a priori information, which typically robustifies analyses. Furthermore, it is anticipated that future revisions of the GUM shall also embrace the Bayesian view. Guidance on Bayesian inference for regression tasks is largely lacking in metrology. For linear regression models with Gaussian measurement errors this tutorial gives explicit guidance. Divided into three steps, the tutorial first illustrates how a priori knowledge, which is available from previous experiments, can be translated into prior distributions from a specific class. These prior distributions have the advantage of yielding analytical, closed form results, thus avoiding the need to apply numerical methods such as Markov Chain Monte Carlo. Secondly, formulas for the posterior results are given, explained and illustrated, and software implementations are provided. In the third step, Bayesian tools are used to assess the assumptions behind the suggested approach. These three steps (prior elicitation, posterior calculation, and robustness to prior uncertainty and model adequacy) are critical to Bayesian inference. The general guidance given here for Normal linear regression tasks is accompanied by a simple, but real-world, metrological example. The calibration of a flow device serves as a running example and illustrates the three steps. It is shown that prior knowledge from previous calibrations of the same sonic nozzle enables robust predictions even for extrapolations.

  4. Characterizing and estimating rice brown spot disease severity using stepwise regression, principal component regression and partial least-square regression


    Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study,measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respectively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demonstrates that it is feasible to estimate the disease severity office brown spot using hyperspectral reflectance data at the leaf level.

  5. Removing Malmquist bias from linear regressions

    Verter, Frances


    Malmquist bias is present in all astronomical surveys where sources are observed above an apparent brightness threshold. Those sources which can be detected at progressively larger distances are progressively more limited to the intrinsically luminous portion of the true distribution. This bias does not distort any of the measurements, but distorts the sample composition. We have developed the first treatment to correct for Malmquist bias in linear regressions of astronomical data. A demonstration of the corrected linear regression that is computed in four steps is presented.

  6. Multicollinearity in cross-sectional regressions

    Lauridsen, Jørgen; Mur, Jesùs


    The paper examines robustness of results from cross-sectional regression paying attention to the impact of multicollinearity. It is well known that the reliability of estimators (least-squares or maximum-likelihood) gets worse as the linear relationships between the regressors become more acute. We resolve the discussion in a spatial context, looking closely into the behaviour shown, under several unfavourable conditions, by the most outstanding misspecification tests when collinear variables are added to the regression. A Monte Carlo simulation is performed. The conclusions point to the fact that these statistics react in different ways to the problems posed.

  7. Currency Substitution and the Regressivity of Inflationary Taxation Currency Substitution and the Regressivity of Inflationary Taxation

    Federico A. Sturzeneger


    Full Text Available Currency Substitution and the Regressivity of Inflationary Taxation The purpose of this paper is to show that in the presence of financial adaptation or currency substitution. the inflation tax is extremely regressive. This regressivity arises from the existence of a fixed cost of switching to inflation-proof transactions technologies. This fixed cost makes it optimal only for those agents with sufficiently high incomes to switch out of domestic currency. The effects are illustrated and quantified for a particular case.

  8. Demonstration of a Fiber Optic Regression Probe

    Korman, Valentin; Polzin, Kurt A.


    The capability to provide localized, real-time monitoring of material regression rates in various applications has the potential to provide a new stream of data for development testing of various components and systems, as well as serving as a monitoring tool in flight applications. These applications include, but are not limited to, the regression of a combusting solid fuel surface, the ablation of the throat in a chemical rocket or the heat shield of an aeroshell, and the monitoring of erosion in long-life plasma thrusters. The rate of regression in the first application is very fast, while the second and third are increasingly slower. A recent fundamental sensor development effort has led to a novel regression, erosion, and ablation sensor technology (REAST). The REAST sensor allows for measurement of real-time surface erosion rates at a discrete surface location. The sensor is optical, using two different, co-located fiber-optics to perform the regression measurement. The disparate optical transmission properties of the two fiber-optics makes it possible to measure the regression rate by monitoring the relative light attenuation through the fibers. As the fibers regress along with the parent material in which they are embedded, the relative light intensities through the two fibers changes, providing a measure of the regression rate. The optical nature of the system makes it relatively easy to use in a variety of harsh, high temperature environments, and it is also unaffected by the presence of electric and magnetic fields. In addition, the sensor could be used to perform optical spectroscopy on the light emitted by a process and collected by fibers, giving localized measurements of various properties. The capability to perform an in-situ measurement of material regression rates is useful in addressing a variety of physical issues in various applications. An in-situ measurement allows for real-time data regarding the erosion rates, providing a quick method for

  9. A Skew-Normal Mixture Regression Model

    Liu, Min; Lin, Tsung-I


    A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…

  10. Predicting Social Trust with Binary Logistic Regression

    Adwere-Boamah, Joseph; Hufstedler, Shirley


    This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…

  11. The Geometry of Enhancement in Multiple Regression.

    Waller, Niels G


    In linear multiple regression, "enhancement" is said to occur when R (2)=b'r>r'r, where b is a p×1 vector of standardized regression coefficients and r is a p×1 vector of correlations between a criterion y and a set of standardized regressors, x. When p=1 then b≡r and enhancement cannot occur. When p=2, for all full-rank R xx≠I, R xx=E[xx']=V Λ V' (where V Λ V' denotes the eigen decomposition of R xx; λ 1>λ 2), the set [Formula: see text] contains four vectors; the set [Formula: see text]; [Formula: see text] contains an infinite number of vectors. When p≥3 (and λ 1>λ 2>⋯>λ p ), both sets contain an uncountably infinite number of vectors. Geometrical arguments demonstrate that B 1 occurs at the intersection of two hyper-ellipsoids in ℝ (p) . Equations are provided for populating the sets B 1 and B 2 and for demonstrating that maximum enhancement occurs when b is collinear with the eigenvector that is associated with λ p (the smallest eigenvalue of the predictor correlation matrix). These equations are used to illustrate the logic and the underlying geometry of enhancement in population, multiple-regression models. R code for simulating population regression models that exhibit enhancement of any degree and any number of predictors is included in Appendices A and B.

  12. Regression Segmentation for M³ Spinal Images.

    Wang, Zhijie; Zhen, Xiantong; Tay, KengYeow; Osman, Said; Romano, Walter; Li, Shuo


    Clinical routine often requires to analyze spinal images of multiple anatomic structures in multiple anatomic planes from multiple imaging modalities (M(3)). Unfortunately, existing methods for segmenting spinal images are still limited to one specific structure, in one specific plane or from one specific modality (S(3)). In this paper, we propose a novel approach, Regression Segmentation, that is for the first time able to segment M(3) spinal images in one single unified framework. This approach formulates the segmentation task innovatively as a boundary regression problem: modeling a highly nonlinear mapping function from substantially diverse M(3) images directly to desired object boundaries. Leveraging the advancement of sparse kernel machines, regression segmentation is fulfilled by a multi-dimensional support vector regressor (MSVR) which operates in an implicit, high dimensional feature space where M(3) diversity and specificity can be systematically categorized, extracted, and handled. The proposed regression segmentation approach was thoroughly tested on images from 113 clinical subjects including both disc and vertebral structures, in both sagittal and axial planes, and from both MRI and CT modalities. The overall result reaches a high dice similarity index (DSI) 0.912 and a low boundary distance (BD) 0.928 mm. With our unified and expendable framework, an efficient clinical tool for M(3) spinal image segmentation can be easily achieved, and will substantially benefit the diagnosis and treatment of spinal diseases.

  13. A Spline Regression Model for Latent Variables

    Harring, Jeffrey R.


    Spline (or piecewise) regression models have been used in the past to account for patterns in observed data that exhibit distinct phases. The changepoint or knot marking the shift from one phase to the other, in many applications, is an unknown parameter to be estimated. As an extension of this framework, this research considers modeling the…

  14. Assessing risk factors for periodontitis using regression

    Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa


    Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.

  15. Selecting a Regression Saturated by Indicators

    Hendry, David F.; Johansen, Søren; Santos, Carlos

    We consider selecting a regression model, using a variant of Gets, when there are more variables than observations, in the special case that the variables are impulse dummies (indicators) for every observation. We show that the setting is unproblematic if tackled appropriately, and obtain...

  16. Functional data analysis of generalized regression quantiles

    Guo, Mengmeng


    Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.

  17. Nonparametric regression with martingale increment errors

    Delattre, Sylvain


    We consider the problem of adaptive estimation of the regression function in a framework where we replace ergodicity assumptions (such as independence or mixing) by another structural assumption on the model. Namely, we propose adaptive upper bounds for kernel estimators with data-driven bandwidth (Lepski's selection rule) in a regression model where the noise is an increment of martingale. It includes, as very particular cases, the usual i.i.d. regression and auto-regressive models. The cornerstone tool for this study is a new result for self-normalized martingales, called ``stability'', which is of independent interest. In a first part, we only use the martingale increment structure of the noise. We give an adaptive upper bound using a random rate, that involves the occupation time near the estimation point. Thanks to this approach, the theoretical study of the statistical procedure is disconnected from usual ergodicity properties like mixing. Then, in a second part, we make a link with the usual minimax th...

  18. Finite Algorithms for Robust Linear Regression

    Madsen, Kaj; Nielsen, Hans Bruun


    The Huber M-estimator for robust linear regression is analyzed. Newton type methods for solution of the problem are defined and analyzed, and finite convergence is proved. Numerical experiments with a large number of test problems demonstrate efficiency and indicate that this kind of approach may...

  19. Finite Algorithms for Robust Linear Regression

    Madsen, Kaj; Nielsen, Hans Bruun


    The Huber M-estimator for robust linear regression is analyzed. Newton type methods for solution of the problem are defined and analyzed, and finite convergence is proved. Numerical experiments with a large number of test problems demonstrate efficiency and indicate that this kind of approach may...

  20. Modeling confounding by half-sibling regression

    Schölkopf, Bernhard; Hogg, David W; Wang, Dun


    We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both...

  1. Selecting a Regression Saturated by Indicators

    Hendry, David F.; Johansen, Søren; Santos, Carlos

    We consider selecting a regression model, using a variant of Gets, when there are more variables than observations, in the special case that the variables are impulse dummies (indicators) for every observation. We show that the setting is unproblematic if tackled appropriately, and obtain...

  2. Structural Break Tests Robust to Regression Misspecification

    Abi Morshed, Alaa; Andreou, E.; Boldea, Otilia


    Structural break tests developed in the literature for regression models are sensitive to model misspecification. We show - analytically and through simulations - that the sup Wald test for breaks in the conditional mean and variance of a time series process exhibits severe size distortions when the

  3. Panel data specifications in nonparametric kernel regression

    Czekaj, Tomasz Gerard; Henningsen, Arne

    parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...

  4. The M Word: Multicollinearity in Multiple Regression.

    Morrow-Howell, Nancy


    Notes that existence of substantial correlation between two or more independent variables creates problems of multicollinearity in multiple regression. Discusses multicollinearity problem in social work research in which independent variables are usually intercorrelated. Clarifies problems created by multicollinearity, explains detection of…

  5. Genetic Programming Transforms in Linear Regression Situations

    Castillo, Flor; Kordon, Arthur; Villa, Carlos

    The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.

  6. Macroeconomic Forecasting Using Penalized Regression Methods

    Smeekes, Stephan; Wijler, Etiënne


    We study the suitability of lasso-type penalized regression techniques when applied to macroeconomic forecasting with high-dimensional datasets. We consider performance of the lasso-type methods when the true DGP is a factor model, contradicting the sparsity assumption underlying penalized regressio

  7. Deriving the Regression Line with Algebra

    Quintanilla, John A.


    Exploration with spreadsheets and reliance on previous skills can lead students to determine the line of best fit. To perform linear regression on a set of data, students in Algebra 2 (or, in principle, Algebra 1) do not have to settle for using the mysterious "black box" of their graphing calculators (or other classroom technologies).…

  8. Prediction of dynamical systems by symbolic regression

    Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.


    We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.

  9. Progression and regression of the atherosclerotic plaque.

    de Feyter, P J; Vos, J; Deckers, J W


    In animals in which atherosclerosis was induced experimentally (by a high cholesterol diet) regression of the atherosclerotic lesion was demonstrated after serum cholesterol was reduced by cholesterol- lowering drugs or a low-fat diet. Whether regression of advanced coronary arterly lesions also takes place in humans after a similar intervention remains conjectural. However, several randomized studies, primarily employing lipid-lowering intervention or comprehensive changes in lifestyle, have demonstrated, using serial angiograms, that it is possible to achieve less progression, arrest or even (small) regression of atherosclerotic lesions. The lipid-lowering trials (NHBLI, CLAS, POSCH, FATS, SCOR and STARS) studied 1240 symptomatic patients, mostly men, with moderately elevated cholesterol levels and moderately severe angiographic-proven coronary artery disease. A variety of lipid-lowering drugs, in addition to a diet, were used over an intervention period ranging from 2 to 3 years. In all but one study (NHBLI), the progression of coronary atherosclerosis was less in the treated group, but regression was induced in only a few patients. The overall relative risk of progression of coronary atherosclerosis was 0 x 62 and 2 x 13, respectively. The induced angiographic differences were small and did not produce any significant haemodynamic benefit. The most important result was tht the disease process could be stabilized in the majority of patients. Three comprehensive lifestyle change trials (the Lifestyle Heart study, STARS and the Heidelberg Study) studied 183 patients, who were subjected to stress management, and/or intensive exercise, in addition to a low fat diet, over a period ranging from 1 to 3 years. All three trials demonstrated less progression, and more regression with overall relative risks of 0 x 40 and 2 x 35 respectively, in the intervention groups. Angiographic trials demonstrated that retardation or arrest of coronary atherosclerosis was possible

  10. Logistic regression when binary predictor variables are highly correlated.

    Barker, L; Brown, C

    Standard logistic regression can produce estimates having large mean square error when predictor variables are multicollinear. Ridge regression and principal components regression can reduce the impact of multicollinearity in ordinary least squares regression. Generalizations of these, applicable in the logistic regression framework, are alternatives to standard logistic regression. It is shown that estimates obtained via ridge and principal components logistic regression can have smaller mean square error than estimates obtained through standard logistic regression. Recommendations for choosing among standard, ridge and principal components logistic regression are developed. Published in 2001 by John Wiley & Sons, Ltd.

  11. Model selection in kernel ridge regression

    Exterkate, Peter


    Kernel ridge regression is a technique to perform ridge regression with a potentially infinite number of nonlinear transformations of the independent variables as regressors. This method is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts....... The influence of the choice of kernel and the setting of tuning parameters on forecast accuracy is investigated. Several popular kernels are reviewed, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. The latter two kernels are interpreted in terms of their smoothing properties......, and the tuning parameters associated to all these kernels are related to smoothness measures of the prediction function and to the signal-to-noise ratio. Based on these interpretations, guidelines are provided for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study...

  12. Online support vector regression for reinforcement learning

    Yu Zhenhua; Cai Yuanli


    The goal in reinforcement learning is to learn the value of state-action pair in order to maximize the total reward. For continuous states and actions in the real world, the representation of value functions is critical. Furthermore, the samples in value functions are sequentially obtained. Therefore, an online support vector regression (OSVR) is set up, which is a function approximator to estimate value functions in reinforcement learning. OSVR updates the regression function by analyzing the possible variation of support vector sets after new samples are inserted to the training set. To evaluate the OSVR learning ability, it is applied to the mountain-car task. The simulation results indicate that the OSVR has a preferable convergence speed and can solve continuous problems that are infeasible using lookup table.

  13. Constrained regression models for optimization and forecasting

    P.J.S. Bruwer


    Full Text Available Linear regression models and the interpretation of such models are investigated. In practice problems often arise with the interpretation and use of a given regression model in spite of the fact that researchers may be quite "satisfied" with the model. In this article methods are proposed which overcome these problems. This is achieved by constructing a model where the "area of experience" of the researcher is taken into account. This area of experience is represented as a convex hull of available data points. With the aid of a linear programming model it is shown how conclusions can be formed in a practical way regarding aspects such as optimal levels of decision variables and forecasting.

  14. Controlling attribute effect in linear regression

    Calders, Toon


    In data mining we often have to learn from biased data, because, for instance, data comes from different batches or there was a gender or racial bias in the collection of social data. In some applications it may be necessary to explicitly control this bias in the models we learn from the data. This paper is the first to study learning linear regression models under constraints that control the biasing effect of a given attribute such as gender or batch number. We show how propensity modeling can be used for factoring out the part of the bias that can be justified by externally provided explanatory attributes. Then we analytically derive linear models that minimize squared error while controlling the bias by imposing constraints on the mean outcome or residuals of the models. Experiments with discrimination-aware crime prediction and batch effect normalization tasks show that the proposed techniques are successful in controlling attribute effects in linear regression models. © 2013 IEEE.

  15. Multiple Kernel Spectral Regression for Dimensionality Reduction

    Bing Liu


    Full Text Available Traditional manifold learning algorithms, such as locally linear embedding, Isomap, and Laplacian eigenmap, only provide the embedding results of the training samples. To solve the out-of-sample extension problem, spectral regression (SR solves the problem of learning an embedding function by establishing a regression framework, which can avoid eigen-decomposition of dense matrices. Motivated by the effectiveness of SR, we incorporate multiple kernel learning (MKL into SR for dimensionality reduction. The proposed approach (termed MKL-SR seeks an embedding function in the Reproducing Kernel Hilbert Space (RKHS induced by the multiple base kernels. An MKL-SR algorithm is proposed to improve the performance of kernel-based SR (KSR further. Furthermore, the proposed MKL-SR algorithm can be performed in the supervised, unsupervised, and semi-supervised situation. Experimental results on supervised classification and semi-supervised classification demonstrate the effectiveness and efficiency of our algorithm.

  16. A Gibbs Sampler for Multivariate Linear Regression

    Mantz, Adam B


    Kelly (2007, hereafter K07) described an efficient algorithm, using Gibbs sampling, for performing linear regression in the fairly general case where non-zero measurement errors exist for both the covariates and response variables, where these measurements may be correlated (for the same data point), where the response variable is affected by intrinsic scatter in addition to measurement error, and where the prior distribution of covariates is modeled by a flexible mixture of Gaussians rather than assumed to be uniform. Here I extend the K07 algorithm in two ways. First, the procedure is generalized to the case of multiple response variables. Second, I describe how to model the prior distribution of covariates using a Dirichlet process, which can be thought of as a Gaussian mixture where the number of mixture components is learned from the data. I present an example of multivariate regression using the extended algorithm, namely fitting scaling relations of the gas mass, temperature, and luminosity of dynamica...

  17. Wavelet Scattering Regression of Quantum Chemical Energies

    Hirn, Matthew; Poilvert, Nicolas


    We introduce multiscale invariant dictionaries to estimate quantum chemical energies of organic molecules, from training databases. Molecular energies are invariant to isometric atomic displacements, and are Lipschitz continuous to molecular deformations. Similarly to density functional theory (DFT), the molecule is represented by an electronic density function. A multiscale invariant dictionary is calculated with wavelet scattering invariants. It cascades a first wavelet transform which separates scales, with a second wavelet transform which computes interactions across scales. Sparse scattering regressions give state of the art results over two databases of organic planar molecules. On these databases, the regression error is of the order of the error produced by DFT codes, but at a fraction of the computational cost.

  18. Are increases in cigarette taxation regressive?

    Borren, P; Sutton, M


    Using the latest published data from Tobacco Advisory Council surveys, this paper re-evaluates the question of whether or not increases in cigarette taxation are regressive in the United Kingdom. The extended data set shows no evidence of increasing price-elasticity by social class as found in a major previous study. To the contrary, there appears to be no clear pattern in the price responsiveness of smoking behaviour across different social classes. Increases in cigarette taxation, while reducing smoking levels in all groups, fall most heavily on men and women in the lowest social class. Men and women in social class five can expect to pay eight and eleven times more of a tax increase respectively, than their social class one counterparts. Taken as a proportion of relative incomes, the regressive nature of increases in cigarette taxation is even more pronounced.

  19. Nonexistence in Reciprocal and Logarithmic Regression

    Josef Bukac


    Fitting logarithmic b ln(clx), a+bln(c+x) or reciprocal b/(c+x), a+b/(c+x) regression models to data by the least squares method asks for the determination of the closure of the set of each type of these functions defined on a finite domain. It follows that a minimal solution may not exist. But it does exist when the closure is considered.

  20. Logistic regression a self-learning text

    Kleinbaum, David G


    This textbook provides students and professionals in the health sciences with a presentation of the use of logistic regression in research. The text is self-contained, and designed to be used both in class or as a tool for self-study. It arises from the author's many years of experience teaching this material and the notes on which it is based have been extensively used throughout the world.

  1. Curvatures for Parameter Subsets in Nonlinear Regression


    The relative curvature measures of nonlinearity proposed by Bates and Watts (1980) are extended to an arbitrary subset of the parameters in a normal, nonlinear regression model. In particular, the subset curvatures proposed indicate the validity of linearization-based approximate confidence intervals for single parameters. The derivation produces the original Bates-Watts measures directly from the likelihood function. When the intrinsic curvature is negligible, the Bates-Watts parameter-effec...

  2. Realization of Ridge Regression in MATLAB

    Dimitrov, S.; Kovacheva, S.; Prodanova, K.


    The least square estimator (LSE) of the coefficients in the classical linear regression models is unbiased. In the case of multicollinearity of the vectors of design matrix, LSE has very big variance, i.e., the estimator is unstable. A more stable estimator (but biased) can be constructed using ridge-estimator (RE). In this paper the basic methods of obtaining of Ridge-estimators and numerical procedures of its realization in MATLAB are considered. An application to Pharmacokinetics problem is considered.

  3. Bayesian Inference of a Multivariate Regression Model

    Marick S. Sinay


    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  4. Leukemia prediction using sparse logistic regression.

    Tapio Manninen

    Full Text Available We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an [Formula: see text] regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.

  5. Outlier Detection Using Nonconvex Penalized Regression

    She, Yiyuan


    This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the $n$ data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual $L_1$ penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The $L_1$ penalty corresponds to soft thresholding. We introduce a thresholding (denoted by $\\Theta$) based iterative procedure for outlier detection ($\\Theta$-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that $\\Theta$-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most $O(np)$ (and sometimes much less) avoiding an $O(np^2)$ least squares estimate. We describe the connection between $\\Theta$-IPOD and $M$-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression...

  6. General regression and representation model for classification.

    Jianjun Qian

    Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.

  7. A Dirty Model for Multiple Sparse Regression

    Jalali, Ali; Sanghavi, Sujay


    Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...

  8. Satellite rainfall retrieval by logistic regression

    Chiu, Long S.


    The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.

  9. The comparison of robust partial least squares regression with robust principal component regression on a real

    Polat, Esra; Gunay, Suleyman


    One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.

  10. Interpreting parameters in the logistic regression model with random effects

    Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben


    interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...

  11. Mission assurance increased with regression testing

    Lang, R.; Spezio, M.

    Knowing what to test is an important attribute in any testing campaign, especially when it has to be right or the mission could be in jeopardy. The New Horizons mission, developed and operated by the John Hopkins University Applied Physics Laboratory, received a planned major upgrade to their Mission Operations and Control (MOC) ground system architecture. Early in the mission planning it was recognized that the ground system platform would require an upgrade to assure continued support of technology used for spacecraft operations. With the planned update to the six year operational ground architecture from Solaris 8 to Solaris 10, it was critical that the new architecture maintain critical operations and control functions. The New Horizons spacecraft is heading to its historic rendezvous with Pluto in July 2015 and then proceeding into the Kuiper Belt. This paper discusses the Independent Software Acceptance Testing (ISAT) Regression test campaign that played a critical role to assure the continued success of the New Horizons mission. The New Horizons ISAT process was designed to assure all the requirements were being met for the ground software functions developed to support the mission objectives. The ISAT team developed a test plan with a series of test case designs. The test objectives were to verify that the software developed from the requirements functioned as expected in the operational environment. As the test cases were developed and executed, a regression test suite was identified at the functional level. This regression test suite would serve as a crucial resource in assuring the operational system continued to function as required with such a large scale change being introduced. Some of the New Horizons ground software changes required modifications to the most critical functions of the operational software. Of particular concern was the new MOC architecture (Solaris 10) is Intel based and little endian, and the legacy architecture (Solaris 8) was SPA

  12. Mapping geogenic radon potential by regression kriging.

    Pásztor, László; Szabó, Katalin Zsuzsanna; Szatmári, Gábor; Laborczi, Annamária; Horváth, Ákos


    Radon ((222)Rn) gas is produced in the radioactive decay chain of uranium ((238)U) which is an element that is naturally present in soils. Radon is transported mainly by diffusion and convection mechanisms through the soil depending mainly on the physical and meteorological parameters of the soil and can enter and accumulate in buildings. Health risks originating from indoor radon concentration can be attributed to natural factors and is characterized by geogenic radon potential (GRP). Identification of areas with high health risks require spatial modeling, that is, mapping of radon risk. In addition to geology and meteorology, physical soil properties play a significant role in the determination of GRP. In order to compile a reliable GRP map for a model area in Central-Hungary, spatial auxiliary information representing GRP forming environmental factors were taken into account to support the spatial inference of the locally measured GRP values. Since the number of measured sites was limited, efficient spatial prediction methodologies were searched for to construct a reliable map for a larger area. Regression kriging (RK) was applied for the interpolation using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly, the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Overall accuracy of the map was tested by Leave-One-Out Cross-Validation. Furthermore the spatial reliability of the resultant map is also estimated by the calculation of the 90% prediction interval of the local prediction values. The applicability of the applied method as well as that of the map is discussed briefly. Copyright © 2015 Elsevier B.V. All rights

  13. An Application on Multinomial Logistic Regression Model

    Abdalla M El-Habil


    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children, from a survey of Youth 2003 which was conducted by Palestinian Central Bureau of Statistics (PCBS. Segment of the population of children in the age group (10-14 years for residents in Gaza governorate, size of 66,935 had been selected, and the response variable consisted of four categories. Eighteen of explanatory variables were used for building the primary multinomial logistic regression model. Model had been tested through a set of statistical tests to ensure its appropriateness for the data. Also the model had been tested by selecting randomly of two observations of the data used to predict the position of each observation in any classified group it can be, by knowing the values of the explanatory variables used. We concluded by using the multinomial logistic regression model that we can able to define accurately the relationship between the group of explanatory variables and the response variable, identify the effect of each of the variables, and we can predict the classification of any individual case.

  14. Hierarchical linear regression models for conditional quantiles

    TIAN Maozai; CHEN Gemai


    The quantile regression has several useful features and therefore is gradually developing into a comprehensive approach to the statistical analysis of linear and nonlinear response models,but it cannot deal effectively with the data with a hierarchical structure.In practice,the existence of such data hierarchies is neither accidental nor ignorable,it is a common phenomenon.To ignore this hierarchical data structure risks overlooking the importance of group effects,and may also render many of the traditional statistical analysis techniques used for studying data relationships invalid.On the other hand,the hierarchical models take a hierarchical data structure into account and have also many applications in statistics,ranging from overdispersion to constructing min-max estimators.However,the hierarchical models are virtually the mean regression,therefore,they cannot be used to characterize the entire conditional distribution of a dependent variable given high-dimensional covariates.Furthermore,the estimated coefficient vector (marginal effects)is sensitive to an outlier observation on the dependent variable.In this article,a new approach,which is based on the Gauss-Seidel iteration and taking a full advantage of the quantile regression and hierarchical models,is developed.On the theoretical front,we also consider the asymptotic properties of the new method,obtaining the simple conditions for an n1/2-convergence and an asymptotic normality.We also illustrate the use of the technique with the real educational data which is hierarchical and how the results can be explained.

  15. [Refractive regression after intraocular lens implantation].

    Ma, Z Z; Momose, A


    Study of refractive changes after IOL implantation in 147 eyes revealed that astigmatism tended to increase, and the natural regressive course followed a negative exponential function, with the steep phase within 3 weeks for spherical, and 5 weeks for cylindrical errors. One (1) week after surgery, the axis of astigmatism was predominantly with the rule, and 2 months after operation, patients with preoperative WRA changed into various astigmatic axial directions, while 76.4% of the patients with preoperative ARA reverted to ARA. Those eyes in which the astigmatic axis was not horizontal 1 week after operation ended with stronger astigmatism in 2 months.

  16. Learning regulatory programs by threshold SVD regression.

    Ma, Xin; Xiao, Luo; Wong, Wing Hung


    We formulate a statistical model for the regulation of global gene expression by multiple regulatory programs and propose a thresholding singular value decomposition (T-SVD) regression method for learning such a model from data. Extensive simulations demonstrate that this method offers improved computational speed and higher sensitivity and specificity over competing approaches. The method is used to analyze microRNA (miRNA) and long noncoding RNA (lncRNA) data from The Cancer Genome Atlas (TCGA) consortium. The analysis yields previously unidentified insights into the combinatorial regulation of gene expression by noncoding RNAs, as well as findings that are supported by evidence from the literature.

  17. Privacy Preserving Linear Regression on Distributed Databases

    Fida K. Dankar


    Full Text Available Studies that combine data from multiple sources can tremendously improve the outcome of the statistical analysis. However, combining data from these various sources for analysis poses privacy risks. A number of protocols have been proposed in the literature to address the privacy concerns; however they do not fully deliver on either privacy or complexity. In this paper, we present a (theoretical privacy preserving linear regression model for the analysis of data owned by several sources. The protocol uses a semi-trusted third party and delivers on privacy and complexity.

  18. Cyclodextrin promotes atherosclerosis regression via macrophage reprogramming


    Atherosclerosis is an inflammatory disease linked to elevated blood cholesterol concentrations. Despite ongoing advances in the prevention and treatment of atherosclerosis, cardiovascular disease remains the leading cause of death worldwide. Continuous retention of apolipoprotein B...... that increases cholesterol solubility in preventing and reversing atherosclerosis. We showed that CD treatment of murine atherosclerosis reduced atherosclerotic plaque size and CC load and promoted plaque regression even with a continued cholesterol-rich diet. Mechanistically, CD increased oxysterol production...... of CD as well as for augmented reverse cholesterol transport. Because CD treatment in humans is safe and CD beneficially affects key mechanisms of atherogenesis, it may therefore be used clinically to prevent or treat human atherosclerosis....

  19. Inferring gene regression networks with model trees

    Aguilar-Ruiz Jesus S


    Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear

  20. Neutrosophic Correlation and Simple Linear Regression

    A. A. Salama


    Full Text Available Since the world is full of indeterminacy, the neutrosophics found their place into contemporary research. The fundamental concepts of neutrosophic set, introduced by Smarandache. Recently, Salama et al., introduced the concept of correlation coefficient of neutrosophic data. In this paper, we introduce and study the concepts of correlation and correlation coefficient of neutrosophic data in probability spaces and study some of their properties. Also, we introduce and study the neutrosophic simple linear regression model. Possible applications to data processing are touched upon.

  1. Paraneoplastic pemphigus regression after thymoma resection

    Stergiou Eleni


    Full Text Available Abstract Background Among human neoplasms thymomas are associated with highest frequency with paraneoplastic autoimmune diseases. Case presentation A case of a 42-year-old woman with paraneoplastic pemphigus as the first manifestation of thymoma is reported. Transsternal complete thymoma resection achieved pemphigus regression. The clinical correlations between pemphigus and thymoma are presented. Conclusion Our case report provides further evidence for the important role of autoantibodies in the pathogenesis of paraneoplastic skin diseases in thymoma patients. It also documents the improvement of the associated pemphigus after radical treatment of the thymoma.

  2. Spectral density regression for bivariate extremes

    Castro Camilo, Daniela


    We introduce a density regression model for the spectral density of a bivariate extreme value distribution, that allows us to assess how extremal dependence can change over a covariate. Inference is performed through a double kernel estimator, which can be seen as an extension of the Nadaraya–Watson estimator where the usual scalar responses are replaced by mean constrained densities on the unit interval. Numerical experiments with the methods illustrate their resilience in a variety of contexts of practical interest. An extreme temperature dataset is used to illustrate our methods. © 2016 Springer-Verlag Berlin Heidelberg

  3. Stability Analysis for Regularized Least Squares Regression

    Rudin, Cynthia


    We discuss stability for a class of learning algorithms with respect to noisy labels. The algorithms we consider are for regression, and they involve the minimization of regularized risk functionals, such as L(f) := 1/N sum_i (f(x_i)-y_i)^2+ lambda ||f||_H^2. We shall call the algorithm `stable' if, when y_i is a noisy version of f*(x_i) for some function f* in H, the output of the algorithm converges to f* as the regularization term and noise simultaneously vanish. We consider two flavors of...

  4. Parametric Regression Models Using Reversed Hazard Rates

    Asokan Mulayath Variyath


    Full Text Available Proportional hazard regression models are widely used in survival analysis to understand and exploit the relationship between survival time and covariates. For left censored survival times, reversed hazard rate functions are more appropriate. In this paper, we develop a parametric proportional hazard rates model using an inverted Weibull distribution. The estimation and construction of confidence intervals for the parameters are discussed. We assess the performance of the proposed procedure based on a large number of Monte Carlo simulations. We illustrate the proposed method using a real case example.

  5. Bayesian regression of piecewise homogeneous Poisson processes

    Diego Sevilla


    Full Text Available In this paper, a Bayesian method for piecewise regression is adapted to handle counting processes data distributed as Poisson. A numerical code in Mathematica is developed and tested analyzing simulated data. The resulting method is valuable for detecting breaking points in the count rate of time series for Poisson processes. Received: 2 November 2015, Accepted: 27 November 2015; Edited by: R. Dickman; Reviewed by: M. Hutter, Australian National University, Canberra, Australia.; DOI: Cite as: D J R Sevilla, Papers in Physics 7, 070018 (2015

  6. Remission and regression of diabetic nephropathy

    Hovind, Peter; Tarnow, Lise; Parving, Hans-Henrik


    diabetic patients with overt nephropathy, remission (decrease in albuminuria to diabetic nephropathy (rate of decline in GFR treatment. Furthermore, remission of nephrotic-range albuminuria......Diabetic kidney disease is considered to be an irreversible and inexorable progressive disease. Therefore, prevention of development of ESRD is extremely important. Animal studies have demonstrated that regression of existing renal morphologic lesions is feasible. In a sizable fraction of type 1...... in diabetic patients, an aggressive multifactorial approach, aiming at lowering blood pressure and albuminuria, and improving glycemic control, must be applied....

  7. SDE based regression for random PDEs

    Bayer, Christian


    A simulation based method for the numerical solution of PDE with random coefficients is presented. By the Feynman-Kac formula, the solution can be represented as conditional expectation of a functional of a corresponding stochastic differential equation driven by independent noise. A time discretization of the SDE for a set of points in the domain and a subsequent Monte Carlo regression lead to an approximation of the global solution of the random PDE. We provide an initial error and complexity analysis of the proposed method along with numerical examples illustrating its behaviour.

  8. Mapping geogenic radon potential by regression kriging

    Pásztor, László [Institute for Soil Sciences and Agricultural Chemistry, Centre for Agricultural Research, Hungarian Academy of Sciences, Department of Environmental Informatics, Herman Ottó út 15, 1022 Budapest (Hungary); Szabó, Katalin Zsuzsanna, E-mail: [Department of Chemistry, Institute of Environmental Science, Szent István University, Páter Károly u. 1, Gödöllő 2100 (Hungary); Szatmári, Gábor; Laborczi, Annamária [Institute for Soil Sciences and Agricultural Chemistry, Centre for Agricultural Research, Hungarian Academy of Sciences, Department of Environmental Informatics, Herman Ottó út 15, 1022 Budapest (Hungary); Horváth, Ákos [Department of Atomic Physics, Eötvös University, Pázmány Péter sétány 1/A, 1117 Budapest (Hungary)


    Radon ({sup 222}Rn) gas is produced in the radioactive decay chain of uranium ({sup 238}U) which is an element that is naturally present in soils. Radon is transported mainly by diffusion and convection mechanisms through the soil depending mainly on the physical and meteorological parameters of the soil and can enter and accumulate in buildings. Health risks originating from indoor radon concentration can be attributed to natural factors and is characterized by geogenic radon potential (GRP). Identification of areas with high health risks require spatial modeling, that is, mapping of radon risk. In addition to geology and meteorology, physical soil properties play a significant role in the determination of GRP. In order to compile a reliable GRP map for a model area in Central-Hungary, spatial auxiliary information representing GRP forming environmental factors were taken into account to support the spatial inference of the locally measured GRP values. Since the number of measured sites was limited, efficient spatial prediction methodologies were searched for to construct a reliable map for a larger area. Regression kriging (RK) was applied for the interpolation using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly, the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Overall accuracy of the map was tested by Leave-One-Out Cross-Validation. Furthermore the spatial reliability of the resultant map is also estimated by the calculation of the 90% prediction interval of the local prediction values. The applicability of the applied method as well as that of the map is discussed briefly. - Highlights: • A new method

  9. Fixed kernel regression for voltammogram feature extraction

    Acevedo Rodriguez, F. J.; López-Sastre, R. J.; Gil-Jiménez, P.; Ruiz-Reyes, N.; Maldonado Bascón, S.


    Cyclic voltammetry is an electroanalytical technique for obtaining information about substances under analysis without the need for complex flow systems. However, classifying the information in voltammograms obtained using this technique is difficult. In this paper, we propose the use of fixed kernel regression as a method for extracting features from these voltammograms, reducing the information to a few coefficients. The proposed approach has been applied to a wine classification problem with accuracy rates of over 98%. Although the method is described here for extracting voltammogram information, it can be used for other types of signals.

  10. Bayesian model selection in Gaussian regression

    Abramovich, Felix


    We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting estimator. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for "nearly-orthogonal" and "multicollinear" designs.

  11. Quantile regression modeling for Malaysian automobile insurance premium data

    Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz


    Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.

  12. Bayesian nonlinear regression for large small problems

    Chakraborty, Sounak


    Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik\\'s ε-insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our RVM and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models. © 2012 Elsevier Inc.

  13. Estimation of pyrethroid pesticide intake using regression ...

    Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation of pesticide intakes for a defined demographic community, and (2) comparison of dietary pesticide intakes between the composite and individual samples. Extant databases were useful for assigning individual samples to composites, but they could not provide the breadth of information needed to facilitate measurable levels in every composite. Composite sample measurements were found to be good predictors of pyrethroid pesticide levels in their individual sample constituents where sufficient measurements are available above the method detection limit. Statistical inference shows little evidence of differences between individual and composite measurements and suggests that regression modeling of food groups based on composite dietary samples may provide an effective tool for estimating dietary pesticide intake for a defined population. The research presented in the journal article will improve community's ability to determine exposures through the dietary route with a less burdensome and costly method.

  14. Revisit of Sheppard corrections in linear regression


    Dempster and Rubin(D&R) in their JRSSB paper considered the statistical error caused by data rounding in a linear regression model and compared the Sheppard correction,BRB correction and the ordinary LSE by simulations.Some asymptotic results when the rounding scale tends to 0 were also presented.In a previous research,we found that the ordinary sample variance of rounded data from normal populations is always inconsistent while the sample mean of rounded data is consistent if and only if the true mean is a multiple of the half rounding scale.In the light of these results,in this paper we further investigate the rounding errors in linear regressions.We notice that these results form the basic reasons that the Sheppard corrections perform better than other methods in D&R examples and their conclusion in general cases is incorrect.Examples in which the Sheppard correction works worse than the BRB correction are also given.Furthermore,we propose a new approach to estimate the parameters,called "two-stage estimator",and establish the consistency and asymptotic normality of the new estimators.

  15. Regression Models for Count Data in R

    Christian Kleiber


    Full Text Available The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zero-inflated regression models in the functions hurdle( and zeroinfl( from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zero-inflated model, are able to incorporate over-dispersion and excess zeros-two problems that typically occur in count data sets in economics and the social sciences—better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be fitted, inspected and tested in practice.

  16. Regression Models For Saffron Yields in Iran

    S. H, Sanaeinejad; S. N, Hosseini

    Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.

  17. Incremental Support Vector Learning for Ordinal Regression.

    Gu, Bin; Sheng, Victor S; Tay, Keng Yeow; Romano, Walter; Li, Shuo


    Support vector ordinal regression (SVOR) is a popular method to tackle ordinal regression problems. However, until now there were no effective algorithms proposed to address incremental SVOR learning due to the complicated formulations of SVOR. Recently, an interesting accurate on-line algorithm was proposed for training ν -support vector classification (ν-SVC), which can handle a quadratic formulation with a pair of equality constraints. In this paper, we first present a modified SVOR formulation based on a sum-of-margins strategy. The formulation has multiple constraints, and each constraint includes a mixture of an equality and an inequality. Then, we extend the accurate on-line ν-SVC algorithm to the modified formulation, and propose an effective incremental SVOR algorithm. The algorithm can handle a quadratic formulation with multiple constraints, where each constraint is constituted of an equality and an inequality. More importantly, it tackles the conflicts between the equality and inequality constraints. We also provide the finite convergence analysis for the algorithm. Numerical experiments on the several benchmark and real-world data sets show that the incremental algorithm can converge to the optimal solution in a finite number of steps, and is faster than the existing batch and incremental SVOR algorithms. Meanwhile, the modified formulation has better accuracy than the existing incremental SVOR algorithm, and is as accurate as the sum-of-margins based formulation of Shashua and Levin.

  18. Bayesian multimodel inference for geostatistical regression models.

    Devin S Johnson

    Full Text Available The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs. The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC. The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.

  19. Supporting Regularized Logistic Regression Privately and Efficiently.

    Wenfa Li

    Full Text Available As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.

  20. Supporting Regularized Logistic Regression Privately and Efficiently.

    Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei


    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.

  1. Variable Selection in Logistic Regression Mo del

    ZHANG Shangli; ZHANG Lili; QIU Kuanmin; LU Ying; CAI Baigen


    Variable selection is one of the most impor-tant problems in pattern recognition. In linear regression model, there are many methods can solve this problem, such as Least absolute shrinkage and selection operator (LASSO) and many improved LASSO methods, but there are few variable selection methods in generalized linear models. We study the variable selection problem in logis-tic regression model. We propose a new variable selection method–the logistic elastic net, prove that it has grouping eff ect which means that the strongly correlated predictors tend to be in or out of the model together. The logistic elastic net is particularly useful when the number of pre-dictors (p) is much bigger than the number of observations (n). By contrast, the LASSO is not a very satisfactory vari-able selection method in the case when p is more larger than n. The advantage and eff ectiveness of this method are demonstrated by real leukemia data and a simulation study.

  2. Scientific Progress or Regress in Sports Physiology?

    Böning, Dieter


    In modern societies there is strong belief in scientific progress, but, unfortunately, a parallel partial regress occurs because of often avoidable mistakes. Mistakes are mainly forgetting, erroneous theories, errors in experiments and manuscripts, prejudice, selected publication of "positive" results, and fraud. An example of forgetting is that methods introduced decades ago are used without knowing the underlying theories: Basic articles are no longer read or cited. This omission may cause incorrect interpretation of results. For instance, false use of actual base excess instead of standard base excess for calculation of the number of hydrogen ions leaving the muscles raised the idea that an unknown fixed acid is produced in addition to lactic acid during exercise. An erroneous theory led to the conclusion that lactate is not the anion of a strong acid but a buffer. Mistakes occur after incorrect application of a method, after exclusion of unwelcome values, during evaluation of measurements by false calculations, or during preparation of manuscripts. Co-authors, as well as reviewers, do not always carefully read papers before publication. Peer reviewers might be biased against a hypothesis or an author. A general problem is selected publication of positive results. An example of fraud in sports medicine is the presence of doped subjects in groups of investigated athletes. To reduce regress, it is important that investigators search both original and recent articles on a topic and conscientiously examine the data. All co-authors and reviewers should read the text thoroughly and inspect all tables and figures in a manuscript.

  3. Downscaling Wind Forecasts via Clustering and Regression

    Lee, H. S.; Zhang, Y.; Liu, Y.; Wu, L.; He, Y.; Schaake, J. C.


    Wind is an important weather variable and a key determinant of evaporation, snowfall and coastal flooding. At present, wind information from medium-range weather forecast is of limited accuracy, and the associated resolution is often too coarse to be used directly for hydrologic prediction purposes. This work presents a statistical post-processing framework that will be used to generate fine-scale wind products to serve the NOAA's National Water Model effort. The prototype of this framework consists of two components: a) a cluster analysis module that classifies Automated Surface Observing System (ASOS) stations into multiple groups based on elevation and/or surface roughness lengths derived from National Land Cover Database 2011 (NLCD2011), and b) a regression module based on the Heteroscedastic Extended Logistic Regression (HXLR) technique that statistically downscales GEFS wind hindcasts to the location of the closest station within each identified cluster. The efficacy of the framework is assessed for a region that is roughly the service area of NOAA's Middle Atlantic River Forecast Center (MARFC). For this region, wind hindcasts generated from Global Ensemble Forecast System (GEFS) are downscaled and corrected using digital elevation model and National Land Cover Database; observations from ASOS serve both as the predictands for establishing the relationship, and as the reference for validation. Our results showed that this framework considerably enhance the quality of wind forecast, with Nash-Sutcliffe efficiency of the downscaled wind speed improved by 0.2 - 0.4 relative to raw GEFS forecast.

  4. Supporting Regularized Logistic Regression Privately and Efficiently

    Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei


    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738

  5. A reconsideration of the concept of regression.

    Dowling, A Scott


    Regression has been a useful psychoanalytic concept, linking present mental functioning with past experiences and levels of functioning. The concept originated as an extension of the evolutionary zeitgeist of the day as enunciated by H. Spencer and H. Jackson and applied by Freud to psychological phenomena. The value system implicit in the contrast of evolution/progression vs dissolution/regression has given rise to unfortunate and powerful assumptions of social, cultural, developmental and individual value as embodied in notions of "higher," "lower;" "primitive," "mature," "archaic," and "advanced." The unhelpful results of these assumptions are evident, for example, in attitudes concerning cultural, sexual, and social "correctness, " same-sex object choice, and goals of treatment. An alternative, a continuously constructed, continuously emerging mental life, in analogy to the ever changing, continuous physical body, is suggested. This view retains the fundamentals of psychoanalysis, for example, unconscious mental life, drive, defense, and psychic structure, but stresses a functional, ever changing, present oriented understanding of mental life as contrasted with a static, onion-layered view.

  6. Shape regression for vertebra fracture quantification

    Lund, Michael Tillge; de Bruijne, Marleen; Tanko, Laszlo B.; Nielsen, Mads


    Accurate and reliable identification and quantification of vertebral fractures constitute a challenge both in clinical trials and in diagnosis of osteoporosis. Various efforts have been made to develop reliable, objective, and reproducible methods for assessing vertebral fractures, but at present there is no consensus concerning a universally accepted diagnostic definition of vertebral fractures. In this project we want to investigate whether or not it is possible to accurately reconstruct the shape of a normal vertebra, using a neighbouring vertebra as prior information. The reconstructed shape can then be used to develop a novel vertebra fracture measure, by comparing the segmented vertebra shape with its reconstructed normal shape. The vertebrae in lateral x-rays of the lumbar spine were manually annotated by a medical expert. With this dataset we built a shape model, with equidistant point distribution between the four corner points. Based on the shape model, a multiple linear regression model of a normal vertebra shape was developed for each dataset using leave-one-out cross-validation. The reconstructed shape was calculated for each dataset using these regression models. The average prediction error for the annotated shape was on average 3%.

  7. Jackknife bias reduction for polychotomous logistic regression.

    Bull, S B; Greenwood, C M; Hauck, W W


    Despite theoretical and empirical evidence that the usual MLEs can be misleading in finite samples and some evidence that bias reduced estimates are less biased and more efficient, they have not seen a wide application in practice. One can obtain bias reduced estimates by jackknife methods, with or without full iteration, or by use of higher order terms in a Taylor series expansion of the log-likelihood to approximate asymptotic bias. We provide details of these methods for polychotomous logistic regression with a nominal categorical response. We conducted a Monte Carlo comparison of the jackknife and Taylor series estimates in moderate sample sizes in a general logistic regression setting, to investigate dichotomous and trichotomous responses and a mixture of correlated and uncorrelated binary and normal covariates. We found an approximate two-step jackknife and the Taylor series methods useful when the ratio of the number of observations to the number of parameters is greater than 15, but we cannot recommend the two-step and the fully iterated jackknife estimates when this ratio is less than 20, especially when there are large effects, binary covariates, or multicollinearity in the covariates.

  8. Heterogeneity in drug abuse among juvenile offenders: is mixture regression more informative than standard regression?

    Montgomery, Katherine L; Vaughn, Michael G; Thompson, Sanna J; Howard, Matthew O


    Research on juvenile offenders has largely treated this population as a homogeneous group. However, recent findings suggest that this at-risk population may be considerably more heterogeneous than previously believed. This study compared mixture regression analyses with standard regression techniques in an effort to explain how known factors such as distress, trauma, and personality are associated with drug abuse among juvenile offenders. Researchers recruited 728 juvenile offenders from Missouri juvenile correctional facilities for participation in this study. Researchers investigated past-year substance use in relation to the following variables: demographic characteristics (gender, ethnicity, age, familial use of public assistance), antisocial behavior, and mental illness symptoms (psychopathic traits, psychiatric distress, and prior trauma). Results indicated that standard and mixed regression approaches identified significant variables related to past-year substance use among this population; however, the mixture regression methods provided greater specificity in results. Mixture regression analytic methods may help policy makers and practitioners better understand and intervene with the substance-related subgroups of juvenile offenders.

  9. Introduction to the use of regression models in epidemiology.

    Bender, Ralf


    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  10. Adaptive Rank Penalized Estimators in Multivariate Regression

    Bunea, Florentina; Wegkamp, Marten


    We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix. The consistency results are valid not only in the classic asymptotic regime, when the number of responses $n$ and predictors $p$ stays bounded, and the number of observations $m$ grows, but also when either, or both, $n$ and $p$ grow, possibly much faster than $m$. Our finite sample prediction and estimation performance bounds show that the RSC estimator achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly ...

  11. Macrophages, Dendritic Cells, and Regression of Atherosclerosis

    Jonathan E. Feig


    Full Text Available Atherosclerosis is the number one cause of death in the Western world. It results from the interaction between modified lipoproteins and monocyte-derived cells such as macrophages, dendritic cells, T cells, and other cellular elements of the arterial wall. This inflammatory process can ultimately lead to the development of complex lesions, or plaques, that protrude into the arterial lumen. Ultimately, plaque rupture and thrombosis can occur leading to the clinical complications of myocardial infarction or stroke. Although each of the cell types plays roles in the pathogenesis of atherosclerosis, in this review, the focus will be primarily on the monocyte derived cells- macrophages and dendritic cells. The roles of these cell types in atherogenesis will be highlighted. Finally, the mechanisms of atherosclerosis regression as it relates to these cells will be discussed.

  12. Levetiracetam-induced reversible autistic regression.

    Camacho, Ana; Espín, José Carlos; Nuñez, Noemí; Simón, Rogelio


    Levetiracetam is a commonly prescribed antiepileptic drug, and is generally well tolerated, but can eventually cause behavioral disturbances. These disturbances seem more frequent in children and in patients with a previous psychiatric history. We report on reversible autistic regression induced by levetiracetam in a 6-year-old girl with spastic cerebral palsy, mild cognitive deficiency, and focal epilepsy. She was diagnosed with pervasive developmental disorder, and demonstrated mild to moderate impairment in pragmatic language and interactions with peers. After the introduction of levetiracetam, she developed stereotypies, and her social and communicative skills deteriorated severely. She also exhibited mood lability. When the medication was discontinued, a dramatic response occurred, with a complete resolution of new abnormal findings. Levetiracetam can provoke unusual behavioral adverse effects in certain patients who are biologically more vulnerable. Copyright © 2012 Elsevier Inc. All rights reserved.

  13. Robust Mediation Analysis Based on Median Regression

    Yuan, Ying; MacKinnon, David P.


    Mediation analysis has many applications in psychology and the social sciences. The most prevalent methods typically assume that the error distribution is normal and homoscedastic. However, this assumption may rarely be met in practice, which can affect the validity of the mediation analysis. To address this problem, we propose robust mediation analysis based on median regression. Our approach is robust to various departures from the assumption of homoscedasticity and normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions. Simulation studies show that under these circumstances, the proposed method is more efficient and powerful than standard mediation analysis. We further extend the proposed robust method to multilevel mediation analysis, and demonstrate through simulation studies that the new approach outperforms the standard multilevel mediation analysis. We illustrate the proposed method using data from a program designed to increase reemployment and enhance mental health of job seekers. PMID:24079925

  14. Model Selection in Kernel Ridge Regression

    Exterkate, Peter

    Kernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular kernels......, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. We interpret the latter two kernels in terms of their smoothing properties, and we relate the tuning parameters associated to all these kernels to smoothness measures of the prediction function and to the signal-to-noise ratio. Based...... on these interpretations, we provide guidelines for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study confirms the practical usefulness of these rules of thumb. Finally, the flexible and smooth functional forms provided by the Gaussian and Sinc kernels makes them widely...

  15. Logistic regression against a divergent Bayesian network

    Noel Antonio Sánchez Trujillo


    Full Text Available This article is a discussion about two statistical tools used for prediction and causality assessment: logistic regression and Bayesian networks. Using data of a simulated example from a study assessing factors that might predict pulmonary emphysema (where fingertip pigmentation and smoking are considered; we posed the following questions. Is pigmentation a confounding, causal or predictive factor? Is there perhaps another factor, like smoking, that confounds? Is there a synergy between pigmentation and smoking? The results, in terms of prediction, are similar with the two techniques; regarding causation, differences arise. We conclude that, in decision-making, the sum of both: a statistical tool, used with common sense, and previous evidence, taking years or even centuries to develop; is better than the automatic and exclusive use of statistical resources.

  16. Remaining Phosphorus Estimate Through Multiple Regression Analysis



    The remaining phosphorus (Prem), P concentration that remains in solution after shaking soil with 0.01 mol L-1 CaCl2 containing 60 μg mL-1 P, is a very useful index for studies related to the chemistry of variable charge soils. Although the Prem determination is a simple procedure, the possibility of estimating accurate values of this index from easily and/or routinely determined soil properties can be very useful for practical purposes. The present research evaluated the Premestimation through multiple regression analysis in which routinely determined soil chemical data, soil clay content and soil pH measured in 1 mol L-1 NaF (pHNaF) figured as Prem predictor variables. The Prem can be estimated with acceptable accuracy using the above-mentioned approach, and PHNaF not only substitutes for clay content as a predictor variable but also confers more accuracy to the Prem estimates.

  17. Nonparametric additive regression for repeatedly measured data

    Carroll, R. J.


    We develop an easily computed smooth backfitting algorithm for additive model fitting in repeated measures problems. Our methodology easily copes with various settings, such as when some covariates are the same over repeated response measurements. We allow for a working covariance matrix for the regression errors, showing that our method is most efficient when the correct covariance matrix is used. The component functions achieve the known asymptotic variance lower bound for the scalar argument case. Smooth backfitting also leads directly to design-independent biases in the local linear case. Simulations show our estimator has smaller variance than the usual kernel estimator. This is also illustrated by an example from nutritional epidemiology. © 2009 Biometrika Trust.

  18. Estimates on compressed neural networks regression.

    Zhang, Yongquan; Li, Youmei; Sun, Jianyong; Ji, Jiabing


    When the neural element number n of neural networks is larger than the sample size m, the overfitting problem arises since there are more parameters than actual data (more variable than constraints). In order to overcome the overfitting problem, we propose to reduce the number of neural elements by using compressed projection A which does not need to satisfy the condition of Restricted Isometric Property (RIP). By applying probability inequalities and approximation properties of the feedforward neural networks (FNNs), we prove that solving the FNNs regression learning algorithm in the compressed domain instead of the original domain reduces the sample error at the price of an increased (but controlled) approximation error, where the covering number theory is used to estimate the excess error, and an upper bound of the excess error is given.

  19. Adaptive regression for modeling nonlinear relationships

    Knafl, George J


    This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...

  20. Subgroup finding via Bayesian additive regression trees.

    Sivaganesan, Siva; Müller, Peter; Huang, Bin


    We provide a Bayesian decision theoretic approach to finding subgroups that have elevated treatment effects. Our approach separates the modeling of the response variable from the task of subgroup finding and allows a flexible modeling of the response variable irrespective of potential subgroups of interest. We use Bayesian additive regression trees to model the response variable and use a utility function defined in terms of a candidate subgroup and the predicted response for that subgroup. Subgroups are identified by maximizing the expected utility where the expectation is taken with respect to the posterior predictive distribution of the response, and the maximization is carried out over an a priori specified set of candidate subgroups. Our approach allows subgroups based on both quantitative and categorical covariates. We illustrate the approach using simulated data set study and a real data set. Copyright © 2017 John Wiley & Sons, Ltd.

  1. Least square regularized regression in sum space.

    Xu, Yong-Li; Chen, Di-Rong; Li, Han-Xiong; Liu, Lu


    This paper proposes a least square regularized regression algorithm in sum space of reproducing kernel Hilbert spaces (RKHSs) for nonflat function approximation, and obtains the solution of the algorithm by solving a system of linear equations. This algorithm can approximate the low- and high-frequency component of the target function with large and small scale kernels, respectively. The convergence and learning rate are analyzed. We measure the complexity of the sum space by its covering number and demonstrate that the covering number can be bounded by the product of the covering numbers of basic RKHSs. For sum space of RKHSs with Gaussian kernels, by choosing appropriate parameters, we tradeoff the sample error and regularization error, and obtain a polynomial learning rate, which is better than that in any single RKHS. The utility of this method is illustrated with two simulated data sets and five real-life databases.

  2. Entrepreneurial intention modeling using hierarchical multiple regression

    Marina Jeger


    Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.

  3. Congenital osteofibrous dysplasia Campanacci: spontaneous postbioptic regression.

    Jobke, Björn; Bohndorf, Klaus; Vieth, Volker; Werner, Mathias


    Osteofibrous dysplasia Campanacci is a rare benign bone tumor most frequently observed in young childhood. The exclusive localization in the tibia is very characteristic. The incidence of congenital primary bone tumors is an absolute rarity. We report a case of a newborn with a histologically proven osteofibrous dysplasia Campanacci at the tibia presenting a regular radiographic follow-up. After a small open biopsy and spontaneous minor fracture, the lesion rapidly remodeled within 1½ months and almost completely regressed with restutio ad integrum. Surgical intervention in this tumor entity at childhood age has been shown to have a high recurrence rate but due to lack of experience with newborns, guidelines do not exist. We analyze the radiologic and histologic differential diagnosis of juvenile adamantinoma and emphasize that congenital peripheral bone tumors should be treated conservatively when malignancy is excluded.

  4. Early development and regression in Rett syndrome.

    Lee, J Y L; Leonard, H; Piek, J P; Downs, J


    This study utilized developmental profiling to examine symptoms in 14 girls with genetically confirmed Rett syndrome and whose families were participating in the Australian Rett syndrome or InterRett database. Regression was mostly characterized by loss of hand and/or communication skills (13/14) except one girl demonstrated slowing of skill development. Social withdrawal and inconsolable crying often developed simultaneously (9/14), with social withdrawal for shorter duration than inconsolable crying. Previously acquired gross motor skills declined in just over half of the sample (8/14), mostly observed as a loss of balance. Early abnormalities such as vomiting and strabismus were also seen. Our findings provide additional insight into the early clinical profile of Rett syndrome.

  5. Statistical learning from a regression perspective

    Berk, Richard A


    This textbook considers statistical learning applications when interest centers on the conditional distribution of the response variable, given a set of predictors, and when it is important to characterize how the predictors are related to the response. As a first approximation, this can be seen as an extension of nonparametric regression. This fully revised new edition includes important developments over the past 8 years. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis derives from sound data collection, intelligent data management, appropriate statistical procedures, and an accessible interpretation of results. A continued emphasis on the implications for practice runs through the text. Among the statistical learning procedures examined are bagging, random forests, boosting, support vector machines and neural networks. Response variables may be quantitative or categorical. As in the first edition, a unifying theme is supervised learning that can be trea...

  6. Logistic regression applied to natural hazards: rare event logistic regression with replications

    M. Guns


    Full Text Available Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  7. Multicollinearity and correlation among local regression coefficients in geographically weighted regression

    Wheeler, David; Tiefelsdorf, Michael


    Present methodological research on geographically weighted regression (GWR) focuses primarily on extensions of the basic GWR model, while ignoring well-established diagnostics tests commonly used in standard global regression analysis. This paper investigates multicollinearity issues surrounding the local GWR coefficients at a single location and the overall correlation between GWR coefficients associated with two different exogenous variables. Results indicate that the local regression coefficients are potentially collinear even if the underlying exogenous variables in the data generating process are uncorrelated. Based on these findings, applied GWR research should practice caution in substantively interpreting the spatial patterns of local GWR coefficients. An empirical disease-mapping example is used to motivate the GWR multicollinearity problem. Controlled experiments are performed to systematically explore coefficient dependency issues in GWR. These experiments specify global models that use eigenvectors from a spatial link matrix as exogenous variables.

  8. Learning Inverse Rig Mappings by Nonlinear Regression.

    Holden, Daniel; Saito, Jun; Komura, Taku


    We present a framework to design inverse rig-functions - functions that map low level representations of a character's pose such as joint positions or surface geometry to the representation used by animators called the animation rig. Animators design scenes using an animation rig, a framework widely adopted in animation production which allows animators to design character poses and geometry via intuitive parameters and interfaces. Yet most state-of-the-art computer animation techniques control characters through raw, low level representations such as joint angles, joint positions, or vertex coordinates. This difference often stops the adoption of state-of-the-art techniques in animation production. Our framework solves this issue by learning a mapping between the low level representations of the pose and the animation rig. We use nonlinear regression techniques, learning from example animation sequences designed by the animators. When new motions are provided in the skeleton space, the learned mapping is used to estimate the rig controls that reproduce such a motion. We introduce two nonlinear functions for producing such a mapping: Gaussian process regression and feedforward neural networks. The appropriate solution depends on the nature of the rig and the amount of data available for training. We show our framework applied to various examples including articulated biped characters, quadruped characters, facial animation rigs, and deformable characters. With our system, animators have the freedom to apply any motion synthesis algorithm to arbitrary rigging and animation pipelines for immediate editing. This greatly improves the productivity of 3D animation, while retaining the flexibility and creativity of artistic input.

  9. Collaborative regression-based anatomical landmark detection

    Gao, Yaozong; Shen, Dinggang


    Anatomical landmark detection plays an important role in medical image analysis, e.g. for registration, segmentation and quantitative analysis. Among the various existing methods for landmark detection, regression-based methods have recently attracted much attention due to their robustness and efficiency. In these methods, landmarks are localised through voting from all image voxels, which is completely different from the classification-based methods that use voxel-wise classification to detect landmarks. Despite their robustness, the accuracy of regression-based landmark detection methods is often limited due to (1) the inclusion of uninformative image voxels in the voting procedure, and (2) the lack of effective ways to incorporate inter-landmark spatial dependency into the detection step. In this paper, we propose a collaborative landmark detection framework to address these limitations. The concept of collaboration is reflected in two aspects. (1) Multi-resolution collaboration. A multi-resolution strategy is proposed to hierarchically localise landmarks by gradually excluding uninformative votes from faraway voxels. Moreover, for informative voxels near the landmark, a spherical sampling strategy is also designed at the training stage to improve their prediction accuracy. (2) Inter-landmark collaboration. A confidence-based landmark detection strategy is proposed to improve the detection accuracy of ‘difficult-to-detect’ landmarks by using spatial guidance from ‘easy-to-detect’ landmarks. To evaluate our method, we conducted experiments extensively on three datasets for detecting prostate landmarks and head & neck landmarks in computed tomography images, and also dental landmarks in cone beam computed tomography images. The results show the effectiveness of our collaborative landmark detection framework in improving landmark detection accuracy, compared to other state-of-the-art methods.

  10. Logistic Regression Applied to Seismic Discrimination

    BG Amindan; DN Hagedorn


    The usefulness of logistic discrimination was examined in an effort to learn how it performs in a regional seismic setting. Logistic discrimination provides an easily understood method, works with user-defined models and few assumptions about the population distributions, and handles both continuous and discrete data. Seismic event measurements from a data set compiled by Los Alamos National Laboratory (LANL) of Chinese events recorded at station WMQ were used in this demonstration study. PNNL applied logistic regression techniques to the data. All possible combinations of the Lg and Pg measurements were tried, and a best-fit logistic model was created. The best combination of Lg and Pg frequencies for predicting the source of a seismic event (earthquake or explosion) used Lg{sub 3.0-6.0} and Pg{sub 3.0-6.0} as the predictor variables. A cross-validation test was run, which showed that this model was able to correctly predict 99.7% earthquakes and 98.0% explosions for this given data set. Two other models were identified that used Pg and Lg measurements from the 1.5 to 3.0 Hz frequency range. Although these other models did a good job of correctly predicting the earthquakes, they were not as effective at predicting the explosions. Two possible biases were discovered which affect the predicted probabilities for each outcome. The first bias was due to this being a case-controlled study. The sampling fractions caused a bias in the probabilities that were calculated using the models. The second bias is caused by a change in the proportions for each event. If at a later date the proportions (a priori probabilities) of explosions versus earthquakes change, this would cause a bias in the predicted probability for an event. When using logistic regression, the user needs to be aware of the possible biases and what affect they will have on the predicted probabilities.

  11. A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

    Smith, Paul F; Ganesh, Siva; Liu, Ping


    Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.

  12. Support Vector Regression Model Based on Empirical Mode Decomposition and Auto Regression for Electric Load Forecasting

    Hong-Juan Li


    Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.

  13. Ridge regression estimator: combining unbiased and ordinary ridge regression methods of estimation

    Sharad Damodar Gore


    Full Text Available Statistical literature has several methods for coping with multicollinearity. This paper introduces a new shrinkage estimator, called modified unbiased ridge (MUR. This estimator is obtained from unbiased ridge regression (URR in the same way that ordinary ridge regression (ORR is obtained from ordinary least squares (OLS. Properties of MUR are derived. Results on its matrix mean squared error (MMSE are obtained. MUR is compared with ORR and URR in terms of MMSE. These results are illustrated with an example based on data generated by Hoerl and Kennard (1975.

  14. Spatial vulnerability assessments by regression kriging

    Pásztor, László; Laborczi, Annamária; Takács, Katalin; Szatmári, Gábor


    information representing IEW or GRP forming environmental factors were taken into account to support the spatial inference of the locally experienced IEW frequency and measured GRP values respectively. An efficient spatial prediction methodology was applied to construct reliable maps, namely regression kriging (RK) using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Application of RK also provides the possibility of inherent accuracy assessment. The resulting maps are characterized by global and local measures of its accuracy. Additionally the method enables interval estimation for spatial extension of the areas of predefined risk categories. All of these outputs provide useful contribution to spatial planning, action planning and decision making. Acknowledgement: Our work was partly supported by the Hungarian National Scientific Research Foundation (OTKA, Grant No. K105167).

  15. Deep Wavelet Scattering for Quantum Energy Regression

    Hirn, Matthew

    Physical functionals are usually computed as solutions of variational problems or from solutions of partial differential equations, which may require huge computations for complex systems. Quantum chemistry calculations of ground state molecular energies is such an example. Indeed, if x is a quantum molecular state, then the ground state energy E0 (x) is the minimum eigenvalue solution of the time independent Schrödinger Equation, which is computationally intensive for large systems. Machine learning algorithms do not simulate the physical system but estimate solutions by interpolating values provided by a training set of known examples {(xi ,E0 (xi) } i physical invariants. Linear regressions of E0 over a dictionary Φ ={ϕk } k compute an approximation E 0 as: E 0 (x) =∑kwkϕk (x) , where the weights {wk } k are selected to minimize the error between E0 and E 0 on the training set. The key to such a regression approach then lies in the design of the dictionary Φ. It must be intricate enough to capture the essential variability of E0 (x) over the molecular states x of interest, while simple enough so that evaluation of Φ (x) is significantly less intensive than a direct quantum mechanical computation (or approximation) of E0 (x) . In this talk we present a novel dictionary Φ for the regression of quantum mechanical energies based on the scattering transform of an intermediate, approximate electron density representation ρx of the state x. The scattering transform has the architecture of a deep convolutional network, composed of an alternating sequence of linear filters and nonlinear maps. Whereas in many deep learning tasks the linear filters are learned from the training data, here the physical properties of E0 (invariance to isometric transformations of the state x, stable to deformations of x) are leveraged to design a collection of linear filters ρx *ψλ for an appropriate wavelet ψ. These linear filters are composed with the nonlinear modulus

  16. Least-Squares Linear Regression and Schrodinger's Cat: Perspectives on the Analysis of Regression Residuals.

    Hecht, Jeffrey B.

    The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…

  17. Estimating Loess Plateau Average Annual Precipitation with Multiple Linear Regression Kriging and Geographically Weighted Regression Kriging

    Qiutong Jin


    Full Text Available Estimating the spatial distribution of precipitation is an important and challenging task in hydrology, climatology, ecology, and environmental science. In order to generate a highly accurate distribution map of average annual precipitation for the Loess Plateau in China, multiple linear regression Kriging (MLRK and geographically weighted regression Kriging (GWRK methods were employed using precipitation data from the period 1980–2010 from 435 meteorological stations. The predictors in regression Kriging were selected by stepwise regression analysis from many auxiliary environmental factors, such as elevation (DEM, normalized difference vegetation index (NDVI, solar radiation, slope, and aspect. All predictor distribution maps had a 500 m spatial resolution. Validation precipitation data from 130 hydrometeorological stations were used to assess the prediction accuracies of the MLRK and GWRK approaches. Results showed that both prediction maps with a 500 m spatial resolution interpolated by MLRK and GWRK had a high accuracy and captured detailed spatial distribution data; however, MLRK produced a lower prediction error and a higher variance explanation than GWRK, although the differences were small, in contrast to conclusions from similar studies.

  18. Using AMMI, factorial regression and partial least squares regression models for interpreting genotype x environment interaction.

    Vargas, M.; Crossa, J.; Eeuwijk, van F.A.; Ramirez, M.E.; Sayre, K.


    Partial least squares (PLS) and factorial regression (FR) are statistical models that incorporate external environmental and/or cultivar variables for studying and interpreting genotype × environment interaction (GEl). The Additive Main effect and Multiplicative Interaction (AMMI) model uses only th

  19. Testing the gonadal regression-cytoprotection hypothesis.

    Crawford, B A; Spaliviero, J A; Simpson, J M; Handelsman, D J


    Germinal damage is an almost universal accompaniment of cancer treatment as the result of bystander damage to the testis from cytotoxic drugs and/or irradiation. Cancer treatment for the most common cancers of the reproductive age group in men has improved such that most are now treated with curative intent, and many others are treated with likelihood of prolonged survival, so that the preservation of fertility is an important component of posttreatment quality of life. This has led to the consideration of developing adjuvant treatments that may reduce the gonadal toxicity of cancer therapy. One dominant hypothesis has been based on the supposition that the immature testis was resistant to cytotoxin damage. Hence, if hormonal treatment were able to cause spermatogenic regression to an immature state via an effective withdrawal of gonadotrophin secretion, the testis might be maintained temporarily in a protected state during cytotoxin exposure. However, clinical studies have been disappointing but have also been unable to test the hypothesis definitively thus far, due to the inability to completely suppress gonadotrophin secretion. Similarly, experimental models have also given conflicting results and, at best, a modest cytoprotection. To definitively test this hypothesis experimentally, we used the fact that the functionally hpg mouse has complete gonadotrophin deficiency but can undergo the induction of full spermatogenesis by testosterone. Thus, if complete gonadotrophin deficiency were an advantage during cytotoxin exposure, then the hpg mouse should exhibit some degree of germinal protection against cytotoxin-induced damage. We therefore administered three different cytotoxins (200 mg/kg procarbazine, 9 mg/kg doxorubicin, 8 Gy of X irradiation) to produce a range of severity in testicular damage and mechanism of action to either phenotypically normal or hpg mice. Testis weight and homogenization-resistant spermatid numbers were measured to evaluate the

  20. Automation of Flight Software Regression Testing

    Tashakkor, Scott B.


    NASA is developing the Space Launch System (SLS) to be a heavy lift launch vehicle supporting human and scientific exploration beyond earth orbit. SLS will have a common core stage, an upper stage, and different permutations of boosters and fairings to perform various crewed or cargo missions. Marshall Space Flight Center (MSFC) is writing the Flight Software (FSW) that will operate the SLS launch vehicle. The FSW is developed in an incremental manner based on "Agile" software techniques. As the FSW is incrementally developed, testing the functionality of the code needs to be performed continually to ensure that the integrity of the software is maintained. Manually testing the functionality on an ever-growing set of requirements and features is not an efficient solution and therefore needs to be done automatically to ensure testing is comprehensive. To support test automation, a framework for a regression test harness has been developed and used on SLS FSW. The test harness provides a modular design approach that can compile or read in the required information specified by the developer of the test. The modularity provides independence between groups of tests and the ability to add and remove tests without disturbing others. This provides the SLS FSW team a time saving feature that is essential to meeting SLS Program technical and programmatic requirements. During development of SLS FSW, this technique has proved to be a useful tool to ensure all requirements have been tested, and that desired functionality is maintained, as changes occur. It also provides a mechanism for developers to check functionality of the code that they have developed. With this system, automation of regression testing is accomplished through a scheduling tool and/or commit hooks. Key advantages of this test harness capability includes execution support for multiple independent test cases, the ability for developers to specify precisely what they are testing and how, the ability to add

  1. Laplacian embedded regression for scalable manifold regularization.

    Chen, Lin; Tsang, Ivor W; Xu, Dong


    Semi-supervised learning (SSL), as a powerful tool to learn from a limited number of labeled data and a large number of unlabeled data, has been attracting increasing attention in the machine learning community. In particular, the manifold regularization framework has laid solid theoretical foundations for a large family of SSL algorithms, such as Laplacian support vector machine (LapSVM) and Laplacian regularized least squares (LapRLS). However, most of these algorithms are limited to small scale problems due to the high computational cost of the matrix inversion operation involved in the optimization problem. In this paper, we propose a novel framework called Laplacian embedded regression by introducing an intermediate decision variable into the manifold regularization framework. By using ∈-insensitive loss, we obtain the Laplacian embedded support vector regression (LapESVR) algorithm, which inherits the sparse solution from SVR. Also, we derive Laplacian embedded RLS (LapERLS) corresponding to RLS under the proposed framework. Both LapESVR and LapERLS possess a simpler form of a transformed kernel, which is the summation of the original kernel and a graph kernel that captures the manifold structure. The benefits of the transformed kernel are two-fold: (1) we can deal with the original kernel matrix and the graph Laplacian matrix in the graph kernel separately and (2) if the graph Laplacian matrix is sparse, we only need to perform the inverse operation for a sparse matrix, which is much more efficient when compared with that for a dense one. Inspired by kernel principal component analysis, we further propose to project the introduced decision variable into a subspace spanned by a few eigenvectors of the graph Laplacian matrix in order to better reflect the data manifold, as well as accelerate the calculation of the graph kernel, allowing our methods to efficiently and effectively cope with large scale SSL problems. Extensive experiments on both toy and real

  2. Mirror Prescription Regression: A Differential Interferometric Technique

    Brian M. Robinson


    Full Text Available We present a remote, differential method for measuring the prescription of aspheric mirrors using null interferometry in the center-of-curvature configuration. The method requires no equipment beyond that used in a basic interferometery setup (i.e., there are no shearing elements or absolute distance meters. We chose this configuration because of its widespread use. However, the method is generalizable to other configurations with an adjustment of the governing equation. The method involves taking a series of interferograms before and after small, known misalignments are applied to the mirror in the interferometry setup and calculating the prescription (e.g., radius of curvature and conic constant of the mirror, based on these differential measurements, using a nonlinear regression. We apply this method successfully to the testing of a Space Optics Research Lab off-axis parabola with a known focal length of 152.4 mm, a diameter of 76.2 mm, and an off-axis angle of 12°.

  3. Boosted Regression Tree Models to Explain Watershed ...

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on the Index of Biotic Integrity (IBI), were also analyzed. Seasonal BRT models at two spatial scales (watershed and riparian buffered area [RBA]) for nitrite-nitrate (NO2-NO3), total Kjeldahl nitrogen, and total phosphorus (TP) and annual models for the IBI score were developed. Two primary factors — location within the watershed (i.e., geographic position, stream order, and distance to a downstream confluence) and percentage of urban land cover (both scales) — emerged as important predictor variables. Latitude and longitude interacted with other factors to explain the variability in summer NO2-NO3 concentrations and IBI scores. BRT results also suggested that location might be associated with indicators of sources (e.g., land cover), runoff potential (e.g., soil and topographic factors), and processes not easily represented by spatial data indicators. Runoff indicators (e.g., Hydrological Soil Group D and Topographic Wetness Indices) explained a substantial portion of the variability in nutrient concentrations as did point sources for TP in the summer months. The results from our BRT approach can help prioritize areas for nutrient management in mixed-use and heavily impacted watershed

  4. Semiparametric Bayesian Regression with Applications in Astronomy

    Broadbent, Mary Elizabeth

    In this thesis we describe a class of Bayesian semiparametric models, known as Levy Adaptive Regression Kernels (LARK); a novel method for posterior computation for those models; and the applications of these models in astronomy, in particular to the analysis of the photon fluence time series of gamma-ray bursts. Gamma-ray bursts are bursts of photons which arrive in a varying number of overlapping pulses with a distinctive "fast-rise, exponential decay" shape in the time domain. LARK models allow us to do inference both on the number of pulses, but also on the parameters which describe the pulses, such as incident time, or decay rate. In Chapter 2, we describe a novel method to aid posterior computation in infinitely-divisible models, of which LARK models are a special case, when the posterior is evaluated through Markov chain Monte Carlo. This is applied in Chapter 3, where time series representing the photon fluence in a single energy channel is analyzed using LARK methods. Due to the effect of the discriminators on BATSE and other instruments, it is important to model the gamma-ray bursts in the incident space. Chapter 4 describes the first to model bursts in the incident photon space, instead of after they have been distorted by the discriminators; since to model photons as they enter the detector is to model both the energy and the arrival time of the incident photon, this model is also the first to jointly model the time and energy domains.

  5. Sparse Regression as a Sparse Eigenvalue Problem

    Moghaddam, Baback; Gruber, Amit; Weiss, Yair; Avidan, Shai


    We extend the l0-norm "subspectral" algorithms for sparse-LDA [5] and sparse-PCA [6] to general quadratic costs such as MSE in linear (kernel) regression. The resulting "Sparse Least Squares" (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem (e.g., binary sparse-LDA [7]). Specifically, for a general quadratic cost we use a highly-efficient technique for direct eigenvalue computation using partitioned matrix inverses which leads to dramatic x103 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n4) scaling behaviour that up to now has limited the previous algorithms' utility for high-dimensional learning problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes more efficient than forward selection. Similarly, branch-and-bound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix inverse techniques. Our Greedy Sparse Least Squares (GSLS) generalizes Natarajan's algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward half of GSLS is exactly equivalent to ORMP but more efficient. By including the backward pass, which only doubles the computation, we can achieve lower MSE than ORMP. Experimental comparisons to the state-of-the-art LARS algorithm [3] show forward-GSLS is faster, more accurate and more flexible in terms of choice of regularization

  6. Sliced Inverse Regression for Time Series Analysis

    Chen, Li-Sue


    In this thesis, general nonlinear models for time series data are considered. A basic form is x _{t} = f(beta_sp{1} {T}X_{t-1},beta_sp {2}{T}X_{t-1},... , beta_sp{k}{T}X_ {t-1},varepsilon_{t}), where x_{t} is an observed time series data, X_{t } is the first d time lag vector, (x _{t},x_{t-1},... ,x _{t-d-1}), f is an unknown function, beta_{i}'s are unknown vectors, varepsilon_{t }'s are independent distributed. Special cases include AR and TAR models. We investigate the feasibility applying SIR/PHD (Li 1990, 1991) (the sliced inverse regression and principal Hessian methods) in estimating beta _{i}'s. PCA (Principal component analysis) is brought in to check one critical condition for SIR/PHD. Through simulation and a study on 3 well -known data sets of Canadian lynx, U.S. unemployment rate and sunspot numbers, we demonstrate how SIR/PHD can effectively retrieve the interesting low-dimension structures for time series data.

  7. Free Software Development. 1. Fitting Statistical Regressions

    Lorentz JÄNTSCHI


    Full Text Available The present paper is focused on modeling of statistical data processing with applications in field of material science and engineering. A new method of data processing is presented and applied on a set of 10 Ni–Mn–Ga ferromagnetic ordered shape memory alloys that are known to exhibit phonon softening and soft mode condensation into a premartensitic phase prior to the martensitic transformation itself. The method allows to identify the correlations between data sets and to exploit them later in statistical study of alloys. An algorithm for computing data was implemented in preprocessed hypertext language (PHP, a hypertext markup language interface for them was also realized and put onto educational web server, and it is accessible via http protocol at the address The program running for the set of alloys allow to identify groups of alloys properties and give qualitative measure of correlations between properties. Surfaces of property dependencies are also fitted.

  8. Gravitational Wave Emulation Using Gaussian Process Regression

    Doctor, Zoheyr; Farr, Ben; Holz, Daniel


    Parameter estimation (PE) for gravitational wave signals from compact binary coalescences (CBCs) requires reliable template waveforms which span the parameter space. Waveforms from numerical relativity are accurate but computationally expensive, so approximate templates are typically used for PE. These `approximants', while quick to compute, can introduce systematic errors and bias PE results. We describe a machine learning method for generating CBC waveforms and uncertainties using existing accurate waveforms as a training set. Coefficients of a reduced order waveform model are computed and each treated as arising from a Gaussian process. These coefficients and their uncertainties are then interpolated using Gaussian process regression (GPR). As a proof of concept, we construct a training set of approximant waveforms (rather than NR waveforms) in the two-dimensional space of chirp mass and mass ratio and interpolate new waveforms with GPR. We demonstrate that the mismatch between interpolated waveforms and approximants is below the 1% level for an appropriate choice of training set and GPR kernel hyperparameters.

  9. Deep Human Parsing with Active Template Regression.

    Liang, Xiaodan; Liu, Si; Shen, Xiaohui; Yang, Jianchao; Liu, Luoqi; Dong, Jian; Lin, Liang; Yan, Shuicheng


    In this work, the human parsing task, namely decomposing a human image into semantic fashion/body regions, is formulated as an active template regression (ATR) problem, where the normalized mask of each fashion/body item is expressed as the linear combination of the learned mask templates, and then morphed to a more precise mask with the active shape parameters, including position, scale and visibility of each semantic region. The mask template coefficients and the active shape parameters together can generate the human parsing results, and are thus called the structure outputs for human parsing. The deep Convolutional Neural Network (CNN) is utilized to build the end-to-end relation between the input human image and the structure outputs for human parsing. More specifically, the structure outputs are predicted by two separate networks. The first CNN network is with max-pooling, and designed to predict the template coefficients for each label mask, while the second CNN network is without max-pooling to preserve sensitivity to label mask position and accurately predict the active shape parameters. For a new image, the structure outputs of the two networks are fused to generate the probability of each label for each pixel, and super-pixel smoothing is finally used to refine the human parsing result. Comprehensive evaluations on a large dataset well demonstrate the significant superiority of the ATR framework over other state-of-the-arts for human parsing. In particular, the F1-score reaches 64.38 percent by our ATR framework, significantly higher than 44.76 percent based on the state-of-the-art algorithm [28].

  10. Rank-preserving regression: a more robust rank regression model against outliers.

    Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M


    Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  11. Dimension Reduction and Discretization in Stochastic Problems by Regression Method

    Ditlevsen, Ove Dalager


    The chapter mainly deals with dimension reduction and field discretizations based directly on the concept of linear regression. Several examples of interesting applications in stochastic mechanics are also given.Keywords: Random fields discretization, Linear regression, Stochastic interpolation, ......, Slepian models, Stochastic finite elements.......The chapter mainly deals with dimension reduction and field discretizations based directly on the concept of linear regression. Several examples of interesting applications in stochastic mechanics are also given.Keywords: Random fields discretization, Linear regression, Stochastic interpolation...

  12. Regression Benchmarking: An Approach to Quality Assurance in Performance


    The paper presents a short summary of our work in the area of regression benchmarking and its application to software development. Specially, we explain the concept of regression benchmarking, the requirements for employing regression testing in a software project, and methods used for analyzing the vast amounts of data resulting from repeated benchmarking. We present the application of regression benchmarking on a real software project and conclude with a glimpse at the challenges for the fu...

  13. [Spatial interpolation of soil organic matter using regression Kriging and geographically weighted regression Kriging].

    Yang, Shun-hua; Zhang, Hai-tao; Guo, Long; Ren, Yan


    Relative elevation and stream power index were selected as auxiliary variables based on correlation analysis for mapping soil organic matter. Geographically weighted regression Kriging (GWRK) and regression Kriging (RK) were used for spatial interpolation of soil organic matter and compared with ordinary Kriging (OK), which acts as a control. The results indicated that soil or- ganic matter was significantly positively correlated with relative elevation whilst it had a significantly negative correlation with stream power index. Semivariance analysis showed that both soil organic matter content and its residuals (including ordinary least square regression residual and GWR resi- dual) had strong spatial autocorrelation. Interpolation accuracies by different methods were esti- mated based on a data set of 98 validation samples. Results showed that the mean error (ME), mean absolute error (MAE) and root mean square error (RMSE) of RK were respectively 39.2%, 17.7% and 20.6% lower than the corresponding values of OK, with a relative-improvement (RI) of 20.63. GWRK showed a similar tendency, having its ME, MAE and RMSE to be respectively 60.6%, 23.7% and 27.6% lower than those of OK, with a RI of 59.79. Therefore, both RK and GWRK significantly improved the accuracy of OK interpolation of soil organic matter due to their in- corporation of auxiliary variables. In addition, GWRK performed obviously better than RK did in this study, and its improved performance should be attributed to the consideration of sample spatial locations.

  14. Regional flood frequency analysis using spatial proximity and basin characteristics: Quantile regression vs. parameter regression technique

    Ahn, Kuk-Hyun; Palmer, Richard


    Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.

  15. Using Dominance Analysis to Determine Predictor Importance in Logistic Regression

    Azen, Razia; Traxel, Nicole


    This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…

  16. Relationship between Multiple Regression and Selected Multivariable Methods.

    Schumacker, Randall E.

    The relationship of multiple linear regression to various multivariate statistical techniques is discussed. The importance of the standardized partial regression coefficient (beta weight) in multiple linear regression as it is applied in path, factor, LISREL, and discriminant analyses is emphasized. The multivariate methods discussed in this paper…

  17. Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing

    Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.


    The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval arithm

  18. Meta-Regression: A Framework for Robust Reactive Optimization

    McClary, Dan; Syrotiuk, Violet R.; Kulahci, Murat


    Maintaining optimal performance as the conditions of a system change is a challenging problem. To solve this problem, we present meta-regression, a general methodology for alleviating traditional difficulties in nonlinear regression modelling. Meta-regression allows for reactive optimization, in ...... of a nonlinear system....

  19. Using Dominance Analysis to Determine Predictor Importance in Logistic Regression

    Azen, Razia; Traxel, Nicole


    This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…

  20. Sparse reduced-rank regression with covariance estimation

    Chen, Lisha


    Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.

  1. Enhancement of Visual Field Predictions with Pointwise Exponential Regression (PER) and Pointwise Linear Regression (PLR).

    Morales, Esteban; de Leon, John Mark S; Abdollahi, Niloufar; Yu, Fei; Nouri-Mahdavi, Kouros; Caprioli, Joseph


    The study was conducted to evaluate threshold smoothing algorithms to enhance prediction of the rates of visual field (VF) worsening in glaucoma. We studied 798 patients with primary open-angle glaucoma and 6 or more years of follow-up who underwent 8 or more VF examinations. Thresholds at each VF location for the first 4 years or first half of the follow-up time (whichever was greater) were smoothed with clusters defined by the nearest neighbor (NN), Garway-Heath, Glaucoma Hemifield Test (GHT), and weighting by the correlation of rates at all other VF locations. Thresholds were regressed with a pointwise exponential regression (PER) model and a pointwise linear regression (PLR) model. Smaller root mean square error (RMSE) values of the differences between the observed and the predicted thresholds at last two follow-ups indicated better model predictions. The mean (SD) follow-up times for the smoothing and prediction phase were 5.3 (1.5) and 10.5 (3.9) years. The mean RMSE values for the PER and PLR models were unsmoothed data, 6.09 and 6.55; NN, 3.40 and 3.42; Garway-Heath, 3.47 and 3.48; GHT, 3.57 and 3.74; and correlation of rates, 3.59 and 3.64. Smoothed VF data predicted better than unsmoothed data. Nearest neighbor provided the best predictions; PER also predicted consistently more accurately than PLR. Smoothing algorithms should be used when forecasting VF results with PER or PLR. The application of smoothing algorithms on VF data can improve forecasting in VF points to assist in treatment decisions.

  2. Regression and Sparse Regression Methods for Viscosity Estimation of Acid Milk From it’s Sls Features

    Sharifzadeh, Sara; Skytte, Jacob Lercke; Nielsen, Otto Højager Attermann;


    Statistical solutions find wide spread use in food and medicine quality control. We investigate the effect of different regression and sparse regression methods for a viscosity estimation problem using the spectro-temporal features from new Sub-Surface Laser Scattering (SLS) vision system. From...... this investigation, we propose the optimal solution for regression estimation in case of noisy and inconsistent optical measurements, which is the case in many practical measurement systems. The principal component regression (PLS), partial least squares (PCR) and least angle regression (LAR) methods are compared...

  3. A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.

    Meaney, Christopher; Moineddin, Rahim


    In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the

  4. The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?

    Jiangshan Lai

    Full Text Available Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.

  5. Asymptotic theory of nonparametric regression estimates with censored data

    施沛德; 王海燕; 张利华


    For regression analysis, some useful Information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literat黵e, but the optimal rates of global convergence have not been obtained yet. Because of the possible Information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression f unction based on right-censored response data, and proves, under some regularity condi-tions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtai

  6. Regression-kriging for characterizing soils with remotesensing data

    Yufeng GE; J.Alex THOMASSON; Ruixiu SUI; James WOOTEN


    In precision agriculture regression has been used widely to quantify the relationship between soil attributes and other environmental variables.However,spatial correlation existing in soil samples usually violates a basic assumption of regression:sample independence.In this study,a regression-kriging method was attempted in relating soil properties to the remote sensing image of a cotton field near Vance,Mississippi,USA.The regressionkriging model was developed and tested by using 273 soil samples collected from the field.The result showed that by properly incorporating the spatial correlation information of regression residuals,the regression-kriging model generally achieved higher prediction accuracy than the stepwise multiple linear regression model.Most strikingly,a 50% increase in prediction accuracy was shown in soil sodium concentration.Potential usages of regressionkriging in future precision agriculture applications include real-time soil sensor development and digital soil mapping.

  7. Local Linear Regression for Data with AR Errors

    Runze Li; Yan Li


    In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques.We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one.From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.

  8. Logistic Regression Model on Antenna Control Unit Autotracking Mode


    412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCT 15 4. TITLE AND SUBTITLE Logistic Regression Model on Antenna Control Unit Autotracking Mode 5a. CONTRACT NUMBER 5b. GRANT...alternative-hypothesis. This paper will present an Antenna Auto- tracking model using Logistic Regression modeling. This paper presents an example of

  9. Sleep in children with autism with and without autistic regression.

    Giannotti, Flavia; Cortesi, Flavia; Cerquiglini, Antonella; Vagnoni, Cristina; Valente, Donatella


    The purpose of the present investigation was to characterize and compare traditional sleep architecture and non-rapid eye movement (NREM) sleep microstructure in a well-defined cohort of children with regressive and non-regressive autism, and in typically developing children (TD). We hypothesized that children with regressive autism would demonstrate a greater degree of sleep disruption either at a macrostructural or microstructural level and a more problematic sleep as reported by parents. Twenty-two children with non-regressive autism, 18 with regressive autism without comorbid pathologies and 12 with TD, aged 5-10years, underwent standard overnight multi-channel polysomnographic evaluation. Parents completed a structured questionnaire (Childrens' Sleep Habits Questionnaire-CSHQ). The initial hypothesis, that regressed children have more disrupted sleep, was supported by our findings that they scored significantly higher on CSHQ, particularly on bedtime resistance, sleep onset delay, sleep duration and night wakings CSHQ subdomains than non-regressed peers, and both scored more than typically developing controls. Regressive subjects had significantly less efficient sleep, less total sleep time, prolonged sleep latency, prolonged REM latency and more time awake after sleep onset than non-regressive children and the TD group. Regressive children showed lower cyclic alternating pattern (CAP) rates and A1 index in light sleep than non-regressive and TD children. Our findings suggest that, even though no particular differences in sleep architecture were found between the two groups of children with autism, those who experienced regression showed more sleep disorders and a disruption of sleep either from a macro- or from a microstructural viewpoint. © 2010 European Sleep Research Society.

  10. Penalized Weighted Least Squares for Outlier Detection and Robust Regression

    Gao, Xiaoli; Fang, Yixin


    To conduct regression analysis for data contaminated with outliers, many approaches have been proposed for simultaneous outlier detection and robust regression, so is the approach proposed in this manuscript. This new approach is called "penalized weighted least squares" (PWLS). By assigning each observation an individual weight and incorporating a lasso-type penalty on the log-transformation of the weight vector, the PWLS is able to perform outlier detection and robust regression simultaneou...

  11. Combining logistic regression and neural networks to create predictive models.

    Spackman, K. A.


    Neural networks are being used widely in medicine and other areas to create predictive models from data. The statistical method that most closely parallels neural networks is logistic regression. This paper outlines some ways in which neural networks and logistic regression are similar, shows how a small modification of logistic regression can be used in the training of neural network models, and illustrates the use of this modification for variable selection and predictive model building wit...

  12. Neighborhood social capital and crime victimization: comparison of spatial regression analysis and hierarchical regression analysis.

    Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro


    Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan.

  13. Neither fixed nor random: weighted least squares meta-regression.

    Stanley, T D; Doucouliagos, Hristos


    Our study revisits and challenges two core conventional meta-regression estimators: the prevalent use of 'mixed-effects' or random-effects meta-regression analysis and the correction of standard errors that defines fixed-effects meta-regression analysis (FE-MRA). We show how and explain why an unrestricted weighted least squares MRA (WLS-MRA) estimator is superior to conventional random-effects (or mixed-effects) meta-regression when there is publication (or small-sample) bias that is as good as FE-MRA in all cases and better than fixed effects in most practical applications. Simulations and statistical theory show that WLS-MRA provides satisfactory estimates of meta-regression coefficients that are practically equivalent to mixed effects or random effects when there is no publication bias. When there is publication selection bias, WLS-MRA always has smaller bias than mixed effects or random effects. In practical applications, an unrestricted WLS meta-regression is likely to give practically equivalent or superior estimates to fixed-effects, random-effects, and mixed-effects meta-regression approaches. However, random-effects meta-regression remains viable and perhaps somewhat preferable if selection for statistical significance (publication bias) can be ruled out and when random, additive normal heterogeneity is known to directly affect the 'true' regression coefficient. Copyright © 2016 John Wiley & Sons, Ltd.

  14. Tax Evasion, Information Reporting, and the Regressive Bias Hypothesis

    Boserup, Simon Halphen; Pinje, Jori Veng

    A robust prediction from the tax evasion literature is that optimal auditing induces a regressive bias in effective tax rates compared to statutory rates. If correct, this will have important distributional consequences. Nevertheless, the regressive bias hypothesis has never been tested empirically....... Using a unique data set, we provide evidence in favor of the regressive bias prediction but only when controlling for the tax agency's use of third-party information in predicting true incomes. In aggregate data, the regressive bias vanishes because of the systematic use of third-party information...

  15. Compound Identification Using Penalized Linear Regression on Metabolomics.

    Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho


    Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson's correlation along with the penalized linear regression are proposed in this study.

  16. Regression on manifolds: Estimation of the exterior derivative

    Aswani, Anil; Tomlin, Claire; 10.1214/10-AOS823


    Collinearity and near-collinearity of predictors cause difficulties when doing regression. In these cases, variable selection becomes untenable because of mathematical issues concerning the existence and numerical stability of the regression coefficients, and interpretation of the coefficients is ambiguous because gradients are not defined. Using a differential geometric interpretation, in which the regression coefficients are interpreted as estimates of the exterior derivative of a function, we develop a new method to do regression in the presence of collinearities. Our regularization scheme can improve estimation error, and it can be easily modified to include lasso-type regularization. These estimators also have simple extensions to the "large $p$, small $n$" context.

  17. Regression calibration with more surrogates than mismeasured variables

    Kipnis, Victor


    In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449-461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two-stage method. We also discuss extensions to different data structures.

  18. A logistic regression estimating function for spatial Gibbs point processes

    Baddeley, Adrian; Coeurjolly, Jean-François; Rubak, Ege

    We propose a computationally efficient logistic regression estimating function for spatial Gibbs point processes. The sample points for the logistic regression consist of the observed point pattern together with a random pattern of dummy points. The estimating function is closely related...

  19. Spatial correlation in Bayesian logistic regression with misclassification

    Bihrmann, Kristine; Toft, Nils; Nielsen, Søren Saxmose


    Standard logistic regression assumes that the outcome is measured perfectly. In practice, this is often not the case, which could lead to biased estimates if not accounted for. This study presents Bayesian logistic regression with adjustment for misclassification of the outcome applied to data...

  20. Changes in persistence, spurious regressions and the Fisher hypothesis

    Kruse, Robinson; Ventosa-Santaulària, Daniel; Noriega, Antonio E.

    Declining inflation persistence has been documented in numerous studies. When such series are analyzed in a regression framework in conjunction with other persistent time series, spurious regressions are likely to occur. We propose to use the coefficient of determination R2 as a test statistic to...

  1. Quantile Regression in the Study of Developmental Sciences

    Petscher, Yaacov; Logan, Jessica A. R.


    Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…

  2. Regression Commonality Analysis: A Technique for Quantitative Theory Building

    Nimon, Kim; Reio, Thomas G., Jr.


    When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…

  3. Stochastic Approximation Methods for Latent Regression Item Response Models

    von Davier, Matthias; Sinharay, Sandip


    This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…

  4. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    Guoqi Qian


    Full Text Available Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method.

  5. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric


    Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939

  6. Interpreting Bivariate Regression Coefficients: Going beyond the Average

    Halcoussis, Dennis; Phillips, G. Michael


    Statistics, econometrics, investment analysis, and data analysis classes often review the calculation of several types of averages, including the arithmetic mean, geometric mean, harmonic mean, and various weighted averages. This note shows how each of these can be computed using a basic regression framework. By recognizing when a regression model…

  7. Augmenting Data with Published Results in Bayesian Linear Regression

    de Leeuw, Christiaan; Klugkist, Irene


    In most research, linear regression analyses are performed without taking into account published results (i.e., reported summary statistics) of similar previous studies. Although the prior density in Bayesian linear regression could accommodate such prior knowledge, formal models for doing so are absent from the literature. The goal of this…

  8. Who Will Win?: Predicting the Presidential Election Using Linear Regression

    Lamb, John H.


    This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…

  9. Augmenting Data with Published Results in Bayesian Linear Regression

    de Leeuw, Christiaan; Klugkist, Irene


    In most research, linear regression analyses are performed without taking into account published results (i.e., reported summary statistics) of similar previous studies. Although the prior density in Bayesian linear regression could accommodate such prior knowledge, formal models for doing so are absent from the literature. The goal of this…

  10. Testing hypotheses for differences between linear regression lines

    Stanley J. Zarnoch


    Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...

  11. Who Will Win?: Predicting the Presidential Election Using Linear Regression

    Lamb, John H.


    This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…

  12. Abnormal behavior of the least squares estimate of multiple regression

    陈希孺; 安鸿志


    An example is given to reveal the abnormal behavior of the least squares estimate of multiple regression. It is shown that the least squares estimate of the multiple linear regression may be "improved in the sense of weak consistency when nuisance parameters are introduced into the model. A discussion on the implications of this finding is given.

  13. An Effect Size for Regression Predictors in Meta-Analysis

    Aloe, Ariel M.; Becker, Betsy Jane


    A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…

  14. Mixed Frequency Data Sampling Regression Models: The R Package midasr

    Eric Ghysels


    Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.

  15. Using Weighted Least Squares Regression for Obtaining Langmuir Sorption Constants

    One of the most commonly used models for describing phosphorus (P) sorption to soils is the Langmuir model. To obtain model parameters, the Langmuir model is fit to measured sorption data using least squares regression. Least squares regression is based on several assumptions including normally dist...

  16. About regression-kriging: from equations to case studies

    Hengl, T.; Heuvelink, G.B.M.; Rossiter, D.G.


    This paper discusses the characteristics of regression-kriging (RK), its strengths and limitations, and illustrates these with a simple example and three case studies. RK is a spatial interpolation technique that combines a regression of the dependent variable on auxiliary variables (such as land su

  17. A Methodology for Generating Placement Rules that Utilizes Logistic Regression

    Wurtz, Keith


    The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…

  18. General Nature of Multicollinearity in Multiple Regression Analysis.

    Liu, Richard


    Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)

  19. Prediction accuracy and stability of regression with optimal scaling transformations

    Kooij, van der Anita J.


    The central topic of this thesis is the CATREG approach to nonlinear regression. This approach finds optimal quantifications for categorical variables and/or nonlinear transformations for numerical variables in regression analysis. (CATREG is implemented in SPSS Categories by the author of the thesi

  20. Comparison of Regularized Regression Methods for ~Omics Data

    Acharjee, A.; Finkers, H.J.; Visser, R.G.F.; Maliepaard, C.A.


    Background: In this study, we compare methods that can be used to relate a phenotypic trait of interest to an ~omics data set, where the number or variables outnumbers by far the number of samples. Methods: We apply univariate regression and different regularized multiple regression methods: ridge r

  1. Regression away from the mean: Theory and examples.

    Schwarz, Wolf; Reike, Dennis


    Using a standard repeated measures model with arbitrary true score distribution and normal error variables, we present some fundamental closed-form results which explicitly indicate the conditions under which regression effects towards (RTM) and away from the mean are expected. Specifically, we show that for skewed and bimodal distributions many or even most cases will show a regression effect that is in expectation away from the mean, or that is not just towards but actually beyond the mean. We illustrate our results in quantitative detail with typical examples from experimental and biometric applications, which exhibit a clear regression away from the mean ('egression from the mean') signature. We aim not to repeal cautionary advice against potential RTM effects, but to present a balanced view of regression effects, based on a clear identification of the conditions governing the form that regression effects take in repeated measures designs. © 2017 The British Psychological Society.

  2. Spontaneous regression together with increased calcification of incidental meningioma

    Kengo Hirota


    Full Text Available Background: Regression of meningioma has been reported after hemorrhage or hormonal withdrawal. However, meningioma regression is rarely observed spontaneously. Case Description: A right falx meningioma was incidentally diagnosed and was followed at every one-year by magnetic resonance imaging (MRI for over 7 years. The tumor, with a maximum diameter of 4 cm, showed a slightly high density and was enhanced on computed tomography (CT, and a high intensity with a low-intensity core on T2 MRI, with significant edema. The meningioma gradually shrank together with a decrease of edema and increase of calcification. The initial volume, 25.5 cm 3 , regressed linearly to less than half, 9.9 cm 3 . Conclusion: Here, we report a case of an incidentally diagnosed meningioma that regressed spontaneously. The pattern of the regression was similar to that following gamma knife radiosurgery.

  3. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami


    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  4. Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation

    Kekatos, Vassilis


    Volterra and polynomial regression models play a major role in nonlinear system identification and inference tasks. Exciting applications ranging from neuroscience to genome-wide association analysis build on these models with the additional requirement of parsimony. This requirement has high interpretative value, but unfortunately cannot be met by least-squares based or kernel regression methods. To this end, compressed sampling (CS) approaches, already successful in linear regression settings, can offer a viable alternative. The viability of CS for sparse Volterra and polynomial models is the core theme of this work. A common sparse regression task is initially posed for the two models. Building on (weighted) Lasso-based schemes, an adaptive RLS-type algorithm is developed for sparse polynomial regressions. The identifiability of polynomial models is critically challenged by dimensionality. However, following the CS principle, when these models are sparse, they could be recovered by far fewer measurements. ...

  5. Tax Evasion, Information Reporting, and the Regressive Bias Hypothesis

    Boserup, Simon Halphen; Pinje, Jori Veng

    A robust prediction from the tax evasion literature is that optimal auditing induces a regressive bias in effective tax rates compared to statutory rates. If correct, this will have important distributional consequences. Nevertheless, the regressive bias hypothesis has never been tested empirically....... Using a unique data set, we provide evidence in favor of the regressive bias prediction but only when controlling for the tax agency's use of third-party information in predicting true incomes. In aggregate data, the regressive bias vanishes because of the systematic use of third-party information....... These results are obtained both in simple reduced-form regressions and in a data-calibrated state-of-the-art model....

  6. Regression of atherosclerosis: which components regress and what influences their reversal.

    Kramsch, D M; Blankenhorn, D H


    Two major features of atherosclerosis may be distinguished: (a) atherosis caused by lipid infiltration in cells and extracellularly and (b) sclerosis caused by connective tissue deposition and by functional disturbance of the endothelium, leading to impairment of endothelium-derived relaxing factor (EDRF)-release and reduced arterial compliance. Atherosis generally rarely causes clinical symptoms and, furthermore, is reversible by lowering of LDL-C. However, the clinically significant human lesion is the fibrous atheromatous plaque. Regression of established fibrous lesions by maximal lowering of LDL-C is slow in native coronary arteries and nonexistent in bypass grafts. However, there is good evidence from recent clinical trials that even in arteries with significant fibrous plaques, functional improvement of sclerosis (increased compliance) can be achieved by reducing maximally LDL-C in spite of remaining structural connective tissue abnormalities, presumably through normalization of EDRF. Atherosclerosis, therefore, appears to be best influenced by lipid lowering when it is in its atherosis phase. In the plaques of advanced atherosclerosis, when connective tissue sclerosis is present, agents working through mechanisms other than lipid lowering appear to be required for plaque reversal.

  7. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    Koon, Sharon; Petscher, Yaacov


    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  8. Using Regression Equations Built from Summary Data in the Psychological Assessment of the Individual Case: Extension to Multiple Regression

    Crawford, John R.; Garthwaite, Paul H.; Denham, Annie K.; Chelune, Gordon J.


    Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because…

  9. Use of probabilistic weights to enhance linear regression myoelectric control

    Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.


    Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  10. Nonlinear Forecasting With Many Predictors Using Kernel Ridge Regression

    Exterkate, Peter; Groenen, Patrick J.F.; Heij, Christiaan

    This paper puts forward kernel ridge regression as an approach for forecasting with many predictors that are related nonlinearly to the target variable. In kernel ridge regression, the observed predictor variables are mapped nonlinearly into a high-dimensional space, where estimation of the predi......This paper puts forward kernel ridge regression as an approach for forecasting with many predictors that are related nonlinearly to the target variable. In kernel ridge regression, the observed predictor variables are mapped nonlinearly into a high-dimensional space, where estimation...... of the predictive regression model is based on a shrinkage estimator to avoid overfitting. We extend the kernel ridge regression methodology to enable its use for economic time-series forecasting, by including lags of the dependent variable or other individual variables as predictors, as typically desired...... in macroeconomic and financial applications. Monte Carlo simulations as well as an empirical application to various key measures of real economic activity confirm that kernel ridge regression can produce more accurate forecasts than traditional linear and nonlinear methods for dealing with many predictors based...

  11. Use of probabilistic weights to enhance linear regression myoelectric control.

    Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J


    Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts' law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p linear regression control. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  12. Impact of multicollinearity on small sample hydrologic regression models

    Kroll, Charles N.; Song, Peter


    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  13. Calculation of Solar Radiation by Using Regression Methods

    Kızıltan, Ö.; Şahin, M.


    In this study, solar radiation was estimated at 53 location over Turkey with varying climatic conditions using the Linear, Ridge, Lasso, Smoother, Partial least, KNN and Gaussian process regression methods. The data of 2002 and 2003 years were used to obtain regression coefficients of relevant methods. The coefficients were obtained based on the input parameters. Input parameters were month, altitude, latitude, longitude and landsurface temperature (LST).The values for LST were obtained from the data of the National Oceanic and Atmospheric Administration Advanced Very High Resolution Radiometer (NOAA-AVHRR) satellite. Solar radiation was calculated using obtained coefficients in regression methods for 2004 year. The results were compared statistically. The most successful method was Gaussian process regression method. The most unsuccessful method was lasso regression method. While means bias error (MBE) value of Gaussian process regression method was 0,274 MJ/m2, root mean square error (RMSE) value of method was calculated as 2,260 MJ/m2. The correlation coefficient of related method was calculated as 0,941. Statistical results are consistent with the literature. Used the Gaussian process regression method is recommended for other studies.

  14. FBH1 Catalyzes Regression of Stalled Replication Forks

    Fugger, Kasper; Mistrik, Martin; Neelsen, Kai J


    DNA replication fork perturbation is a major challenge to the maintenance of genome integrity. It has been suggested that processing of stalled forks might involve fork regression, in which the fork reverses and the two nascent DNA strands anneal. Here, we show that FBH1 catalyzes regression...... a model whereby FBH1 promotes early checkpoint signaling by remodeling of stalled DNA replication forks....... of a model replication fork in vitro and promotes fork regression in vivo in response to replication perturbation. Cells respond to fork stalling by activating checkpoint responses requiring signaling through stress-activated protein kinases. Importantly, we show that FBH1, through its helicase activity...

  15. Developmental regression in autism: research and conceptual questions

    Carolina Lampreia


    Full Text Available The subject of developmental regression in autism has gained importance and a growing number of studies have been conducted in recent years. It is a major issue indicating that there is not a unique form of autism onset. However the phenomenon itself and the concept of regression have been the subject of some debate: there is no consensus on the existence of regression, as there is no consensus on its definition. The aim of this paper is to review the research literature in this area and to introduce some conceptual questions about its existence and its definition.

  16. Improved estimation in a non-Gaussian parametric regression

    Pchelintsev, Evgeny


    The paper considers the problem of estimating the parameters in a continuous time regression model with a non-Gaussian noise of pulse type. The noise is specified by the Ornstein-Uhlenbeck process driven by the mixture of a Brownian motion and a compound Poisson process. Improved estimates for the unknown regression parameters, based on a special modification of the James-Stein procedure with smaller quadratic risk than the usual least squares estimates, are proposed. The developed estimation scheme is applied for the improved parameter estimation in the discrete time regression with the autoregressive noise depending on unknown nuisance parameters.

  17. Nonlinear wavelet estimation of regression function with random desigm

    张双林; 郑忠国


    The nonlinear wavelet estimator of regression function with random design is constructed. The optimal uniform convergence rate of the estimator in a ball of Besov space Bp,q? is proved under quite genera] assumpations. The adaptive nonlinear wavelet estimator with near-optimal convergence rate in a wide range of smoothness function classes is also constructed. The properties of the nonlinear wavelet estimator given for random design regression and only with bounded third order moment of the error can be compared with those of nonlinear wavelet estimator given in literature for equal-spaced fixed design regression with i.i.d. Gauss error.


    ZhuZhongyi; WeiBocheng


    In this paper, the estimation method based on the “generalized profile likelihood” for the conditionally parametric models in the paper given by Severini and Wong (1992) is extendedto fixed design semiparametrie nonlinear regression models. For these semiparametrie nonlinear regression models,the resulting estimator of parametric component of the model is shown to beasymptotically efficient and the strong convergence rate of nonparametric component is investigated. Many results (for example Chen (1988) ,Gao & Zhao (1993), Rice (1986) et al. ) are extended to fixed design semiparametric nonlinear regression models.

  19. Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis

    Nielsen, Allan Aasbjerg


    This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying...... and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain...

  20. On a method of perfect regression using sinusoidal expansion

    Sinha, Nilotpal Kanti


    We present a new method of weighted least square regression that gives a curve of fit with any desired degree of accuracy for a given set of data points. By applying this iterative process infinitely, we show that every finite set of coplanar points can be expanded as a sinusoidal series in infinitely many ways. Thus, given any set of finite data points, we can obtain infinitely many perfect regression curves which give a perfect match between the given data points and the values given by the regression.

  1. Support vector regression model for complex target RCS predicting

    Wang Gu; Chen Weishi; Miao Jungang


    The electromagnetic scattering computation has developed rapidly for many years; some computing problems for complex and coated targets cannot be solved by using the existing theory and computing models. A computing model based on data is established for making up the insufficiency of theoretic models. Based on the "support vector regression method", which is formulated on the principle of minimizing a structural risk, a data model to predicate the unknown radar cross section of some appointed targets is given. Comparison between the actual data and the results of this predicting model based on support vector regression method proved that the support vector regression method is workable and with a comparative precision.

  2. Nonparametric Least Squares Estimation of a Multivariate Convex Regression Function

    Seijo, Emilio


    This paper deals with the consistency of the least squares estimator of a convex regression function when the predictor is multidimensional. We characterize and discuss the computation of such an estimator via the solution of certain quadratic and linear programs. Mild sufficient conditions for the consistency of this estimator and its subdifferentials in fixed and stochastic design regression settings are provided. We also consider a regression function which is known to be convex and componentwise nonincreasing and discuss the characterization, computation and consistency of its least squares estimator.

  3. Using ridge regression in systematic pointing error corrections

    Guiar, C. N.


    A pointing error model is used in the antenna calibration process. Data from spacecraft or radio star observations are used to determine the parameters in the model. However, the regression variables are not truly independent, displaying a condition known as multicollinearity. Ridge regression, a biased estimation technique, is used to combat the multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking (days 105 and 106 of 1987) were analyzed using both linear least squares and ridge regression methods. The advantages and limitations of employing the technique are presented. The problem is not yet fully resolved.

  4. Poor smokers, poor quitters, and cigarette tax regressivity.

    Remler, Dahlia K


    The traditional view that excise taxes are regressive has been challenged. I document the history of the term regressive tax, show that traditional definitions have always found cigarette taxes to be regressive, and illustrate the implications of the greater price responsiveness observed among the poor. I explain the different definitions of tax burden: accounting, welfare-based willingness to pay, and welfare-based time inconsistent. Progressivity (equity across income groups) is sensitive to the way in which tax burden is assessed. Analysis of horizontal equity (fairness within a given income group) shows that cigarette taxes heavily burden poor smokers who do not quit, no matter how tax burden is assessed.

  5. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    Hong Wang

    Full Text Available Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

  6. Research and analyze of physical health using multiple regression analysis

    T. S. Kyi


    Full Text Available This paper represents the research which is trying to create a mathematical model of the "healthy people" using the method of regression analysis. The factors are the physical parameters of the person (such as heart rate, lung capacity, blood pressure, breath holding, weight height coefficient, flexibility of the spine, muscles of the shoulder belt, abdominal muscles, squatting, etc.., and the response variable is an indicator of physical working capacity. After performing multiple regression analysis, obtained useful multiple regression models that can predict the physical performance of boys the aged of fourteen to seventeen years. This paper represents the development of regression model for the sixteen year old boys and analyzed results.

  7. Robust Depth-Weighted Wavelet for Nonparametric Regression Models

    Lu LIN


    In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.

  8. Radiation regression patterns after cobalt plaque insertion for retinoblastoma

    Buys, R.J.; Abramson, D.H.; Ellsworth, R.M.; Haik, B.


    An analysis of 31 eyes of 30 patients who had been treated with cobalt plaques for retinoblastoma disclosed that a type I radiation regression pattern developed in 15 patients; type II, in one patient, and type III, in five patients. Nine patients had a regression pattern characterized by complete destruction of the tumor, the surrounding choroid, and all of the vessels in the area into which the plaque was inserted. This resulting white scar, corresponding to the sclerae only, was classified as a type IV radiation regression pattern. There was no evidence of tumor recurrence in patients with type IV regression patterns, with an average follow-up of 6.5 years, after receiving cobalt plaque therapy. Twenty-nine of these 30 patients had been unsuccessfully treated with at least one other modality (ie, light coagulation, cryotherapy, external beam radiation, or chemotherapy).

  9. Predicting Nigeria budget allocation using regression analysis: A ...

    Predicting Nigeria budget allocation using regression analysis: A data mining approach. ... Open Access DOWNLOAD FULL TEXT ... Budget is used by the Government as a guiding tool for planning and management of its resources to aid in ...

  10. Detecting ecological breakpoints: a new tool for piecewise regression

    Alessandro Ferrarini


    Full Text Available Simple linear regression tries to determine a linear relationship between a given variable X (predictor and a dependent variable Y. Since most of the environmental problems involve complex relationships, X-Y relationship is often better modeled through a regression where, instead of fitting a single straight line to the data, the algorithm allows the fitting to bend. Piecewise regressions just do it, since they allow emphasize local, instead of global, rules connecting predictor and dependent variables. In this work, a tool called RolReg is proposed as an implementation of Krummel's method to detect breakpoints in regression models. RolReg, which is freely available upon request from the author, could useful to detect proper breakpoints in ecological laws.

  11. Nonlinear and Non Normal Regression Models in Physiological Research


    Applications of nonlinear and non normal regression models are in increasing order for appropriate interpretation of complex phenomenon of biomedical sciences. This paper reviews critically some applications of these models physiological research.

  12. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng


    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

  13. ARC Code TI: Block-GP: Scalable Gaussian Process Regression

    National Aeronautics and Space Administration — Block GP is a Gaussian Process regression framework for multimodal data, that can be an order of magnitude more scalable than existing state-of-the-art nonlinear...

  14. Block-GP: Scalable Gaussian Process Regression for Multimodal Data

    National Aeronautics and Space Administration — Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. In many cases,...

  15. Distributed Monitoring of the R2 Statistic for Linear Regression

    National Aeronautics and Space Administration — The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and...

  16. Fuzzy rule-based support vector regression system

    Ling WANG; Zhichun MU; Hui GUO


    In this paper,we design a fuzzy rule-based support vector regression system.The proposed system utilizes the advantages of fuzzy model and support vector regression to extract support vectors to generate fuzzy if-then rules from the training data set.Based on the first-order linear Tagaki-Sugeno (TS) model,the structure of rules is identified by the support vector regression and then the consequent parameters of rules are tuned by the global least squares method.Our model is applied to the real world regression task.The simulation results gives promising performances in terms of a set of fuzzy rules,which can be easily interpreted by humans.

  17. Novel algorithm for constructing support vector machine regression ensemble

    Li Bo; Li Xinjun; Zhao Zhiyan


    A novel algorithm for constructing support vector machine regression ensemble is proposed. As to regression prediction, support vector machine regression(SVMR) ensemble is proposed by resampling from given training data sets repeatedly and aggregating several independent SVMRs, each of which is trained to use a replicated training set. After training, several independently trained SVMRs need to be aggregated in an appropriate combination manner. Generally, the linear weighting is usually used like expert weighting score in Boosting Regression and it is without optimization capacity. Three combination techniques are proposed, including simple arithmetic mean,linear least square error weighting and nonlinear hierarchical combining that uses another upper-layer SVMR to combine several lower-layer SVMRs. Finally, simulation experiments demonstrate the accuracy and validity of the presented algorithm.


    Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...

  19. Maximum likelihood polynomial regression for robust speech recognition

    LU Yong; WU Zhenyang


    The linear hypothesis is the main disadvantage of maximum likelihood linear re- gression (MLLR). This paper applies the polynomial regression method to model adaptation and establishes a nonlinear model adaptation algorithm using maximum likelihood polyno

  20. Correlation-regression model for physico-chemical quality of ...


    Key words: Groundwater, water quality, bore well, water supply, correlation, regression. INTRODUCTION ..... interpreting groundwater quality data and relating them to specific hydro ..... Regional trends in nitrate content of Texas groundwater.

  1. Determinants of Inequality in Cameroon: A Regression-Based ...


    Jul 22, 2008 ... This paper applies the regression-based inequality decomposition approach to explore ..... health (WHO Regional Office for Europe's Health Evidence Network, 2005; ..... “Explaining Inter-Household Gender Inequality in.

  2. Estimating the exceedance probability of rain rate by logistic regression

    Chiu, Long S.; Kedem, Benjamin


    Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.

  3. Testing for Stock Market Contagion: A Quantile Regression Approach

    S.Y. Park (Sung); W. Wang (Wendun); N. Huang (Naijing)


    markdownabstract__Abstract__ Regarding the asymmetric and leptokurtic behavior of financial data, we propose a new contagion test in the quantile regression framework that is robust to model misspecification. Unlike conventional correlation-based tests, the proposed quantile contagion test

  4. Group Lasso for high dimensional sparse quantile regression models

    Kato, Kengo


    This paper studies the statistical properties of the group Lasso estimator for high dimensional sparse quantile regression models where the number of explanatory variables (or the number of groups of explanatory variables) is possibly much larger than the sample size while the number of variables in "active" groups is sufficiently small. We establish a non-asymptotic bound on the $\\ell_{2}$-estimation error of the estimator. This bound explains situations under which the group Lasso estimator is potentially superior/inferior to the $\\ell_{1}$-penalized quantile regression estimator in terms of the estimation error. We also propose a data-dependent choice of the tuning parameter to make the method more practical, by extending the original proposal of Belloni and Chernozhukov (2011) for the $\\ell_{1}$-penalized quantile regression estimator. As an application, we analyze high dimensional additive quantile regression models. We show that under a set of primitive regularity conditions, the group Lasso estimator c...

  5. Spontaneous regression of a cyst of the cavum septi pellucidi

    Kocer, N.; Kantarci, F.; Mihmalli, I.; Islak, C.; Cokyueksel, O. [Istanbul University, Cerrahpasa Medical Faculty, Department of Radiology, Istanbul (Turkey)


    A 20-year-old woman with secondary amenorrhoea and an empty sella turcica was found to have a cyst of the cavum septi pellucidi (CSP) on MRI. The cyst had regressed spontaneously on follow-up MRI. (orig.)

  6. An Efficient Local Algorithm for Distributed Multivariate Regression

    National Aeronautics and Space Administration — This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm is designed for distributed...

  7. A Scalable Local Algorithm for Distributed Multivariate Regression

    National Aeronautics and Space Administration — This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm can be used for distributed...

  8. Research of the Control Domain of Edges in Regression Testing

    GAO Jian-Hua


    Regression testing is the process of validating modified software to provide confidence that the changed parts of the software behave as intended and that the unchanged parts have not been adversely affected by the modifications. The goal of regression testing is to reduce the test suit by testing the new characters and the modified parts of a program with the original test suit. Regression testing is a high cost testing method. This paper presents a regression testing selection technique that can reduce the test suit on the basis of Control Flow Graph (CFG). It import the inherit strategy of object-oriented language to ensure an edge's control domain to reduce the test suit size effectively. We implement the idea by coding the edge. An algorithm is also presented at last.

  9. An Original Stepwise Multilevel Logistic Regression Analysis of Discriminatory Accuracy

    Merlo, Juan; Wagner, Philippe; Ghith, Nermin


    BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that disting......BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach...

  10. Kernel Partial Least Squares for Nonlinear Regression and Discrimination

    Rosipal, Roman; Clancy, Daniel (Technical Monitor)


    This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing kernel Hilbert space (RKHS). A previously proposed kernel PLS regression model was proven to be competitive with other regularized regression methods in RKHS. The family of nonlinear kernel-based PLS models is extended by considering the kernel PLS method for discrimination. Theoretical and experimental results on a two-class discrimination problem indicate usefulness of the method.

  11. Spectral Experts for Estimating Mixtures of Linear Regressions

    Chaganty, Arun Tejasvi; Liang, Percy


    Discriminative latent-variable models are typically learned using EM or gradient-based optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative latent-variable model. Our approach relies on a low-rank linear regression to recover a symmetric tensor, which can be factorized into the parameters using a tensor power method. We prove rates of ...

  12. Spontaneous regression of lung metastasis from osteosarcoma: a case report.

    Bacci, Gaetano; Palmerini, Emanuela; Staals, Eric L; Ferrari, Stefano; Battaglia, Milva; Longhi, Alessandra; Bertoni, Franco; Briccoli, Antonio


    A case of spontaneous regression of a pulmonary metastasis from high-grade osteosarcoma is reported. The metastasis developed 5 years after chemotherapy and amputation for a distal femur osteosarcoma. The sarcomatous nature of the lesion was histologically confirmed. No treatment was attempted owing to the patient's refusal. The patient was followed up every 3 months and a spontaneous regression of the lesion was documented. Seven years after the diagnosis of lung metastases, no pulmonary nodules or other signs of relapse are present.

  13. Combining regression trees and radial basis function networks.

    Orr, M; Hallam, J; Takezawa, K; Murra, A; Ninomiya, S; Oide, M; Leonard, T


    We describe a method for non-parametric regression which combines regression trees with radial basis function networks. The method is similar to that of Kubat, who was first to suggest such a combination, but has some significant improvements. We demonstrate the features of the new method, compare its performance with other methods on DELVE data sets and apply it to a real world problem involving the classification of soybean plants from digital images.

  14. Adaptive Regression and Classification Models with Applications in Insurance

    Jekabsons Gints


    Full Text Available Nowadays, in the insurance industry the use of predictive modeling by means of regression and classification techniques is becoming increasingly important and popular. The success of an insurance company largely depends on the ability to perform such tasks as credibility estimation, determination of insurance premiums, estimation of probability of claim, detecting insurance fraud, managing insurance risk. This paper discusses regression and classification modeling for such types of prediction problems using the method of Adaptive Basis Function Construction

  15. Innovative regression in the foreign trade commodity structure of Ukraine

    O.V. Zubko


    Full Text Available The dynamics of the indicators of Ukraines foreign trade with innovative products are analyzed. The existence of innovative regression in the structure of Ukraine's foreign trade is proved. The threats to economic growth through Ukraine's foreign trade disproportion are found. The reasons for innovative regression in the economy of Ukraine are outlined. The priority of stimulating the development of Ukraines foreign trade with innovative products is substantiated.

  16. Geometric Properties of AR(q) Nonlinear Regression Models

    LIUYing-ar; WEIBo-cheng


    This paper is devoted to a study of geometric properties of AR(q) nonlinear regression models. We present geometric frameworks for regression parameter space and autoregression parameter space respectively based on the weighted inner product by fisher information matrix. Several geometric properties related to statistical curvatures are given for the models. The results of this paper extended the work of Bates & Watts(1980,1988)[1.2] and Seber & Wild (1989)[3].

  17. Classification and Regression Trees(CART) Theory and Applications

    Timofeev, Roman


    This master thesis is devoted to Classification and Regression Trees (CART). CART is classification method which uses historical data to construct decision trees. Depending on available information about the dataset, classification tree or regression tree can be constructed. Constructed tree can be then used for classification of new observations. The first part of the thesis describes fundamental principles of tree construction, different splitting algorithms and pruning procedures. Seco...

  18. 3D Regression Heat Map Analysis of Population Study Data.

    Klemm, Paul; Lawonn, Kai; Glaßer, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Völzke, Henry; Preim, Bernhard


    Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease.

  19. Identification of Influential Points in a Linear Regression Model

    Jan Grosz


    Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.

  20. Segmented Regression Based on B-Splines with Solved Examples

    Miloš Kaňka


    Full Text Available The subject of the paper is segmented linear, quadratic, and cubic regression based on B-spline basis functions. In this article we expose the formulas for the computation of B-splines of order one, two, and three that is needed to construct linear, quadratic, and cubic regression. We list some interesting properties of these functions. For a clearer understanding we give the solutions of a couple of elementary exercises regarding these functions.

  1. Logistic regression for risk factor modelling in stuttering research.

    Reed, Phil; Wu, Yaqionq


    To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.

  2. Geometric effects of fuel regression rate in hybrid rocket motors

    CAI GuoBiao; ZHANG YuanJun; WANG PengFei; HUI Tian; ZHAO Sheng; YU NanJia


    The geometric configuration of the solid fuel is a key parameter affecting the fuel regression rate in hybrid rocket motors.In this paper,a semi-empirical regression rate model is developed to investigate the geometric effect on the fuel regression rate by incorporating the hydraulic diameter into the classical model.The semi-empirical model indicates that the fuel regression rate decreases with increasing hydraulic diameter and is proportional to dh-0.2 when convective heat transfer is dominant.Then a numerical model considering turbulence,combustion,solid fuel pyrolysis,and a solid-gas coupling model is established to further investigate the geometric effect.Eight motors with different solid fuel grains are simulated,and four methods of scaling the regression rate between different solid fuel grains are compared.The results indicate that the solid fuel regression rates are approximate the same when the hydraulic diameters are equal.The numerical results verify the accuracy of the semi-empirical model.

  3. Asymptotic theory of nonparametric regression estimates with censored data


    For regression analysis, some useful information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literature, but the optimal rates of global convergence have not been obtained yet. Because of the possible information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression function based on right_censored response data, and proves, under some regularity conditions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtain the asymptotic optimality of AIC, AICC, GCV, Cp and FPE criteria in the process of selecting the parameters.

  4. Learning a Nonnegative Sparse Graph for Linear Regression.

    Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung


    Previous graph-based semisupervised learning (G-SSL) methods have the following drawbacks: 1) they usually predefine the graph structure and then use it to perform label prediction, which cannot guarantee an overall optimum and 2) they only focus on the label prediction or the graph structure construction but are not competent in handling new samples. To this end, a novel nonnegative sparse graph (NNSG) learning method was first proposed. Then, both the label prediction and projection learning were integrated into linear regression. Finally, the linear regression and graph structure learning were unified within the same framework to overcome these two drawbacks. Therefore, a novel method, named learning a NNSG for linear regression was presented, in which the linear regression and graph learning were simultaneously performed to guarantee an overall optimum. In the learning process, the label information can be accurately propagated via the graph structure so that the linear regression can learn a discriminative projection to better fit sample labels and accurately classify new samples. An effective algorithm was designed to solve the corresponding optimization problem with fast convergence. Furthermore, NNSG provides a unified perceptiveness for a number of graph-based learning methods and linear regression methods. The experimental results showed that NNSG can obtain very high classification accuracy and greatly outperforms conventional G-SSL methods, especially some conventional graph construction methods.

  5. Direction of Effects in Multiple Linear Regression Models.

    Wiedermann, Wolfgang; von Eye, Alexander


    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  6. Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned

    Bettina Grün


    Full Text Available Beta regression – an increasingly popular approach for modeling rates and proportions – is extended in various directions: (a bias correction/reduction of the maximum likelihood estimator, (b beta regression tree models by means of recursive partitioning, (c latent class beta regression by means of finite mixture models. All three extensions may be of importance for enhancing the beta regression toolbox in practice to provide more reliable inference and capture both observed and unobserved/latent heterogeneity in the data. Using the analogy of Smithson and Verkuilen (2006, these extensions make beta regression not only “a better lemon squeezer” (compared to classical least squares regression but a full-fledged modern juicer offering lemon-based drinks: shaken and stirred (bias correction and reduction, mixed (finite mixture model, or partitioned (tree model. All three extensions are provided in the R package betareg (at least 2.4-0, building on generic algorithms and implementations for bias correction/reduction, model-based recursive partioning, and finite mixture models, respectively. Specifically, the new functions betatree( and betamix( reuse the object-oriented flexible implementation from the R packages party and flexmix, respectively.

  7. Descriptor Learning via Supervised Manifold Regularization for Multioutput Regression.

    Zhen, Xiantong; Yu, Mengyang; Islam, Ali; Bhaduri, Mousumi; Chan, Ian; Li, Shuo


    Multioutput regression has recently shown great ability to solve challenging problems in both computer vision and medical image analysis. However, due to the huge image variability and ambiguity, it is fundamentally challenging to handle the highly complex input-target relationship of multioutput regression, especially with indiscriminate high-dimensional representations. In this paper, we propose a novel supervised descriptor learning (SDL) algorithm for multioutput regression, which can establish discriminative and compact feature representations to improve the multivariate estimation performance. The SDL is formulated as generalized low-rank approximations of matrices with a supervised manifold regularization. The SDL is able to simultaneously extract discriminative features closely related to multivariate targets and remove irrelevant and redundant information by transforming raw features into a new low-dimensional space aligned to targets. The achieved discriminative while compact descriptor largely reduces the variability and ambiguity for multioutput regression, which enables more accurate and efficient multivariate estimation. We conduct extensive evaluation of the proposed SDL on both synthetic data and real-world multioutput regression tasks for both computer vision and medical image analysis. Experimental results have shown that the proposed SDL can achieve high multivariate estimation accuracy on all tasks and largely outperforms the algorithms in the state of the arts. Our method establishes a novel SDL framework for multioutput regression, which can be widely used to boost the performance in different applications.

  8. Scientific articles recommendation with topic regression and relational matrix factorization

    Ming YANG; Ying-ming LI; Zhongfei(Mark)ZHANG


    In this paper we study the problem of recommending scientifi c articles to users in an online community with a new perspective of considering topic regression modeling and articles relational structure analysis simultane-ously. First, we present a novel topic regression model, the topic regression matrix factorization (tr-MF), to solve the problem. The main idea of tr-MF lies in extending the matrix factorization with a probabilistic topic modeling. In particular, tr-MF introduces a regression model to regularize user factors through probabilistic topic modeling under the basic hypothesis that users share similar preferences if they rate similar sets of items. Consequently, tr-MF provides interpretable latent factors for users and items, and makes accurate predictions for community users. To incorporate the relational structure into the framework of tr-MF, we introduce relational matrix factorization. Through combining tr-MF with the relational matrix factorization, we propose the topic regression collective matrix factorization (tr-CMF) model. In addition, we also present the collaborative topic regression model with relational matrix factorization (CTR-RMF) model, which combines the existing collaborative topic regression (CTR) model and relational matrix factorization (RMF). From this point of view, CTR-RMF can be considered as an appropriate baseline for tr-CMF. Further, we demonstrate the efficacy of the proposed models on a large subset of the data from CiteULike, a bibliography sharing service dataset. The proposed models outperform the state-of-the-art matrix factorization models with a signifi cant margin. Specifi cally, the proposed models are effective in making predictions for users with only few ratings or even no ratings, and support tasks that are specifi c to a certain fi eld, neither of which has been addressed in the existing literature.

  9. Evaluation of Linear Regression Simultaneous Myoelectric Control Using Intramuscular EMG.

    Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J


    The objective of this study was to evaluate the ability of linear regression models to decode patterns of muscle coactivation from intramuscular electromyogram (EMG) and provide simultaneous myoelectric control of a virtual 3-DOF wrist/hand system. Performance was compared to the simultaneous control of conventional myoelectric prosthesis methods using intramuscular EMG (parallel dual-site control)-an approach that requires users to independently modulate individual muscles in the residual limb, which can be challenging for amputees. Linear regression control was evaluated in eight able-bodied subjects during a virtual Fitts' law task and was compared to performance of eight subjects using parallel dual-site control. An offline analysis also evaluated how different types of training data affected prediction accuracy of linear regression control. The two control systems demonstrated similar overall performance; however, the linear regression method demonstrated improved performance for targets requiring use of all three DOFs, whereas parallel dual-site control demonstrated improved performance for targets that required use of only one DOF. Subjects using linear regression control could more easily activate multiple DOFs simultaneously, but often experienced unintended movements when trying to isolate individual DOFs. Offline analyses also suggested that the method used to train linear regression systems may influence controllability. Linear regression myoelectric control using intramuscular EMG provided an alternative to parallel dual-site control for 3-DOF simultaneous control at the wrist and hand. The two methods demonstrated different strengths in controllability, highlighting the tradeoff between providing simultaneous control and the ability to isolate individual DOFs when desired.

  10. Counting people with low-level features and Bayesian regression.

    Chan, Antoni B; Vasconcelos, Nuno


    An approach to the problem of estimating the size of inhomogeneous crowds, which are composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking is proposed. Instead, the crowd is segmented into components of homogeneous motion, using the mixture of dynamic-texture motion model. A set of holistic low-level features is extracted from each segmented region, and a function that maps features into estimates of the number of people per segment is learned with Bayesian regression. Two Bayesian regression models are examined. The first is a combination of Gaussian process regression with a compound kernel, which accounts for both the global and local trends of the count mapping but is limited by the real-valued outputs that do not match the discrete counts. We address this limitation with a second model, which is based on a Bayesian treatment of Poisson regression that introduces a prior distribution on the linear weights of the model. Since exact inference is analytically intractable, a closed-form approximation is derived that is computationally efficient and kernelizable, enabling the representation of nonlinear functions. An approximate marginal likelihood is also derived for kernel hyperparameter learning. The two regression-based crowd counting methods are evaluated on a large pedestrian data set, containing very distinct camera views, pedestrian traffic, and outliers, such as bikes or skateboarders. Experimental results show that regression-based counts are accurate regardless of the crowd size, outperforming the count estimates produced by state-of-the-art pedestrian detectors. Results on 2 h of video demonstrate the efficiency and robustness of the regression-based crowd size estimation over long periods of time.

  11. Plotting partial correlation and regression in ecological studies

    J. Moya-Laraño


    Full Text Available Multiple regression, the General linear model (GLM and the Generalized linear model (GLZ are widely used in ecology. The widespread use of graphs that include fitted regression lines to document patterns in simple linear regression can be easily extended to these multivariate techniques in plots that show the partial relationship of the dependent variable with each independent variable. However, the latter procedure is not nearly as widely used in ecological studies. In fact, a brief review of the recent ecological literature showed that in ca. 20% of the papers the results of multiple regression are displayed by plotting the dependent variable against the raw values of the independent variable. This latter procedure may be misleading because the value of the partial slope may change in magnitude and even in sign relative to the slope obtained in simple least-squares regression. Plots of partial relationships should be used in these situations. Using numerical simulations and real data we show how displaying plots of partial relationships may also be useful for: 1 visualizing the true scatter of points around the partial regression line, and 2 identifying influential observations and non-linear patterns more efficiently than using plots of residuals vs. fitted values. With the aim to help in the assessment of data quality, we show how partial residual plots (residuals from overall model + predicted values from the explanatory variable vs. the explanatory variable should only be used in restricted situations, and how partial regression plots (residuals of Y on the remaining explanatory variables vs. residuals of the target explanatory variable on the remaining explanatory variables should be the ones displayed in publications because they accurately reflect the scatter of partial correlations. Similarly, these partial plots can be applied to visualize the effect of continuous variables in GLM and GLZ for normal distributions and identity link

  12. Associative Regressive Decision Rule Mining for Predicting Customer Satisfactory Patterns

    P. Suresh


    Full Text Available Opinion mining also known as sentiment analysis, involves cust omer satisfactory patterns, sentiments and attitudes toward entities, products, service s and their attributes. With the rapid development in the field of Internet, potential customer’s provi des a satisfactory level of product/service reviews. The high volume of customer rev iews were developed for product/review through taxonomy-aware processing but, it was di fficult to identify the best reviews. In this paper, an Associative Regression Decisio n Rule Mining (ARDRM technique is developed to predict the pattern for service provider and to improve customer satisfaction based on the review comments. Associative Regression based Decisi on Rule Mining performs two- steps for improving the customer satisfactory level. Initial ly, the Machine Learning Bayes Sentiment Classifier (MLBSC is used to classify the cla ss labels for each service reviews. After that, Regressive factor of the opinion words and Class labels w ere checked for Association between the words by using various probabilistic rules. Based on t he probabilistic rules, the opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of service preferred by the customers with their review com ments. The Associative Regressive Decision Rule helps the service provider to take decision on imp roving the customer satisfactory level. The experimental results reveal that the Associ ative Regression Decision Rule Mining (ARDRM technique improved the performance in terms of true positive rate, Associative Regression factor, Regressive Decision Rule Generation time a nd Review Detection Accuracy of similar pattern.

  13. Alternative regression models to assess increase in childhood BMI

    Mansmann Ulrich


    Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  14. A Multiple Regression Approach to Normalization of Spatiotemporal Gait Features.

    Wahid, Ferdous; Begg, Rezaul; Lythgo, Noel; Hass, Chris J; Halgamuge, Saman; Ackland, David C


    Normalization of gait data is performed to reduce the effects of intersubject variations due to physical characteristics. This study reports a multiple regression normalization approach for spatiotemporal gait data that takes into account intersubject variations in self-selected walking speed and physical properties including age, height, body mass, and sex. Spatiotemporal gait data including stride length, cadence, stance time, double support time, and stride time were obtained from healthy subjects including 782 children, 71 adults, 29 elderly subjects, and 28 elderly Parkinson's disease (PD) patients. Data were normalized using standard dimensionless equations, a detrending method, and a multiple regression approach. After normalization using dimensionless equations and the detrending method, weak to moderate correlations between walking speed, physical properties, and spatiotemporal gait features were observed (0.01 normalization using the multiple regression method reduced these correlations to weak values (|r| normalization using dimensionless equations and detrending resulted in significant differences in stride length and double support time of PD patients; however the multiple regression approach revealed significant differences in these features as well as in cadence, stance time, and stride time. The proposed multiple regression normalization may be useful in machine learning, gait classification, and clinical evaluation of pathological gait patterns.

  15. Buffalos milk yield analysis using random regression models

    A.S. Schierholt


    Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.

  16. Association between regression and self injury among children with autism.

    Lance, Eboni I; York, Janet M; Lee, Li-Ching; Zimmerman, Andrew W


    Self injurious behaviors (SIBs) are challenging clinical problems in individuals with autism spectrum disorders (ASDs). This study is one of the first and largest to utilize inpatient data to examine the associations between autism, developmental regression, and SIBs. Medical records of 125 neurobehavioral hospitalized patients with diagnoses of ASDs and SIBs between 4 and 17 years of age were reviewed. Data were collected from medical records on the type and frequency of SIBs and a history of language, social, or behavioral regression during development. The children with a history of any type of developmental regression (social, behavioral, or language) were more likely to have a diagnosis of autistic disorder than other ASD diagnoses. There were no significant differences in the occurrence of self injurious or other problem behaviors (such as aggression or disruption) between children with and without regression. Regression may influence the diagnostic considerations in ASDs but does not seem to influence the clinical phenotype with regard to behavioral issues. Additional data analyses explored the frequencies and subtypes of SIBs and other medical diagnoses in ASDs, with intellectual disability and disruptive behavior disorder found most commonly. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    Ulbrich, N.; Bader, Jon B.


    Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

  18. Optimization of Regression Models of Experimental Data Using Confirmation Points

    Ulbrich, N.


    A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.

  19. Regression Model Optimization for the Analysis of Experimental Data

    Ulbrich, N.


    A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.

  20. Study of Rapid-Regression Liquefying Hybrid Rocket Fuels

    Zilliac, Greg; DeZilwa, Shane; Karabeyoglu, M. Arif; Cantwell, Brian J.; Castellucci, Paul


    A report describes experiments directed toward the development of paraffin-based hybrid rocket fuels that burn at regression rates greater than those of conventional hybrid rocket fuels like hydroxyl-terminated butadiene. The basic approach followed in this development is to use materials such that a hydrodynamically unstable liquid layer forms on the melting surface of a burning fuel body. Entrainment of droplets from the liquid/gas interface can substantially increase the rate of fuel mass transfer, leading to surface regression faster than can be achieved using conventional fuels. The higher regression rate eliminates the need for the complex multi-port grain structures of conventional solid rocket fuels, making it possible to obtain acceptable performance from single-port structures. The high-regression-rate fuels contain no toxic or otherwise hazardous components and can be shipped commercially as non-hazardous commodities. Among the experiments performed on these fuels were scale-up tests using gaseous oxygen. The data from these tests were found to agree with data from small-scale, low-pressure and low-mass-flux laboratory tests and to confirm the expectation that these fuels would burn at high regression rates, chamber pressures, and mass fluxes representative of full-scale rocket motors.

  1. Hybrid rocket fuel combustion and regression rate study

    Strand, L. D.; Ray, R. L.; Anderson, F. A.; Cohen, N. S.


    The objectives of this study are to develop hybrid fuels (1) with higher regression rates and reduced dependence on fuel grain geometry and (2) that maximize potential specific impulse using low-cost materials. A hybrid slab window motor system was developed to screen candidate fuels - their combustion behavior and regression rate. Combustion behavior diagnostics consisted of video and high speed motion pictures coverage. The mean fuel regression rates were determined by before and after measurements of the fuel slabs. The fuel for this initial investigation consisted of hydroxyl-terminated polybutadiene binder with coal and aluminum fillers. At low oxidizer flux levels (and corresponding fuel regression rates) the filled-binder fuels burn in a layered fashion, forming an aluminum containing binder/coal surface melt that, in turn, forms into filigrees or flakes that are stripped off by the crossflow. This melt process appears to diminish with increasing oxidizer flux level. Heat transfer by radiation is a significant contributor, producing the desired increase in magnitude and reduction in flow dependency (power law exponent) of the fuel regression rate.

  2. Regression analysis for solving diagnosis problem of children's health

    Cherkashina, Yu A.; Gerget, O. M.


    The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.

  3. Sample Size Requirements for Traditional and Regression-Based Norms.

    Oosterhuis, Hannah E M; van der Ark, L Andries; Sijtsma, Klaas


    Test norms enable determining the position of an individual test taker in the group. The most frequently used approach to obtain test norms is traditional norming. Regression-based norming may be more efficient than traditional norming and is rapidly growing in popularity, but little is known about its technical properties. A simulation study was conducted to compare the sample size requirements for traditional and regression-based norming by examining the 95% interpercentile ranges for percentile estimates as a function of sample size, norming method, size of covariate effects on the test score, test length, and number of answer categories in an item. Provided the assumptions of the linear regression model hold in the data, for a subdivision of the total group into eight equal-size subgroups, we found that regression-based norming requires samples 2.5 to 5.5 times smaller than traditional norming. Sample size requirements are presented for each norming method, test length, and number of answer categories. We emphasize that additional research is needed to establish sample size requirements when the assumptions of the linear regression model are violated.

  4. Simple and multiple linear regression: sample size considerations.

    Hanley, James A


    The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. Analyzing genotype-by-environment interaction using curvilinear regression

    Dulce Gamito Santinhos Pereira


    Full Text Available In the context of multi-environment trials, where a series of experiments is conducted across different environmental conditions, the analysis of the structure of genotype-by-environment interaction is an important topic. This paper presents a generalization of the joint regression analysis for the cases where the response (e.g. yield is not linear across environments and can be written as a second (or higher order polynomial or another non-linear function. After identifying the common form regression function for all genotypes, we propose a selection procedure based on the adaptation of two tests: (i a test for parallelism of regression curves; and (ii a test of coincidence for those regressions. When the hypothesis of parallelism is rejected, subgroups of genotypes where the responses are parallel (or coincident should be identified. The use of the Scheffé multiple comparison method for regression coefficients in second-order polynomials allows to group the genotypes in two types of groups: one with upward-facing concavity (i.e. potential yield growth, and the other with downward-facing concavity (i.e. the yield approaches saturation. Theoretical results for genotype comparison and genotype selection are illustrated with an example of yield from a non-orthogonal series of experiments with winter rye (Secalecereale L.. We have deleted 10 % of that data at random to show that our meteorology is fully applicable to incomplete data sets, often observed in multi-environment trials.

  6. Quantile regression provides a fuller analysis of speed data.

    Hewson, Paul


    Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a "community speed watch" programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated.

  7. Regression of Pathological Cardiac Hypertrophy: Signaling Pathways and Therapeutic Targets

    Hou, Jianglong; Kang, Y. James


    Pathological cardiac hypertrophy is a key risk factor for heart failure. It is associated with increased interstitial fibrosis, cell death and cardiac dysfunction. The progression of pathological cardiac hypertrophy has long been considered as irreversible. However, recent clinical observations and experimental studies have produced evidence showing the reversal of pathological cardiac hypertrophy. Left ventricle assist devices used in heart failure patients for bridging to transplantation not only improve peripheral circulation but also often cause reverse remodeling of the geometry and recovery of the function of the heart. Dietary supplementation with physiologically relevant levels of copper can reverse pathological cardiac hypertrophy in mice. Angiogenesis is essential and vascular endothelial growth factor (VEGF) is a constitutive factor for the regression. The action of VEGF is mediated by VEGF receptor-1, whose activation is linked to cyclic GMP-dependent protein kinase-1 (PKG-1) signaling pathways, and inhibition of cyclic GMP degradation leads to regression of pathological cardiac hypertrophy. Most of these pathways are regulated by hypoxia-inducible factor. Potential therapeutic targets for promoting the regression include: promotion of angiogenesis, selective enhancement of VEGF receptor-1 signaling pathways, stimulation of PKG-1 pathways, and sustention of hypoxia-inducible factor transcriptional activity. More exciting insights into the regression of pathological cardiac hypertrophy are emerging. The time of translating the concept of regression of pathological cardiac hypertrophy to clinical practice is coming. PMID:22750195

  8. The use of GLS regression in regional hydrologic analyses

    Griffis, V. W.; Stedinger, J. R.


    SummaryTo estimate flood quantiles and other statistics at ungauged sites, many organizations employ an iterative generalized least squares (GLS) regression procedure to estimate the parameters of a model of the statistic of interest as a function of basin characteristics. The GLS regression procedure accounts for differences in available record lengths and spatial correlation in concurrent events by using an estimator of the sampling covariance matrix of available flood quantiles. Previous studies by the US Geological Survey using the LP3 distribution have neglected the impact of uncertainty in the weighted skew on quantile precision. The needed relationship is developed here and its use is illustrated in a regional flood study with 162 sites from South Carolina. The performance of a pooled regression model is compared to separate models for each hydrologic region: statistical tests recommend an interesting hybrid of the two which is both surprising and hydrologically reasonable. The statistical analysis is augmented with new diagnostic metrics including a condition number to check for multicollinearity, a new pseudo- R appropriate for use with GLS regression, and two error variance ratios. GLS regression for the standard deviation demonstrates that again a hybrid model is attractive, and that GLS rather than an OLS or WLS analysis is appropriate for the development of regional standard deviation models.

  9. Classification of microarray data with penalized logistic regression

    Eilers, Paul H. C.; Boer, Judith M.; van Ommen, Gert-Jan; van Houwelingen, Hans C.


    Classification of microarray data needs a firm statistical basis. In principle, logistic regression can provide it, modeling the probability of membership of a class with (transforms of) linear combinations of explanatory variables. However, classical logistic regression does not work for microarrays, because generally there will be far more variables than observations. One problem is multicollinearity: estimating equations become singular and have no unique and stable solution. A second problem is over-fitting: a model may fit well into a data set, but perform badly when used to classify new data. We propose penalized likelihood as a solution to both problems. The values of the regression coefficients are constrained in a similar way as in ridge regression. All variables play an equal role, there is no ad-hoc selection of most relevant or most expressed genes. The dimension of the resulting systems of equations is equal to the number of variables, and generally will be too large for most computers, but it can dramatically be reduced with the singular value decomposition of some matrices. The penalty is optimized with AIC (Akaike's Information Criterion), which essentially is a measure of prediction performance. We find that penalized logistic regression performs well on a public data set (the MIT ALL/AML data).

  10. A Bayesian approach to linear regression in astronomy

    Sereno, Mauro


    Linear regression is common in astronomical analyses. I discuss a Bayesian hierarchical modeling of data with heteroscedastic and possibly correlated measurement errors and intrinsic scatter. The method fully accounts for time evolution. The slope, the normalization, and the intrinsic scatter of the relation can evolve with the redshift. The intrinsic distribution of the independent variable is approximated using a mixture of Gaussian distributions whose means and standard deviations depend on time. The method can address scatter in the measured independent variable (a kind of Eddington bias), selection effects in the response variable (Malmquist bias), and departure from linearity in form of a knee. I tested the method with toy models and simulations and quantified the effect of biases and inefficient modeling. The R-package LIRA (LInear Regression in Astronomy) is made available to perform the regression.

  11. Exploring compact reinforcement-learning representations with linear regression

    Walsh, Thomas J; Diuk, Carlos; Littman, Michael L


    This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together.

  12. Technological progress and regress in pre-industrial times

    Aiyar, Shekhar; Dalgaard, Carl-Johan Lars; Moav, Omer


    This paper offers micro-foundations for the dynamic relationship between technology and population in the pre-industrial world, accounting for both technological progress and the hitherto neglected but common phenomenon of technological regress. A positive feedback between population...... and the adoption of new techniques that increase the division of labor explains technological progress. A transient shock to productivity or population induces the neglect of some techniques rendered temporarily unprofitable, which are therefore not transmitted to the next generation. Productivity remains...... constrained by the smaller stock of knowledge and technology has thereby regressed. A slow process of rediscovery is required for the economy to reach its previous level of technological sophistication and population size. The model is employed to analyze specific historical examples of technological regress...

  13. Approaches to Low Fuel Regression Rate in Hybrid Rocket Engines

    Dario Pastrone


    Full Text Available Hybrid rocket engines are promising propulsion systems which present appealing features such as safety, low cost, and environmental friendliness. On the other hand, certain issues hamper the development hoped for. The present paper discusses approaches addressing improvements to one of the most important among these issues: low fuel regression rate. To highlight the consequence of such an issue and to better understand the concepts proposed, fundamentals are summarized. Two approaches are presented (multiport grain and high mixture ratio which aim at reducing negative effects without enhancing regression rate. Furthermore, fuel material changes and nonconventional geometries of grain and/or injector are presented as methods to increase fuel regression rate. Although most of these approaches are still at the laboratory or concept scale, many of them are promising.

  14. SCA2 Presenting With Cognitive Regression in Childhood

    Ramocki, Melissa B.; Chapieski, Lynn; McDonald, Ryan O.; Fernandez, Fabio; Malphrus, Amy D.


    Spinocerebellar ataxia type 2 (SCA2) typically presents in adulthood with progressive ataxia, dysarthria, tremor, and slow saccadic eye movements. Childhood onset SCA2 is rare and only the infantile onset form has been well-characterized clinically. This article describes a girl who met all developmental milestones until age 3 ½ years when she experienced cognitive regression that preceded motor regression by 6 months. A diagnosis of SCA2 was delayed until she presented to our emergency room at age 7 years. We document the results of her neuropsychological evaluation at both timepoints. This case broadens the spectrum of SCA2 presentation in childhood, highlights the importance of considering a spinocerebellar ataxia in a child who presents with cognitive regression only, and extends currently available clinical information to help clinicians discuss prognosis in childhood SCA2. PMID:18344458

  15. Spinocerebellar ataxia type 2 presenting with cognitive regression in childhood.

    Ramocki, Melissa B; Chapieski, Lynn; McDonald, Ryan O; Fernandez, Fabio; Malphrus, Amy D


    Spinocerebellar ataxia type 2 typically presents in adulthood with progressive ataxia, dysarthria, tremor, and slow saccadic eye movements. Childhood-onset spinocerebellar ataxia type 2 is rare, and only the infantile-onset form has been well characterized clinically. This article describes a girl who met all developmental milestones until age 3(1/2) years, when she experienced cognitive regression that preceded motor regression by 6 months. A diagnosis of spinocerebellar ataxia type 2 was delayed until she presented to the emergency department at age 7 years. This report documents the results of her neuropsychologic evaluation at both time points. This case broadens the spectrum of spinocerebellar ataxia type 2 presentation in childhood, highlights the importance of considering a spinocerebellar ataxia in a child who presents with cognitive regression only, and extends currently available clinical information to help clinicians discuss the prognosis in childhood spinocerebellar ataxia type 2.

  16. A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

    Aida Mustapha


    Full Text Available In many telecommunication companies, Entrepreneur Development Unit (EDU is responsible to manage a big group of vendors that hold contract with the company. This unit assesses the vendors’ performance in terms of revenue and profitability on yearly basis and uses the information in arranging suitable development trainings. The main challenge faced by this unit, however, is to obtain the annual revenue data from the vendors due to time constraints. This paper presents a regression approach to predict the vendors’ annual revenues based on their previous records so the assessment exercise could be expedited. Three regression methods were investigated; linear regression, sequential minimal optimization algorithm, and M5rules. The results were analysed and discussed.

  17. FBH1 Catalyzes Regression of Stalled Replication Forks

    Kasper Fugger


    Full Text Available DNA replication fork perturbation is a major challenge to the maintenance of genome integrity. It has been suggested that processing of stalled forks might involve fork regression, in which the fork reverses and the two nascent DNA strands anneal. Here, we show that FBH1 catalyzes regression of a model replication fork in vitro and promotes fork regression in vivo in response to replication perturbation. Cells respond to fork stalling by activating checkpoint responses requiring signaling through stress-activated protein kinases. Importantly, we show that FBH1, through its helicase activity, is required for early phosphorylation of ATM substrates such as CHK2 and CtIP as well as hyperphosphorylation of RPA. These phosphorylations occur prior to apparent DNA double-strand break formation. Furthermore, FBH1-dependent signaling promotes checkpoint control and preserves genome integrity. We propose a model whereby FBH1 promotes early checkpoint signaling by remodeling of stalled DNA replication forks.

  18. Nonparametric instrumental regression with non-convex constraints

    Grasmair, M.; Scherzer, O.; Vanhems, A.


    This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.

  19. Modelling multimodal photometric redshift regression with noisy observations

    Kügler, S D


    In this work, we are trying to extent the existing photometric redshift regression models from modeling pure photometric data back to the spectra themselves. To that end, we developed a PCA that is capable of describing the input uncertainty (including missing values) in a dimensionality reduction framework. With this "spectrum generator" at hand, we are capable of treating the redshift regression problem in a fully Bayesian framework, returning a posterior distribution over the redshift. This approach allows therefore to approach the multimodal regression problem in an adequate fashion. In addition, input uncertainty on the magnitudes can be included quite naturally and lastly, the proposed algorithm allows in principle to make predictions outside the training values which makes it a fascinating opportunity for the detection of high-redshifted quasars.

  20. Partial least squares regression in the social sciences

    Megan L. Sawatsky


    Full Text Available Partial least square regression (PLSR is a statistical modeling technique that extracts latent factors to explain both predictor and response variation. PLSR is particularly useful as a data exploration technique because it is highly flexible (e.g., there are few assumptions, variables can be highly collinear. While gaining importance across a diverse number of fields, its application in the social sciences has been limited. Here, we provide a brief introduction to PLSR, directed towards a novice audience with limited exposure to the technique; demonstrate its utility as an alternative to more classic approaches (multiple linear regression, principal component regression; and apply the technique to a hypothetical dataset using JMP statistical software (with references to SAS software.

  1. Wavelet regression model in forecasting crude oil price

    Hamid, Mohd Helmie; Shabri, Ani


    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  2. Robust Bayesian Regularized Estimation Based on t Regression Model

    Zean Li


    Full Text Available The t distribution is a useful extension of the normal distribution, which can be used for statistical modeling of data sets with heavy tails, and provides robust estimation. In this paper, in view of the advantages of Bayesian analysis, we propose a new robust coefficient estimation and variable selection method based on Bayesian adaptive Lasso t regression. A Gibbs sampler is developed based on the Bayesian hierarchical model framework, where we treat the t distribution as a mixture of normal and gamma distributions and put different penalization parameters for different regression coefficients. We also consider the Bayesian t regression with adaptive group Lasso and obtain the Gibbs sampler from the posterior distributions. Both simulation studies and real data example show that our method performs well compared with other existing methods when the error distribution has heavy tails and/or outliers.

  3. Zipf’s law for Chinese cities: Rolling sample regressions

    Peng, Guohua


    We study the validity of Zipf’s Law in a data set of Chinese city sizes for the years 1999-2004, when the numbers of cities remain almost constant after a rapid urbanization process during the period of the market-oriented economy and reform-open policy. Previous investigations are restricted to log-log rank-size regression for a fixed sample. In contrast, we use rolling sample regression methods in which the sample is changing with the truncation point. The intuition is that if the distribution is Pareto with a coefficient one (Zipf’s law holds), rolling sample regressions should yield a constant coefficient regardless of what the sample is. We find that the Pareto exponent is almost monotonically decreasing in the truncation point; the mean estimated coefficient is 0.84 for the full dataset, which is not so far from 1.

  4. Joint regression analysis and AMMI model applied to oat improvement

    Oliveira, A.; Oliveira, T. A.; Mejza, S.


    In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.

  5. The role of innate immunity in spontaneous regression of cancer

    J A Thomas


    Full Text Available Nature has provided us with infections - acute and chronic - and these infections have both harmful and beneficial effects on the human system. Worldwide, a number of chronic infections are associated with a risk of cancer, but it is also known that cancer regresses when associated with acute infections such as bacterial, viral, fungal, protozoal, etc. Acute infections are known to cure chronic diseases since the time of Hippocrates. The benefits of these fever producing acute infections has been applied in cancer vaccinology such as the Coley′s toxins. Immune cells like the natural killer cells, macrophages and dendritic cells have taken greater precedence in cancer immunity than ever before. This review provides an insight into the benefits of fever and its role in prevention of cancer, the significance of common infections in cancer regression, the dual nature of our immune system and the role of the often overlooked primary innate immunity in tumor immunology and spontaneous regression of cancer.


    Geeta Nagpal


    Full Text Available Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources. This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.

  7. Multiple regression for physiological data analysis: the problem of multicollinearity.

    Slinker, B K; Glantz, S A


    Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.

  8. Regression of oral lichenoid lesions after replacement of dental restorations.

    Mårell, L; Tillberg, A; Widman, L; Bergdahl, J; Berglund, A


    The aim of the study was to determine the prognosis and to evaluate the regression of lichenoid contact reactions (LCR) and oral lichen planus (OLP) after replacement of dental restorative materials suspected as causing the lesions. Forty-four referred patients with oral lesions participated in a follow-up study that was initiated an average of 6 years after the first examination at the Department of Odontology, i.e. the baseline examination. The patients underwent odontological clinical examination and answered a questionnaire with questions regarding dental health, medical and psychological health, and treatments undertaken from baseline to follow-up. After exchange of dental materials, regression of oral lesions was significantly higher among patients with LCR than with OLP. As no cases with OLP regressed after an exchange of materials, a proper diagnosis has to be made to avoid unnecessary exchanges of intact restorations on patients with OLP.

  9. Analysis of some methods for reduced rank Gaussian process regression

    Quinonero-Candela, J.; Rasmussen, Carl Edward


    While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...

  10. Theory of net analyte signal vectors in inverse regression

    Bro, R.; Andersen, Charlotte Møller


    been proposed. These methods yield different results and most do not provide results that are in accordance with the chosen calibration model. In this paper a thorough development of a calibration-specific net analyte signal vector-is given. This definition turns out to be almost identical to the one......The. net analyte signal and the net analyte signal vector are useful measures in building and optimizing multivariate calibration models. In this paper a theory for their use in inverse regression is developed. The theory of net analyte signal was originally derived from classical least squares...... in spectral calibration where the responses of all pure analytes and interferents are assumed to be known. However, in chemometrics, inverse calibration models such as partial least squares regression are more abundant and several tools for calculating the net analyte signal in inverse regression models have...

  11. Epidemiology of CKD Regression in Patients under Nephrology Care.

    Borrelli, Silvio; Leonardis, Daniela; Minutolo, Roberto; Chiodini, Paolo; De Nicola, Luca; Esposito, Ciro; Mallamaci, Francesca; Zoccali, Carmine; Conte, Giuseppe


    Chronic Kidney Disease (CKD) regression is considered as an infrequent renal outcome, limited to early stages, and associated with higher mortality. However, prevalence, prognosis and the clinical correlates of CKD regression remain undefined in the setting of nephrology care. This is a multicenter prospective study in 1418 patients with established CKD (eGFR: 60-15 ml/min/1.73m²) under nephrology care in 47 outpatient clinics in Italy from a least one year. We defined CKD regressors as a ΔGFR ≥0 ml/min/1.73 m2/year. ΔGFR was estimated as the absolute difference between eGFR measured at baseline and at follow up visit after 18-24 months, respectively. Outcomes were End Stage Renal Disease (ESRD) and overall-causes Mortality.391 patients (27.6%) were identified as regressors as they showed an eGFR increase between the baseline visit in the renal clinic and the follow up visit. In multivariate regression analyses the regressor status was not associated with CKD stage. Low proteinuria was the main factor associated with CKD regression, accounting per se for 48% of the likelihood of this outcome. Lower systolic blood pressure, higher BMI and absence of autosomal polycystic disease (PKD) were additional predictors of CKD regression. In regressors, ESRD risk was 72% lower (HR: 0.28; 95% CI 0.14-0.57; pCKD stage. CKD regression occurs in about one-fourth patients receiving renal care in nephrology units and correlates with low proteinuria, BP and the absence of PKD. This condition portends better renal prognosis, mostly in earlier CKD stages, with no excess risk for mortality.

  12. Sparse Regression by Projection and Sparse Discriminant Analysis

    Qi, Xin


    © 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.

  13. Bayesian Lasso and multinomial logistic regression on GPU.

    Češnovar, Rok; Štrumbelj, Erik


    We describe an efficient Bayesian parallel GPU implementation of two classic statistical models-the Lasso and multinomial logistic regression. We focus on parallelizing the key components: matrix multiplication, matrix inversion, and sampling from the full conditionals. Our GPU implementations of Bayesian Lasso and multinomial logistic regression achieve 100-fold speedups on mid-level and high-end GPUs. Substantial speedups of 25 fold can also be achieved on older and lower end GPUs. Samplers are implemented in OpenCL and can be used on any type of GPU and other types of computational units, thereby being convenient and advantageous in practice compared to related work.

  14. Sugarcane Land Classification with Satellite Imagery using Logistic Regression Model

    Henry, F.; Herwindiati, D. E.; Mulyono, S.; Hendryli, J.


    This paper discusses the classification of sugarcane plantation area from Landsat-8 satellite imagery. The classification process uses binary logistic regression method with time series data of normalized difference vegetation index as input. The process is divided into two steps: training and classification. The purpose of training step is to identify the best parameter of the regression model using gradient descent algorithm. The best fit of the model can be utilized to classify sugarcane and non-sugarcane area. The experiment shows high accuracy and successfully maps the sugarcane plantation area which obtained best result of Cohen’s Kappa value 0.7833 (strong) with 89.167% accuracy.

  15. Credit Scoring Model Hybridizing Artificial Intelligence with Logistic Regression

    Han Lu


    Full Text Available Today the most commonly used techniques for credit scoring are artificial intelligence and statistics. In this paper, we started a new way to use these two kinds of models. Through logistic regression filters the variables with a high degree of correlation, artificial intelligence models reduce complexity and accelerate convergence, while these models hybridizing logistic regression have better explanations in statistically significance, thus improve the effect of artificial intelligence models. With experiments on German data set, we find an interesting phenomenon defined as ‘Dimensional interference’ with support vector machine and from cross validation it can be seen that the new method gives a lot of help with credit scoring.

  16. Implementing fuzzy polynomial interpolation (FPI and fuzzy linear regression (LFR

    Maria Cristina Floreno


    Full Text Available This paper presents some preliminary results arising within a general framework concerning the development of software tools for fuzzy arithmetic. The program is in a preliminary stage. What has been already implemented consists of a set of routines for elementary operations, optimized functions evaluation, interpolation and regression. Some of these have been applied to real problems.This paper describes a prototype of a library in C++ for polynomial interpolation of fuzzifying functions, a set of routines in FORTRAN for fuzzy linear regression and a program with graphical user interface allowing the use of such routines.

  17. Dramatic regression of presumed acquired retinal astrocytoma with photodynamic therapy

    Samuray Tuncer


    Full Text Available Photodynamic therapy (PDT has been used for treatment of various intraocular tumors including choroidal hemangioma, vasoproliferative tumor, amelanotic choroidal melanoma and choroidal neovascular membrane due to choroidal osteoma. This case report documents the effect of PDT for a presumed acquired retinal astrocytoma. A 42-year-old female with a juxtapapillary acquired astrocytoma was treated with a single session of PDT using standard parameters. The tumor showed dramatic regression over 6 months into a fibrotic scar. It remained regressed and stable with 20/20 vision after 51 months of follow-up. We believe that PDT can be used as a primary treatment for acquired retinal astrocytoma.

  18. Tax Evasion, Information Reporting, and the Regressive Bias Prediction

    Boserup, Simon Halphen; Pinje, Jori Veng


    evasion and audit probabilities once we account for information reporting in the tax compliance game. When conditioning on information reporting, we find that both reduced-form evidence and simulations exhibit the predicted regressive bias. However, in the overall economy, this bias is negated by the tax......Models of rational tax evasion and optimal enforcement invariably predict a regressive bias in the effective tax system, which reduces redistribution in the economy. Using Danish administrative data, we show that a calibrated structural model of this type replicates moments and correlations of tax...... agency's use of information reports and revenue-maximizing disposition of audit resources....

  19. Cardiovascular Response Identification Based on Nonlinear Support Vector Regression

    Wang, Lu; Su, Steven W.; Chan, Gregory S. H.; Celler, Branko G.; Cheng, Teddy M.; Savkin, Andrey V.

    This study experimentally investigates the relationships between central cardiovascular variables and oxygen uptake based on nonlinear analysis and modeling. Ten healthy subjects were studied using cycle-ergometry exercise tests with constant workloads ranging from 25 Watt to 125 Watt. Breath by breath gas exchange, heart rate, cardiac output, stroke volume and blood pressure were measured at each stage. The modeling results proved that the nonlinear modeling method (Support Vector Regression) outperforms traditional regression method (reducing Estimation Error between 59% and 80%, reducing Testing Error between 53% and 72%) and is the ideal approach in the modeling of physiological data, especially with small training data set.

  20. Regression Rate Study in HTPB/GOX Hybrid Rocket Motors.

    Philmon George


    Full Text Available The theoretical and experimenIal studies on hybrid rocket motor combustion research are briefly reviewed and the need for a clear understanding of hybrid rocket fuel regression rate mechanism is brought out. A test facility established at the Indian Institute of Technology, Madras, for hybrid rocket motor research study is described.The results of an experimental study on hydroxyl terminated polybutadiene and gaseous oxygen hybrid rocket motor are presented. Fuel grains with ammonium perchlorate "additive" have shownenhanced oxidizermass flux dependence. Smallergrains have higher regression rates than those of the larger ones.

  1. An Approach to the Programming of Biased Regression Algorithms.


    Due to the near nonexistence of computer algorithms for calculating estimators and ancillary statistics that are needed for biased regression methodologies, many users of these methodologies are forced to write their own programs. Brute-force coding of such programs can result in a great waste of computer core and computing time, as well as inefficient and inaccurate computing techniques. This article proposes some guides to more efficient programming by taking advantage of mathematical similarities among several of the more popular biased regression estimators.

  2. Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil

    Newton Carneiro Affonso da Costa Jr.


    Full Text Available This work compares the traditional methodology for ratio analysis, applied to a sample of Brazilian firms, with the alternative one of regression analysis both to cross-industry and intra-industry samples. It was tested the structural validity of the traditional methodology through a model that represents its analogous regression format. The data are from 156 Brazilian public companies in nine industrial sectors for the year 1997. The results provide weak empirical support for the traditional ratio methodology as it was verified that the validity of this methodology may differ between ratios.

  3. Time series analysis using semiparametric regression on oil palm production

    Yundari, Pasaribu, U. S.; Mukhaiyar, U.


    This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).

  4. Geographically weighted regression and multicollinearity: dispelling the myth

    Fotheringham, A. Stewart; Oshan, Taylor M.


    Geographically weighted regression (GWR) extends the familiar regression framework by estimating a set of parameters for any number of locations within a study area, rather than producing a single parameter estimate for each relationship specified in the model. Recent literature has suggested that GWR is highly susceptible to the effects of multicollinearity between explanatory variables and has proposed a series of local measures of multicollinearity as an indicator of potential problems. In this paper, we employ a controlled simulation to demonstrate that GWR is in fact very robust to the effects of multicollinearity. Consequently, the contention that GWR is highly susceptible to multicollinearity issues needs rethinking.

  5. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    Henrard, S; Speybroeck, N; Hermans, C


    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  6. Do clinical and translational science graduate students understand linear regression? Development and early validation of the REGRESS quiz.

    Enders, Felicity


    Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students' understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health. A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field. Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P REGRESS quiz was internally reliable (Cronbach's alpha 0.89). The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions. © 2013 Wiley Periodicals, Inc.

  7. The application of Dynamic Linear Bayesian Models in hydrological forecasting: Varying Coefficient Regression and Discount Weighted Regression

    Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa


    A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.

  8. Estimation of reference evapotranspiration using multivariate fractional polynomial, Bayesian regression, and robust regression models in three arid environments

    Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad


    The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM ( R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.

  9. Robust regression for large-scale neuroimaging studies.

    Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand


    Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies.

  10. Score normalization using logistic regression with expected parameters

    Aly, Robin


    State-of-the-art score normalization methods use generative models that rely on sometimes unrealistic assumptions. We propose a novel parameter estimation method for score normalization based on logistic regression. Experiments on the Gov2 and CluewebA collection indicate that our method is consiste

  11. Geographically Weighted Logistic Regression Applied to Credit Scoring Models

    Pedro Henrique Melo Albuquerque

    Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

  12. Simulation Experiments in Practice : Statistical Design and Regression Analysis

    Kleijnen, J.P.C.


    In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. Statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic t

  13. Sample Size Requirements for Traditional and Regression-Based Norms

    Oosterhuis, H.E.M.; van der Ark, L.A.; Sijtsma, K.


    Test norms enable determining the position of an individual test taker in the group. The most frequently used approach to obtain test norms is traditional norming. Regression-based norming may be more efficient than traditional norming and is rapidly growing in popularity, but little is known about

  14. DYNA3D/ParaDyn Regression Test Suite Inventory

    Lin, Jerry I. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)


    The following table constitutes an initial assessment of feature coverage across the regression test suite used for DYNA3D and ParaDyn. It documents the regression test suite at the time of preliminary release 16.1 in September 2016. The columns of the table represent groupings of functionalities, e.g., material models. Each problem in the test suite is represented by a row in the table. All features exercised by the problem are denoted by a check mark (√) in the corresponding column. The definition of “feature” has not been subdivided to its smallest unit of user input, e.g., algorithmic parameters specific to a particular type of contact surface. This represents a judgment to provide code developers and users a reasonable impression of feature coverage without expanding the width of the table by several multiples. All regression testing is run in parallel, typically with eight processors, except problems involving features only available in serial mode. Many are strictly regression tests acting as a check that the codes continue to produce adequately repeatable results as development unfolds; compilers change and platforms are replaced. A subset of the tests represents true verification problems that have been checked against analytical or other benchmark solutions. Users are welcomed to submit documented problems for inclusion in the test suite, especially if they are heavily exercising, and dependent upon, features that are currently underrepresented.

  15. Enhance-Synergism and Suppression Effects in Multiple Regression

    Lipovetsky, Stan; Conklin, W. Michael


    Relations between pairwise correlations and the coefficient of multiple determination in regression analysis are considered. The conditions for the occurrence of enhance-synergism and suppression effects when multiple determination becomes bigger than the total of squared correlations of the dependent variable with the regressors are discussed. It…

  16. Evaluation Applications of Regression Analysis with Time-Series Data.

    Veney, James E.


    The application of time series analysis is described, focusing on the use of regression analysis for analyzing time series in a way that may make it more readily available to an evaluation practice audience. Practical guidelines are suggested for decision makers in government, health, and social welfare agencies. (SLD)

  17. Tax Evasion, Information Reporting, and the Regressive Bias Prediction

    Boserup, Simon Halphen; Pinje, Jori Veng


    Models of rational tax evasion and optimal enforcement invariably predict a regressive bias in the effective tax system, which reduces redistribution in the economy. Using Danish administrative data, we show that a calibrated structural model of this type replicates moments and correlations of ta...... agency's use of information reports and revenue-maximizing disposition of audit resources....

  18. Modeling energy expenditure in children and adolescents using quantile regression

    Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obes...

  19. Using regression models to determine the poroelastic properties of cartilage.

    Chung, Chen-Yuan; Mansour, Joseph M


    The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds.

  20. Regression-based Multi-View Facial Expression Recognition

    Rudovic, Ognjen; Patras, Ioannis; Pantic, Maja


    We present a regression-based scheme for multi-view facial expression recognition based on 2-D geometric features. We address the problem by mapping facial points (e.g. mouth corners) from non-frontal to frontal view where further recognition of the expressions can be performed using a state-of-the-